top of page

Spark To Snowpark Pre Migration Checklist

Author: Anant Kashyap


What is Pre-migration?

Pre-migration refers to the activities that need to be performed before the migration of an application or system from one environment to another. Before the migration process starts, it is important to perform pre-migration activities to ensure that the migrated application or system should function as expected and to identify and resolve any issues that may arise during the migration process. Pre-migration activities will include setting up a pyspark/Databricks account ,installing Python libraries,Cluster configuration, Environment set-up and Snowflake account . These activities are needed to ensure that the migration system operates smoothly and delivers the expected business benefits.


Here are the various steps that we need to perform for pre-migration of workloads from PySpark to Snowpark:


  • Setting up Pyspark/Databricks account:

  • Cluster Creation

  • Importing Py-spark libraries

  • Environment set-up for Snowpark

  • Snowflake account


Pyspark/DataBricks account Set-up

Trial Account creation of Databricks community edition to execute py-spark code and perform certain transformations on data. Sign up for Databricks Community Edition


Cluster Creation

In order to run py-spark code you need to create a cluster which will process the code that is running.Cluster is the engine on which code will be processed , so after creation of account we need to create cluster


Importing Py-spark libraries

First of all import py-spark library SparkSession from pyspark.sql to create a session.

You can also download different modules from pyspark.sql.functions according to the requirements.


Environment set-up for Snowpark

1.Download Miniconda and install it.


2. Open the terminal or command prompt and run this command : activate base


3. Create environment by running : conda create --name snowpark_migration -c https://repo.anaconda.com/pkgs/snowflake python=3.8


4. Activate conda environment by running:conda activate snowpark_migration


5. Install SnowPark for Python, pandas, and scikit-learn by running : conda install -c https://repo.anaconda.com/pkgs/snowflake snowflake-snowpark-python pandas scikit-learn


6. Download Jupyter notebook by running :conda install jupyter


7. Set the kernel of jupyter notebook by running : ipython kernel install --name "snowpark_migration"


8. Open Jupyter Notebook by running : jupyter notebook



Snowflake account

Create a snowflake trial account and make a connection of the snowflake account with a jupyter notebook using snowflake credentials in order to process data which is present in the form of a table in snowflake.


Conclusion

In this blog, we have learned about the pre migration steps that we need to perform before Pyspark to Snowpark Migration.


26 views0 comments
bottom of page