Author: Amit Khandelwal
Snowpark Features Continue to Revolutionize the Data Landscape
Snowflake continues to enhance its unified platform by introducing new features and capabilities. At the Snowflake Summit in June 2023, exciting updates were announced for Snowpark and DevOps. These advancements will streamline the migration process for data engineers from Spark for ELT/ETL, empower data scientists to natively construct and deploy ML models, and assist data developers in building applications using Snowpark.
Before going further, I am excited to share that Kipi.bi has been awarded The Americas System Integrator Growth Partner of the Year Award for excellence in helping joint customers build future-proof data solutions at scale.
You can find recordings of all the Snowflake features launched during the Summit here. But here is a summary of Snowpark's new features, Snowflake announced during the Summit:
Support for Python 3.9 and 3.10 (PuPr)
Snowpark users will be able to update to newer Python versions in order to take advantage of Python enhancements and compatible third-party packages. With this, users will also be able to take advantage of new Python libraries using Anaconda Integration inside Snowflake.
For more information, refer to the Docs on Support for Python 3.9 and 3.10
Snowpark Container Services (PrPr)
The show-stopper of Snowflake Summit 2023…
Snowpark Container Services empowers users to operate any workload or application in Snowflake, including full-stack applications, front-end web applications, secure hosting of LLMs, robust model training, and more, all within the secure environment of Snowflake. As a fully managed container offering, Snowpark Container Services (SPCS) facilitates the easy registration, deployment, management, and scaling of containerized services, jobs, and functions.
Users have the freedom to build a container image of their application in any language they prefer (e.g. Python, R, C/C++, React, any framework, etc), and deploy and execute it using configurable hardware options, including CPUs and GPUs.
With Snowpark, users can run GPU-enabled machine learning and AI workloads without the need to move data to an external compute infrastructure.
For more information, refer to the Docs on Snowpark Container Services.
Snowpark ML APIs
Snowpark ML offers a comprehensive toolkit, inclusive of SDKs and foundational infrastructure, designed for the creation and deployment of machine learning models. With Snowpark ML, users have the ability to pre-process data, train, manage, and deploy ML models entirely within the Snowflake environment.
Snowpark ML consists of APIs that cater to the complete Machine Learning Development and deployment cycle. It primarily consists of two components:
ML Modeling API or Snowpark ML Development API(PuPr)
Built atop well-known ML libraries like scikit-learn, xgboost, and lightgbm, these APIs enable users to conduct feature engineering and train ML Models directly in Snowflake using Snowpark Dataframes. They leverage the computational resources of Snowpark-optimized Warehouses for scalable data transformations.
ML Operations API (PrPr)
The ML Operations API, or Snowpark ML Ops, enhances the functionality of the Snowpark ML Development API. It offers model management capabilities and facilitates integrated deployment for inference directly into Snowflake via the Snowflake Model Registry.
For more information, refer to the Docs on Snowpark ML.
ML-Powered Functions (PuPr)
Experience the power of ML-powered functions, a unique blend of SQL and Machine Learning that simplifies the complexities of ML frameworks. These functions utilize ML to assist analysts in making faster, more accurate decisions without the need to build a full ML Pipeline.
Benefit from three native ML-powered functions designed for Forecasting, Anomaly Detection, and Contribution Explorer. Invoke these functions directly from SQL. They're user-friendly, robust, deliver insights quickly, require no complex infrastructure, and scale effortlessly.
Unstructured Data processing (PuPr)
Users can utilize Python UDFs, UDTFs, and Stored Procedures to securely extract insights from various unstructured files such as documents, images, audio, video, emails, and industry-specific formats. This can be done from internal/external stages or on-premise storage using Java, Python, or Scala in Snowflake with Snowpark.
For detailed information, refer to the docs on Processing Files and Unstructured Data with Snowpark for Python, Quickstart on getting started with Unstructured Data, and Introduction to Unstructured Data Support.
External Network Access (PrPr)
Enables users to effortlessly link to external endpoints via their Snowpark code, such as UDFs/UDTFs and Stored procedures, while upholding stringent security and governance policies. Users can establish a network rule that signifies the external network's location and access restrictions.
NOTE: This feature is currently available to specific regions in AWS only.
For detailed information, refer to the External Network Access Docs
Python Package Policies (PrPr)
Packages policy enables you to set allowlists and blocklists for third-party Python packages from Anaconda at the account level. This gives you more fine-grained control over which packages are available or blocked in your environment.
For detailed information, refer to the Docs on Package Policies
Snowpark Local Testing (PrPr)
Local testing helps reduce infrastructure costs for development…
The Snowpark Local Testing Feature allows users to establish a Snowpark session and DataFrames without needing a live Snowflake connection. This feature not only accelerates Snowpark development and testing but also conserves credits by using a local Session. Users can effortlessly transition to a live connection without making any code modifications.
For detailed information, refer to the video on Snowpark Local Testing
Vectorized UDTFs (PuPr soon)
Vectorized UDTFs allow users to define Python table functions that take a partition as a Pandas DataFrame and return results as either a Pandas DataFrame or arrays/series. This feature facilitates seamless partition-by-partition processing, a faster alternative to the row-by-row processing of scalar UDTFs. It enhances performance in various use cases, such as distributed training of multiple models, time series analysis, forecasting, and model inference with multiple outputs.
For detailed information, refer to the Docs on Vectorized Python UDTF.
Python UDAFs: User-Defined Aggregate Functions (PrPr)
Act on one or more input rows, perform mathematical operations such as sum, average, counting, finding minimum or maximum values, standard deviation, and estimation, as well as some non-mathematical operations to return a single aggregated value.
Python UDAFs provide a way for you to write your own aggregate functions that are similar to the Snowflake system-defined SQL Aggregate Functions.
For detailed information, refer to the Docs on Python UDAFs
Table Stored Procs (PuPr)
Table Stored Procs, now allows data to be returned in a tabular format. Unlike before, when only scalar values could be returned, users can now receive results in tables. This enhancement facilitates easier downstream processing within Snowpark code.
For detailed information, refer to the Docs on Table Stored Procs.
Anonymous Stored Procedures (GA)
Generate and utilize an anonymous procedure, similar to a stored procedure, but without the need for future storage. This is ideal for constructing Snowpark apps/integrations that require Snowpark code execution without persistence. For instance, dbt Python models and Snowflake Python worksheets take advantage of these anonymous procedures.
For detailed information, refer to the Docs on Anonymous Stored Procedures.
Python Tasks API (PrPr Soon)
Provides first class Python APIs for creating and managing Snowflake Tasks/DAGs.
Logging and Tracing with Event Tables (PuPr)
Logging and Tracing features allow users to capture logs and traces from UDFs, UDTFs, Stored Procedures (including code written using Snowpark APIs), and Snowpark containers. These are seamlessly directed to a secure, customer-owned Event Table. Users can then query and analyze log and trace data in Event Tables for troubleshooting applications or gaining insights into code performance and behavior.
Native Git Integration (PrPr Soon)
Snowflake now offers native git repository integration, enhancing version control, CI/CD workflows, and testing for pipelines, ML models, and applications. This feature lets users securely link a git repo to their Snowflake account, granting access to any branch, tag, or commit. Post-integration, users can effortlessly create UDFs, stored procedures, Streamlit Apps, and more by referencing the repo and branch as they would a stage file.
For detailed information, refer to the video on Git integration for Snowpark
Snowflake CLI (PrPr)
Native Command Line Interface (CLI) for optimized development and testing within Snowflake. It enables users to effortlessly create, manage, update, view apps, and build automation and CI/CD capabilities across app-centric workloads.
Triggered Tasks (PrPr)
The introduction of Triggered Tasks enables users to consume data from a Snowflake Stream more efficiently. Unlike the previous system where tasks could be executed after as much as a minute, Triggered Tasks enables data consumption as soon as the data arrives. This significantly reduces latency, optimizes resource usage, and cuts costs.
The Snowflake Summit delivered technical innovation and business value for everyone. Whether users are data engineers constructing data pipelines, data scientists crafting AI models, machine learning developers creating ML models, testers evaluating data pipelines using Snowpark locally, or DevOps engineers, Snowflake has something for everyone. With the launch of new features on Snowpark and other enhancements in Snowflake's environment, users now have a comprehensive platform to create and run end-to-end applications. From data ingestion to processing using Snowpark and AI/ML Modelling and visualization through Streamlit, everything is possible within the Snowflake's powerful, data-centric environment.
As an ELITE Partner we at Kipi.bi have access to some of the Private Preview Features that were recently announced, and we are working hard to get you our first-hand feedback on each of these features. So stay tuned!
Recordings of Snowflake features launched in Summit 2023: https://www.snowflake.com/summit/on-demand/agenda/?login=ML