top of page

Snowpark UDFs and UDTfs

Updated: Mar 14

Authors: Iram Ahmed & Maseed Mohammad Ilyas


Image Credit: interworks

Snowpark allows developers to write their own code in their preferred language (Python, Java, Scala) and translates the code into the respective SQL query in Snowflake. Because of this, developers who are not well versed in SQL can write the code in the language they are comfortable with and do complex data transformations easily.


Also, one of the perks of using Snowpark is that the data need not move from the client machine; only the required JAR files will be moved to Snowflake by Snowpark automatically.


Snowpark UDFs

These UDFs are typically used when using jupyter notebooks because there is no need to manage any other python files. It is automatically handled by the notebook, and the transformation happens row by row.


There are two ways in which UDFs can be created with the help of Snowpark.

  1. Anonymous UDFs

  2. Named UDFs


Anonymous UDFs: As the name suggests, this user-defined function does not have a name. It is a temporary UDF that is only valid until the snowpark session is active. We do not need to specify the stage location while making this UDF.


It is useful if there is a one-time use of a particular code.


Named UDFs: A named UDF can be defined as temporary or permanent, depending on the requirement. If the named UDF is permanent, then the stage location must also be mentioned when defining this UDF.


If a block of code needs to be called many times, then Named UDFs can be used.


Snowpark UDFs can be of two types, Scalar UDFs or Vectorized UDFs. The only difference between the two is that the Vectorised UDFs batch data rows into Pandas data frames and process them by using vectorization to boost the performance, while Scalar UDFs are called one per row.


Snowpark UDTFs

A UDTF stands for “User Defined Table Function,” which receives scalar inputs and returns the tabular output. In many scenarios, we might want to return the output of a user-defined function as a table instead of a scalar value. In such cases, UDTFs can be used. Snowpark UDTFs can be created in the same way as a Scalar UDF. The only difference is that a UDTF must have a return type as a table. UDTFs can be scaled out, and we can also specify the partition key according to the requirement.


UDTF can be of two types:


Anonymous UDTFs: As the name suggests, this user-defined function does not have a name. It is a temporary UDTF that is only valid while the snowpark session is active. We do not need to specify the stage location while making this UDTF.


It is useful if there is a one-time use of a particular code.


Named UDTFs: A named udtf can be defined as temporary or permanent, depending on the requirement. If the named udtf is permanent, then the stage location must also be mentioned when defining this UDTF.


If a block of code needs to be called many times, then Named UDTFs can be used.


Note: Stage location must be specified only when creating permanent UDF’s or permanent UDTF’s from outside snowflake worksheets (jupyter notebooks etc.)


When to choose a UDF or UDTF:


Conclusion

This blog covered UDFs and UDTFs in Snowpark. It also talked about the different types of UDFs and UDTFs available in Snowpark and the circumstances under which it is best to use a UDF or a UDTF.


References


78 views0 comments

Recent Posts

See All
bottom of page