Different User Defined Function In Snowflake

Author: Mridul Vij


If you want to do operations that are not directly available by Snowflake as built-in functions, Snowflake provides users with User Defined Functions (UDF) that can be used with different languages that will be explained below to perform various operations within Snowflake, calling a User Defined Function in Snowflake is just like calling a normal function which passes Name and Parameters of the UDF to Snowflake.


Note:

  • To avoid conflicts when calling functions, Snowflake does not allow creating any UDFs with the same name as any of the system-defined functions present in snowflake. But if the same name UDF has a different number of arguments or different types of arguments then UDF Overloading occurs.

  • Scalar UDF i.e UDF that returns a single value for a single row of input has a limit of 500 input arguments. On the other hand, Tabular UDTF has a limit of 500 input arguments and 500 output columns.

  • Body of a UDF can contain only SELECT statements but it cannot contain other DML statements or any DDL statement.

  • It is crucial to remember that UDFs do not now have access to the outside world. This security limitation was voluntarily implemented by Snowflake. It is advised to use external functions rather than creating a UDF if you want to develop a UDF that interacts with the outside world, such as to hit an API using the requests library.


JAVA UDF

In Java UDF, UDF contains Java code in a JAR file (referred to as the handler method) which is called within Snowflake. The output is then returned to Snowflake by the handler method, who subsequently sends it back to the client.

  • This is a preview feature for accounts on GCP.

  • The maximum size supported for a Java UDF output row is 16 MB.

  • For each row passed to a Java UDF, the UDF returns either a scalar (i.e. single) value or, if defined as a table function, a set of rows i.e tabular result.

  • The HANDLER clause identifies the class and method to use as the handler because Java source code might have several classes and many methods within a class.

  • Java UDF that is in-line or pre-compiled can call code in JAR files that are included in the CREATE FUNCTION statement’s IMPORTS clause.

  • Snowflake supports writing Java UDFs in java 11.x

  • Example: largest of two no.


create function larger_no(no1 float, no2 float)

returns integer

language java

handler='AddFunc.add'

target_path='@~/AddFunc.jar'

as

$$

class TestAddFunc {

public static int add(float no1, float no2) {

if(no1>no2)

return no1;

else

return no2;


}

}

$$;

Advantages of Java UDFs
  • We can use already prepared Java code (source or compiled) in UDF.

  • We can use functions that already exist in standard Java libraries.


Disadvantages of Java UDFs
  • Snowflake has some security constraints that disable some capabilities, such as writing to files. Even if Java method can use classes and methods in the standard Java libraries,

  • We cannot share Java UDF And database objects that use Java UDF can also not be shared, Which means one cannot share a view, function that calls a Java UDF or a table with access policy.

  • You cannot create a Java UDF with a secure option.

  • Giving Role USAGE privilege on a Java UDF can lead to leak of content of any file imported by Java UDF which can hinder security.


Python UDF

In Python UDF, the user passes the name of the UDF and the parameters of the UDF to Snowflake. If the UDF is a Python language, Snowflake calls the required Python code known as the handler function. The output is then returned to Snowflake by the handler function, which subsequently sends it back to the client. It's also crucial to remember that the arguments sent to the handler Python function on row 10 must match the arguments passed to the UDF on row 1 and must be given in the same intended sequence.


Snowflake currently supports writing UDFs in Python version 3.8.

  • Python UDFs are scalar functions; they return a value for each row that is provided to them. It does not support tableaular results.

  • Snowflake offers third-party Python packages thanks to a collaboration with Anaconda.

  • Snowflake currently supports writing UDFs in Python version 3.8.

Example:


create or replace function larger_no(no1 float, no2 float)

returns int

language python

runtime_version = '3.8'

as

$$

def addone_py(no1,no2):

if(no1>no2):

return no1;

else:

return no2;

$$;

Advantages of Python UDFs
  • We can use python code and functions that already exist in standard Python packages within Python UDF.

  • Python UDF can take advantage of Python’s rich 3rd-party ecosystem.


Disadvantages of Python UDFs
  • Snowflake has some security constraints that disable some capabilities, Even if Python UDF can use modules and functions in the standard Python packages.

  • UDFs and modules that are brought in through stages must be platform-independent i.e no specific CPU architecture or Operating System and must not contain native extensions.

  • We cannot share Python UDF And database objects that use Python UDF can also not be shared, Which means one cannot share a view or function that calls a Python UDF or a table with access policy.

  • You cannot create Python UDF with a secure option.

  • Giving Role USAGE privilege on a Java UDF can lead to leak of content of any file imported by Java UDF which can hinder security.

  • Right now creating or refreshing a secondary database is blocked if a Python UDF exists in the primary database.

  • Python UDF in Snowflake uses the Python ‘zipimport’ module to import Python code from stages. So any limitations with the ‘zipimport’ module will also be present with UDFs.


SQL UDF:

A SQL UDF runs an arbitrary expression written in SQL and gives the output of the sql expression.

  • SQL UDF returns either a scalar (i.e. single) value or, if defined as a table function, a set of rows (i.e tableaular values)..

  • The newly formed user defined function can be used in SQL statements.

  • Snowflake supports overloading of SQL UDF names. As long as the argument signatures of different SQL UDFs in the same schema differ by the number of arguments or the types of arguments, they can share the same name.

  • In SQL UDF body, one should not use semicolons to terminate the query expression.

  • Example: Largest of two no.


create or replace function larger_num (no1 float, no2 float)

returns float

as

$$

select case when no1 > no2 then no1 else no2 end

$$

;

Advantages of SQL UDFs
  • In SQL UDF function we can condition and can convert different types like String to Integer, which all can help to solve the common problem of machine learning that is converting categorical data to an integer.


Disadvantages of SQL UDFs
  • SQL UDF can call or refer to other user-defined functions within its expression. But it cannot call or refer to itself directly or even indirectly via other functions.

  • In SQL UDF when using masking policy, It can show error due to differences between data type of the column, UDF, and masking policy.


JAVASCRIPT UDF:

JavaScript UDFs can only access the data required to carry out the given function. They are unable to change the status of the underlying system. And also this UDF consume a reasonable amount of memory and processing time.

  • A JavaScript UDF returns either scalar (i.e single value) or tabular results (i.e set of values or rows).

  • You can also use a JavaScript try/catch block to handle errors.

  • For a nested level interval addition query, the Javascript-based UDF performs significantly better than the SQL-based UDF.

  • Unlike in SQL UDF, the JavaScript code must refer to the input parameter names as all uppercase.

  • Use the statement ‘language javascript’ for using javascript in UDF.

  • Example : largest of two no.


create or replace function larger_no (no1 float, no2 float)

returns float

language javascript

as

$$

if (no1>no2) { return no1 } else { return no2}

$$

;


Advantages of JavaScript UDFs
  • It uses standard JavaScript Library meaning that a JavaScript UDF can call the usual JavaScript v8 engine (supported by snowflake) function that a JavaScript interpreter would allow a JavaScript script to call. The V8 engine implements ECMAScript for which a language specification exists which includes the statements, objects, properties, functions, operators, etc.

  • JavaScript UDF allows one to handle errors that occur while running a function by using try/catch Block.

  • The JavaScript global state is typically kept between iterations of a UDF by Snowflake.


Disadvantages of JavaScript UDFs
  • Snowflake does not ensure that every row has a clean JavaScript environment. This is for performance reasons. This can lead to a recursion error while using the function in a select from a table with more than 1 row.

  • Snowflake limits the maximum size of the JavaScript source code in the body of a JavaScript UDF to 100 KB (This size is for compressed code and depends on compressibility of code).

  • JavaScript UDFs will result in an error if consume too much memory, And also if it takes too long to process then also give error and kill the process.

  • Excessive stack depth in JavaScript UDF due to recursion will result in an error.

  • Snowflake preserves the JavaScript global state between iterations of a UDF but one should not rely on the previous state being available between function calls. Also, you should not assume that all rows will execute within the same JavaScript environment. This method of preserving state is only beneficial for caching effects of code evaluation.

  • JavaScript UDF supports the JavaScript standard library. but does not include many objects and methods provided by browsers. Also does not have any mechanism to import, include, or call additional libraries.

  • JavaScript UDF Objects are currently limited to several megabytes (i.e if an object is too large it can return error when UDF is called), and nesting depth of 1000.


Conclusion:

We can create and use UDF with different languages to do a particular task that cannot be done with built-in functions already present in Snowflake. This makes work convenient and faster.


204 views0 comments

Recent Posts

See All