Best Practices For cloning

Author: Nagam Jayakrishna


Overview

The purpose of this blog is to highlight the best practices of cloning and some situations in which cloning will not provide effective results. This blog also provides solutions for those situations and will highlight some of the pros and cons of cloning various objects.


Cloning:

Cloning in Snowflake is a technique that produces point-in-time replicas of the source objects. In Snowflake, cloning will not copy the data physically; it happens thRough metadata. In this way, we don’t need to create the objects every time, and we can pay once for unique data.


In Snowflake, source objects are the objects on which the cloning is applied, and the cloned objects are the objects that are obtained after the process of cloning.



As cloning takes place without making the physical copy of the object, it is called “zero-copy cloning."


Syntax

CREATE [OBJECT] <CLONING_OBJECT_NAME> CLONE <SOURCE_OBJECT_NAME>;


Can we apply cloning to all objects?

Objects which you cannot apply cloning in Snowflake. The below list tells you about the object which you can clone and cannot clone. In Snowflake, we can apply cloning to most of the objects, but there are some.


Objects that support cloning:
  • Databases

  • Schemas

  • Tables

  • Pipes

  • External Named Stages

  • Streams and Tasks

  • Tags

  • Sequences

  • Foreign Keys


Objects that do not support cloning:
  • External Tables

  • Internal Named Stages


Although internally named stages cannot be cloned, internal table stages can be cloned by cloning the table. External named stages can be cloned by executing CREATE .. CLONE statement.


Cloning Database and Schema:

If a database or schema is cloned, the underlying objects within the database or schema will also get cloned. The cloning of a database or schema doesn’t incur data storage as it uses metadata, but if the existing data in the database or schema is modified, then the data storage is considered.


Cloning Table:

If you are cloning a table, your current role must have SELECT privileges on the source table.


Pros

If a table is cloned, the clustering keys within the table will also get cloned, but by default, the automatic clustering is suspended. It can be resumed by altering the table.


Cons
  1. If a table is cloned, the cloned table will not include the load history of the source table.

  2. If a table is cloned with COPY GRANTS it will not inherit FUTURE GRANTS.

Cloning Tags

Cloning the database or schema will also clone the tags. The tags that were cloned are completely independent of the source database or schema.


Cloning Streams and Tasks

In Snowflake, we can also clone streams, but you have OWNERSHIP privilege on the streams or tables which you are cloning.

Tasks can be cloned separately, or they can be cloned by cloning the database or schema for that, the current role should have OWNERSHIP privilege on the task.


Syntax

CREATE OR REPLACE TASK <CLONING_TASK_NAME> CLONE


<SOURCE_TASK_NAME>;


Cons

1. Suppose you are cloning a table or stream; if any records are present in the stream (i.e., unconsumed), they cannot be accessed in a clone.


Solution

1. It is suggested to perform cloning after consumption of the stream.


2. When a task is cloned by default, it is suspended. It can be resumed by using an ALTER statement. But in order to execute ALTER statement on the task your current role should have EXECUTE TASK privilege on account


Syntax

USE ROLE ACCOUNTADMIN;


GRANT EXECUTE TASK ON ACCOUNT TO ROLE <TASK_ROLE>;


ALTER TASK <CLONING_TASK_NAME> RESUME;


Cloning Sequences and Foreign Keys

In Snowflake, both Sequences and Foreign keys can be cloned by cloning a database or schema. It can also be cloned by cloning the table.

  • Sequences and Foreign keys will refer to the source tables if the table is cloned separately instead of cloning a database or schema.

  • If clone a database or schema the sequences and foreign keys will refer to the cloned table.


Cloning Pipes

While cloning a database or schema, if you define a pipe or COPY INTO statement if you mention <database_name>.<scheme_name>.<table_name>, it will duplicate the data into the source table.

Instead of mentioning in <database_name>.<scheme_name>.<table_name> format, if you just mention the table name liKE SALES_TABLE it will load the data into both the source and the cloned table.


Cons

1. By default while cloning the pipe AUTO_INGEST is set to false. It can be set to true by altering the pipe.


Cloning with Time Travel

In Snowflake, you can clone the data based on Time Travel. This can be done by AT clause or BEFORE clause. The following are the ways to clone based on offset, timestamp, and statement(Query ID).


AT clause:

CREATE TABLE <CLONED TABLE NAME> CLONE <SOURCE TABLE NAME> AT (OFFSET=> <OFFSET VALUE> );


CREATE TABLE <CLONED TABLE NAME> CLONE <SOURCE TABLE NAME> AT (timestamp => to_timestamp_tz(' <DATE> <TIME>', 'mm/dd/yyyy hh24:mi:ss'));


CREATE TABLE <CLONED TABLE NAME> CLONE<SOURCE TABLE NAME> AT (STATEMENT=>'<QUERY ID>');


BEFORE clause

CREATE TABLE <CLONED TABLE NAME> CLONE <SOURCE TABLE NAME> BEFORE (STATEMENT=><QUERY ID>');


Some Situations in which cloning will not give effective results

Performing DDL operations:


Consider a situation in which the cloning is in progress, and DDL is executed at the same time. Then the DDL will not reflect into the cloned object. If you are dropping an object while cloning is in progress, then the object will be present in the cloning database. The same will apply to other DDL operations.


Solution

It is suggested to perform DDL operations after or before cloning.


Performing DML operations:

In Snowflake, the DATA_RETENTION for Time Travel incurs storage costs. So, some users may set the DATA_RETENTION period to zero to avoid storage costs.


If a user starts the cloning process, in this case, the Snowflake performs cloning at the start of the process, and if the user performs DML operations, it will change the source table. In this case, as the DATA_RETENTION period is zero and the Snowflake can’t access the data at the start of the cloning, it will throw an error that it can't access the data.


Solution
  1. Set the DATA_RETENTION_PERIOD_IN_DAYS=1.

  2. Hold the DML operations till the cloning completes.


Conclusion

Thus we can conclude that by following the above steps and solutions, we can utilize the best practices of cloning across various objects.


References



26 views0 comments

Recent Posts

See All