Author: Unika Subha Gandepalli
The purpose of this blog is to highlight the best practices to reduce storage, computing, and data transfer costs associated with Snowflake. Before directly going into the best practices, let us first understand what these costs are and how they affect us.
It is the cost associated with the number of credits consumed by the virtual warehouses, including serverless features such as snowpipe, automatic clustering, etc.,
Snowflake credits are units of measure. It will consume credits only when resources are utilized, like when a warehouse is running, and serverless features are used.
It is the cost associated with average monthly storage consumption, and it may vary depending on prepaid or on-demand storage purchase. It is calculated on terabyte consumption.
Data transfer Cost
It is the cost associated with data transfers from a snowflake account hosted in one region to another region, either on the same cloud platform or a different cloud platform (AWS, GCP, AZURE). It is charged per byte and also depends on where the snowflake account is hosted.
Which cost bothers you the most?
The below diagram shows the objects in the snowflake which incur a cost.
Compute Cost Best Practice
Terminate virtual warehouse:
Snowflake runs queries using virtual warehouses that incur computational costs.
It is better to stop virtual warehouses when they are not in use.
Snowflake provides a minimum auto-suspend time for a virtual warehouse is 5 minutes, and by default, in web UI, it is set to 10 mins, and still, you can change the auto-suspend time below 5 mins by executing SQL query.
alter warehouse <warehouse_name> set auto_suspend =<num_in_seconds>;
Resize warehouse based on load:
A significant part of Snowflake's bill is Compute costs. Since the computational cost of a virtual warehouse depends on its size, it is best to start with x-small virtual memory to run all queries. Observe for a few weeks before moving to medium/large/other larger sizes.
Now let us understand how credit consumption billing happens based on warehouse size and the number of clusters used:
Consider a 2X-Large multicluster warehouse that runs 1 cluster for the first two hours and then runs 2 clusters for the next one hour; then the total number of credits billed would be (1*2*32+2*1*32) = 128 credits.
In the formula, it can be expressed as:
No of Credits = No of Clusters*No of Hours*No of Servers.
Set up a resource monitor:
A resource monitor can be used to monitor credit usage by user-managed virtual warehouses and virtual warehouses used by cloud services, but it can only terminate user-managed warehouses based on credit usage thresholds. Cloud service credit usage can continue after the user-managed warehouse is suspended.
Example: statements to create a resource monitor using SQL.
Example 2: Resource monitor creation using Web UI.
Establish alerts for reader accounts:
When a consumer runs queries, the data provider will be charged for the computation cost.
If you want to monitor the monthly amount of credits consumed by the virtual warehouse in the reader account, create one or more resource monitors.
You can monitor this credit consumption at the account level and warehouse level.
If you skip this task, the warehouses in the reader's account can consume more credits that will be charged to your provider account each month.
Storage Cost Best Practices
Consider account usage views related to snowflake cost:
It is essential to monitor account usage views such as:
Storage usage view: This account usage view shows the average daily data storage usage in bytes over the last 365 days for the entire account, including database tables and files in the internal stage.
If you want to monitor storage usage separately for database and internal stages, then you can execute the following views:
Metering history view: This view is used to return hourly credit usage for the entire account within the last 365 days.
Warehouse metering history view: This view is used to return hourly credit for a single warehouse or for all warehouses within the last 365 days.
Metering Daily History view: It is used to return the daily credit usage and cloud services deduction for the entire account within the last 365 days.
Use zero-copy cloning productively:
Cloning only creates metadata for the cloned table that points to the same storage as the original table.
This ensures that there are no additional storage costs when cloning.
This will optimize Snowflake costs and decrease query execution time.
Make use of temporary and transient tables to reduce storage costs:
To reduce the storage cost, make use of temporary and transient tables.
Use Temporary tables for session-specific intermediate results in complex data processing workflow because temporary tables drop when the session ends, which incurs no storage cost.
Use transient tables for staging where frequent truncate or reload operations occur.
Create databases and schemas as transient to simplify table creation.
Data Transfer Cost Best Practices
Use the same cloud provider within the same region to store files and data:
We know that snowflake supports three cloud providers, i.e., AWS, GCP, and azure, for loading data. There are data transfer costs when data is moved from a cloud provider or region other than the Snowflake account's region or hosted cloud provider.
Hence to reduce data transfer costs try to use a single cloud provider.
Therefore, these are some of the best practices to be followed to optimize the cost associated with Snowflake.