Tracking PII & Warehouse Costs Using Snowflake’s Object Tagging Feature

Created by: Nikhil Pai J



Introduction

This blog talks about how we can leverage Snowflake’s object tagging feature to track PII data and potentially mask them if required and track the cost associated with warehouses.


With Snowflake rolling out new features for governance and security object tagging plays an important role in helping you track where and how your data is being stored. Whether it is secure and being masked in case of being confidential data or if the costs on your warehouses are not exceeding the limit.


What are Tags?

Tags allow data stewards to track sensitive data about compliance, discovery, protection, and resource utilization use cases through a centralized or decentralized data governance management approach.


A tag is a schema-level object that you can assign to another Snowflake object. When assigning a tag to a snowflake object, you can assign any string value to the tag.


This blog uses object tagging to track inventory-related costs and track PII data during an audit to ensure that the required masking policies are in place.


Tag Lineage

Steps to create and apply tags on Tables, columns, and warehouses

Step 1: Create an overview of how your tags will be applied on the warehouse/table/column level.



List all the tag names that are to be applied on the warehouses and the corresponding values and document the same. Similarly, carry out the process for Table & column level tags.


Step 2: Create the object tags with allowed values (optional) and grant the apply tag to a role that will be applying the tags on the snowflake objects or to the owner of the data.


Code:


// create object tags using account admin role use role accountadmin;

--warehouse level tag

create or replace tag <tag_name> allowed_values <value_1>,<value_2>

comment = 'warehouse level tagging to track warehouse cost';

--Table level tag

create or replace tag <tag_name> allowed_values <value_1>,<value_2>

comment = 'table level tagging to categorise sensitive data';


--column level tag

create or replace tag <tag_name> allowed_values <value_1>,<value_2>

comment = 'sensitivity levels of each column the tag is applied on';


//grant apply tags to role sysadmin

grant apply on tag <tag_name> to role <role_name>; grant apply tag on account to role <role_name>;


Step 3: Apply the tags on the Snowflake objects using the role chosen for the applying of the tags.


Note: you can reference the tags created in one schema to the other when applying tags.


//apply tags on warehouses use role <role_name>;

alter warehouse <warehouse_name> set tag schema_name.tag_name= <tag_value>;


//apply tags on tables use role sysadmin;

--landing schema

alter table <table_name> set tag schema_name.tag_name = <tag_value>;


//apply tags on columns


alter table <table_name> modify <column_name> set tag schema_name.tag_name =

<tag_value>;


Step 4: Tracking the tags and costs associated with warehouses.


//track tags given to objects

--Identify tags in your account:

select * from snowflake.account_usage.tags order by tag_name

--Identify a value for a given tag

select system$get_tag(<tag_name>,<object_name>,<object_type>);


--Account-level query with lineage select * from

table(snowflake.account_usage.tag_references_with_lineage(<tag_name>));


--Account-level query without lineage

select * from snowflake.account_usage.tag_references order by tag_name, domain, object_id;


--Database-level query, with lineage select * from

table(production_db.information_schema.tag_references(<object_name>,<object_ty pe>));


--Database-level query for all of the tags on every column in a table or view, with lineage

select * from table(information_schema.tag_references_all_columns(<object_name>,<object_type

>));


Step 5: Using custom queries to check the cost tracking and audit queries to track sensitive data for masking policies.


//audit query to check for masking with column_with_tag

as (select object_name table_name, column_name, object_database table_db_name,object_schema table_schema_name

from "SNOWFLAKE"."ACCOUNT_USAGE"."TAG_REFERENCES"

where tag_schema='SCHEMA_NAME' AND tag_database='DB_NAME' and (tag_name='TAG_NAME_1' OR tag_name='TAG_NAME_2')),


column_with_policy

as (select ref_entity_name pol_table_name, policy_name from "SNOWFLAKE"."ACCOUNT_USAGE"."POLICY_REFERENCES"

where policy_kind ='MASKING_POLICY')

select distinct * from column_with_tag

Left join column_with_policy on

column_with_policy.pol_table_name=column_with_tag.table_name;


Note: This query provides information on all the tables and columns with tags on them along with the masking policy information applied on the columns or tables. This makes it easy during an audit to check on the columns with sensitive data without masking policies applied on them.


//cost tracking

select tag_value as department, sum(credits_used) as credits from

"SNOWFLAKE"."ACCOUNT_USAGE"."WAREHOUSE_METERING_HISTORY","SNOWFLAKE"."ACCOUNT_USAGE"."TAG_REFERENCES"

where true

and warehouse_name = object_name

and tag_name= <tag_name> and tag_database= <DB_name>

and tag_schema= <schema_name>

and start_time >= dateadd('days',-30,current_date()) group by 1

order by 2 desc;


Note: The above query provides you with information on the warehouse’s credit usage for the past 30 days.


Summary

Snowflake’s object tagging feature is very handy in terms of tracking data and costs. This gives an overview of how it can be used during an audit to make it easier to apply security and governance on the Snowflake objects. This can also be used to dynamically apply masking policies on top of the Database objects and track them.


References
  1. Object Tagging snowflake blog

  2. Object Tagging snowflake documentation

52 views0 comments

Recent Posts

See All