Snowboard - Data Catalog

Author: Peruri Mohana Satya



This Blog provides information about what is a data catalog, how is it helpful to organizations, what is snowboard, how to integrate Snowboard with snowflake and

What are its features?


What is a Data Catalog and How is it useful ?
  • Data Catalog is a Metadata Management tool that maintains a neatly organized inventory of data assets across the organization in one place.

  • It is helpful for the organizations that are striving to be data-driven. Due to the generation of large volumes of data, it became difficult for the organizations to maintain all the data assets and keep track of how data flow happens between different assets in an organization.

  • In such cases, a data catalog helps organizations to better manage their data and get faster access to it. It also helps them to get meaningful insights and make quicker decisions to drive their businesses.


A Data Catalog helps organizations to answer questions such as:

  • What are the most frequently used objects?

  • How are objects related to each other?

  • Which user/role has what access to that object?

  • When was the object discovered and last modified?


There are many data catalog tools available such as Atlan, Alation, Dataedo, and data. world and so on. Here, this blog revolves around Snowboard which provides basic features of the data catalog.


What is Snowboard and its features ?
  • A Snowboard is a cloud-based software built for the snowflake data cloud. It provides both free and pay-as-you-go pricing models. It offers a free trial forever up to 2000 assets and costs 0.05$/asset for the pay-as-you-go model.

  • It can be hosted either on the cloud or self-hosted. It has two engines one for metadata and the other for metrics. The Metadata engine also connects to the DBT cloud, Tableau, and Power BI.


Some of the features of Snowboard are mentioned below

  • Search and filtering

  • Discover and observe data with data profiles

  • Monitor data freshness and usage

  • Understand data flow between assets down to column level - Lineage

  • View access rights and ownership of users on the assets.

  • Define the business glossary and metrics


Integrating snowboard with snowflake and exploring all features

1. Create a Snowboard free trial account using the link mentioned below. After creation, you will receive an email to verify your account, Click start now, and log in to your account. https://register.s8.software/



2. After logging in, you can see demo content to understand the features provided by snowboard. Next, To connect your snowflake account to snowboard, you can go to settings -> Click on delete and Connect. After that you will be redirected to the page shown below



3. A Technical user i.e, a user defined role, is created in snowboard to integrate it with snowflake by executing the below queries


/* For connecting snowboard with Snowflake, created a Snowboard user with a default role, warehouse, and namespace */


create role snowboard_role;

create warehouse SNOWBOARD_WH;


create or replace user snowboard_user

--Set a password and remember it for further use

password = ‘*****'

--specify warehouse

default_warehouse = 'SNOWBOARD_WH'

default_role = snowboard_role

default_namespace = snowboard

comment = 'Technical User created for Snowboard';


grant role snowboard_role to user snowboard_user;


/* allow usage on warehouse */


grant usage on warehouse SNOWBOARD_WH to role snowboard_role;


/* A database is required to store profiling and parsed query log results. So, Snowboard database is created and ownership on that database is granted to snowboard_role */


create database snowboard;

grant ownership on database snowboard to role snowboard_role;


/* granted access to query log and further metadata from snowflake */


grant imported privileges on database snowflake to role snowboard_role;


4. After creating a user, enter the details in the snowboard page shown in step 2, click on test and add the connection. Their Hostname should be in the form of “ab12345.us-east-2.aws” -> this you can get from your snowflake account URL.


5. Once a connection is established you can add objects from snowflake to Snowboard. To do that you can execute the below queries


/* for each database to be added to the data catalog execute these statements. Replace database_name with your specified database name*/


set db_name = 'database_name';

grant usage on database identifier($db_name) to role snowboard_role;

grant usage on all schemas in database identifier($db_name) to role snowboard_role;

grant usage on future schemas in database identifier($db_name) to role snowboard_role;

grant select, references on all tables in database identifier($db_name) to role snowboard_role;

grant select, references on future tables in database identifier($db_name) to role snowboard_role;

grant select, references on all views in database identifier($db_name) to role snowboard_role;

grant select, references on future views in database identifier($db_name) to role snowboard_role;

grant select, references on all materialized views in database identifier($db_name) to role snowboard_role;

grant select, references on future materialized views in database identifier($db_name) to role snowboard_role;


6. After adding required objects, you can see all the added objects in the snowboard. In the Search bar, you can type the required object name and access it quickly.



7. Most frequently used objects appear first as shown. You can also filter and sort the results based on the requirement.



8. For any object you can see metadata information such as the type of object (database, schema, table, or view), when it was created, owner of the object and you can also add tags, properties, and descriptions to that object. 9. For a table, you can get metadata information such as when it was created and last updated, number of rows, number of columns, size of the table, number of queries executed on the table last month, who are the most frequent users, and who is the owner of the table.



10. You can also get column-level details such as datatype of that column, column name, number of unique values in that column, number of queries executed on that column, and based on profiling it shows statistics related to that column.



11. The Timeline feature provides information about the number of updates and the number of selects that happened on a table on a particular date.



12. Lineage is one of the main features of a data catalog. It provides information about where the data originated, how data flow happened between assets, and what transformation it went through. The key benefits of lineage are:

  • Root cause analysis

  • Impact analysis


13. When you click on the arrow between assets you can view the data flow definition.


14. Access provides all access related information like who is the owner of that table and which roles have what access on that table.



15. In metrics, you can define your own metrics by specifying data source, time dimensions, and other related information to get more insights from your data.



16. In terms, you can define your business glossary and add a description to it.



  • We are getting all this metadata information based on scheduled background tasks which include discovering tables, profiling data, and parsing query logs.

  • Discover tables often refreshes the metadata of your table. It does not require a warehouse to start.

  • Query Parser is responsible for timeline, lineage, and usage statistics which requires a warehouse. It is good to run it every 8 hours or daily.

  • You can schedule these tasks in scheduled background tasks, under settings.

  • Set discovery paths, allows you to control which objects to be included and excluded in the catalog.

  • Set profile paths control which objects shall be profiled by the scheduler.

  • SSO can be enabled to snowboard using snowflake.

  • You can also add user-defined properties and tags.




Limitations of Snowboard
  • Data modifications that happened outside the snowflake cannot be captured.

  • Users should have the necessary privileges to access the tables, both source and target tables need to be indexed in snowboard otherwise, snowboard will ignore the connection.


Conclusion

Therefore, a Data Catalog is a metadata management tool that helps businesses better manage and organize their data assets and also helps in deriving meaningful insights from them to drive their businesses.


References:

[1]. Snowboard [Documentation]. Retrieved from https://docs.sled.so/

[2]. Nicholas Samuel. (2022, January 21). Snowflake data catalog: key functions, benefits, and 6 best tools [Blog post]. Retrieved from https://hevodata.com/learn/snowflake-data-catalog/

[3]. Chandan Gaur. (2021, October 21). Data catalog for snowflake Benefits and its functions [Blog post]. Retrieved from https://www.xenonstack.com/insights/data-catalog-for-snowflake


19 views0 comments

Recent Posts

See All