Continuous Ingestion Pipeline (Azure - Snowflake)

Author: Sankalp Jain


Introduction:

The continuous ingestion pipeline between Snowflake and Azure allows for near-real-time data streaming from Azure to Snowflake. This enables organizations to have up-to-date data for analytics and decision-making. The pipeline uses Azure Event Hubs to ingest the data into Snowflake, and can be configured to ingest data from multiple Azure data sources.


Configuration:

Step 1 - We need to login in our Azure account and create a new storage account.



Step 2 - To create a new storage account we need to take care of certain fields such as:

  • Resource Group

  • Unique Storage account name.

  • Region

  • Performance according to the requirement of use case.



Step 3 - Once the storage account is created then we need to create the containers where we will store the files.



Step 4 - To create a new container, first we will go to the newly created storage account and then find the container in the navigation area. The name of the container and public access level needs to be entered according to the use case.



Step 5 - A new Queue needs to be created for the storage account. You can find the queue below the Container tab in the Navigation area.



Step 6 - Now look for the Access Control (IAM) tab in the navigation area of the storage account. Here we will have to Grant access to the storage account for Snowflake Integration.



Step 7 - Head back to the Snowflake environment. In the new worksheet we will be creating the Storage Integration & the Notification Integration. You can find the Sql statement for the same below the screenshot.



To create Storage Integration, you need an Azure Tenant ID which you can find in Azure Active Directory and the URL of the container in container properties.


--Creating Storage Integration for Azure


create or replace storage integration azure_int

TYPE = EXTERNAL_STAGE

STORAGE_PROVIDER = Azure

ENABLED = TRUE

AZURE_TENANT_ID = '6d668ab6-df66-4d4e-9b32-05f76e7cbbee'

STORAGE_ALLOWED_LOCATIONS = ('azure://cyborgsa.blob.core.windows.net/continuousdata/');



--Checking storage integration properties to fetch external_id

desc storage integration azure_int;



Similarly to create Notification Integration we need the Azure Tenant ID and URL of the Queue present in the storage account.



--Creating Notification Integration for Azure


CREATE OR REPLACE NOTIFICATION INTEGRATION azure_data_event

ENABLED = TRUE

TYPE = QUEUE

NOTIFICATION_PROVIDER = AZURE_STORAGE_QUEUE

AZURE_STORAGE_QUEUE_PRIMARY_URI = 'https://cyborgsa.queue.core.windows.net/continuousdataqueue'

AZURE_TENANT_ID = '6d668ab6-df66-4d4e-9b32-05f76e7cbbee';

--Checking notification integration properties to fetch external_id

desc integration azure_data_event;



Step 8 - Once you run the describe Integration SQL command for Storage Integration as well as Notification Integration, look for AZURE_CONSENT_URL output in the result pane. Click on the link present in it and authorize your access to Azure account.



Step 9 - Head back to step 6 and Add role assignment as Storage Blob Data Contributor & Storage Queue Data Contributor.



Step 10 - To assign the members we will have to search for the Snowflake keyword.



Step 11 - We will also have to create an Event Notification in the storage account. Look for Events in the navigation pane of the storage account.



Step 12 - To create a new event subscription, you need to take care of the following fields such as :

  • System Topic Name

  • Source Resource

  • Filter to Event type

  • Endpoint

  • Queue



Step 13 - Click on create to create the event.



Conclusion:

Finally your Azure account has been configured for Continuous Ingestion Pipeline.In your snowflake account you need to create a database, schema, tables, file format for those tables. A staging area needs to be created to stage the file present in the Azure Blob storage. Snowpipe connected with steam and task helps in loading the data to the actual table.


References:
44 views0 comments

Recent Posts

See All