Author: Sankalp Jain
The continuous ingestion pipeline between Snowflake and Azure allows for near-real-time data streaming from Azure to Snowflake. This enables organizations to have up-to-date data for analytics and decision-making. The pipeline uses Azure Event Hubs to ingest the data into Snowflake and can be configured to ingest data from multiple Azure data sources.
Step 1 - We need to login into our Azure account and create a new storage account.
Step 2 - To create a new storage account, we need to take care of certain fields such as:
Unique Storage account name.
Performance according to the requirement of the use case.
Step 3 - Once the storage account is created, then we need to create the containers where we will store the files.
Step 4 - To create a new container, first, we will go to the newly created storage account and then find the container in the navigation area. The name of the container and public access level needs to be entered according to the use case.
Step 5 - A new Queue needs to be created for the storage account. You can find the queue below the Container tab in the Navigation area.
Step 6 - Now look for the Access Control (IAM) tab in the navigation area of the storage account. Here we will have to Grant access to the storage account for Snowflake Integration.
Step 7 - Head back to the Snowflake environment. In the new worksheet, we will be creating the Storage Integration & the Notification Integration. You can find the Sql statement for the same below the screenshot.
To create Storage Integration, you need an Azure Tenant ID which you can find in Azure Active Directory, and the URL of the container in container properties.
--Creating Storage Integration for Azure
create or replace storage integration azure_int
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = Azure
ENABLED = TRUE
AZURE_TENANT_ID = '6d668ab6-df66-4d4e-9b32-05f76e7cbbee'
STORAGE_ALLOWED_LOCATIONS = ('azure://cyborgsa.blob.core.windows.net/continuousdata/');
--Checking storage integration properties to fetch external_id
desc storage integration azure_int;
Similarly, to create Notification Integration, we need the Azure Tenant ID and URL of the Queue present in the storage account.
--Creating Notification Integration for Azure
CREATE OR REPLACE NOTIFICATION INTEGRATION azure_data_event
ENABLED = TRUE
TYPE = QUEUE
NOTIFICATION_PROVIDER = AZURE_STORAGE_QUEUE
AZURE_STORAGE_QUEUE_PRIMARY_URI = 'https://cyborgsa.queue.core.windows.net/continuousdataqueue'
AZURE_TENANT_ID = '6d668ab6-df66-4d4e-9b32-05f76e7cbbee';
--Checking notification integration properties to fetch external_id
desc integration azure_data_event;
Step 8 - Once you run the describe Integration SQL command for Storage Integration as well as Notification Integration, look for AZURE_CONSENT_URL output in the result pane. Click on the link present in it and authorize your access to the Azure account.
Step 9 - Head back to step 6 and Add role assignment as Storage Blob Data Contributor & Storage Queue Data Contributor.
Step 10 - To assign the members, we will have to search for the Snowflake keyword.
Step 11 - We will also have to create an Event Notification in the storage account. Look for Events in the navigation pane of the storage account.
Step 12 - To create a new event subscription, you need to take care of the following fields such as :
System Topic Name
Filter to Event type
Step 13 - Click on create to create the event.
Finally, your Azure account has been configured for Continuous Ingestion Pipeline. In your snowflake account, you need to create a database, schema, tables, and file format for those tables. A staging area needs to be created to stage the file present in the Azure Blob storage. Snowpipe connected with steam and task helps in loading the data to the actual table.