Kipi.bi / Insights / Partnerships / Extraction of Data from Mongodb to Snowflake Using SSIS

Extraction of Data from Mongodb to Snowflake Using SSIS

Author: Rachana Kadiam

Introduction to SSIS

  • SQL Server Integration Services (SSIS) is a powerful data integration and workflow automation tool developed by Microsoft. It is a flexible ETL tool that facilitates the creation of data integration solutions, helping organizations manage and transform their data to meet business requirements.
  • It is used for solving complex business problems by extracting from various sources like databases, flat files etc  and transforming data, and loading data into one/ multiple destinations.
  • It allows users to design and manage data integration processes through a graphical interface(GUI). 
  • Performs transformations to cleanse, aggregate, and manipulate the data before loading it into the target destination.
  • SSIS packages, which are sets of workflows created in the SSIS development environment, can be scheduled and executed as part of ETL and provides automation of data integration scenarios.
  • SSIS provides features for error handling, logging, and monitoring, allowing developers to identify and troubleshoot issues efficiently.
Your Data Journey Starts Here

Pre-requisities

  • SSIS installed in Microsoft visual studio.
  • SSIS extension packages for using Snowflake, Mongodb and AWS in the environment.(using MongoDB SSIS Components – SSIS Productivity Pack by kingswaysoft)

Steps

  • For using SSIS , we are using Microsoft Visual studio(2019). Open visual studio and create a new project and in the search explorer find the SSIS executables and create a new package.
  • We need to create connection managers for establishing a connection between different sources and destinations.
  • For Creating a connection manager, right click on the connection manager area and create a new connection.
  • Snowflake connection manager – fill in the required fields

Mongodb connection manager – fill the required fields and authentication modes and then test the connection.

Amazon S3 connection manager – fill in the required fields

  • Once the connection managers are set-up, we need to create a data flow task.
  • Drag the data flow task component from the SSIS Toolkit.
  • Drag the Mongodb source component and Snowflake destination component from SSIS Toolkit on to the data flow task.

In Mongodb source component, select the database and collection from the drop-down. We will get the list of databases present in Mongodb server based on Mongodb connection manager.

  • In document designer, we can update the data types of the columns if needed, And we can also add additional columns if required.
  • In Snowflake destination component:

  • Select the action you want to perform such as update, delete, insert
  • Select the database, schema and destination table
  • If we have huge data, then we have to use bulk copy, which uses the AWS S3 bucket as the intermediate location for storing the data and deletes once it’s loaded into a Snowflake table.
  • We can specify the batch size based on our requirements.

Incremental Load

  • If we want to load data incrementally, we have to create variables for storing the values

  • Drag a Snowflake command task , add a Snowflake command here and then assign it to an output variable.
  • Then we can use variables referenced here in the data flow task y dragging it from the variables section.

Conclusion

In this blog, we have seen how to extract the data from Mongodb to Snowflake using SSIS

References

November 23, 2023