top of page

Data Migration Using Airbyte

Updated: Mar 9

Author: Anmol Joshi


Overview

Airbyte is a very simple open-source data integration platform for the modern data stack. It’s a data pipeline platform that performs data integration. It is used to sync the data from various applications, databases, and APIs into data warehouses and data lakes. Airbyte is on a mission to make data integration pipelines a commonly used tool. Connectors that are maintenance-free and can be used in minutes.


For reference, we are using the source as Redshift and the destination as Snowflake, both of which are SaaS services. So you'll need to have an account on these platforms to get started.


Airbyte Features

● Scheduled updates

● Manual full refresh

● Real-time monitoring

● Debugging autonomy

● Optionally normalized schemas

● Full control over the data

● Benefit from the long tail of connectors, and adapt them to your needs

● Build connectors in the language of your choice, as they run in Docker containers


Data Ingestion Using airbyte step by step guide:

Source setup:


● Go to the source in the airbyte UI.

● Select your source (We are using redshift here)


(Source selection)
  • To set it up, just follow the instructions in the screenshot below.

  • Fill in the source setup’s required details as shown below

  • Our data is stored in the ‘redshift-cluster’ that we created within the ‘dev’ database.

  • The rest of the details are to be filled as per the cluster you are using in redshift to store the data.




(Source Setup)
  • Connect to the source.

(To avoid any error, check the details and fill them in correctly. Also, the redshift cluster should be in a ready state.)



(Source Connection Ready)

Your source is now ready and connected to airbyte. This source will be used to fetch the data to be ingested into the destination (Snowflake).


Destination setup

● After the Source is configured, proceed to the ‘Set up the destination’ page in Airbyte to configure the destination.

● For the Destination, you will need to create an empty database and warehouse within Snowflake to host your data.

● Give a name to the db. In the example below, we have named our database ‘DEV_DB’.

● Now go to the destination in the airbyte UI.

● Select your destination (We are using Snowflake here)


(Destination selection)
  • Fill in the Destination setup’s required details.

  • For this example, we are using the ACCOUNT ADMIN role in Snowflake, but it is highly recommended to create a custom role in Snowflake with reduced privileges for use with Airbyte. By the default setting, it uses the default schema from Snowflake for writing the data, but we can put data in another schema as well if we want

  • Fill in the rest of the details as shown in the screenshots below.



(Destination Setup)

● Connect to the destination

● Hit the “Setup the destination” button, and if everything goes well, you should see a message telling you that all the connection tests have passed.


(Destination Connection Ready)

(To avoid any error, check the details and fill them in correctly)


Data Sync: Setup the connection

Sync up the data between the source and destination:

  • Once the destination is set up, you will be directed to the ‘set up the connection’ screen in Airbyte. We can see that Airbyte has already detected all the tables and schemas to migrate into Snowflake. By default, all of the tables are selected for migration. But if you want to migrate only a subset of the data, you can unselect the tables you wish to skip.

  • You can then specify the details, such as sync frequency between source and destination, as per the request.

  • The granularity of the sync operation can also be set by selecting the correct sync mode for your use case.

  • After we have specified all our customizations, we can click on ‘Set up Connection’ to kick off data migration from Redshift to Snowflake.

  • At last, you'll be able to see the last sync status of your connection and when the previous sync happened.


(Connection setup for data sync-up)
Evaluating the results

After successfully migrating the data, we need to evaluate Snowflake's key features which inspired the migration.

  • In Snowflake, the computing power and the storage are decoupled hence making Snowflake's storage capacity independent of the cluster size.

  • Snowflake also enables us to scale our data with three simple steps and is much simpler and faster as compared to Redshift's cumbersome process.

434 views0 comments

Recent Posts

See All
bottom of page