Author: Sanket Nalawade, Sachin Rai
Replication involves writing or copying the same data to different locations. For example, data can be copied between two on-premises hosts, between hosts in other locations, to multiple storage devices on the same host, or to or from a cloud-based host. Data can be copied on demand, transferred in bulk or batches according to a schedule, or replicated in real time as the data is written, changed, or deleted in the master source.
The Replication of data is done mainly by below two ways:
ETL(Extraction Transformation Load) / ELT(Extraction Load Transformation)
It is typically used to move data from one location to another with a transformation during the migration.
Real time data replication
How does it work?
It delivers data on records that changed for database functions such as inserts, updates, and deletes and makes a record of that change available either within a database itself or to other applications that rely on the data in a near real time manner.
CDC tools typically rely on the database’s transaction log, which keeps track internally of record changes for system recovery. CDC tools leverage that information to deliver database changes to an external system.
The use of timestamps.
Database transaction logs.
A database triggers log or publish change events to a table and shares those changes with the CDC system.
Qlik Replicate is a simple, powerful, easy-to-implement solution that provides Replication between various endpoints. Replicate lets you:
Load data efficiently and quickly to operational data stores/warehouses
Create copies of production endpoints
Distribute data across endpoints
Types of Replication supported:
Full Load Replication: Creates files or tables on the target endpoint, automatically defines the metadata that is required on the target, and populates the tables with data from the source
Change Processing, also called Change Data Capture (CDC): Captures changes in the source data or metadata as they occur and applies them to the target endpoint as soon as possible, in near-real time.
Store Changes: Captures changes in the source data or metadata as they occur but does not apply them to the target.
Replication is log based, which means that it reads only the changes. This reduces the impact on the source endpoints.
Qlik Replicate to ingest and deliver data in real-time to the Snowflake environment
Step 1: Spin up a Qlik Replicate cluster either from the Snowflake partner connect page or its official website.
Once you are logged in, you will see the main screen for Qlik Replicate.
Step 2: MySQL Source Configuration The first thing we need to do is create a source endpoint. We do this by clicking the Manage Endpoint Connections button at the top of the screen.
From there, click on Add New Endpoint Connection link or the + New Endpoint Connection button at the top of the screen.
We will now create a MySQL source endpoint:
Replace the text New Endpoint Connection 1 with something more descriptive like MySQL-Source,
make sure the Source radio button is selected,
and then choose MySQL from the dropdown selection box.
You will notice as we proceed that the content of the configuration window is context-sensitive.
Fill in the blanks as indicated in the images above:
Security/SSL Mode: None
And then click on Test Connection. Your screen should look like the following, indicating that your connection succeeded.
Assuming so, click Save, and the configuration of your MySQL source endpoint is complete. Click Close to close the window.
Step 2 — Snowflake Target Configuration
Similarly, create a Snowflake connection.
The below screenshot is taken from a Snowflake on AWS configuration, but the process is the same for Azure and Google.
Now click on the arrow (>) next to “Staging”:
Your options on AWS look like this:
Once you have configured your staging area, click on Test Connection. Your screen should look like the following, indicating that your connection succeeded.
Assuming so, click Save, and the configuration of your Snowflake on AWS target endpoint is complete. Click Close to close the window.
Step 3 — Configure Your Task
Now that we have configured our MySQL source and Snowflake target endpoints, we need to tie them together in a Replicate task. In short, a task defines the following:
A source endpoint
A target endpoint
The list of tables that we want to capture
Any transformations we want to make on the data.
We need to select a source endpoint (MySQL) and a target endpoint (Snowflake) to configure our task. You can either drag the MySQL Source endpoint from the box on the left of the screen and drop it into the circle that says Drop source endpoint here, or you can click on the arrow that appears just to the right of the endpoint when you highlight it. Repeat the same process for the Snowflake Target endpoint. Your screen should now look like this:
We have worked with some sample datasets in Mysql(Source) and replicated the data into Snowflake(Target).
Tables Used: offices, Customers
TASK 1: Filtering the records during Replication.
Source Database: MySQL Workbench
The objective is to filter the data with ‘officeCode’ value between 4 to 7
Step 1) Create a task in Qlik Replicate
To get started, we need to create a task. Click on the + New Task button at the top of the screen.
Once you do, a window like this will pop up:
Give this task a meaningful name like MySQL-to-Snowflake. For this task, we will take the defaults:
Full Load: enabled (Blue highlight is enabled; click to enable / disable.)
Apply Changes: enabled (Blue highlight is enabled; click to enable / disable.)
Store Changes: disabled (Blue highlight is enabled; click to enable / disable.)
Step 2) In Designer Mode, Select table ‘offices’ from schema ‘classicmodels’
Step 3) Add the filter condition to separate the ‘officeCode’ records from the table named ’offices.’
Step 4) Start the task, and after completion, analyze the result in the Monitor section.
Step 5) We will get the result in the target Snowflake database with only 4 records as expected.
TASK 2: Concatenation of existing columns from source data and introducing new columns at Target database
Source Database: MySQL Workbench
Here we concatenate two columns ‘ContactFirstName’ and ‘ContactLastName’ and introduce a new column, ‘contact full_name’into the target snowflake database.
Step1) Create a task in Qlik Replicate
Step 2) In Designer Mode, Select table ‘Customers’ from schema ‘classicmodels’
Step3) Click on Add column Button to add a new column, as shown below.
Step 4) Add logic to newly added column ’contact full_name’ as shown below:
Step 5) Save the work and run the task after successful completion in the monitor section. Check for the desired result in Snowflake.
TASK 3: Replicating the entire schema(exact copy)
Source: MySQL Workbench Target: Snowflake
Objective: Replicating the source schema to target as it is.
Step 1) Create a task in Qlik Replicate with a meaningful name and Replication Profile as Unidirectional.
Step 2) In Designer mode, click on Table Selection.
Step 3) Click on the drop down to select source schema and all the tables for Replication.
Step 4) Run the task and monitor the progress.