Authors: Ganesh Goel & Navanil Roy
In any development process, logging is the most important part. There are multiple approaches to it. This article highlights one such automated process which will collect logs of each and every component running in the Matillion pipelines and store them(logs) in Snowflake.
In order to make effective use of logging, just storing logs is not sufficient. Sending out a notification of failure and success is equally important in order to make the review process fast.
Fig. Below is the orchestration job for Logging and email alerting.
Environment setup is done with the correct configuration
Roles, Warehouse Set-up in Snowflake, for Matillion to use
Steps for setting up the Logging pipeline in Matillion
1. Creation of Job variables
Create three job variables to be used later in the job.
2. Check for existing logged data
For this, you need to import the Query Result to the Scalar component. This component will check if there is any existing data in the above Snowflake log table. If there is any data present in the table, this component will store the count of rows present in the environment variable created above, so that those rows will not be added again.
Now using the ‘If’ component, the pipeline will be branched into two sections based on whether the count of rows is less than 1 or not. If there are any rows present in the log table, then import the Query Result to the Scalar component to store the last run_id from the log table.
3. Fetch the last RunID
This is to prevent duplicate logs in case there are existing log records present in the log table (If there are no logs present in the table, you can skip this step).
For this, you need to use the Query Result to Scalar component and fetch the last id and then map it to a job variable created above.
4. Get Log Data
For getting the log records you need to query Matillion’s PostgreSQL database and then parse the required data from JSON to a tabular format. This step uses the Database Query component. Below are the connection details of Matillion’s PostgreSQL database.
JDBC URL: jdbc:postgresql://localhost/
Password: empty(there is no password)
Below is a sample query to get data from this database and load it incrementally in the logs table, in case there are existing records. For an empty table, the where clause can be removed and the flow can be made accordingly.
select * from (select
batchid AS "RUNID",
rec::json->'task'->>'jobName' AS "JOB_NAME",
rec::json->'taskID' AS "TASK_ID",
rec::json->'task'->>'componentName' AS "COMPONENT_NAME",
rec::json->'task'->>'startTime'::text AS "START_TIME",
rec::json->'task'->>'endTime'::text AS "END_TIME",
rec::json->'task'->>'state' AS "STATE",
rec::json->'task'->>'message' AS "MESSAGE"
order by batchid desc
select row_number() over (partition by 1 order by batchid) r, batchid from
(select distinct batchid
from task_package_history) g group by batchid) y
on batchid="RUNID") h
order by "ID" desc, "TASK_ID"::int
Below is a sample log table generated in Snowflake
Steps for setting up the Email Alerting in the same pipeline in Matillion
1. Check if any errors are logged
There might be a possibility that no new errors are logged(in that case you need to send a success email). So, for this, you need to import the Query Result to the Scalar component. This component will be used to store the count of errors in the log table so that success and failure emails can be sent separately.
Now import the ‘If’ component to branch the pipeline into two parts based on whether the count of errors is less than 1 or not.
2. Use Send email component
If there are no errors then the send email component can directly send a successful run email based on the configuration. After sending a successful run email, import the SQL query component to set the NOTIFICATION_SENT status to ‘YES’, so that email regarding those logs will not be sent again.
Now, if there are errors present in the log then you can configure another send email component with the error message you want and send it to the required users. After sending a failure email, import the SQL query component to set the NOTIFICATION_SENT status to ‘YES’, so that email regarding those logs will not be sent again.
3. Import Child job in Parent
Import this child orchestration job in the parent orchestration job and run the parent job in the correct environment to get the successful(If there are no errors logged and logs are present for which notification has not been sent) and failure emails(If there are new errors logged for which notification has not been sent).
Below are the sample emails in each of the cases:
Successful Run email
Failure Run email