Matillion - Logging And Email Alerts

Updated: Aug 1

Authors: Ganesh Goel & Navanil Roy


Overview

In any development process, logging is the most important part. There are multiple approaches for it. This article highlights one such automated process which will collect logs of each and every component running in the Matillion pipelines and store them(logs) in Snowflake.


In order to make effective use of logging, just storing logs is not sufficient. Sending out a notification of failure and success is equally important in order to make the review process fast.


Fig. Below is the orchestration job for Logging and email alerting.


Parent Orchestration job for Logging and Email Alerts
Prerequisites
  • Environment setup is done with correct configuration

  • Roles, Warehouse Set-up in Snowflake, for Matillion to use

Implementation

Steps for setting up the Logging pipeline in Matillion


1. Creation of an Job variables


Create three job variables to be used later in the job.



2. Check for existing logged data


For this you need to import the Query Result to Scalar component. This component will check if there is any existing data in the above Snowflake log table. If there is any data present in the table, this component will store the count of rows present in the environment variable created above, so that those rows will not be added again.



Now using the ‘If’ component, the pipeline will be branched into two sections based on whether the count of rows is less than 1 or not. If there are any rows present in the log table, then import the Query Result to Scalar component to store the last run_id from the log table.


3. Fetch last RunID


This is to prevent duplicate logs in case there are existing log records present in the log table (If there are no logs present in the table, you can skip this step).

For this you need to use the Query Result to Scalar component and fetch the last id and then map it to a job variable created above.



4. Get Log Data


For getting the log records you need to query the Matillion’s PostgreSQL database and then parse the required data from JSON to a tabular format. This step uses the Database Query component. Below are the connection details of the Matillion’s PostgreSQL database.


JDBC URL: jdbc:postgresql://localhost/

Username: postgres

Password: empty(there is no password)


Below is a sample query to get data from this database and load it incrementally in the logs table, in case there are existing records. For an empty table, the where clause can be removed and the flow can be made accordingly.


select * from (select

y.r "ID",

x."JOB_NAME",

x."TASK_ID"::text "TASK_ID",

x."COMPONENT_NAME",

TO_TIMESTAMP(x."START_TIME"::int8/1000.0) "START_TIME",

TO_TIMESTAMP(x."END_TIME"::int8/1000.0) "END_TIME",

x."STATE",

x."MESSAGE",

'NO' "NOTIFICATION_SENT"

from(

select

batchid AS "RUNID",

rec::json->'task'->>'jobName' AS "JOB_NAME",

rec::json->'taskID' AS "TASK_ID",

rec::json->'task'->>'componentName' AS "COMPONENT_NAME",

rec::json->'task'->>'startTime'::text AS "START_TIME",

rec::json->'task'->>'endTime'::text AS "END_TIME",

rec::json->'task'->>'state' AS "STATE",

rec::json->'task'->>'message' AS "MESSAGE"

from task_package_history

order by batchid desc

) x

inner join

(

select row_number() over (partition by 1 order by batchid) r, batchid from

(select distinct batchid

from task_package_history) g group by batchid) y

on batchid="RUNID") h

where "ID">${l_id_job}

order by "ID" desc, "TASK_ID"::int



Below is a sample log table generated in Snowflake


Results after running the above SQL query
Steps for setting up the Email Alerting in the same pipeline in Matillion

1. Check if any errors are logged


There might be a possibility that no new errors are logged(in that case you need to send a success email). So, for this you need to import the Query Result to Scalar component. This component will be used to store count of errors in the log table, so that success and failure emails can be sent separately.



Now import the ‘If’ component to branch the pipeline into two parts based on whether the count of errors is less than 1 or not.


2. Use Send email component


If there are no errors then the send email component can directly send a successful run email based on the configuration. After sending a successful run email, import SQL query component to set the NOTIFICATION_SENT status to ‘YES’, so that email regarding those logs will not be sent again.

Now, if there are errors present in the log then you can configure another send email component with the error message you want and send it to the required users. After sending a failure email, import the SQL query component to set the NOTIFICATION_SENT status to ‘YES’, so that email regarding those logs will not be sent again.


3. Import Child job in Parent


Import this child orchestration job in the parent orchestration job and run the parent job in the correct environment to get the successful(If there are no errors logged and logs are present for which notification has not been sent) and failure emails(If there are new errors logged for which notification has not been sent).

Below are the sample emails in each of the cases:

  • Successful Run email



  • Failure Run email



185 views0 comments

Recent Posts

See All