top of page

DBT Testing And CI/CD

Updated: Mar 15

Author: Amulya Nidhi


Overview Of DBT

Data Build Tool (DBT) is known for its good performance in transforming data from warehouses.

It is not responsible for the extract and load process in ELT.


We can perform testing of data as well as CI/CD in DBT.


In our Project, we have performed Testing and CI/CD in DBT.


Use of DBT

A DBT project is a directory that contains .sql and .yml files. Models are the .sql files written in select statements. A project file dbt_project.yml defines our DBT Project. By default, the model is built as a view in the database. We can build the model as a table in the database.


Getting Started with DBT


At first, we created a DBT account and connected it with a bitbucket repository to save all the codes.

The data warehouse that we have used for database connection is Snowflake.


The transformed data taken from the Snowflake database is db_dev.


We have built the models of each sector and deployed them in the DBT schema of production_env database.




Creation of Models

The multiple models created in our project are shown below.



In our project, the models that we have made reference to the stock names.


The above will show four models created for two sectors.


One is the Transport_Logistics Sector, in which we have created three models of stocks named accuracy, jitfinfra, total, respectively.


Another is the Banking Sector, in which we have created 1 model of stock named aubank.


Code snippet for .sql file (i.e., for accuracy model)


select

DATE ,

OPEN ,

HIGH ,

LOW ,

CLOSE ,

ADJCLOSE ,

VOLUME

from {{ source('transport_sector', 'ACCURACY_TRANSFORMATION') }}


Code snippet for .sql file (i.e. for jitfinfra model)


select

DATE ,

OPEN ,

HIGH ,

LOW ,

CLOSE ,

ADJCLOSE ,

VOLUME

from {{ source('transport_sector', 'JITFINFRA_TRANSFORMATION') }}


Code snippet for .sql file (i.e. for total model)


select

DATE ,

OPEN ,

HIGH ,

LOW ,

CLOSE ,

ADJCLOSE ,

VOLUME

from {{ source('transport_sector', 'TOTAL_TRANSFORMATION') }}


Code snippet for .sql file (i.e for aubank model)


select

DATE ,

OPEN ,

HIGH ,

LOW ,

CLOSE ,

ADJCLOSE ,

VOLUME

from {{ source('banking_sector', 'AUBANK_TRANSFORMATION') }}


The models inside dbt are run with the command dbt run.



We can confirm the objects in the Snowflake.


Go to the classic console of Snowflake web UI. Refresh the database objects. We have PRODUCTION_ENV database; under that, in DBT schema, we can see all the tables.



We have created a source.yml file for the models created.


Code snippet for the .yml file

version: 2


sources:

- name: transport_sector

database: db_dev

schema: TRANSPORT_LOGISTICS

tables:

- name: ACCURACY_TRANSFORMATION

- name: JITFINFRA_TRANSFORMATION

- name: TOTAL_TRANSFORMATION

- name: banking_sector

database: db_dev

schema: BANKING

tables:

- name: AUBANK_TRANSFORMATION


Testing

Generic Test


Generic Tests are run on the specific columns in a model. In our project, we have performed Generic Test on the date column. The two built-in tests that we performed on the date column are unique and not_null tests.


The unique test shows that the values stored in the date column are distinct. The Not_null test shows that the values stored in the date column are not null values.


Write dbt test in the command line to test all the models.



We can test for one particular model too (let's take for the accuracy model) typing dbt test –select accuracy will show all the testing performed for the model named accuracy.


Code snippet for the test.yml file


version: 2

models:

- name: accuracy

columns:

- name: DATE

tests:

- unique

- not_null

- name: jitfinfra

columns:

- name: DATE

tests:

- unique

- not_null

- name: total

columns:

- name: DATE

tests:

- unique

- not_null

- name: aubank

columns:

- name: DATE

tests:

- unique

- not_null


Specific Test

Specific Tests are run on the entire model. Therefore, we have run the specific test on the entire stocks named accuracy, jitfinfra, total, and aubank, etc. We have checked whether the parameters open, close, high, and low is not negative for the models.


Run the command dbt test –select open_check in the terminal to test whether the open column in our model is not negative.



Code snippet for the open_check.sql file


with open_value as (

select * from {{ref('accuracy')}}

union

select * from {{ref('jitfinfra')}}

union

select * from {{ref('total')}}

union

select * from {{ref('aubank')}}

union

)

select open

from open_value

where open <0


The same code we have used to check whether the parameters high, close, and low are not in negative value.


Generate Docs

Write dbt docs generate on the command line to generate the documentation.

Click on view docs.


In the doc, we have the details of the SQL code of the model, tests performed on the model, DAG graph, etc.



The above is the lineage graph of aubank model.



After the commit is done in DBT, we created a pull request in the bitbucket repository to merge the code.



Once a request is created, it is sent to the reviewer to receive approval and merge the request.

Setting Up The Production Environment

Click on the hamburger menu and click on environments.



We can see two environments, one for development and testing and the other for deployment.



We have created a prod environment for the modeled schema refresh.



DBT Refresh Scheldule

We have created a Prod data loading schedule in the prod environment and scheduled it every day at 6:00 AM UTC.




Setting Up Slack Notifications For The Job Execution:

We have integrated the slack account with dbt.



After integrating, if we go to the notifications sections, we have the option to choose whichever notifications we want to get displayed on slack.



Slack notifications for the job execution are shown below:



Conclusion

We have covered the creation of models for the stocks of 2 sectors, generic and singular tests are performed on the models. We have also seen how to run and test the entire model as well as for the single model; Continuous Integration and Continuous Deployment are also shown. The generation of documentation for the models is also discussed.


509 views0 comments

Recent Posts

See All
bottom of page