Author: Amulya Nidhi
Overview Of DBT
Data Build Tool (DBT) is known for its good performance in transforming data from warehouses.
It is not responsible for the extract and load process in ELT.
We can perform testing of data as well as CI/CD in DBT.
In our Project, we have performed Testing and CI/CD in DBT.
Use of DBT
A DBT project is a directory that contains .sql and .yml files. Models are the .sql files written in select statements. A project file dbt_project.yml defines our DBT Project. By default, the model is built as a view in the database. We can build the model as a table in the database.
Getting Started with DBT

At first, we created a DBT account and connected it with a bitbucket repository to save all the codes.
The data warehouse that we have used for database connection is Snowflake.
The transformed data taken from the Snowflake database is db_dev.
We have built the models of each sector and deployed them in the DBT schema of production_env database.


Creation of Models
The multiple models created in our project are shown below.

In our project, the models that we have made reference to the stock names.
The above will show four models created for two sectors.
One is the Transport_Logistics Sector, in which we have created three models of stocks named accuracy, jitfinfra, total, respectively.
Another is the Banking Sector, in which we have created 1 model of stock named aubank.
Code snippet for .sql file (i.e., for accuracy model)
select
DATE ,
OPEN ,
HIGH ,
LOW ,
CLOSE ,
ADJCLOSE ,
VOLUME
from {{ source('transport_sector', 'ACCURACY_TRANSFORMATION') }}
Code snippet for .sql file (i.e. for jitfinfra model)
select
DATE ,
OPEN ,
HIGH ,
LOW ,
CLOSE ,
ADJCLOSE ,
VOLUME
from {{ source('transport_sector', 'JITFINFRA_TRANSFORMATION') }}
Code snippet for .sql file (i.e. for total model)
select
DATE ,
OPEN ,
HIGH ,
LOW ,
CLOSE ,
ADJCLOSE ,
VOLUME
from {{ source('transport_sector', 'TOTAL_TRANSFORMATION') }}
Code snippet for .sql file (i.e for aubank model)
select
DATE ,
OPEN ,
HIGH ,
LOW ,
CLOSE ,
ADJCLOSE ,
VOLUME
from {{ source('banking_sector', 'AUBANK_TRANSFORMATION') }}
The models inside dbt are run with the command dbt run.

We can confirm the objects in the Snowflake.
Go to the classic console of Snowflake web UI. Refresh the database objects. We have PRODUCTION_ENV database; under that, in DBT schema, we can see all the tables.

We have created a source.yml file for the models created.
Code snippet for the .yml file
version: 2
sources:
- name: transport_sector
database: db_dev
schema: TRANSPORT_LOGISTICS
tables:
- name: ACCURACY_TRANSFORMATION
- name: JITFINFRA_TRANSFORMATION
- name: TOTAL_TRANSFORMATION
- name: banking_sector
database: db_dev
schema: BANKING
tables:
- name: AUBANK_TRANSFORMATION
Testing
Generic Test
Generic Tests are run on the specific columns in a model. In our project, we have performed Generic Test on the date column. The two built-in tests that we performed on the date column are unique and not_null tests.
The unique test shows that the values stored in the date column are distinct. The Not_null test shows that the values stored in the date column are not null values.
Write dbt test in the command line to test all the models.

We can test for one particular model too (let's take for the accuracy model) typing dbt test –select accuracy will show all the testing performed for the model named accuracy.
Code snippet for the test.yml file
version: 2
models:
- name: accuracy
columns:
- name: DATE
tests:
- unique
- not_null
- name: jitfinfra
columns:
- name: DATE
tests:
- unique
- not_null
- name: total
columns:
- name: DATE
tests:
- unique
- not_null
- name: aubank
columns:
- name: DATE
tests:
- unique
- not_null
Specific Test
Specific Tests are run on the entire model. Therefore, we have run the specific test on the entire stocks named accuracy, jitfinfra, total, and aubank, etc. We have checked whether the parameters open, close, high, and low is not negative for the models.
Run the command dbt test –select open_check in the terminal to test whether the open column in our model is not negative.

Code snippet for the open_check.sql file
with open_value as (
select * from {{ref('accuracy')}}
union
select * from {{ref('jitfinfra')}}
union
select * from {{ref('total')}}
union
select * from {{ref('aubank')}}
union
)
select open
from open_value
where open <0
The same code we have used to check whether the parameters high, close, and low are not in negative value.
Generate Docs
Write dbt docs generate on the command line to generate the documentation.
Click on view docs.
In the doc, we have the details of the SQL code of the model, tests performed on the model, DAG graph, etc.

The above is the lineage graph of aubank model.

After the commit is done in DBT, we created a pull request in the bitbucket repository to merge the code.


Once a request is created, it is sent to the reviewer to receive approval and merge the request.
Setting Up The Production Environment
Click on the hamburger menu and click on environments.

We can see two environments, one for development and testing and the other for deployment.

We have created a prod environment for the modeled schema refresh.

DBT Refresh Scheldule
We have created a Prod data loading schedule in the prod environment and scheduled it every day at 6:00 AM UTC.


Setting Up Slack Notifications For The Job Execution:
We have integrated the slack account with dbt.

After integrating, if we go to the notifications sections, we have the option to choose whichever notifications we want to get displayed on slack.

Slack notifications for the job execution are shown below:

Conclusion
We have covered the creation of models for the stocks of 2 sectors, generic and singular tests are performed on the models. We have also seen how to run and test the entire model as well as for the single model; Continuous Integration and Continuous Deployment are also shown. The generation of documentation for the models is also discussed.