top of page

Bank Churn Analysis Using Machine Learning

Updated: Mar 9

Author: Anmol Gaba


Objective:

To predict Customer churn on a quarterly basis using Machine Learning models and take necessary measures for customer retention.


Implementation of this analysis is done using DATAIKU, which provides us the features like data preparation, visualization, MLOPs, Analytics, etc.

Integration of DATAIKU is easy with SNOWFLAKE with the help of a partner connect.


Prerequisite:

DATAIKU account SNOWFLAKE account

Steps:

1) Open SNOWFLAKES and click on partner connect and search the DATAIKU and click on the icon.



Once opened, click on the launch, and this will open a DATAIKU DSS launchpad from where we can launch the services, manage the users, roles, groups, and plugins and also manage the subscriptions, etc.



Once turning on the services, it will take some time to load the DSS, and once done, you can create, import and check the old projects and tutorials.


2) We have created a new project in our use case, and once project creation is done, we have to create the dataset and recipes.



Importing the dataset and applying recipes will create the dataflow and then later perform operations like visualizations, AUTOML, etc.


3) In customer Churn Analysis


We have created the below data flow, which consists of the dataset, recipes, SQL scripts, ML model, etc., to find the predictions using multiple features present in our dataset.



In the above diagram, we have taken the dataset from the Customer table, which consists of customer-specific data, from the services table, which consists of service-related data like a Credit card, UPI, Demat account, etc. and from the transactions table, which consists of customer transaction-related data like amount, mode, transaction ID, account ID, etc.




Apart from customer recipes, we also have some inbuilt recipes which just need to drag and drop in order to use in a data flow diagram. Recipes like group, finding distinct, join, merge, and split, along with SQL, python language code recipes, are present in DATAIKU.





4) After importing the dataset, operations like data cleansing and data preprocessing are done on the dataset.


Once all the 3 data sources were ready, we joined them using inner join and created the new dataset, which is now ready to apply Machine Learning algorithms.



The above data set, which consists of around 17 features, consists of services that customers are using, age, balance, gender, qualification, inactivity, etc., which works as independent features while CHURN acts as a dependent feature.


5) After feature engineering, the next step is model training and hyperparameter tuning and optimizations.


In our case, we have trained using classification algorithms like KNN, Random Forests, Decision Tree, and Logistic Regression and then checked the accuracy metrics like accuracy and performance for model selections.



Since each algorithm has many parameters which need to be set before model training, in the case of hyperparameter tuning, we can give multiple values or a range of values. In the below figure, we can give the multiple values of (k), i.e., nearest neighbor; while model training and evaluation, the best parameters with higher accuracy will be selected automatically, and once this is done, we will publish the model in the data flow for further analysis.



6) After publishing the model, we are going to predict the customer churn for those customers who are still using our services.


In the below image, you can see here the customer churn is NO. This signifies that our customer is happy with the services the bank is offering.



Conclusion:

Analysis of customer churn data has been completed. The Bank representative team will analyze the results and try for customer retention by providing them with more offers, services, benefits, and discounted vouchers so that they can retain their customers. As creating new customers is more expensive than retaining existing customers.


References:
  1. https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning

  2. https://blog.ineuron.ai/Random-forest-r7gFle7V8L

  3. https://blog.ineuron.ai/All-About-Decision-Tree-from-Scratch-with-Python-Implementation-JDh9qypLPl

  4. https://www.analyticsvidhya.com/blog/2021/10/building-an-end-to-end-logistic-regression-model/

  5. https://medium.datadriveninvestor.com/bank-churn-prediction-using-popular-classification-algorithms-143d72dfc70b

  6. https://www.kaggle.com/code/kmalit/bank-customer-churn-prediction/data

435 views0 comments

Recent Posts

See All
bottom of page