Project Scope
Deliverable Trained Model File/ Feature Importance Chart
Machine Learning Task Clasification
Target Variable Customer Churn Status (True/False)
Control Metric AUROC 0.78

This project showcases a classification workflow for customer churn prediction. The data is from an anonymized real-world data set for e-commerce retailers. `

The goal is to verify two client customer-service hypotheses:

  • Sales decline leads to end of business and churn.
  • Signififcant increases in sales lead to the client securing other financing options and churning.

Two notebooks are presented in which data is cleaned and chronologically ordered data. Sales data is rolled up through extensive feature engineering to capture the trend information. Data visualizations are presented to improve our understanding of the sales trend data and visually examine hypotheses.

A second notebook contains a chi2 test of feature significance, a churn prediction classification and Random Forest feature importance examination. The sales histories have a strong preponderance towards significant sales decline and significant growth over the previous 3 and 6 month periods for our customers. As a result most churn events will fall into 1 of these 2 categories by frequency alone. The worfkflows shown that the trend data is a useful predictor of churn and that significant sales increases are more likely to lead to churn. There is insufficient evidence to statistically verify the customer-service hypothesis.

You can find the code for this project at: github repository