Jupyter Notebook Walkthrough
Notebook Workflow
The Customer Churn Analyzer.ipynb is designed as an end-to-end data science pipeline. Each section is modular, allowing you to move from raw data ingestion to a functional prediction model.
1. Environment Setup and Data Loading
The initial cells handle the import of essential libraries such as pandas, numpy, and matplotlib.
- Data Source: The notebook defaults to loading
churn_sample.csv. If you are running this in Google Colab, ensure your file path matches your Drive mounting point or local directory. - Initial Inspection: It executes
df.info()andvalue_counts()to help you understand the data distribution and identify any immediate data quality issues.
2. Data Preprocessing
Before training, the data undergoes several transformations:
- Feature Dropping: The
CustomerIDcolumn is removed as it contains no predictive value. - Label Encoding: Categorical variables (Contract, PaymentMethod, AutoPay, and Churn) are converted into numerical formats using
LabelEncoder. - Mapping: A dictionary (
le_dict) is maintained to store the encoders, allowing you to reverse-transform the numerical predictions back into human-readable labels (e.g., "Churn" or "No Churn") later.
3. Exploratory Data Analysis (EDA)
This section generates visual insights to identify trends within the dataset:
- Correlation Matrix: A heatmap using
seabornto visualize how features likeMonthlyBillorSupportCallsrelate to one another. - Billing Trends: A combined Box and Swarm plot illustrates the distribution of monthly bills among churned versus retained customers.
- Density Analysis: Kernel Density Estimate (KDE) plots provide a view of where "Churn" clusters are most dense relative to billing amounts.
4. Model Training & Evaluation
The notebook utilizes a Logistic Regression model, serving as a robust baseline for binary classification.
- Data Splitting: The dataset is divided using a 70/30 train-test split to ensure unbiased evaluation.
- Metric Analysis: After training, the notebook outputs a Confusion Matrix and a Classification Report. Pay close attention to the F1 Score, which provides a balanced view of the model's precision and recall.
5. Feature Influence
A horizontal bar chart visualizes the model's coefficients. This allows you to see which factors (like SupportCalls or TenureMonths) have the strongest positive or negative impact on a customer's likelihood to leave.
6. Making Custom Predictions
The final section of the notebook is an interactive inference block. You can manually input customer attributes to see how the model categorizes them.
# Example of testing a custom profile in the final cell:
new_customer = pd.DataFrame({
'Contract': [le_dict['Contract'].transform(['Monthly'])[0]],
'SupportCalls': [6],
'MonthlyBill': [120.5],
'PaymentMethod': [le_dict['PaymentMethod'].transform(['ElectronicCheck'])[0]],
'BillingIssues': [1],
'DataUsageGB': [150.0],
'TenureMonths': [2],
'AutoPay': [le_dict['AutoPay'].transform([0])[0]]
})
pred = model.predict(new_customer)
print("Predicted Churn Status:", le_dict['Churn'].inverse_transform(pred)[0])
To predict for a different customer, simply modify the values in the dictionary above and re-run the cell.