CLI Execution Guide

The customer_churn_analyzer.py script provides a standalone interface for executing the end-to-end machine learning pipeline. This script automates data preprocessing, exploratory analysis, model training, and performance evaluation.

Prerequisites

Before running the script, ensure your environment meets the following requirements:

Python Version: Python 3.8 or higher.

Dependencies: Install required libraries via pip:

pip install pandas scikit-learn matplotlib seaborn numpy

Environment Configuration

The script is currently configured to look for the source dataset in a specific directory structure (optimized for Google Colab). To run this locally, you must ensure the dataset is accessible.

Locate the Dataset: Ensure churn_sample.csv is in your project directory.

Update Path: Open customer_churn_analyzer.py and modify line 23 to point to your local file:

# From:
df = pd.read_csv('/content/drive/MyDrive/churn_sample.csv')

# To:
df = pd.read_csv('churn_sample.csv')

Execution Steps

To run the full analysis pipeline, execute the script from your terminal:

python customer_churn_analyzer.py

Script Workflow

When executed, the CLI tool performs the following operations sequentially:

Data Ingestion & Cleaning: Loads the CSV, drops non-predictive identifiers (e.g., CustomerID), and applies LabelEncoding to categorical features.
Exploratory Data Analysis (EDA): Generates and displays statistical visualizations, including a Correlation Heatmap and Monthly Bill distribution plots.
- Note: These plots will open in a new window. You must close the window to resume script execution.
Model Training: Splits the data (70/30) and trains a Logistic Regression classifier.
Evaluation: Outputs the Confusion Matrix, F1 Score, and a detailed Classification Report directly to the terminal.
Feature Influence: Displays a horizontal bar chart showing which variables (like SupportCalls or Contract) most heavily impact the churn prediction.
Inference Test: Runs a hardcoded "New Customer" test case and prints the predicted churn status.

Interpreting Terminal Output

The script provides real-time feedback. Look for the following key sections in your terminal:

Model Performance:

Confusion Matrix:
 [[150  25]
  [ 30 145]]

Classification Report:
              precision    recall  f1-score   support
           0       0.83      0.86      0.84       175
           1       0.85      0.83      0.84       175

Custom Prediction:

Predicted Churn: Yes

Troubleshooting

File Not Found: If the script fails immediately, verify that the path in pd.read_csv() matches your local filename.
Plotting Errors: If running in a headless environment (like a remote server without a GUI), the script may error during plt.show(). In such cases, comment out the plt.show() lines or use a non-interactive backend for Matplotlib.