CLI Execution Guide
CLI Execution Guide
The customer_churn_analyzer.py script provides a standalone interface for executing the end-to-end machine learning pipeline. This script automates data preprocessing, exploratory analysis, model training, and performance evaluation.
Prerequisites
Before running the script, ensure your environment meets the following requirements:
- Python Version: Python 3.8 or higher.
- Dependencies: Install required libraries via pip:
pip install pandas scikit-learn matplotlib seaborn numpy
Environment Configuration
The script is currently configured to look for the source dataset in a specific directory structure (optimized for Google Colab). To run this locally, you must ensure the dataset is accessible.
- Locate the Dataset: Ensure
churn_sample.csvis in your project directory. - Update Path: Open
customer_churn_analyzer.pyand modify line 23 to point to your local file:# From: df = pd.read_csv('/content/drive/MyDrive/churn_sample.csv') # To: df = pd.read_csv('churn_sample.csv')
Execution Steps
To run the full analysis pipeline, execute the script from your terminal:
python customer_churn_analyzer.py
Script Workflow
When executed, the CLI tool performs the following operations sequentially:
- Data Ingestion & Cleaning: Loads the CSV, drops non-predictive identifiers (e.g.,
CustomerID), and appliesLabelEncodingto categorical features. - Exploratory Data Analysis (EDA): Generates and displays statistical visualizations, including a Correlation Heatmap and Monthly Bill distribution plots.
- Note: These plots will open in a new window. You must close the window to resume script execution.
- Model Training: Splits the data (70/30) and trains a Logistic Regression classifier.
- Evaluation: Outputs the Confusion Matrix, F1 Score, and a detailed Classification Report directly to the terminal.
- Feature Influence: Displays a horizontal bar chart showing which variables (like
SupportCallsorContract) most heavily impact the churn prediction. - Inference Test: Runs a hardcoded "New Customer" test case and prints the predicted churn status.
Interpreting Terminal Output
The script provides real-time feedback. Look for the following key sections in your terminal:
Model Performance:
Confusion Matrix:
[[150 25]
[ 30 145]]
Classification Report:
precision recall f1-score support
0 0.83 0.86 0.84 175
1 0.85 0.83 0.84 175
Custom Prediction:
Predicted Churn: Yes
Troubleshooting
- File Not Found: If the script fails immediately, verify that the path in
pd.read_csv()matches your local filename. - Plotting Errors: If running in a headless environment (like a remote server without a GUI), the script may error during
plt.show(). In such cases, comment out theplt.show()lines or use a non-interactive backend for Matplotlib.