Logistic Regression Implementation
Logistic Regression Implementation
The Customer Churn Analyzer utilizes Logistic Regression as its foundational machine learning algorithm. This choice serves as an ideal baseline for binary classification tasks, offering high interpretability and efficiency when identifying the probability of customer attrition.
Model Selection & Configuration
The implementation uses the LogisticRegression class from the scikit-learn library. To ensure model convergence given the variety of feature scales (e.g., Monthly Bill vs. Tenure), the iteration limit is increased from the default.
Key Hyperparameters:
max_iter=1000: Ensures the solver has sufficient iterations to find the optimal coefficients for the dataset.random_state=42: Guarantees reproducibility across different runs.
Training Workflow
The model is trained on a supervised learning pipeline where the dataset is partitioned to validate performance on unseen data.
- Feature Selection: The target variable
Churnis isolated from the feature set. - Data Splitting: The data is divided into a 70% training set and a 30% testing set.
- Model Fitting: The algorithm learns the relationship between customer behaviors (like
SupportCallsandContracttype) and the likelihood of churn.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
# Define features and target
X = df.drop('Churn', axis=1)
y = df['Churn']
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Initialize and train the baseline model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
Performance Evaluation
Post-training, the model is evaluated using a standard classification suite to measure its ability to distinguish between retained and churned customers:
- Confusion Matrix: Visualizes true positives vs. false positives.
- F1-Score: Used as the primary metric to balance precision and recall, especially important if churn classes are imbalanced.
- Coefficient Analysis: The model exposes the "weight" of each feature, allowing the business to see which factors (like high monthly bills) most strongly drive a customer to leave.
Usage in Inference
Once trained, the model can predict the churn status of a new customer profile. The input data must undergo the same label encoding as the training set before being passed to the predict() method.
# Predicting churn for a hypothetical customer
# Input features: [Contract, SupportCalls, MonthlyBill, PaymentMethod, BillingIssues, DataUsageGB, TenureMonths, AutoPay]
new_customer = [[0, 4, 110.0, 1, 0, 85.0, 12, 1]]
prediction = model.predict(new_customer)
print("Predicted Churn Status:", "Yes" if prediction[0] == 1 else "No")