Dataset Specification

The Customer Churn Analyzer relies on a structured dataset containing customer demographics, usage metrics, and account details. This data is used to train the Logistic Regression model to identify patterns associated with customer attrition.

Data Source and Profile

Filename: churn_sample.csv
Total Records: 1,002 customers
Total Features: 8 predictive features + 1 unique identifier + 1 target variable
Missing Values: None (Cleaned)

Feature Schema

The following table describes the variables used during model training and inference. When providing data for a Prediction Example, ensure the inputs align with these definitions.

Target Variable

Data Preprocessing Requirements

To prepare the raw churn_sample.csv for the machine learning pipeline, the following transformations are applied:

Feature Exclusion: The CustomerID column is dropped during preprocessing as it contains unique identifiers that do not contribute to predictive power.
Label Encoding: Categorical variables (Contract, PaymentMethod, AutoPay, and Churn) are converted into numerical formats using Scikit-Learn's LabelEncoder.
- Note: The le_dict object stores the encoders to allow for inverse transformation of prediction results.
Train-Test Split: The data is partitioned into a training set (70%) and a testing set (30%) to ensure unbiased evaluation of model performance.

Sample Data Format (JSON Representation)

For programmatic access or API integration, the input data structure should mirror the following:

{
  "Contract": "Monthly",
  "SupportCalls": 4,
  "MonthlyBill": 110.0,
  "PaymentMethod": "CreditCard",
  "BillingIssues": 0,
  "DataUsageGB": 85.0,
  "TenureMonths": 12,
  "AutoPay": 1
}