Dataset Specification
Dataset Specification
The Customer Churn Analyzer relies on a structured dataset containing customer demographics, usage metrics, and account details. This data is used to train the Logistic Regression model to identify patterns associated with customer attrition.
Data Source and Profile
- Filename:
churn_sample.csv - Total Records: 1,002 customers
- Total Features: 8 predictive features + 1 unique identifier + 1 target variable
- Missing Values: None (Cleaned)
Feature Schema
The following table describes the variables used during model training and inference. When providing data for a Prediction Example, ensure the inputs align with these definitions.
| Feature Name | Data Type | Description | Values / Range |
| :--- | :--- | :--- | :--- |
| Contract | Categorical | The billing cycle of the customer's account. | Monthly, Yearly, Two-Year |
| SupportCalls | Integer | Total number of customer service interactions. | 0 to 10+ |
| MonthlyBill | Float | The current monthly charge amount (INR). | e.g., ₹110.00 |
| PaymentMethod | Categorical | Primary method used for paying invoices. | CreditCard, Bank Transfer, Electronic Check |
| BillingIssues | Integer | Count of recorded disputes or payment failures. | 0, 1, 2... |
| DataUsageGB | Float | Total data consumed in Gigabytes (GB). | 0 to 500+ GB |
| TenureMonths | Integer | Number of months the customer has been with the provider. | 1 to 72 months |
| AutoPay | Boolean | Whether the customer is enrolled in automatic billing. | 0 (No), 1 (Yes) |
Target Variable
| Variable | Definition | Class Labels |
| :--- | :--- | :--- |
| Churn | Indicates if the customer canceled their subscription. | 0: Retained<br>1: Churned |
Data Preprocessing Requirements
To prepare the raw churn_sample.csv for the machine learning pipeline, the following transformations are applied:
- Feature Exclusion: The
CustomerIDcolumn is dropped during preprocessing as it contains unique identifiers that do not contribute to predictive power. - Label Encoding: Categorical variables (
Contract,PaymentMethod,AutoPay, andChurn) are converted into numerical formats usingScikit-Learn'sLabelEncoder.- Note: The
le_dictobject stores the encoders to allow for inverse transformation of prediction results.
- Note: The
- Train-Test Split: The data is partitioned into a training set (70%) and a testing set (30%) to ensure unbiased evaluation of model performance.
Sample Data Format (JSON Representation)
For programmatic access or API integration, the input data structure should mirror the following:
{
"Contract": "Monthly",
"SupportCalls": 4,
"MonthlyBill": 110.0,
"PaymentMethod": "CreditCard",
"BillingIssues": 0,
"DataUsageGB": 85.0,
"TenureMonths": 12,
"AutoPay": 1
}