What Is Regression and When Do You Use It
How Regression Differs From Classification
Classification answers multiple-choice questions: is this customer going to churn (yes or no), is this lead hot, warm, or cold. The output is a label from a fixed set of options.
Regression answers open-ended numeric questions: how much will this customer spend next quarter ($347.82), how many orders will we receive on Tuesday (214), how many days until this machine needs maintenance (17). The output is a number on a continuous scale.
The same business problem can often be framed either way. "Will this customer spend more than $500?" is classification. "How much will this customer spend?" is regression. Choose the framing that gives your team the most actionable information.
Available Regression Algorithms
The Data Aggregator app provides several regression algorithms:
- Linear regression: The simplest and fastest option. Assumes a straight-line relationship between input features and the output number. Works well when relationships are roughly linear and you want interpretable results. Good starting point for any regression problem.
- Ridge regression: A version of linear regression that handles correlated features better by adding a penalty for overly large feature weights. Use it when you have many input columns that may be related to each other, like multiple spending metrics or overlapping time-based features.
- Random forest regression: Builds many decision trees that each predict a number, then averages their outputs. Handles non-linear relationships, mixed data types, and noisy data well. More accurate than linear regression for complex patterns. One of the best general-purpose regressors.
- Gradient boosting regression: Builds trees sequentially, with each tree correcting the errors of the previous ones. Often the most accurate option for large datasets with complex patterns. Takes longer to train but delivers the best results when accuracy matters most.
If you are not sure where to start, try random forest regression. It handles most real-world data patterns without requiring you to understand the math behind it. See How to Choose the Right Algorithm for detailed guidance.
Real Business Examples
Revenue Forecasting
Train a regressor on monthly revenue history along with features like marketing spend, seasonal indicators, new customer count, and economic indicators. The model learns how these factors combine to produce revenue numbers. Then forecast future months by plugging in planned marketing spend and expected conditions. See How to Forecast Sales With Machine Learning.
Customer Lifetime Value
Train a regressor on past customer data where you know the total amount each customer spent over their entire relationship. Features include first purchase value, acquisition channel, product category, geographic region, and early engagement metrics. The model predicts how much a new customer will eventually spend, helping you decide how much to invest in acquiring similar customers.
Support Ticket Volume Prediction
Train a regressor on daily ticket counts with features like day of week, month, recent product releases, marketing campaign status, and user growth rate. The model predicts how many tickets you will receive on any given day, letting you staff your support team accordingly. See How to Predict Support Ticket Volume.
Pricing Estimation
Train a regressor on past sales with the sale price as the target and product attributes as features (size, condition, age, brand, location, season). The model estimates what price similar items should be listed at. Useful for real estate, used goods, wholesale pricing, and any market where prices are not fixed.
What Good Regression Data Looks Like
Regression requires numeric target values. Your training data needs a column containing the actual numbers you want to predict (revenue, price, count, time). The other columns should contain features that logically relate to that number.
Common issues to watch for:
- Outliers: A few extreme values can skew the model. One $50,000 order in a dataset where most are $50-$200 will distort predictions. Consider removing or capping extreme outliers before training.
- Missing values: Rows with blank cells in important columns reduce data quality. Fill in missing values with averages, medians, or remove incomplete rows before training.
- Irrelevant features: Columns that have no logical relationship to the target number add noise. Customer ID numbers, random codes, and auto-generated timestamps usually do not help predict business outcomes.
See How to Prepare Your Data for Machine Learning for a complete data preparation guide.
Build regression models that forecast numbers from your own business data. Zero per-request cost for predictions.
Get Started Free