How to Predict Which Customers Will Buy Again
Framing the Problem
The first decision is your prediction window. "Will this customer buy again" is too vague for a model. "Will this customer make another purchase within 30 days" is specific and trainable. Choose a window that matches your business cycle:
- E-commerce with consumables: 30-60 days (customers reorder supplies regularly)
- Retail with seasonal products: 90 days (captures seasonal purchase patterns)
- SaaS or subscription renewals: The renewal window (30 days before renewal date)
- High-value infrequent purchases: 6-12 months (furniture, electronics, professional services)
Once you choose the window, your target column becomes binary: did this customer make another purchase within X days (yes/no). This turns the problem into a straightforward classification task.
Features That Predict Repeat Purchases
The strongest predictors of repeat purchase behavior usually include:
- Recency: Days since last purchase. Customers who bought recently are more likely to buy again.
- Frequency: Total number of past purchases. Previous repeat buyers are the most likely future repeat buyers.
- Monetary value: Total amount spent and average order value. Higher spenders tend to be more engaged.
- First purchase details: What they bought first, how much they spent, which channel they came from. The first purchase often predicts long-term behavior.
- Product category: Consumable products drive more repeat purchases than durable goods.
- Engagement metrics: Email opens, site visits, app logins since the last purchase.
- Satisfaction signals: Reviews left, support tickets filed, returns made.
- Time-based patterns: Day of week and month of previous purchases, average days between purchases.
Building the Model
Select customers whose first purchase was at least X days ago (where X is your prediction window), so you know the outcome for everyone. For a 30-day window, only include customers whose last purchase was at least 30 days ago. This ensures the model trains on complete data, not customers still within their window.
For each customer, calculate the features above as of their last purchase date (not as of today). This prevents data leakage. The model should only see information that was available at the time you would make the prediction. Add a "bought_again" column (yes/no) based on whether they purchased within the window after their last included purchase.
Upload to the Data Aggregator app. Logistic regression gives clean probability scores. Random forest handles complex feature interactions. Try both and compare accuracy. See How to Test Model Accuracy.
Calculate the same features for your current active customers as of today. Send them through the model to get repeat purchase probabilities. Sort by score to see who is most and least likely to come back. Scoring costs zero credits per customer.
Acting on Predictions
High Probability Customers (70%+)
These customers are already likely to come back. Do not waste marketing budget on heavy discounts for them. Instead, focus on increasing their order value. Send personalized product recommendations, cross-sell complementary items, or offer early access to new products. A well-timed "you might also like" email or SMS can increase average order value without discounting.
Medium Probability Customers (30-70%)
These are your best marketing targets because a nudge can tip them toward purchasing. Send a targeted drip campaign with relevant content, limited-time offers, or reminders about items they viewed. Time your outreach based on their typical purchase interval. If they usually buy every 45 days, reach out around day 35.
Low Probability Customers (below 30%)
These customers are unlikely to return on their own. Test a win-back offer (deeper discount, free shipping, loyalty bonus) on a small segment to see if the economics make sense. If the cost of the incentive exceeds the expected profit from a repeat purchase, save your budget for the medium-probability group.
Predict which customers will buy again and focus your marketing where it matters most. No coding or data science required.
Get Started Free