How to Predict Employee Turnover
Why Predicting Turnover Matters
Replacing an employee costs between 50% and 200% of their annual salary when you account for recruiting, onboarding, lost productivity, and the learning curve for a replacement. Most companies only find out someone is leaving when they hand in their notice, which is too late to intervene. A turnover prediction model shifts this from reactive to proactive by flagging at-risk employees weeks or months before they start looking.
The practical value is straightforward: if you know which employees are likely to leave, you can have a conversation, adjust workload, offer a development opportunity, or address a compensation gap before the decision is made. Even catching a few key departures per quarter can save tens of thousands of dollars.
What Data You Need
A turnover model needs historical records of employees who have already left and employees who stayed. Each row represents one employee, and you need columns that describe their work situation before the outcome was known.
Useful features for turnover prediction include:
- Tenure: Months or years at the company
- Department and role: Which team or function they work in
- Compensation: Salary band, time since last raise, comparison to market rate
- Performance: Most recent review score, number of promotions, time since last promotion
- Workload: Average hours per week, overtime frequency, project count
- Manager: Manager ID or team lead (some managers have higher turnover rates than others)
- Commute or location: Remote vs in-office, distance to office if applicable
- Training: Number of training sessions attended, development opportunities offered
- Engagement signals: Survey scores, participation in optional activities, PTO usage patterns
- Target column: A "left" column with values like "yes" or "no" (or 1/0)
You do not need every one of these features. Start with whatever your HR system can export. Even five or six solid features can produce a useful model. The most predictive features in most organizations are tenure, time since last promotion, compensation relative to peers, and manager. See How to Prepare Your Data for Machine Learning for detailed formatting guidance.
How to Build the Model
Pull records for employees who left in the past two to three years and a matching set of employees who stayed during the same period. You need at least 200 total records, with at least 50 in the "left" category. If your company is small, include as much history as you have. Exclude layoffs and restructuring since those are not voluntary turnover.
Save the data as a CSV and upload it through the Data Aggregator app. Select the "left" column as your target variable and all other columns as input features. Remove any columns that leak the outcome, such as "exit interview date" or "resignation reason," since those would not be available for current employees.
Random forest works well for turnover prediction because it handles both numeric features (salary, tenure) and categorical features (department, role) without extensive preprocessing. It also tells you which features matter most, which is valuable for understanding why people leave. Gradient boosting is another strong option if you want to experiment with accuracy improvements.
After training, check the model's recall (what percentage of actual departures did it predict) and precision (when it flags someone as at-risk, how often are they actually at risk). For HR use, recall matters more, it is better to flag a few false positives than to miss real departures. Also review which features the model weighted most heavily. If "manager" ranks high, that tells you something important about your organization. See How to Test Model Accuracy.
Export the same features for your active employees (without the "left" column) and run them through the trained model. Each employee gets a turnover probability score between 0 and 1. Sort by score to create a prioritized watch list. This scoring step costs zero credits per employee, so you can rescore weekly or monthly as data changes.
What to Do With Turnover Predictions
Predictions are only valuable if they drive action. Here are practical ways to use turnover scores:
- Retention conversations: Have managers check in with high-risk employees. A casual "How are things going, is there anything you need?" is often enough to surface issues before they become deal-breakers.
- Compensation reviews: Cross-reference high-risk scores with time since last raise. If someone is flagged as at-risk and has not had a compensation adjustment in 18 months, that is a clear action item.
- Development opportunities: Offer training, stretch assignments, or mentorship to high-risk employees who seem disengaged. Sometimes the problem is not money but growth.
- Succession planning: For high-risk employees in critical roles, start cross-training and knowledge transfer proactively instead of scrambling after a resignation.
- Department analysis: If one department shows consistently higher turnover risk, investigate the common factors. It might be a management issue, workload problem, or compensation gap.
Keeping the Model Current
Workforce dynamics change over time. Market conditions shift, new competitors enter your hiring market, company culture evolves, and new policies take effect. A model trained on 2024 departure patterns may not capture 2026 realities accurately.
Retrain the model every six months, or whenever you notice the predictions becoming less useful. Each retraining cycle incorporates recent departures, which helps the model learn new patterns. If your company goes through a major change like an acquisition, office relocation, or large restructuring, retrain immediately afterward since the old patterns may no longer apply. See How to Retrain Models With New Data.
Example: Mid-Size Company Retention Program
A 400-person software company exports three years of employee data including tenure, department, salary band, last promotion date, manager, average weekly hours, and departure status. They have 180 departures and 600 retained employees in the dataset. After training a random forest classifier, the model achieves 72% recall and 58% precision.
They score all 400 current employees. The model flags 65 as elevated risk. HR reviews the top 20 highest-risk employees in critical roles and initiates targeted retention efforts, including four compensation adjustments, three role changes, and six development plans. Over the following quarter, voluntary turnover drops by 30% compared to the same quarter the prior year. The training cost was under $3 in platform credits.
Identify flight risk across your workforce before you lose your best people. Train a turnover prediction model on your own HR data today.
Get Started Free