How to Run Predictions at Zero Per-Request Cost
How the Pricing Model Works
Most AI services charge per API call. Send a question to GPT or Claude and you pay for the tokens. This makes sense for language models that do heavy computation on every request. Machine learning predictions are different. Once a model is trained, running a prediction is just math, multiplying input values by learned weights and producing an output. It takes microseconds and negligible compute resources.
The AI Apps API pricing reflects this reality. Training the model costs credits because it requires real compute time, processing your dataset, fitting the algorithm, and running validation. After that, the trained model sits ready to answer queries. Each prediction is a lightweight calculation that runs locally, so there is no per-request charge.
What This Means for Your Business
Zero per-request cost changes which problems are worth solving with ML. Consider these scenarios:
- Lead scoring at scale: If you generate 10,000 leads per month and each prediction cost 1 credit, that is 10,000 credits just for scoring. With zero-cost predictions, you score every lead automatically regardless of volume.
- Real-time fraud detection: E-commerce sites might process thousands of transactions daily. Charging per prediction would make ML fraud detection prohibitively expensive. With free predictions, you can check every transaction.
- Customer segmentation: Re-segment your entire customer base whenever you want, not just when you can justify the cost. Run clustering on 100,000 customers as often as your data updates.
- Embedded predictions: Build predictions into custom apps or automated workflows that run continuously without accumulating prediction costs.
Comparing Costs: ML Predictions vs AI Chat Models
It is worth understanding the difference between ML predictions and AI model queries, because they solve different problems at very different cost structures.
An AI chat model (GPT, Claude) processes natural language. Every request involves reading your prompt, reasoning about it, and generating a response. This costs tokens, typically 2-15 credits per interaction on our platform. Chat models are powerful for open-ended questions, content generation, and conversational AI.
An ML prediction model processes structured data. You give it numbers and categories, it returns a prediction. After training, each prediction is free. ML models are ideal for scored decisions: will this customer churn, how much will this item sell for, is this transaction fraudulent.
If you are asking the same type of question repeatedly on structured data, ML predictions are dramatically cheaper than routing each question through a language model. If you need to handle unique, unstructured questions, you need a chat model instead.
Your Only Cost: Training and Retraining
The training cost depends on your dataset size and the algorithm you choose. For most business datasets (thousands to tens of thousands of rows), training costs a few credits. More complex algorithms or very large datasets cost more, but the cost is still a one-time expense per model version.
When you retrain the model with new data, you pay the training cost again. If you retrain monthly, your total ML cost is just the monthly training fee, regardless of whether you run 100 predictions or 100,000 predictions between retrains.
How to Run Predictions
After training a model, you run predictions by uploading a CSV file with the same input columns used in training (minus the target column). The platform returns a new file with prediction values appended. You can also run single predictions through the API if you want to embed ML scoring into a workflow or custom app.
For step-by-step training instructions, see How to Train a Machine Learning Model Without Code. To choose the right algorithm for your prediction task, see the algorithm selection guide.
Train once, predict forever. Zero per-request cost for all ML predictions.
Get Started Free