How to Upload Training Data From CSV or S3
Uploading a CSV File Directly
Make sure your file has column headers in the first row. Each subsequent row is one data point. If you are training a supervised model (classifier or regressor), include the target column with known outcomes. Save the file with a .csv extension using UTF-8 encoding.
Log into your admin panel and navigate to the Data Aggregator app. Click on the dataset or model you want to train, or create a new one.
Use the file upload form to select your CSV. The platform reads the file, displays the detected columns and their data types, and shows a preview of the first several rows so you can verify the data looks correct.
Confirm the column types are correct (numeric columns detected as numeric, categorical as categorical). If a column is misdetected, you can adjust it. Then proceed to select an algorithm and train.
Uploading From Amazon S3
If your data lives in an S3 bucket (common for businesses that already use AWS), you can connect the Data Aggregator directly to S3 instead of downloading and re-uploading the file manually.
Upload your prepared CSV to any S3 bucket your AWS account has access to. Note the bucket name and file path (key).
In the Data Aggregator upload form, select the S3 source option and enter your bucket name, file key, and AWS credentials (access key and secret key). The platform uses these to read the file directly from S3.
The platform downloads the CSV from S3, parses it the same way as a direct upload, and presents the column preview for your verification.
S3 upload is especially useful for automated workflows. If your data pipeline already writes processed data to S3 on a schedule, you can point the Data Aggregator at that same file and retrain on the latest version without any manual steps.
CSV Formatting Requirements
The platform is flexible about CSV formatting, but following these guidelines prevents import errors:
- Headers required: The first row must contain column names. Use simple alphanumeric names without special characters. "customer_age" and "total_spend" are good. "Customer's Age ($)" will work but cleaner names are easier to work with.
- Consistent column count: Every row must have the same number of columns. A row with a missing comma breaks the parser.
- Encoding: Use UTF-8. If you export from Excel, choose "CSV UTF-8" as the save format. Other encodings can corrupt special characters.
- Quoting: If any values contain commas (like "Smith, John"), they should be wrapped in double quotes. Most spreadsheet programs handle this automatically on export.
- No trailing rows: Remove any summary rows, total rows, or blank rows at the bottom. These get treated as data points and can confuse the model.
- Missing values: Empty cells are acceptable. The platform handles them based on the algorithm's requirements. However, too many missing values in a column reduce its usefulness. If more than 30% of a column's values are missing, consider dropping that column or filling in defaults before uploading.
Data Size Limits and Performance
Direct CSV uploads support files up to several hundred megabytes. For most business datasets (thousands to hundreds of thousands of rows), uploads take a few seconds. Very large files (millions of rows) upload in under a minute.
If your dataset is too large for a single CSV upload, split it into multiple files and upload them sequentially using incremental training, or use the S3 method which handles larger files more efficiently.
Exporting Data From Common Sources
From a Spreadsheet
Google Sheets: File > Download > Comma-separated values. Excel: File > Save As > CSV UTF-8. Both produce clean CSV files ready for upload.
From a Database
If your data lives in MySQL or PostgreSQL, run a SELECT query for the data you want and export the results as CSV. Most database clients (DBeaver, pgAdmin, MySQL Workbench) have a one-click CSV export for query results.
From a SaaS Tool
Most business SaaS tools (CRM, email marketing, analytics, e-commerce) have a data export feature that produces CSV files. Look for "Export," "Download," or "Reports" in the tool's settings. Export all the fields you think might be relevant, since you can always drop columns later but cannot add fields you did not export.
Upload your data and train a model in minutes. CSV from any source, or connect directly to S3.
Get Started Free