Home » Training AI » PDFs and Files

How to Train AI on PDFs and Text Files

You can train your AI chatbot on PDF and text files by uploading them directly through the admin panel. The system extracts the text from your files, splits it into chunks, and creates searchable embeddings automatically. PDFs with selectable text work best. Text files (TXT and DOCX) are processed as-is. The entire process takes seconds per file.

Supported File Types

The platform accepts three file formats for knowledge base uploads:

Preparing Your PDFs

Not all PDFs are created equal when it comes to AI training. The key distinction is whether the PDF contains actual text or just images of text.

Text-Based PDFs (ready to use)

If you can select and copy text from your PDF using a standard PDF reader, it contains real text and is ready to upload. This includes most digitally-created documents: files exported from Word, Google Docs, or other document editors, as well as reports generated by business software.

Scanned PDFs (need OCR first)

If you cannot select text in the PDF (the entire page appears as one image), the file is a scan. The AI cannot read image-based content. You need to run the file through OCR (optical character recognition) software first, which converts the scanned images into selectable text. Most modern scanning apps include OCR, and free tools like Adobe Acrobat Reader can add OCR to existing scans.

Tips for Better PDF Processing

Preparing Text Files

Plain text files need minimal preparation. A few things to check:

Upload Process

Step 1: Open your chatbot's knowledge base.
In the AI Chatbot app, select the chatbot you want to train, then navigate to the knowledge base or embeddings section.
Step 2: Click the file upload button.
Select your PDF, TXT, or DOCX file from your computer. The file starts uploading immediately.
Step 3: Wait for processing.
The system extracts text from the file, chunks it into pieces of 250 to 2,000 characters, and generates embeddings for each chunk. A 10-page PDF typically processes in 10 to 20 seconds.
Step 4: Verify the results.
Check the newly created chunks in the knowledge base list. Open a few to confirm the text was extracted correctly. Test the chatbot with questions the document should answer.

How Many Files Can You Upload

There is no limit on the number of files you can upload per chatbot. Each file is processed independently and its chunks are added to the knowledge base. You can upload files one at a time or in batches. Common setups include:

Cost reference: A standard 10-page PDF produces approximately 30 to 60 chunks, costing 90 to 180 credits ($0.09 to $0.18) to process. A single-page text file produces 3 to 8 chunks, costing 9 to 24 credits (under $0.03).

Updating Files

When a document changes (new product version, updated pricing, revised policy), you need to delete the old embeddings and upload the new file. The chatbot immediately stops referencing the old content and starts using the new version. There is no concept of "replacing" a file in place; you delete and re-upload.

Tag your uploads by topic or document name so you can easily find and delete the right chunks when updates are needed. See How to Delete or Update Specific Training Data for detailed instructions.

Upload your PDFs and documents to give your chatbot expert knowledge of your business. Processing takes seconds.

Get Started Free