Home » Training AI » PDFs and Files

How to Train AI on PDFs and Text Files

You can train your AI chatbot on PDF and text files by uploading them directly through the admin panel. The system extracts the text from your files, splits it into chunks, and creates searchable embeddings automatically. PDFs with selectable text work best. Text files (TXT and DOCX) are processed as-is. The entire process takes seconds per file.

Supported File Types

The platform accepts three file formats for knowledge base uploads:

PDF (.pdf): The most common format. Works with any PDF that contains selectable text. Product manuals, datasheets, reports, whitepapers, and policy documents all work well.
Plain text (.txt): Simple text files with no formatting. Good for pasted content, exported notes, or raw text data.
Word documents (.docx): Microsoft Word format. The system extracts the text content and ignores formatting, images, and tables (table text is preserved but layout structure is lost).

Preparing Your PDFs

Not all PDFs are created equal when it comes to AI training. The key distinction is whether the PDF contains actual text or just images of text.

Text-Based PDFs (ready to use)

If you can select and copy text from your PDF using a standard PDF reader, it contains real text and is ready to upload. This includes most digitally-created documents: files exported from Word, Google Docs, or other document editors, as well as reports generated by business software.

Scanned PDFs (need OCR first)

If you cannot select text in the PDF (the entire page appears as one image), the file is a scan. The AI cannot read image-based content. You need to run the file through OCR (optical character recognition) software first, which converts the scanned images into selectable text. Most modern scanning apps include OCR, and free tools like Adobe Acrobat Reader can add OCR to existing scans.

Tips for Better PDF Processing

Remove password protection before uploading. The system cannot open encrypted PDFs.
If your PDF has a table of contents, headers, and footers on every page, those will be extracted and embedded. This is usually harmless but adds a few extra chunks with low-value content.
Multi-column layouts extract correctly in most cases, but if the extracted text appears jumbled, paste the content manually instead.
Very large PDFs (100+ pages) work fine but produce many chunks. Consider splitting by chapter or section for better organization.

Preparing Text Files

Plain text files need minimal preparation. A few things to check:

Encoding: UTF-8 is best. Files with unusual encoding may produce garbled text.
Structure: Add blank lines between topics or sections. This helps the chunking algorithm split the text at natural boundaries rather than mid-paragraph.
Length: There is no maximum file size, but very long files (hundreds of pages worth of text) should be split into topical sections for better search accuracy.

Upload Process

Step 1: Open your chatbot's knowledge base.
In the AI Chatbot app, select the chatbot you want to train, then navigate to the knowledge base or embeddings section.

Step 2: Click the file upload button.
Select your PDF, TXT, or DOCX file from your computer. The file starts uploading immediately.

Step 3: Wait for processing.
The system extracts text from the file, chunks it into pieces of 250 to 2,000 characters, and generates embeddings for each chunk. A 10-page PDF typically processes in 10 to 20 seconds.

Step 4: Verify the results.
Check the newly created chunks in the knowledge base list. Open a few to confirm the text was extracted correctly. Test the chatbot with questions the document should answer.

How Many Files Can You Upload

There is no limit on the number of files you can upload per chatbot. Each file is processed independently and its chunks are added to the knowledge base. You can upload files one at a time or in batches. Common setups include:

A small business uploading 3 to 5 files (FAQ, product catalog, policies): 50 to 200 chunks total
A support team uploading 20 to 30 files (product manuals, troubleshooting guides): 500 to 2,000 chunks
A large knowledge base with 100+ files: 5,000+ chunks

Cost reference: A standard 10-page PDF produces approximately 30 to 60 chunks, costing 90 to 180 credits ($0.09 to $0.18) to process. A single-page text file produces 3 to 8 chunks, costing 9 to 24 credits (under $0.03).

Updating Files

When a document changes (new product version, updated pricing, revised policy), you need to delete the old embeddings and upload the new file. The chatbot immediately stops referencing the old content and starts using the new version. There is no concept of "replacing" a file in place; you delete and re-upload.

Tag your uploads by topic or document name so you can easily find and delete the right chunks when updates are needed. See How to Delete or Update Specific Training Data for detailed instructions.

Upload your PDFs and documents to give your chatbot expert knowledge of your business. Processing takes seconds.

Contact Our Team

View the AI Chatbot App