Home » AI Chatbots » Train on Documents

How to Train a Chatbot on Your Own Documents

Training a chatbot on your documents means uploading your business content so the chatbot can reference it when answering questions. You upload files or paste text, the platform splits them into searchable chunks, and the chatbot pulls the most relevant chunks into each response. This is how you make a chatbot that knows your specific products, policies, and procedures instead of relying on general knowledge.

What Types of Documents Work

The system accepts plain text and PDF files. You can also paste text directly into the knowledge base interface without uploading a file. Common sources include:

The content should be written in clear, factual language. The chatbot performs best when the source material directly states the information rather than implying it. If your FAQ says "We ship within 3-5 business days," the chatbot will quote that exactly. If the information is buried in marketing language, the chatbot may struggle to extract the specific detail a visitor needs.

How to Upload Documents

Step 1: Open the knowledge base.
In your admin panel, go to the AI Chatbot app and select your chatbot. Navigate to the Knowledge Base or Embeddings section. This is where all your chatbot's training content is managed.
Step 2: Add content.
You have three options. You can paste text directly into the text area, which is good for short content like FAQ answers or quick reference information. You can upload a PDF or text file for longer documents. Or you can crawl your website to pull in all your published pages automatically.
Step 3: Review the chunks.
After uploading, the platform splits your content into chunks of roughly 500-1500 characters each. Each chunk becomes a separately searchable unit. The system tries to split at natural boundaries like paragraph breaks. You can review the chunks to make sure important information was not split awkwardly across two chunks.
Step 4: Test with real questions.
Go back to your chatbot and ask questions that your uploaded content should answer. Check that the responses are accurate and include the right details. If the chatbot gives incomplete or wrong answers, the training data may need to be reorganized or expanded. See How to Improve Chatbot Accuracy for troubleshooting tips.

How Chunking and Retrieval Work

When you upload a document, the platform does not feed the entire document to the AI model on every question. That would be slow and expensive. Instead, it breaks the content into chunks and converts each chunk into a vector embedding, which is a mathematical representation of the chunk's meaning. These embeddings are stored in a searchable index.

When a visitor asks a question, the system converts that question into an embedding too, then finds the chunks whose embeddings are most similar. Typically the top 3-5 most relevant chunks are included in the prompt sent to the AI model. This process is called Retrieval Augmented Generation (RAG), and it is what allows the chatbot to give specific, sourced answers without hallucinating.

Chunking costs 3 credits per chunk as a one-time cost. A 10-page PDF might produce 20-40 chunks depending on content density, costing 60-120 credits (under $0.12). Once the chunks are created, they persist in your knowledge base until you delete them, and there is no ongoing storage cost.

Best Practices for Training Data

Be Specific and Factual

Write content that states facts directly. Instead of "Our shipping is incredibly fast and reliable," write "Standard shipping takes 3-5 business days. Express shipping takes 1-2 business days. Free shipping is available on orders over $50." The chatbot can only be as specific as its source material.

Cover Common Questions First

Start with the 20-30 questions your customers ask most often. Look at your support inbox, chat history, or FAQ page for patterns. Training the chatbot on these first gives you the biggest immediate impact. You can always add more content later.

Keep Content Current

If your pricing, policies, or product details change, update the knowledge base too. You can delete old chunks and upload new versions at any time. The chatbot will immediately use the updated information. See How to Keep Your AI Training Data Up to Date for a maintenance strategy.

Separate Topics Clearly

If you upload one large document that covers many topics, the chunking may mix related but different information in the same chunk. It is often better to upload several smaller documents, each covering a single topic, so each chunk stays focused. See How to Organize Training Data for Best Results for more on this.

How much content do you need? There is no minimum. Even a single FAQ page with 10 questions and answers can make a useful chatbot. More content generally means better coverage, but quality matters more than quantity. Ten well-written FAQ answers will outperform a hundred pages of vague marketing copy.

Train your chatbot on your own business data and start answering customer questions automatically.

Get Started Free