How to Train AI on Your Website Content
Why Use Your Website as Training Data
Your website already contains the information your customers are looking for. Product pages, service descriptions, FAQ sections, about pages, pricing pages, and blog posts are all written specifically to answer customer questions. Using this existing content means you do not have to write new training material from scratch.
Website-based training is especially effective because the content is already written for your audience. It uses the same language and terminology your customers expect. The chatbot learns to answer questions using the same tone and detail level your website provides, creating a consistent experience between browsing your site and chatting with your bot.
Method 1: Website Crawling
The crawling approach is best when you have many pages to process. Instead of copying content page by page, you provide a URL and the system visits the page, extracts the text, and processes it automatically.
Make a list of the important pages on your website. Focus on pages with substantive content: product pages, service descriptions, FAQ, pricing, about, and any knowledge base articles. Skip pages that are mostly images, forms, or navigation.
In the AI Chatbot app, open the knowledge base section for your chatbot. Use the website crawl option and enter the URL of the page you want to crawl. The system will fetch the page, extract the text content, and process it into embeddings.
After crawling, check the chunks that were created. Make sure the extracted text is clean and relevant. Sometimes headers, footers, or sidebar content gets included. You can delete individual chunks that contain irrelevant content.
Crawl each important page on your site. For a detailed walkthrough of the full crawling process, see How to Crawl and Index a Website for AI Training.
Method 2: Copy and Paste
For smaller websites or when you want precise control over what content is included, copying and pasting text directly is simpler and more reliable. Select the text content from a web page (skip navigation, headers, and footers), paste it into the text input area in the knowledge base section, and submit.
This method is better when your pages have complex layouts, lots of images, or dynamic content that a crawler might not extract cleanly. It also gives you the opportunity to edit the content before uploading, removing anything that would not be useful as chatbot knowledge.
Which Pages to Prioritize
Not every page on your website is equally valuable for chatbot training. Prioritize in this order:
- FAQ and help pages: These directly match the question-answer format that chatbots handle best
- Product and service pages: Customers will ask about what you sell, so detailed product information is essential
- Pricing pages: One of the most common customer questions is "how much does it cost"
- Policy pages: Returns, shipping, warranties, terms of service
- About and contact pages: Business hours, locations, team information
- Blog posts with evergreen content: Tutorials, guides, and educational content that stays relevant
Skip pages that are primarily promotional (landing pages with little information), time-sensitive (event announcements), or redundant (pages that repeat content from other pages).
Keeping Website Training Data Current
Your website changes over time, and your chatbot's knowledge should change with it. When you update product information, change pricing, or revise policies on your website, you need to update the corresponding training data. Delete the old embeddings for the changed content and re-crawl or re-paste the updated pages.
A good practice is to schedule a monthly review of your chatbot's knowledge base against your current website content. See How to Keep Your AI Training Data Up to Date for a maintenance routine.
Turn your existing website into a chatbot knowledge base. Crawl your pages and have a trained chatbot answering questions in minutes.
Get Started Free