Home » Training AI » Website Content

How to Train AI on Your Website Content

You can train your AI chatbot on your existing website content by crawling your pages or by copying and pasting text from key pages. Crawling automatically extracts the text from each page, strips out navigation and layout elements, and processes the content into searchable embeddings. This is the fastest way to give your chatbot comprehensive knowledge about everything already published on your site.

Why Use Your Website as Training Data

Your website already contains the information your customers are looking for. Product pages, service descriptions, FAQ sections, about pages, pricing pages, and blog posts are all written specifically to answer customer questions. Using this existing content means you do not have to write new training material from scratch.

Website-based training is especially effective because the content is already written for your audience. It uses the same language and terminology your customers expect. The chatbot learns to answer questions using the same tone and detail level your website provides, creating a consistent experience between browsing your site and chatting with your bot.

Method 1: Website Crawling

The crawling approach is best when you have many pages to process. Instead of copying content page by page, you provide a URL and the system visits the page, extracts the text, and processes it automatically.

Step 1: Identify the pages to crawl.
Make a list of the important pages on your website. Focus on pages with substantive content: product pages, service descriptions, FAQ, pricing, about, and any knowledge base articles. Skip pages that are mostly images, forms, or navigation.
Step 2: Use the website crawl feature.
In the AI Chatbot app, open the knowledge base section for your chatbot. Use the website crawl option and enter the URL of the page you want to crawl. The system will fetch the page, extract the text content, and process it into embeddings.
Step 3: Review the extracted content.
After crawling, check the chunks that were created. Make sure the extracted text is clean and relevant. Sometimes headers, footers, or sidebar content gets included. You can delete individual chunks that contain irrelevant content.
Step 4: Repeat for additional pages.
Crawl each important page on your site. For a detailed walkthrough of the full crawling process, see How to Crawl and Index a Website for AI Training.

Method 2: Copy and Paste

For smaller websites or when you want precise control over what content is included, copying and pasting text directly is simpler and more reliable. Select the text content from a web page (skip navigation, headers, and footers), paste it into the text input area in the knowledge base section, and submit.

This method is better when your pages have complex layouts, lots of images, or dynamic content that a crawler might not extract cleanly. It also gives you the opportunity to edit the content before uploading, removing anything that would not be useful as chatbot knowledge.

Which Pages to Prioritize

Not every page on your website is equally valuable for chatbot training. Prioritize in this order:

Skip pages that are primarily promotional (landing pages with little information), time-sensitive (event announcements), or redundant (pages that repeat content from other pages).

Keeping Website Training Data Current

Your website changes over time, and your chatbot's knowledge should change with it. When you update product information, change pricing, or revise policies on your website, you need to update the corresponding training data. Delete the old embeddings for the changed content and re-crawl or re-paste the updated pages.

A good practice is to schedule a monthly review of your chatbot's knowledge base against your current website content. See How to Keep Your AI Training Data Up to Date for a maintenance routine.

Tip: If your website has a sitemap.xml file, use it as your crawling checklist. It lists every important page on your site and helps you avoid missing content.

Turn your existing website into a chatbot knowledge base. Crawl your pages and have a trained chatbot answering questions in minutes.

Get Started Free