Home » Training AI » What Are Embeddings

What Are Vector Embeddings in Simple Terms

Vector embeddings are numerical representations of text that capture meaning. When you upload a document to train your AI, each chunk of text is converted into a list of numbers (a vector) that represents what that text is about. These vectors allow the system to find content by meaning rather than exact keywords, so a question about "return policy" can match a document titled "refund guidelines" even though the words are different.

The Simple Analogy

Imagine a library where books are not organized by title or author, but by what they are about. A book about dog training would sit right next to a book about puppy obedience, even though the titles share no words. A book about cooking Italian food would be near a book about pasta recipes. The "location" of each book represents its meaning.

That is what embeddings do with text. Each piece of text gets a location in a mathematical space where similar meanings are close together and unrelated meanings are far apart. When someone asks a question, the system converts the question into the same kind of location and finds the nearest stored content. This is called vector search or similarity search.

How Embeddings Are Created

An embedding model (a specialized AI designed for this specific task) reads a piece of text and outputs a vector, which is a list of numbers. Modern embedding models produce vectors with 1,536 or more dimensions. Each dimension captures some aspect of the text's meaning, though individual dimensions do not map to human-readable concepts like "topic" or "sentiment." The meaning emerges from the pattern of all the numbers together.

For example, the sentence "How do I return a product?" might produce a vector like [0.023, -0.156, 0.891, ...] with 1,536 numbers. The sentence "What is your refund policy?" would produce a different vector, but one that is mathematically close to the first because the meanings are related. The sentence "What color options are available?" would produce a vector that is farther away because the topic is different.

Why Embeddings Matter for AI Training

Without embeddings, the only way to search your documents would be keyword matching. If a customer asks "Can I get my money back?" and your FAQ says "Our return policy allows refunds within 30 days," a keyword search might miss this entirely because none of the important words match. Embedding-based search finds it immediately because the meanings are similar.

This semantic understanding is what makes RAG so effective. Your chatbot does not need customers to ask questions using the exact words in your documents. It understands intent and meaning, finding the right information even when the question is phrased differently from the source material.

Embeddings on This Platform

When you add training data to your chatbot through the AI Chatbot app, the platform handles embedding automatically. Your content is chunked into pieces of 250 to 2,000 characters, each chunk is sent to an embedding model, and the resulting vectors are stored in the embeddings database. The cost is 3 credits per chunk, which covers the embedding model API call and storage.

You never interact with the vectors directly. The system handles all the mathematical operations behind the scenes. You upload text, and the chatbot starts finding and using relevant information. If you want to understand the search process in more detail, see How Vector Search Finds the Right Information.

Technical detail: The platform uses OpenAI's text-embedding-3-small model for generating embeddings. This model produces 1,536-dimensional vectors that balance accuracy and cost effectively. The vectors are stored in DynamoDB with the original text, enabling fast retrieval during chatbot conversations.

Common Questions About Embeddings

Do embeddings expire or need refreshing?

No. Once created, embeddings are permanent and do not degrade. The vectors accurately represent the meaning of your text indefinitely. You only need to create new embeddings when you add new content or update existing content.

Can I use embeddings across different AI models?

Yes. The embeddings are created by one model (the embedding model) and used to retrieve content for any AI model (GPT, Claude, or others). Switching your chatbot from GPT to Claude does not require re-embedding your data.

How many embeddings can I store?

There is no hard limit. Businesses on this platform store anywhere from a few dozen chunks (a small FAQ) to tens of thousands of chunks (large documentation libraries). More chunks means more knowledge available to your chatbot.

Start training your AI with embeddings today. Upload your content and the platform handles all the technical details automatically.

Get Started Free