Home » AI Chatbots » Voice Features

How to Add Voice Input and Output to Your Chatbot

You can give your chatbot a real voice by connecting it to the Voices app. Visitors speak their questions instead of typing, and the chatbot responds with natural-sounding audio using AWS Polly neural voices. Setup takes a few minutes and works with any chatbot type.

What Voice Features Are Available

The platform supports two voice capabilities that work independently or together. Voice input uses OpenAI Whisper to transcribe spoken questions into text, so visitors can talk to your chatbot hands-free. Voice output uses AWS Polly to synthesize the chatbot's text response into spoken audio, delivered as MP3 or OGG. You can enable one or both depending on your use case.

AWS Polly provides over 100 neural and standard voices across 30+ languages. Neural voices sound more natural and support lip-sync timing data (called tween data), which is useful if you are building an animated avatar chatbot. Standard voices cost less and work well for straightforward audio responses.

Before You Start

You need an existing chatbot configured in the AI Chatbot app. Voice features are added on top of a working text chatbot, so make sure your chatbot is responding correctly to typed messages first. You also need the Voices app installed on your account.

Step-by-Step Setup

Step 1: Create a voice character in the Voices app.
Open the Voices app in your admin panel and create a new character. Choose a voice from the AWS Polly library, pick your engine (neural recommended for natural sound), set the output format to MP3 or OGG, and select the language. This character profile stores all the voice settings your chatbot will use.

Step 2: Link the voice character to your chatbot.
Open your chatbot settings in the AI Chatbot app. Find the voice character field and select the character you just created. This tells the chatbot to synthesize every response into audio using that voice profile. Save your chatbot settings.

Step 3: Enable voice input on your chat widget.
In your chatbot embed settings, enable the audio input option. This adds a microphone button to the chat widget. When a visitor clicks it and speaks, their audio is sent to OpenAI Whisper for transcription, and the resulting text is processed by the chatbot like any typed message.

Step 4: Test voice input and output.
Open your chatbot on your website and click the microphone button to ask a question by voice. The chatbot should return both a text response and an audio player with the spoken version. Check that the voice sounds right and the transcription is accurate.

Step 5: Enable lip-sync tween data (optional).
If you are building an animated character, enable the tween option on your voice character. This generates viseme timing marks alongside the audio, which your frontend code can use to animate mouth movements in sync with the speech. Tween data requires the neural engine and doubles the synthesis cost since it requires a second processing pass.

Cost breakdown: Voice transcription (speech-to-text) costs roughly 1 to 10 credits depending on audio length. Voice synthesis (text-to-speech) is based on character count, with neural voices costing about 4x more than standard voices. Both are capped at 10 credits per request. For a typical chatbot response of 200 characters, neural synthesis costs around 2 credits.

Choosing the Right Voice

For customer-facing chatbots, neural voices are worth the small cost increase because they sound significantly more natural. Standard voices work well for internal tools or high-volume applications where cost matters more than polish. You can preview voices in the Voices app before assigning one to your chatbot.

If your audience speaks multiple languages, create separate voice characters for each language and assign them to language-specific chatbots. AWS Polly supports languages including English, Spanish, French, German, Japanese, Portuguese, Italian, and many more.

Common Use Cases

Accessibility: Voice input and output make your chatbot usable for visitors who have difficulty typing or reading
Hands-free support: Warehouse workers, drivers, and field staff can interact with your chatbot by voice while working
Animated avatars: Combine voice output with tween data to create a talking character that greets website visitors
Phone-like experience: Give visitors a conversational, voice-first interaction that feels more personal than text chat

Add a voice to your AI chatbot today. Natural-sounding speech in 30+ languages.

Contact Our Team

View the AI Chatbot App · View the Voices App