Home » Training AI on Your Data » Security and Privacy

Security and Privacy When Training AI on Business Data

Training AI on your business data means sending your documents to AI model providers for processing. Understanding how your data is handled, stored, and protected is essential before uploading sensitive business information. The good news is that with the right approach, you can train AI on your data without compromising security or violating privacy regulations.

How Your Data Flows Through the System

When you upload training data to the platform, here is what happens at each stage:

At no point is your training data used to train the AI models themselves. OpenAI and Anthropic (Claude) both have API data usage policies that state they do not use API inputs or outputs to train their models. Your business data remains your business data.

What Data Should You Avoid Uploading

Even with these protections, certain types of data should not be included in AI training:

Practical approach: Train the AI on information you would be comfortable putting in a knowledge base article visible to authorized users. If you would not post it on an internal wiki, do not upload it as training data.

Data Isolation Between Accounts

On our platform, each account's training data is stored separately in DynamoDB with your account ID as the partition key. Your embeddings are only searchable by chatbots that belong to your account. No other user can access your training data, and your chatbots cannot accidentally retrieve another account's information.

Chatbot conversations are similarly isolated. Each conversation is stored with your account ID and a unique conversation ID. Only you can view the conversation history through your admin panel.

Using Your Own API Keys

For maximum control over data flow, you can use your own API keys for OpenAI and Anthropic instead of the platform's shared keys. When you use your own keys:

Compliance Considerations

GDPR

If you serve European customers, avoid including personal data in training content. Use anonymized or aggregated data for training. Your chatbot's responses should not reveal personal information about specific individuals. If your use case requires processing personal data, review the AI provider's data processing agreements.

HIPAA

Healthcare organizations should not upload PHI into AI training data unless they have verified HIPAA compliance with every system in the data chain. For most healthcare chatbot use cases, you can train on general medical information, practice policies, and procedures without including any patient data.

SOC 2 and Enterprise Requirements

For organizations with strict compliance requirements, using your own API keys gives you a direct contractual relationship with the AI provider. The platform itself uses AWS infrastructure (DynamoDB, Lambda, EC2) which provides the underlying security certifications.

Best Practices for Secure AI Training

Train AI on your business data with confidence. Your data stays yours.

Get Started Free