Voice Cloning

Create custom AI voices and generate speech with ElevenLabs integration.

Beta

Overview

Voice cloning on elizaOS Cloud enables you to:

Clone voices: Create AI replicas of any voice
Generate speech: Convert text to natural-sounding audio
Custom voices: Use cloned voices in your agents
Multi-language: Support for 29+ languages

Quick Start

Dashboard

Navigate to Dashboard → Voices for the visual interface.

API

Clone Voice


# Clone a voice from audio samples
curl -X POST "https://cloud.milady.ai/api/elevenlabs/voices/clone" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "name=My Voice Clone" \
  -F "files=@sample1.mp3" \
  -F "files=@sample2.mp3"

Generate Speech


# Generate speech
curl -X POST "https://cloud.milady.ai/api/elevenlabs/tts" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is a test of voice synthesis.",
    "voice_id": "voice_abc123",
    "model_id": "eleven_multilingual_v2"
  }' \
  --output speech.mp3

List Voices


# List your voices
curl -X GET "https://cloud.milady.ai/api/elevenlabs/voices/user" \
  -H "Authorization: Bearer YOUR_API_KEY"

Voice Cloning

Prepare Audio Samples

Gather 1-3 minutes of clean audio from the target voice.

Requirements:

Clear speech without background noise
Single speaker only
High quality (WAV or MP3, 44.1kHz+)

Upload Samples

Upload audio files via dashboard or API.

Create Clone

Submit the cloning request and wait for processing.

Verify Quality

Test the cloned voice with sample text.

Sample Requirements

Requirement	Recommendation
Duration	1-3 minutes total
Format	WAV, MP3, M4A
Quality	44.1kHz, 16-bit minimum
Content	Natural speech, varied intonation
Noise	Minimal background noise

Using someone’s voice without permission may violate their rights. Only clone voices you have rights to use.

Text-to-Speech

Generate Speech


const response = await fetch("https://cloud.milady.ai/api/elevenlabs/tts", {
  method: "POST",
  headers: {
    Authorization: "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    text: "Welcome to elizaOS Cloud!",
    voice_id: "voice_abc123",
    model_id: "eleven_multilingual_v2",
    voice_settings: {
      stability: 0.5,
      similarity_boost: 0.75,
    },
  }),
});
 
const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);

Voice Settings

Setting	Range	Description
`stability`	0-1	Higher = more consistent, lower = more expressive
`similarity_boost`	0-1	How closely to match the original voice
`style`	0-1	Style exaggeration (v2 models only)
`use_speaker_boost`	bool	Enhance speaker similarity

Available Models

Model	Languages	Quality	Speed
`eleven_multilingual_v2`	29	Highest	Medium
`eleven_monolingual_v1`	English	High	Fast
`eleven_turbo_v2`	English	Good	Fastest

Pre-built Voices

elizaOS Cloud provides pre-built voices:


curl -X GET "https://cloud.milady.ai/api/elevenlabs/voices" \
  -H "Authorization: Bearer YOUR_API_KEY"


{
  "voices": [
    {
      "voice_id": "21m00Tcm4TlvDq8ikWAM",
      "name": "Rachel",
      "labels": { "accent": "american", "age": "young" },
      "preview_url": "https://..."
    },
    {
      "voice_id": "AZnzlk1XvdvUeBnXmlld",
      "name": "Domi",
      "labels": { "accent": "american", "age": "young" }
    }
  ]
}

Voice Management

Get Voice Details


curl -X GET "https://cloud.milady.ai/api/elevenlabs/voices/voice_abc123" \
  -H "Authorization: Bearer YOUR_API_KEY"

Delete Voice


curl -X DELETE "https://cloud.milady.ai/api/elevenlabs/voices/voice_abc123" \
  -H "Authorization: Bearer YOUR_API_KEY"

Check Clone Status


curl -X GET "https://cloud.milady.ai/api/elevenlabs/voices/jobs" \
  -H "Authorization: Bearer YOUR_API_KEY"

Agent Integration

Use cloned voices with your agents:


{
  "name": "Voice Assistant",
  "bio": ["Helpful AI assistant with custom voice"],
  "settings": {
    "voice": {
      "provider": "elevenlabs",
      "voiceId": "voice_abc123",
      "model": "eleven_multilingual_v2"
    }
  }
}

Speech-to-Text

Convert audio to text:


curl -X POST "https://cloud.milady.ai/api/elevenlabs/stt" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@audio.mp3"


{
  "text": "This is the transcribed text from the audio.",
  "confidence": 0.95,
  "words": [
    { "word": "This", "start": 0.0, "end": 0.2, "confidence": 0.98 },
    { "word": "is", "start": 0.2, "end": 0.3, "confidence": 0.99 }
  ]
}

Pricing

See Billing & Credits for current pricing. Voice cloning costs 5 credits, TTS/STT are usage-based.

Monitor your voice usage in the billing dashboard.

Best Practices

Quality Samples — Use high-quality, noise-free audio (44.1kHz+, minimal background noise)
Natural Speech — Include varied intonation, pacing, and emotional range in samples
Sufficient Length — Provide 1-3 minutes of audio for best clone quality
Test Thoroughly — Verify clone quality with diverse text before production use

Next Steps

Add voice to your agents

AI Agents

Add voiceovers to videos

Video Generation

Complete API documentation

API Reference