Voice & TTS
Text-to-speech generation and voice cloning capabilities.
API Key Support: /api/v1/voice/* endpoints support both session-based authentication and API key authentication.
Legacy /api/elevenlabs/* endpoints are session-based only.
Endpoint Patterns
Voice APIs are available at two paths:
| Pattern | Description | Auth | Use Case |
|---|---|---|---|
/api/v1/voice/* | Recommended - Generic, provider-agnostic endpoints | Session or API key | New integrations, programmatic access |
/api/elevenlabs/* | Legacy endpoints (still supported) | Session only | Existing integrations, backwards compatibility |
API key authentication is available only on /api/v1/voice/*. Legacy /api/elevenlabs/* endpoints require session-based auth and do not accept API keys.
Text to Speech
Legacy path: /api/elevenlabs/tts (still supported for backwards compatibility)
Convert text to speech audio using premium AI voices.
cURL
curl -X POST "https://cloud.milady.ai/api/v1/voice/tts" \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, this is a test of the text to speech system.",
"voiceId": "21m00Tcm4TlvDq8ikWAM",
"modelId": "eleven_multilingual_v2"
}' \
--output speech.mp3Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
text | string | ✓ | Text to convert to speech (max 5000 chars) |
voiceId | string | ✓ | Voice ID to use (see List Voices) |
modelId | string | Model ID. Default: eleven_multilingual_v2 | |
stability | number | Voice stability (0-1). Default: 0.5 | |
similarity_boost | number | Voice similarity (0-1). Default: 0.75 |
Available Models
| Model | Languages | Quality | Speed |
|---|---|---|---|
eleven_multilingual_v2 | 29 | Highest | Medium |
eleven_turbo_v2_5 | 32 | High | Fast |
eleven_flash_v2_5 | 32 | High | Fast |
eleven_v3 | Multi | Highest | Medium |
Voice pricing is refreshed separately from the text/image/video catalogs. TTS is billed per character, STT per decoded audio duration, and voice cloning by clone tier.
Response
Returns audio data as audio/mpeg stream. The Content-Length header indicates file size.
Speech to Text
Legacy path: /api/elevenlabs/stt (still supported for backwards compatibility)
Transcribe audio to text.
Request
Upload audio as multipart/form-data:
curl -X POST "https://cloud.milady.ai/api/v1/voice/stt" \
-H "X-API-Key: YOUR_API_KEY" \
-F "audio=@recording.mp3"Response
Returns JSON with transcript and duration_ms.
| Field | Type | Description |
|---|---|---|
transcript | string | Transcribed text |
duration_ms | number | Audio duration in milliseconds |
{
"transcript": "Hello, this is a transcription test.",
"duration_ms": 3245
}List Voices
Legacy path: /api/elevenlabs/voices/user (still supported for backwards compatibility)
Get your cloned voices with pagination and filtering.
Query Parameters
| Parameter | Type | Description |
|---|---|---|
includeInactive | boolean | Include inactive voices (default: false) |
cloneType | string | Filter by instant or professional |
limit | number | Results per page (default: 50, max: 100) |
offset | number | Pagination offset |
Response
{
"success": true,
"voices": [
{
"id": "123e4567-e89b-12d3-a456-426614174000",
"elevenlabsVoiceId": "xyz789",
"name": "My Custom Voice",
"description": "A professional voice clone",
"cloneType": "instant",
"sampleCount": 3,
"usageCount": 150,
"isActive": true,
"createdAt": "2024-01-15T10:30:00Z"
}
],
"total": 5,
"limit": 50,
"offset": 0,
"hasMore": false
}Clone Voice
Legacy path: /api/elevenlabs/voices/clone (still supported for backwards compatibility)
Create a voice clone from audio samples.
Request
Upload audio samples as multipart/form-data:
curl -X POST "https://cloud.milady.ai/api/v1/voice/clone" \
-H "X-API-Key: YOUR_API_KEY" \
-F "name=My Voice" \
-F "cloneType=instant" \
-F "file0=@sample1.mp3" \
-F "file1=@sample2.mp3"Response
{
"id": "123e4567-e89b-12d3-a456-426614174000",
"name": "My Voice",
"status": "processing"
}Voice Cloning Tips: - Provide 1-5 minutes of clear audio for best results
- Use high-quality recordings with minimal background noise - Speaking clearly and at a natural pace produces better clones - Multiple samples in different contexts improve voice quality
Get Voice
Legacy path: /api/elevenlabs/voices/{id} (still supported for backwards compatibility)
Get details for a specific voice by its internal UUID.
curl "https://cloud.milady.ai/api/v1/voice/123e4567-e89b-12d3-a456-426614174000" \
-H "X-API-Key: YOUR_API_KEY"Update Voice
Update a voice’s metadata.
curl -X PATCH "https://cloud.milady.ai/api/v1/voice/123e4567-e89b-12d3-a456-426614174000" \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"name": "Updated Voice Name", "isActive": true}'Delete Voice
Legacy path: /api/elevenlabs/voices/{id} (still supported for backwards compatibility)
Delete a cloned voice from your account.
curl -X DELETE "https://cloud.milady.ai/api/v1/voice/123e4567-e89b-12d3-a456-426614174000" \
-H "X-API-Key: YOUR_API_KEY"Response
{
"success": true,
"message": "Voice deleted successfully"
}Voice Cloning Jobs
Legacy path: /api/elevenlabs/voices/jobs (still supported for backwards compatibility)
Check status of active voice cloning jobs.
Response
{
"success": true,
"jobs": [
{
"id": "job_xyz789",
"voiceName": "My Voice",
"jobType": "instant",
"status": "processing",
"progress": 50,
"createdAt": "2024-01-15T10:30:00Z"
}
],
"total": 1
}Status Values
| Status | Description |
|---|---|
pending | Job is queued |
processing | Voice clone is being generated |
completed | Voice is ready to use |
failed | Cloning failed (check audio quality) |
Pricing
| Operation | Credits |
|---|---|
| Text-to-Speech (per 1K chars) | 1 |
| Speech-to-Text (per minute) | 2 |
| Voice Clone (one-time) | 10 |
Error Handling
| Code | Error | Solution |
|---|---|---|
| 400 | Invalid voice ID | Use a valid voice from List Voices |
| 400 | Text too long | Split text into chunks under 5000 chars |
| 402 | Insufficient credits | Add credits to your account |
| 404 | Voice not found | Voice may have been deleted |
| 429 | Rate limited | Wait and retry |