Skip to Content

Voice & TTS

Stable

Text-to-speech generation and voice cloning capabilities.

API Key Support: /api/v1/voice/* endpoints support both session-based authentication and API key authentication. Legacy /api/elevenlabs/* endpoints are session-based only.

Endpoint Patterns

Voice APIs are available at two paths:

PatternDescriptionAuthUse Case
/api/v1/voice/*Recommended - Generic, provider-agnostic endpointsSession or API keyNew integrations, programmatic access
/api/elevenlabs/*Legacy endpoints (still supported)Session onlyExisting integrations, backwards compatibility

API key authentication is available only on /api/v1/voice/*. Legacy /api/elevenlabs/* endpoints require session-based auth and do not accept API keys.


Text to Speech

POST/api/v1/voice/tts

Legacy path: /api/elevenlabs/tts (still supported for backwards compatibility)

Convert text to speech audio using premium AI voices.

curl -X POST "https://cloud.milady.ai/api/v1/voice/tts" \ -H "X-API-Key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "Hello, this is a test of the text to speech system.", "voiceId": "21m00Tcm4TlvDq8ikWAM", "modelId": "eleven_multilingual_v2" }' \ --output speech.mp3

Parameters

ParameterTypeRequiredDescription
textstringText to convert to speech (max 5000 chars)
voiceIdstringVoice ID to use (see List Voices)
modelIdstringModel ID. Default: eleven_multilingual_v2
stabilitynumberVoice stability (0-1). Default: 0.5
similarity_boostnumberVoice similarity (0-1). Default: 0.75

Available Models

ModelLanguagesQualitySpeed
eleven_multilingual_v229HighestMedium
eleven_turbo_v2_532HighFast
eleven_flash_v2_532HighFast
eleven_v3MultiHighestMedium

Voice pricing is refreshed separately from the text/image/video catalogs. TTS is billed per character, STT per decoded audio duration, and voice cloning by clone tier.

Response

Returns audio data as audio/mpeg stream. The Content-Length header indicates file size.


Speech to Text

POST/api/v1/voice/stt

Legacy path: /api/elevenlabs/stt (still supported for backwards compatibility)

Transcribe audio to text.

Request

Upload audio as multipart/form-data:

curl -X POST "https://cloud.milady.ai/api/v1/voice/stt" \ -H "X-API-Key: YOUR_API_KEY" \ -F "audio=@recording.mp3"

Response

Returns JSON with transcript and duration_ms.

FieldTypeDescription
transcriptstringTranscribed text
duration_msnumberAudio duration in milliseconds
{ "transcript": "Hello, this is a transcription test.", "duration_ms": 3245 }

List Voices

GET/api/v1/voice/list

Legacy path: /api/elevenlabs/voices/user (still supported for backwards compatibility)

Get your cloned voices with pagination and filtering.

Query Parameters

ParameterTypeDescription
includeInactivebooleanInclude inactive voices (default: false)
cloneTypestringFilter by instant or professional
limitnumberResults per page (default: 50, max: 100)
offsetnumberPagination offset

Response

{ "success": true, "voices": [ { "id": "123e4567-e89b-12d3-a456-426614174000", "elevenlabsVoiceId": "xyz789", "name": "My Custom Voice", "description": "A professional voice clone", "cloneType": "instant", "sampleCount": 3, "usageCount": 150, "isActive": true, "createdAt": "2024-01-15T10:30:00Z" } ], "total": 5, "limit": 50, "offset": 0, "hasMore": false }

Clone Voice

POST/api/v1/voice/clone

Legacy path: /api/elevenlabs/voices/clone (still supported for backwards compatibility)

Create a voice clone from audio samples.

Request

Upload audio samples as multipart/form-data:

curl -X POST "https://cloud.milady.ai/api/v1/voice/clone" \ -H "X-API-Key: YOUR_API_KEY" \ -F "name=My Voice" \ -F "cloneType=instant" \ -F "file0=@sample1.mp3" \ -F "file1=@sample2.mp3"

Response

{ "id": "123e4567-e89b-12d3-a456-426614174000", "name": "My Voice", "status": "processing" }

Voice Cloning Tips: - Provide 1-5 minutes of clear audio for best results

  • Use high-quality recordings with minimal background noise - Speaking clearly and at a natural pace produces better clones - Multiple samples in different contexts improve voice quality

Get Voice

GET/api/v1/voice/{id}

Legacy path: /api/elevenlabs/voices/{id} (still supported for backwards compatibility)

Get details for a specific voice by its internal UUID.

curl "https://cloud.milady.ai/api/v1/voice/123e4567-e89b-12d3-a456-426614174000" \ -H "X-API-Key: YOUR_API_KEY"

Update Voice

PATCH/api/v1/voice/{id}

Update a voice’s metadata.

curl -X PATCH "https://cloud.milady.ai/api/v1/voice/123e4567-e89b-12d3-a456-426614174000" \ -H "X-API-Key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"name": "Updated Voice Name", "isActive": true}'

Delete Voice

DELETE/api/v1/voice/{id}

Legacy path: /api/elevenlabs/voices/{id} (still supported for backwards compatibility)

Delete a cloned voice from your account.

curl -X DELETE "https://cloud.milady.ai/api/v1/voice/123e4567-e89b-12d3-a456-426614174000" \ -H "X-API-Key: YOUR_API_KEY"

Response

{ "success": true, "message": "Voice deleted successfully" }

Voice Cloning Jobs

GET/api/v1/voice/jobs

Legacy path: /api/elevenlabs/voices/jobs (still supported for backwards compatibility)

Check status of active voice cloning jobs.

Response

{ "success": true, "jobs": [ { "id": "job_xyz789", "voiceName": "My Voice", "jobType": "instant", "status": "processing", "progress": 50, "createdAt": "2024-01-15T10:30:00Z" } ], "total": 1 }

Status Values

StatusDescription
pendingJob is queued
processingVoice clone is being generated
completedVoice is ready to use
failedCloning failed (check audio quality)

Pricing

OperationCredits
Text-to-Speech (per 1K chars)1
Speech-to-Text (per minute)2
Voice Clone (one-time)10

Error Handling

CodeErrorSolution
400Invalid voice IDUse a valid voice from List Voices
400Text too longSplit text into chunks under 5000 chars
402Insufficient creditsAdd credits to your account
404Voice not foundVoice may have been deleted
429Rate limitedWait and retry