Voice & TTS

Stable

Text-to-speech generation and voice cloning capabilities.

API Key Support: /api/v1/voice/* endpoints support both session-based authentication and API key authentication. Legacy /api/elevenlabs/* endpoints are session-based only.

Endpoint Patterns

Voice APIs are available at two paths:

Pattern	Description	Auth	Use Case
`/api/v1/voice/*`	Recommended - Generic, provider-agnostic endpoints	Session or API key	New integrations, programmatic access
`/api/elevenlabs/*`	Legacy endpoints (still supported)	Session only	Existing integrations, backwards compatibility

API key authentication is available only on /api/v1/voice/*. Legacy /api/elevenlabs/* endpoints require session-based auth and do not accept API keys.

Text to Speech

POST/api/v1/voice/tts

Legacy path: /api/elevenlabs/tts (still supported for backwards compatibility)

Convert text to speech audio using premium AI voices.

cURL


curl -X POST "https://cloud.milady.ai/api/v1/voice/tts" \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is a test of the text to speech system.",
    "voiceId": "21m00Tcm4TlvDq8ikWAM",
    "modelId": "eleven_multilingual_v2"
  }' \
  --output speech.mp3

JavaScript


const response = await fetch('https://cloud.milady.ai/api/v1/voice/tts', {
  method: 'POST',
  headers: {
    'X-API-Key': 'YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    text: 'Hello, this is a test.',
    voiceId: '21m00Tcm4TlvDq8ikWAM',
    modelId: 'eleven_multilingual_v2',
  }),
});
 
const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);

Python


import requests
 
response = requests.post(
    'https://cloud.milady.ai/api/v1/voice/tts',
    headers={
        'X-API-Key': 'YOUR_API_KEY',
        'Content-Type': 'application/json',
    },
    json={
        'text': 'Hello, this is a test.',
        'voiceId': '21m00Tcm4TlvDq8ikWAM',
        'modelId': 'eleven_multilingual_v2',
    }
)
 
with open('speech.mp3', 'wb') as f:
    f.write(response.content)

Parameters

Parameter	Type	Required	Description
`text`	string	✓	Text to convert to speech (max 5000 chars)
`voiceId`	string	✓	Voice ID to use (see List Voices)
`modelId`	string		Model ID. Default: `eleven_multilingual_v2`
`stability`	number		Voice stability (0-1). Default: 0.5
`similarity_boost`	number		Voice similarity (0-1). Default: 0.75

Available Models

Model	Languages	Quality	Speed
`eleven_multilingual_v2`	29	Highest	Medium
`eleven_turbo_v2_5`	32	High	Fast
`eleven_flash_v2_5`	32	High	Fast
`eleven_v3`	Multi	Highest	Medium

Voice pricing is refreshed separately from the text/image/video catalogs. TTS is billed per character, STT per decoded audio duration, and voice cloning by clone tier.

Response

Returns audio data as audio/mpeg stream. The Content-Length header indicates file size.

Speech to Text

POST/api/v1/voice/stt

Legacy path: /api/elevenlabs/stt (still supported for backwards compatibility)

Transcribe audio to text.

Request

Upload audio as multipart/form-data:


curl -X POST "https://cloud.milady.ai/api/v1/voice/stt" \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "audio=@recording.mp3"

Response

Returns JSON with transcript and duration_ms.

Field	Type	Description
`transcript`	string	Transcribed text
`duration_ms`	number	Audio duration in milliseconds


{
  "transcript": "Hello, this is a transcription test.",
  "duration_ms": 3245
}

List Voices

GET/api/v1/voice/list

Legacy path: /api/elevenlabs/voices/user (still supported for backwards compatibility)

Get your cloned voices with pagination and filtering.

Query Parameters

Parameter	Type	Description
`includeInactive`	boolean	Include inactive voices (default: false)
`cloneType`	string	Filter by `instant` or `professional`
`limit`	number	Results per page (default: 50, max: 100)
`offset`	number	Pagination offset

Response


{
  "success": true,
  "voices": [
    {
      "id": "123e4567-e89b-12d3-a456-426614174000",
      "elevenlabsVoiceId": "xyz789",
      "name": "My Custom Voice",
      "description": "A professional voice clone",
      "cloneType": "instant",
      "sampleCount": 3,
      "usageCount": 150,
      "isActive": true,
      "createdAt": "2024-01-15T10:30:00Z"
    }
  ],
  "total": 5,
  "limit": 50,
  "offset": 0,
  "hasMore": false
}

Clone Voice

POST/api/v1/voice/clone

Legacy path: /api/elevenlabs/voices/clone (still supported for backwards compatibility)

Create a voice clone from audio samples.

Request

Upload audio samples as multipart/form-data:


curl -X POST "https://cloud.milady.ai/api/v1/voice/clone" \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "name=My Voice" \
  -F "cloneType=instant" \
  -F "file0=@sample1.mp3" \
  -F "file1=@sample2.mp3"

Response


{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "name": "My Voice",
  "status": "processing"
}

Voice Cloning Tips: - Provide 1-5 minutes of clear audio for best results

Use high-quality recordings with minimal background noise - Speaking clearly and at a natural pace produces better clones - Multiple samples in different contexts improve voice quality

Get Voice

GET/api/v1/voice/{id}

Legacy path: /api/elevenlabs/voices/{id} (still supported for backwards compatibility)

Get details for a specific voice by its internal UUID.


curl "https://cloud.milady.ai/api/v1/voice/123e4567-e89b-12d3-a456-426614174000" \
  -H "X-API-Key: YOUR_API_KEY"

Update Voice

PATCH/api/v1/voice/{id}

Update a voice’s metadata.


curl -X PATCH "https://cloud.milady.ai/api/v1/voice/123e4567-e89b-12d3-a456-426614174000" \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "Updated Voice Name", "isActive": true}'

Delete Voice

DELETE/api/v1/voice/{id}

Legacy path: /api/elevenlabs/voices/{id} (still supported for backwards compatibility)

Delete a cloned voice from your account.


curl -X DELETE "https://cloud.milady.ai/api/v1/voice/123e4567-e89b-12d3-a456-426614174000" \
  -H "X-API-Key: YOUR_API_KEY"

Response


{
  "success": true,
  "message": "Voice deleted successfully"
}

Voice Cloning Jobs

GET/api/v1/voice/jobs

Legacy path: /api/elevenlabs/voices/jobs (still supported for backwards compatibility)

Check status of active voice cloning jobs.

Response


{
  "success": true,
  "jobs": [
    {
      "id": "job_xyz789",
      "voiceName": "My Voice",
      "jobType": "instant",
      "status": "processing",
      "progress": 50,
      "createdAt": "2024-01-15T10:30:00Z"
    }
  ],
  "total": 1
}

Status Values

Status	Description
`pending`	Job is queued
`processing`	Voice clone is being generated
`completed`	Voice is ready to use
`failed`	Cloning failed (check audio quality)

Pricing

Operation	Credits
Text-to-Speech (per 1K chars)	1
Speech-to-Text (per minute)	2
Voice Clone (one-time)	10

Error Handling

Code	Error	Solution
400	Invalid voice ID	Use a valid voice from List Voices
400	Text too long	Split text into chunks under 5000 chars
402	Insufficient credits	Add credits to your account
404	Voice not found	Voice may have been deleted
429	Rate limited	Wait and retry