Skip to Content

Chat Completions

Stable

OpenAI-compatible chat completions endpoint for conversational AI.

Create Chat Completion

POST/api/v1/chat/completions

Generate a chat completion response from the model.

Request

curl -X POST "https://cloud.milady.ai/api/v1/chat/completions" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"} ], "temperature": 0.7 }'

Parameters

ParameterTypeRequiredDescription
modelstringModel ID or agent ID
messagesarrayArray of message objects
temperaturenumberSampling temperature (0-2). Default: 1
max_tokensintegerMaximum tokens to generate
streambooleanStream response chunks. Default: false
top_pnumberNucleus sampling parameter (0-1)
frequency_penaltynumberFrequency penalty (-2 to 2)
presence_penaltynumberPresence penalty (-2 to 2)
stopstring/arrayStop sequences
userstringUnique user identifier

Message Object

FieldTypeRequiredDescription
rolestringsystem, user, assistant, or tool
contentstringMessage content
namestringOptional name for the participant

Response

{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1705312800, "model": "gpt-4o", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help you today?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 20, "completion_tokens": 10, "total_tokens": 30 } }

Streaming

Enable streaming for real-time responses by setting stream: true.

Streaming Request

const response = await fetch("https://cloud.milady.ai/api/v1/chat/completions", { method: "POST", headers: { Authorization: "Bearer YOUR_API_KEY", "Content-Type": "application/json", }, body: JSON.stringify({ model: "gpt-4o", messages: [{ role: "user", content: "Tell me a story" }], stream: true, }), }); const reader = response.body.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value); const lines = chunk.split("\n").filter((line) => line.startsWith("data: ")); for (const line of lines) { const data = line.slice(6); if (data === "[DONE]") continue; const parsed = JSON.parse(data); process.stdout.write(parsed.choices[0]?.delta?.content || ""); } }

Using with Agents

You can use an agent ID as the model to chat with your custom AI agents:

{ "model": "agent_abc123", "messages": [{ "role": "user", "content": "Hello!" }] }

The agent’s personality, system prompt, and configured model will be used automatically.


Available Models

ModelProviderDescription
gpt-4oOpenAIMost capable GPT-4 model
gpt-4o-miniOpenAIFast and efficient
claude-sonnet-4-6AnthropicLatest Claude model
gemini-1.5-proGoogleGemini Pro model

See the Models API for a complete list.