Rate Limits
Stable
elizaOS Cloud applies rate limits to ensure fair usage and platform stability.
Overview
Rate limits are applied per API key and vary by:
- Endpoint type: Different limits for different APIs
- Time window: Requests per minute or per 5 minutes
Default Limits
Rate limits are applied per API key. All users share the same default limits — credits and billing control cost.
| Endpoint | Rate Limit | Preset |
|---|---|---|
| Chat Completions | 200/min | RELAXED |
| Responses | 200/min | RELAXED |
| Embeddings | 60/min | STANDARD |
| Image Generation | 60/min | STANDARD |
| Video Generation | 5/5min | CRITICAL |
| Knowledge Query | 60/min | STANDARD |
Enterprise
Custom rate limits based on your needs. Contact sales.
Rate Limit Headers
Every response includes rate limit information:
X-RateLimit-Limit: 200
X-RateLimit-Remaining: 185
X-RateLimit-Reset: 2026-01-15T12:01:00.000Z
X-RateLimit-Policy: redis| Header | Description |
|---|---|
X-RateLimit-Limit | Max requests in window |
X-RateLimit-Remaining | Requests remaining |
X-RateLimit-Reset | ISO timestamp when window resets |
X-RateLimit-Policy | Backend: redis or in-memory |
Handling Rate Limits
429 Response
When rate limited, you’ll receive:
{
"error": {
"code": "RATE_LIMITED",
"message": "Too many requests",
"retryAfter": 42
}
}Retry Strategy
Implement exponential backoff:
async function fetchWithRetry(url, options, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
const response = await fetch(url, options);
if (response.status === 429) {
const retryAfter = response.headers.get("X-RateLimit-Reset-After");
const waitTime = retryAfter
? parseInt(retryAfter) * 1000
: Math.pow(2, i) * 1000;
console.log(`Rate limited. Waiting ${waitTime}ms...`);
await new Promise((resolve) => setTimeout(resolve, waitTime));
continue;
}
return response;
}
throw new Error("Max retries exceeded");
}Best Practices
- Monitor headers: Track remaining requests
- Implement backoff: Wait before retrying
- Queue requests: Smooth out request spikes
- Cache responses: Reduce redundant calls
Request Queuing
For high-volume applications, implement a request queue:
class RequestQueue {
constructor(rateLimit = 60, windowMs = 60000) {
this.queue = [];
this.rateLimit = rateLimit;
this.windowMs = windowMs;
this.requestTimes = [];
}
async add(request) {
return new Promise((resolve, reject) => {
this.queue.push({ request, resolve, reject });
this.process();
});
}
async process() {
if (this.queue.length === 0) return;
// Clean old request times
const now = Date.now();
this.requestTimes = this.requestTimes.filter(
(t) => now - t < this.windowMs,
);
// Check if we can make a request
if (this.requestTimes.length < this.rateLimit) {
const { request, resolve, reject } = this.queue.shift();
this.requestTimes.push(now);
try {
const result = await request();
resolve(result);
} catch (error) {
reject(error);
}
this.process();
} else {
// Wait until we can make another request
const waitTime = this.windowMs - (now - this.requestTimes[0]);
setTimeout(() => this.process(), waitTime);
}
}
}Increasing Limits
For specific use cases, contact support to request a custom limit increase.
Burst Limits
The BURST preset (10 req/sec) is available for real-time features. Standard AI endpoints use per-minute windows only.
Monitoring Usage
Track your API usage:
curl -X GET "https://cloud.milady.ai/api/quotas/usage" \
-H "Authorization: Bearer YOUR_API_KEY"{
"period": "current_minute",
"usage": {
"chat": { "used": 45, "limit": 200 },
"embeddings": { "used": 12, "limit": 60 }
}
}