Skip to Content

Rate Limits

Stable

elizaOS Cloud applies rate limits to ensure fair usage and platform stability.

Overview

Rate limits are applied per API key and vary by:

Default Limits

Rate limits are applied per API key. All users share the same default limits — credits and billing control cost.

EndpointRate LimitPreset
Chat Completions200/minRELAXED
Responses200/minRELAXED
Embeddings60/minSTANDARD
Image Generation60/minSTANDARD
Video Generation5/5minCRITICAL
Knowledge Query60/minSTANDARD

Enterprise

Custom rate limits based on your needs. Contact sales.

Rate Limit Headers

Every response includes rate limit information:

X-RateLimit-Limit: 200 X-RateLimit-Remaining: 185 X-RateLimit-Reset: 2026-01-15T12:01:00.000Z X-RateLimit-Policy: redis
HeaderDescription
X-RateLimit-LimitMax requests in window
X-RateLimit-RemainingRequests remaining
X-RateLimit-ResetISO timestamp when window resets
X-RateLimit-PolicyBackend: redis or in-memory

Handling Rate Limits

429 Response

When rate limited, you’ll receive:

{ "error": { "code": "RATE_LIMITED", "message": "Too many requests", "retryAfter": 42 } }

Retry Strategy

Implement exponential backoff:

async function fetchWithRetry(url, options, maxRetries = 3) { for (let i = 0; i < maxRetries; i++) { const response = await fetch(url, options); if (response.status === 429) { const retryAfter = response.headers.get("X-RateLimit-Reset-After"); const waitTime = retryAfter ? parseInt(retryAfter) * 1000 : Math.pow(2, i) * 1000; console.log(`Rate limited. Waiting ${waitTime}ms...`); await new Promise((resolve) => setTimeout(resolve, waitTime)); continue; } return response; } throw new Error("Max retries exceeded"); }

Best Practices

  1. Monitor headers: Track remaining requests
  2. Implement backoff: Wait before retrying
  3. Queue requests: Smooth out request spikes
  4. Cache responses: Reduce redundant calls

Request Queuing

For high-volume applications, implement a request queue:

class RequestQueue { constructor(rateLimit = 60, windowMs = 60000) { this.queue = []; this.rateLimit = rateLimit; this.windowMs = windowMs; this.requestTimes = []; } async add(request) { return new Promise((resolve, reject) => { this.queue.push({ request, resolve, reject }); this.process(); }); } async process() { if (this.queue.length === 0) return; // Clean old request times const now = Date.now(); this.requestTimes = this.requestTimes.filter( (t) => now - t < this.windowMs, ); // Check if we can make a request if (this.requestTimes.length < this.rateLimit) { const { request, resolve, reject } = this.queue.shift(); this.requestTimes.push(now); try { const result = await request(); resolve(result); } catch (error) { reject(error); } this.process(); } else { // Wait until we can make another request const waitTime = this.windowMs - (now - this.requestTimes[0]); setTimeout(() => this.process(), waitTime); } } }

Increasing Limits

For specific use cases, contact support to request a custom limit increase.

Burst Limits

The BURST preset (10 req/sec) is available for real-time features. Standard AI endpoints use per-minute windows only.

Monitoring Usage

Track your API usage:

curl -X GET "https://cloud.milady.ai/api/quotas/usage" \ -H "Authorization: Bearer YOUR_API_KEY"
{ "period": "current_minute", "usage": { "chat": { "used": 45, "limit": 200 }, "embeddings": { "used": 12, "limit": 60 } } }

Next Steps