Rate Limits

Stable

elizaOS Cloud applies rate limits to ensure fair usage and platform stability.

Overview

Rate limits are applied per API key and vary by:

Endpoint type: Different limits for different APIs
Time window: Requests per minute or per 5 minutes

Default Limits

Rate limits are applied per API key. All users share the same default limits — credits and billing control cost.

Endpoint	Rate Limit	Preset
Chat Completions	200/min	RELAXED
Responses	200/min	RELAXED
Embeddings	60/min	STANDARD
Image Generation	60/min	STANDARD
Video Generation	5/5min	CRITICAL
Knowledge Query	60/min	STANDARD

Enterprise

Custom rate limits based on your needs. Contact sales.

Rate Limit Headers

Every response includes rate limit information:


X-RateLimit-Limit: 200
X-RateLimit-Remaining: 185
X-RateLimit-Reset: 2026-01-15T12:01:00.000Z
X-RateLimit-Policy: redis

Header	Description
`X-RateLimit-Limit`	Max requests in window
`X-RateLimit-Remaining`	Requests remaining
`X-RateLimit-Reset`	ISO timestamp when window resets
`X-RateLimit-Policy`	Backend: `redis` or `in-memory`

Handling Rate Limits

429 Response

When rate limited, you’ll receive:


{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Too many requests",
    "retryAfter": 42
  }
}

Retry Strategy

Implement exponential backoff:


async function fetchWithRetry(url, options, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    const response = await fetch(url, options);
 
    if (response.status === 429) {
      const retryAfter = response.headers.get("X-RateLimit-Reset-After");
      const waitTime = retryAfter
        ? parseInt(retryAfter) * 1000
        : Math.pow(2, i) * 1000;
 
      console.log(`Rate limited. Waiting ${waitTime}ms...`);
      await new Promise((resolve) => setTimeout(resolve, waitTime));
      continue;
    }
 
    return response;
  }
 
  throw new Error("Max retries exceeded");
}

Best Practices

Monitor headers: Track remaining requests
Implement backoff: Wait before retrying
Queue requests: Smooth out request spikes
Cache responses: Reduce redundant calls

Request Queuing

For high-volume applications, implement a request queue:


class RequestQueue {
  constructor(rateLimit = 60, windowMs = 60000) {
    this.queue = [];
    this.rateLimit = rateLimit;
    this.windowMs = windowMs;
    this.requestTimes = [];
  }
 
  async add(request) {
    return new Promise((resolve, reject) => {
      this.queue.push({ request, resolve, reject });
      this.process();
    });
  }
 
  async process() {
    if (this.queue.length === 0) return;
 
    // Clean old request times
    const now = Date.now();
    this.requestTimes = this.requestTimes.filter(
      (t) => now - t < this.windowMs,
    );
 
    // Check if we can make a request
    if (this.requestTimes.length < this.rateLimit) {
      const { request, resolve, reject } = this.queue.shift();
      this.requestTimes.push(now);
 
      try {
        const result = await request();
        resolve(result);
      } catch (error) {
        reject(error);
      }
 
      this.process();
    } else {
      // Wait until we can make another request
      const waitTime = this.windowMs - (now - this.requestTimes[0]);
      setTimeout(() => this.process(), waitTime);
    }
  }
}

Increasing Limits

For specific use cases, contact support to request a custom limit increase.

Burst Limits

The BURST preset (10 req/sec) is available for real-time features. Standard AI endpoints use per-minute windows only.

Monitoring Usage

Track your API usage:


curl -X GET "https://cloud.milady.ai/api/quotas/usage" \
  -H "Authorization: Bearer YOUR_API_KEY"


{
  "period": "current_minute",
  "usage": {
    "chat": { "used": 45, "limit": 200 },
    "embeddings": { "used": 12, "limit": 60 }
  }
}

Next Steps

Handle API errors gracefully

Error Handling

Complete API documentation

API Reference

Upgrade your plan

Billing