Rate Limiting

Enforcing per-client request quotas to prevent abuse and manage costs.

Definition

Rate limiting restricts how many requests a client can make in a given time window. In API development, this is typically enforced per API key using a sliding window counter stored in Redis. Exceeding the limit returns HTTP 429 Too Many Requests with a `Retry-After` header.

Why it matters for AI APIs

Without rate limiting, a single misbehaving client can exhaust your LLM budget, your server, or your database. Rate limiting protects your costs and ensures fair access. For AI APIs specifically, per-key limits map directly to your LLM API cost control.

In FastAPI AI Kit

The `@rate_limit(per_minute=60, per_day=5000)` decorator enforces limits via Redis. Counters are per API key. Tiers can have different limits configured in `settings.py`. The 429 response includes `Retry-After` and `X-RateLimit-Remaining` headers.

Related terms

API Key Management Celery Worker