Skip to content

429 Too Many Requests

What is HTTP 429 Too Many Requests?

429 Too Many Requests
Think about the playground slide with a rule: "Everyone can go down the slide 10...
HTTP 429 Too Many Requests status code illustration

Explain Like I’m 3

Imagine you’re on a long car trip and you keep asking Mom “Are we there yet? Are we there yet? Are we there yet?” over and over really fast! Mom says “Sweetie, you’re asking too many times! Wait 5 minutes and then you can ask again.” That’s what 429 means - you asked the computer too many questions too fast, and it’s saying “Slow down! Wait a little bit before asking again!”

Example: When you keep pressing the elevator button over and over, but it doesn’t make the elevator come faster! The elevator is telling you “I heard you the first time, stop pressing!”

Explain Like I’m 5

Think about the playground slide with a rule: “Everyone can go down the slide 10 times every 10 minutes.” If you try to go down 15 times in 5 minutes, the playground monitor says “Whoa! Too many turns! You need to wait until the 10 minutes is up, then you can have 10 more turns.” This is fair because it gives everyone a chance to use the slide. That’s exactly what 429 Too Many Requests means - you’re using the service too much, too fast, and you need to wait before you can use it again!

Example: Like when you’re at the library and can only check out 5 books at a time. If you try to check out a 6th book, the librarian says “You’ve hit your limit! Return some books first, then you can check out more.”

Jr. Developer

429 Too Many Requests indicates you’ve exceeded the rate limit for an API or service. Rate limiting is a crucial technique to prevent abuse, ensure fair usage, and protect servers from being overwhelmed.

Why rate limiting exists:

  • Prevent abuse: Stop malicious actors from overwhelming the service (DoS protection)
  • Fair resource allocation: Ensure all users get equal access
  • Cost control: Limit expensive operations (database queries, external API calls)
  • Infrastructure protection: Prevent server overload and crashes
  • Business model: Enforce tiered pricing (free tier = 100 req/hour, paid = 10,000 req/hour)

Common rate limit schemes:

  • Per IP address: 100 requests per hour
  • Per API key: 1,000 requests per day
  • Per user account: 10,000 requests per month
  • Per endpoint: 10 requests per minute (for expensive operations)

Key headers in 429 responses:

  • Retry-After: How long to wait before retrying (seconds or HTTP date)
  • X-RateLimit-Limit: Maximum requests allowed in window
  • X-RateLimit-Remaining: Requests left in current window
  • X-RateLimit-Reset: When the limit resets (Unix timestamp)

How to handle 429 as a client:

  1. Check for Retry-After header
  2. Wait the specified time before retrying
  3. Implement exponential backoff if no Retry-After header
  4. Monitor rate limit headers to avoid hitting limits
  5. Cache responses when possible to reduce requests

Code Example

// Express.js: Simple rate limiting middleware
const rateLimit = require('express-rate-limit');
// Create rate limiter: 100 requests per 15 minutes
const limiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100, // Max 100 requests per window
standardHeaders: true, // Return RateLimit-* headers
legacyHeaders: false, // Disable X-RateLimit-* headers
handler: (req, res) => {
res.status(429).json({
error: 'Too Many Requests',
message: 'Rate limit exceeded. Try again later.',
retry_after: Math.ceil(req.rateLimit.resetTime.getTime() / 1000)
});
}
});
// Apply to all requests
app.use(limiter);
// Different limit for expensive endpoint
const strictLimiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 5, // Max 5 requests per minute
message: 'This endpoint is rate-limited to 5 requests per minute'
});
app.post('/api/search', strictLimiter, async (req, res) => {
// Expensive search operation
const results = await performSearch(req.body.query);
res.json(results);
});
// Client-side: Handling 429 with retry
async function fetchWithRetry(url, options = {}, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
const response = await fetch(url, options);
if (response.status !== 429) {
return response;
}
// Rate limited - check Retry-After header
const retryAfter = response.headers.get('Retry-After');
const delay = retryAfter
? parseInt(retryAfter) * 1000
: Math.pow(2, i) * 1000; // Exponential backoff
console.log(`Rate limited. Retrying after ${delay}ms`);
await new Promise(resolve => setTimeout(resolve, delay));
}
throw new Error('Max retries exceeded');
}

Crash Course

HTTP 429 Too Many Requests (RFC 6585 §4) signals that the client has exceeded the rate limit for requests. This status code is fundamental to API stability, security, and business model enforcement.

RFC 6585 Definition: “The 429 status code indicates that the user has sent too many requests in a given amount of time (‘rate limiting’). The response representations SHOULD include details explaining the condition, and MAY include a Retry-After header indicating how long to wait before making a new request.”

Rate Limiting Algorithms:

  1. Token Bucket

    • Tokens added to bucket at fixed rate (refill rate)
    • Each request consumes a token
    • Allows bursts up to bucket capacity
    • Commonly used (e.g., AWS API Gateway)
  2. Leaky Bucket

    • Processes requests at constant rate
    • Smooths bursts, enforces steady output
    • Requests overflow if bucket full
    • Good for strict rate enforcement
  3. Fixed Window

    • Counter resets at fixed intervals (every hour, etc.)
    • Simple but has edge case: 2x requests at window boundary
    • Example: 100 req/hour, user sends 100 at 12:59, 100 at 13:01 = 200 in 2 minutes
  4. Sliding Window

    • Tracks requests in rolling time window
    • More accurate than fixed window
    • Higher memory/computation cost
    • Two variants: sliding window log (precise) and sliding window counter (approximate)

Rate Limit Granularity:

  • IP-based: Simple but problematic (NAT, shared IPs, VPNs)
  • API key-based: Most common for authenticated APIs
  • User-based: For user accounts
  • Endpoint-based: Different limits per endpoint
  • Hybrid: Combination (e.g., per-user-per-endpoint)

Response Headers:

Standardized headers (IETF Draft):

  • RateLimit-Limit: Request quota in time window
  • RateLimit-Remaining: Remaining requests
  • RateLimit-Reset: Seconds until quota resets

Legacy headers (still common):

  • X-RateLimit-Limit
  • X-RateLimit-Remaining
  • X-RateLimit-Reset
  • Retry-After: When cli…

Code Example

// Token bucket rate limiter with Redis
const Redis = require('ioredis');
const redis = new Redis();
class TokenBucketLimiter {
constructor(options) {
this.capacity = options.capacity; // Max tokens
this.refillRate = options.refillRate; // Tokens per second
this.redis = options.redis;
}
async consume(key, tokens = 1) {
const now = Date.now() / 1000; // Current time in seconds
const bucketKey = `ratelimit:${key}`;
// Lua script for atomic token bucket operation
const script = `
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local tokens_requested = tonumber(ARGV[3])
local now = tonumber(ARGV[4])
local bucket = redis.call('HMGET', KEYS[1], 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now
-- Calculate tokens to add based on time elapsed
local elapsed = now - last_refill
local tokens_to_add = elapsed * refill_rate
tokens = math.min(capacity, tokens + tokens_to_add)
local allowed = 0
local retry_after = 0
if tokens >= tokens_requested then
tokens = tokens - tokens_requested
allowed = 1
else
-- Calculate when enough tokens will be available
retry_after = math.ceil((tokens_requested - tokens) / refill_rate)
end
-- Update bucket state
redis.call('HMSET', KEYS[1], 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', KEYS[1], 3600) -- Expire after 1 hour of inactivity
return {allowed, tokens, retry_after}
`;
const result = await this.redis.eval(
script,
1,
bucketKey,
this.capacity,
this.refillRate,
tokens,
now
);
return {
allowed: result[0] === 1,
remaining: Math.floor(result[1]),
retryAfter: result[2]
};
}
}
const limiter = new TokenBucketLimiter({
capacity: 100, // 100 tokens
refillRate: 10, // 10 tokens per second
redis
});
// Express middleware
app.use(async (req, res, next) => {
const key = req.user?.id || req.ip;
const result = await limiter.consume(key);
// Add rate limit headers to all responses
res.set('RateLimit-Limit', limiter.capacity);
res.set('RateLimit-Remaining', result.remaining);
res.set('RateLimit-Reset', Math.ceil(result.remaining / limiter.refillRate));
if (!result.allowed) {
return res.status...

Deep Dive

HTTP 429 Too Many Requests represents a critical control mechanism in modern API design, balancing resource protection, fair usage, and business model enforcement. Effective rate limiting requires careful consideration of algorithm choice, distributed consistency, and client interaction patterns.

Algorithm Comparison and Trade-offs:

Token Bucket:

  • Advantages: Allows controlled bursts, flexible (separate token rate and bucket size), widely understood
  • Disadvantages: Can allow sudden traffic spikes, requires state storage
  • Use case: APIs with variable workloads, need to allow bursts
  • Memory: O(1) per key (just token count and timestamp)
  • Accuracy: Exact

Leaky Bucket:

  • Advantages: Smooth rate enforcement, prevents bursts, simple conceptual model
  • Disadvantages: Doesn’t allow legitimate bursts, can increase latency (queue delay)
  • Use case: Services requiring smooth, predictable load
  • Memory: O(n) if queuing requests
  • Accuracy: Exact

Fixed Window:

  • Advantages: Extremely simple, minimal memory, easy to implement
  • Disadvantages: Boundary problem (2x requests at window edge), less accurate
  • Use case: Simple rate limiting where precision not critical
  • Memory: O(1) per key (counter + window start time)
  • Accuracy: Approximate (worst case 2x limit at boundaries)

Sliding Window Log:

  • Advantages: Precise, no boundary issues, accurate enforcement
  • Disadvantages: High memory cost (stores timestamp per request), expensive to compute
  • Use case: Critical systems requiring precise rate limiting
  • Memory: O(limit) per key (stores all recent request timestamps)
  • Accuracy: Exact

Sliding Window Counter:

  • Advantages: Balance of accuracy and efficiency, smooth rolling window
  • Disadvantages: Slightly approximate, more complex than fixed window
  • Use case: Production APIs balancing accuracy and performance
  • Memory: O(1) per key (current + previous window counters)
  • Accuracy: Weighted approximation (typically within 0.003% of exact)

Distributed Rate Limiting Challenges:

In distributed systems with multiple servers, rate limiting faces consistency challenges:

  1. Race Conditions: Multiple servers checking limit simultaneously
  2. Synchronization Overhead: Coordinating across servers adds latency
  3. Network Partitions: Split-brain scenarios may allow limit exceedance
  4. Clock Skew: Different server times affect wind…

Code Example

// Production-grade distributed rate limiting system
const Redis = require('ioredis');
const { promisify } = require('util');
// Distributed token bucket with Redis and Lua
class DistributedRateLimiter {
constructor(options) {
this.redis = options.redis;
this.fallbackToLocal = options.fallbackToLocal !== false;
this.localCache = new Map();
this.cacheTTL = options.cacheTTL || 1000; // 1 second local cache
}
// Multi-tier rate limiting: different limits for different user tiers
async checkLimit(userId, tier = 'free') {
const limits = this.getTierLimits(tier);
// Check each limit type (per-second, per-minute, per-hour)
for (const limit of limits) {
const result = await this.checkSingleLimit(
userId,
limit.window,
limit.max,
limit.name
);
if (!result.allowed) {
return result;
}
}
return { allowed: true, limits };
}
getTierLimits(tier) {
const tiers = {
free: [
{ name: 'per_second', window: 1, max: 2 },
{ name: 'per_minute', window: 60, max: 100 },
{ name: 'per_hour', window: 3600, max: 1000 }
],
basic: [
{ name: 'per_second', window: 1, max: 10 },
{ name: 'per_minute', window: 60, max: 500 },
{ name: 'per_hour', window: 3600, max: 10000 }
],
pro: [
{ name: 'per_second', window: 1, max: 100 },
{ name: 'per_minute', window: 60, max: 5000 },
{ name: 'per_hour', window: 3600, max: 100000 }
]
};
return tiers[tier] || tiers.free;
}
async checkSingleLimit(userId, window, max, limitName) {
const key = `ratelimit:${userId}:${limitName}`;
// Try local cache first (avoids Redis roundtrip)
const cached = this.getFromLocalCache(key);
if (cached && cached.allowed === false) {
return cached; // Definitely rate limited
}
try {
// Sliding window counter algorithm with Redis
const now = Date.now();
const windowStart = now - (window * 1000);
// Lua script for atomic sliding window check
const script = `
local key = KEYS[1]
local window_start = tonumber(ARGV[1])
local now = tonumber(ARGV[2])
local max_requests = tonumber(ARGV[3])
local window_size = tonumber(ARGV[4])
-- Remove old entries
redis.call('ZREMRANGEBYSCORE', key, 0, window_start)
-- Count current requests
local current = redis.call('ZCARD', key)
if current < max_requests then
-- Add this request
redis.call('ZADD', key, now, now .. '-' .. math.random())
redis.call('EXPIRE', key, window_size * 2)
return {1, current + 1, max_requests - current - 1, 0}
else
-- Rate limited - calculate retry after
local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
local retry_after = math.ceil((tonumber(oldest[2...

Frequently Asked Questions

What's the difference between 429 Too Many Requests and 503 Service Unavailable?

429 indicates the client has exceeded their rate limit quota - it's the client's fault, and they should slow down and retry after the specified delay. 503 indicates the server is temporarily overloaded or under maintenance - it's the server's fault, not related to any specific user's quota. Use 429 for quota enforcement, 503 for server capacity issues.

What should I include in 429 responses?

Include: (1) Retry-After header (seconds or HTTP date), (2) Rate limit headers (RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset), (3) Clear error message explaining the limit, (4) Link to rate limit documentation, (5) For freemium APIs: link to pricing/upgrade page. The Retry-After header is most critical for client retry logic.

How should clients handle 429 responses?

Best practices: (1) Check Retry-After header and wait that long before retrying, (2) If no Retry-After, use exponential backoff with jitter, (3) Monitor RateLimit-Remaining header proactively to slow down before hitting limit, (4) Implement client-side request queuing, (5) Cache responses aggressively to reduce requests, (6) Don't retry immediately - that makes the problem worse.

What rate limiting algorithm should I use?

Depends on your needs: (1) Token bucket: Best for APIs needing burst tolerance (most common choice), (2) Sliding window counter: Best balance of accuracy and performance for production, (3) Fixed window: Simplest but has boundary issues, (4) Leaky bucket: Best for strictly smooth rate enforcement. For most APIs, start with token bucket or sliding window counter.

How do I implement rate limiting in distributed systems?

Options: (1) Centralized state with Redis (most accurate but adds latency), (2) Local limits on each server = total_limit / server_count (simple but less accurate), (3) Sticky sessions (route same user to same server), (4) Eventual consistency with gossip protocols (complex). Most production systems use Redis with fallback to local limiting if Redis fails.

Should rate limits be per IP, per API key, or per user?

Depends on authentication: (1) Public APIs without auth: per IP (watch for NAT, shared IPs), (2) Authenticated APIs: per API key or per user (more accurate), (3) Hybrid: IP limits for unauthenticated endpoints, user limits for authenticated, (4) Per-endpoint limits for expensive operations. Consider multiple layers (per-second, per-minute, per-hour) for better protection.

How do I choose appropriate rate limits?

Considerations: (1) Server capacity: what load can you handle?, (2) Business model: freemium tiers with different limits, (3) Abuse prevention: low enough to prevent DoS, (4) User experience: high enough for legitimate use, (5) Per-endpoint costs: expensive operations get stricter limits. Start conservative, monitor usage patterns, adjust based on data. Common pattern: 100 req/hour free, 10,000 req/hour paid.

Common Causes

  • Sending too many API requests in short time period
  • Client not implementing rate limit backoff or retry logic
  • Automated scripts or bots making rapid requests
  • Mobile app making requests in tight loop
  • Multiple browser tabs/windows making concurrent requests
  • Exceeding free tier quota on freemium API
  • Shared IP address (NAT, VPN) with multiple users hitting same limit
  • Web scraping or data extraction exceeding limits
  • DDoS attack or malicious traffic
  • Misconfigured client retrying failed requests too aggressively
  • Legitimate traffic spike beyond purchased tier limits
  • Polling an endpoint too frequently instead of using webhooks/SSE

Implementation Guidance

  • Check Retry-After header and wait the specified time before retrying
  • Implement exponential backoff: wait 1s, 2s, 4s, 8s, etc. between retries
  • Add random jitter to retry delays to prevent thundering herd
  • Monitor RateLimit-Remaining header proactively and slow down before hitting limit
  • Cache API responses to reduce number of requests
  • Batch multiple operations into single requests when API supports it
  • Implement client-side request queue/throttling
  • Use webhooks or Server-Sent Events instead of polling
  • Reduce polling frequency (e.g., poll every 60s instead of 5s)
  • Upgrade to higher tier/paid plan for increased rate limits
  • Distribute requests across multiple API keys if allowed
  • Review code for request loops or unintentional rapid requests
  • Consider using CDN or caching layer for frequently accessed data

Comments