429 Too Many Requests

What is HTTP 429 Too Many Requests?

Think about the playground slide with a rule: "Everyone can go down the slide 10...

HTTP 429 Too Many Requests status code illustration

Explain Like I’m 3

Imagine you’re on a long car trip and you keep asking Mom “Are we there yet? Are we there yet? Are we there yet?” over and over really fast! Mom says “Sweetie, you’re asking too many times! Wait 5 minutes and then you can ask again.” That’s what 429 means - you asked the computer too many questions too fast, and it’s saying “Slow down! Wait a little bit before asking again!”

Example: When you keep pressing the elevator button over and over, but it doesn’t make the elevator come faster! The elevator is telling you “I heard you the first time, stop pressing!”

Explain Like I’m 5

Think about the playground slide with a rule: “Everyone can go down the slide 10 times every 10 minutes.” If you try to go down 15 times in 5 minutes, the playground monitor says “Whoa! Too many turns! You need to wait until the 10 minutes is up, then you can have 10 more turns.” This is fair because it gives everyone a chance to use the slide. That’s exactly what 429 Too Many Requests means - you’re using the service too much, too fast, and you need to wait before you can use it again!

Example: Like when you’re at the library and can only check out 5 books at a time. If you try to check out a 6th book, the librarian says “You’ve hit your limit! Return some books first, then you can check out more.”

Jr. Developer

429 Too Many Requests indicates you’ve exceeded the rate limit for an API or service. Rate limiting is a crucial technique to prevent abuse, ensure fair usage, and protect servers from being overwhelmed.

Why rate limiting exists:

Prevent abuse: Stop malicious actors from overwhelming the service (DoS protection)
Fair resource allocation: Ensure all users get equal access
Cost control: Limit expensive operations (database queries, external API calls)
Infrastructure protection: Prevent server overload and crashes
Business model: Enforce tiered pricing (free tier = 100 req/hour, paid = 10,000 req/hour)

Common rate limit schemes:

Per IP address: 100 requests per hour
Per API key: 1,000 requests per day
Per user account: 10,000 requests per month
Per endpoint: 10 requests per minute (for expensive operations)

Key headers in 429 responses:

Retry-After: How long to wait before retrying (seconds or HTTP date)
X-RateLimit-Limit: Maximum requests allowed in window
X-RateLimit-Remaining: Requests left in current window
X-RateLimit-Reset: When the limit resets (Unix timestamp)

How to handle 429 as a client:

Check for Retry-After header
Wait the specified time before retrying
Implement exponential backoff if no Retry-After header
Monitor rate limit headers to avoid hitting limits
Cache responses when possible to reduce requests

Code Example

// Express.js: Simple rate limiting middleware
const rateLimit = require('express-rate-limit');

// Create rate limiter: 100 requests per 15 minutes
const limiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100, // Max 100 requests per window
  standardHeaders: true, // Return RateLimit-* headers
  legacyHeaders: false, // Disable X-RateLimit-* headers
  handler: (req, res) => {
    res.status(429).json({
      error: 'Too Many Requests',
      message: 'Rate limit exceeded. Try again later.',
      retry_after: Math.ceil(req.rateLimit.resetTime.getTime() / 1000)
    });
  }
});

// Apply to all requests
app.use(limiter);

// Different limit for expensive endpoint
const strictLimiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 5, // Max 5 requests per minute
  message: 'This endpoint is rate-limited to 5 requests per minute'
});

app.post('/api/search', strictLimiter, async (req, res) => {
  // Expensive search operation
  const results = await performSearch(req.body.query);
  res.json(results);
});

// Client-side: Handling 429 with retry
async function fetchWithRetry(url, options = {}, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    const response = await fetch(url, options);

    if (response.status !== 429) {
      return response;
    }

    // Rate limited - check Retry-After header
    const retryAfter = response.headers.get('Retry-After');
    const delay = retryAfter
      ? parseInt(retryAfter) * 1000
      : Math.pow(2, i) * 1000; // Exponential backoff

    console.log(`Rate limited. Retrying after ${delay}ms`);
    await new Promise(resolve => setTimeout(resolve, delay));
  }

  throw new Error('Max retries exceeded');
}

Crash Course

HTTP 429 Too Many Requests (RFC 6585 §4) signals that the client has exceeded the rate limit for requests. This status code is fundamental to API stability, security, and business model enforcement.

RFC 6585 Definition: “The 429 status code indicates that the user has sent too many requests in a given amount of time (‘rate limiting’). The response representations SHOULD include details explaining the condition, and MAY include a Retry-After header indicating how long to wait before making a new request.”

Rate Limiting Algorithms:

Token Bucket
- Tokens added to bucket at fixed rate (refill rate)
- Each request consumes a token
- Allows bursts up to bucket capacity
- Commonly used (e.g., AWS API Gateway)
Leaky Bucket
- Processes requests at constant rate
- Smooths bursts, enforces steady output
- Requests overflow if bucket full
- Good for strict rate enforcement
Fixed Window
- Counter resets at fixed intervals (every hour, etc.)
- Simple but has edge case: 2x requests at window boundary
- Example: 100 req/hour, user sends 100 at 12:59, 100 at 13:01 = 200 in 2 minutes
Sliding Window
- Tracks requests in rolling time window
- More accurate than fixed window
- Higher memory/computation cost
- Two variants: sliding window log (precise) and sliding window counter (approximate)

Rate Limit Granularity:

IP-based: Simple but problematic (NAT, shared IPs, VPNs)
API key-based: Most common for authenticated APIs
User-based: For user accounts
Endpoint-based: Different limits per endpoint
Hybrid: Combination (e.g., per-user-per-endpoint)

Response Headers:

Standardized headers (IETF Draft):

RateLimit-Limit: Request quota in time window
RateLimit-Remaining: Remaining requests
RateLimit-Reset: Seconds until quota resets

Legacy headers (still common):

X-RateLimit-Limit
X-RateLimit-Remaining
X-RateLimit-Reset
Retry-After: When cli…

Code Example

// Token bucket rate limiter with Redis
const Redis = require('ioredis');
const redis = new Redis();

class TokenBucketLimiter {
  constructor(options) {
    this.capacity = options.capacity; // Max tokens
    this.refillRate = options.refillRate; // Tokens per second
    this.redis = options.redis;
  }

  async consume(key, tokens = 1) {
    const now = Date.now() / 1000; // Current time in seconds
    const bucketKey = `ratelimit:${key}`;

    // Lua script for atomic token bucket operation
    const script = `
      local capacity = tonumber(ARGV[1])
      local refill_rate = tonumber(ARGV[2])
      local tokens_requested = tonumber(ARGV[3])
      local now = tonumber(ARGV[4])

      local bucket = redis.call('HMGET', KEYS[1], 'tokens', 'last_refill')
      local tokens = tonumber(bucket[1]) or capacity
      local last_refill = tonumber(bucket[2]) or now

      -- Calculate tokens to add based on time elapsed
      local elapsed = now - last_refill
      local tokens_to_add = elapsed * refill_rate
      tokens = math.min(capacity, tokens + tokens_to_add)

      local allowed = 0
      local retry_after = 0

      if tokens >= tokens_requested then
        tokens = tokens - tokens_requested
        allowed = 1
      else
        -- Calculate when enough tokens will be available
        retry_after = math.ceil((tokens_requested - tokens) / refill_rate)
      end

      -- Update bucket state
      redis.call('HMSET', KEYS[1], 'tokens', tokens, 'last_refill', now)
      redis.call('EXPIRE', KEYS[1], 3600) -- Expire after 1 hour of inactivity

      return {allowed, tokens, retry_after}
    `;

    const result = await this.redis.eval(
      script,
      1,
      bucketKey,
      this.capacity,
      this.refillRate,
      tokens,
      now
    );

    return {
      allowed: result[0] === 1,
      remaining: Math.floor(result[1]),
      retryAfter: result[2]
    };
  }
}

const limiter = new TokenBucketLimiter({
  capacity: 100, // 100 tokens
  refillRate: 10, // 10 tokens per second
  redis
});

// Express middleware
app.use(async (req, res, next) => {
  const key = req.user?.id || req.ip;
  const result = await limiter.consume(key);

  // Add rate limit headers to all responses
  res.set('RateLimit-Limit', limiter.capacity);
  res.set('RateLimit-Remaining', result.remaining);
  res.set('RateLimit-Reset', Math.ceil(result.remaining / limiter.refillRate));

  if (!result.allowed) {
    return res.status...

Deep Dive

HTTP 429 Too Many Requests represents a critical control mechanism in modern API design, balancing resource protection, fair usage, and business model enforcement. Effective rate limiting requires careful consideration of algorithm choice, distributed consistency, and client interaction patterns.

Algorithm Comparison and Trade-offs:

Token Bucket:

Advantages: Allows controlled bursts, flexible (separate token rate and bucket size), widely understood
Disadvantages: Can allow sudden traffic spikes, requires state storage
Use case: APIs with variable workloads, need to allow bursts
Memory: O(1) per key (just token count and timestamp)
Accuracy: Exact

Leaky Bucket:

Advantages: Smooth rate enforcement, prevents bursts, simple conceptual model
Disadvantages: Doesn’t allow legitimate bursts, can increase latency (queue delay)
Use case: Services requiring smooth, predictable load
Memory: O(n) if queuing requests
Accuracy: Exact

Fixed Window:

Advantages: Extremely simple, minimal memory, easy to implement
Disadvantages: Boundary problem (2x requests at window edge), less accurate
Use case: Simple rate limiting where precision not critical
Memory: O(1) per key (counter + window start time)
Accuracy: Approximate (worst case 2x limit at boundaries)

Sliding Window Log:

Advantages: Precise, no boundary issues, accurate enforcement
Disadvantages: High memory cost (stores timestamp per request), expensive to compute
Use case: Critical systems requiring precise rate limiting
Memory: O(limit) per key (stores all recent request timestamps)
Accuracy: Exact

Sliding Window Counter:

Advantages: Balance of accuracy and efficiency, smooth rolling window
Disadvantages: Slightly approximate, more complex than fixed window
Use case: Production APIs balancing accuracy and performance
Memory: O(1) per key (current + previous window counters)
Accuracy: Weighted approximation (typically within 0.003% of exact)

Distributed Rate Limiting Challenges:

In distributed systems with multiple servers, rate limiting faces consistency challenges:

Race Conditions: Multiple servers checking limit simultaneously
Synchronization Overhead: Coordinating across servers adds latency
Network Partitions: Split-brain scenarios may allow limit exceedance
Clock Skew: Different server times affect wind…

Code Example

// Production-grade distributed rate limiting system
const Redis = require('ioredis');
const { promisify } = require('util');

// Distributed token bucket with Redis and Lua
class DistributedRateLimiter {
  constructor(options) {
    this.redis = options.redis;
    this.fallbackToLocal = options.fallbackToLocal !== false;
    this.localCache = new Map();
    this.cacheTTL = options.cacheTTL || 1000; // 1 second local cache
  }

  // Multi-tier rate limiting: different limits for different user tiers
  async checkLimit(userId, tier = 'free') {
    const limits = this.getTierLimits(tier);

    // Check each limit type (per-second, per-minute, per-hour)
    for (const limit of limits) {
      const result = await this.checkSingleLimit(
        userId,
        limit.window,
        limit.max,
        limit.name
      );

      if (!result.allowed) {
        return result;
      }
    }

    return { allowed: true, limits };
  }

  getTierLimits(tier) {
    const tiers = {
      free: [
        { name: 'per_second', window: 1, max: 2 },
        { name: 'per_minute', window: 60, max: 100 },
        { name: 'per_hour', window: 3600, max: 1000 }
      ],
      basic: [
        { name: 'per_second', window: 1, max: 10 },
        { name: 'per_minute', window: 60, max: 500 },
        { name: 'per_hour', window: 3600, max: 10000 }
      ],
      pro: [
        { name: 'per_second', window: 1, max: 100 },
        { name: 'per_minute', window: 60, max: 5000 },
        { name: 'per_hour', window: 3600, max: 100000 }
      ]
    };
    return tiers[tier] || tiers.free;
  }

  async checkSingleLimit(userId, window, max, limitName) {
    const key = `ratelimit:${userId}:${limitName}`;

    // Try local cache first (avoids Redis roundtrip)
    const cached = this.getFromLocalCache(key);
    if (cached && cached.allowed === false) {
      return cached; // Definitely rate limited
    }

    try {
      // Sliding window counter algorithm with Redis
      const now = Date.now();
      const windowStart = now - (window * 1000);

      // Lua script for atomic sliding window check
      const script = `
        local key = KEYS[1]
        local window_start = tonumber(ARGV[1])
        local now = tonumber(ARGV[2])
        local max_requests = tonumber(ARGV[3])
        local window_size = tonumber(ARGV[4])

        -- Remove old entries
        redis.call('ZREMRANGEBYSCORE', key, 0, window_start)

        -- Count current requests
        local current = redis.call('ZCARD', key)

        if current < max_requests then
          -- Add this request
          redis.call('ZADD', key, now, now .. '-' .. math.random())
          redis.call('EXPIRE', key, window_size * 2)
          return {1, current + 1, max_requests - current - 1, 0}
        else
          -- Rate limited - calculate retry after
          local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
          local retry_after = math.ceil((tonumber(oldest[2...

Frequently Asked Questions

What's the difference between 429 Too Many Requests and 503 Service Unavailable?

429 indicates the client has exceeded their rate limit quota - it's the client's fault, and they should slow down and retry after the specified delay. 503 indicates the server is temporarily overloaded or under maintenance - it's the server's fault, not related to any specific user's quota. Use 429 for quota enforcement, 503 for server capacity issues.

What should I include in 429 responses?

Include: (1) Retry-After header (seconds or HTTP date), (2) Rate limit headers (RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset), (3) Clear error message explaining the limit, (4) Link to rate limit documentation, (5) For freemium APIs: link to pricing/upgrade page. The Retry-After header is most critical for client retry logic.

How should clients handle 429 responses?

Best practices: (1) Check Retry-After header and wait that long before retrying, (2) If no Retry-After, use exponential backoff with jitter, (3) Monitor RateLimit-Remaining header proactively to slow down before hitting limit, (4) Implement client-side request queuing, (5) Cache responses aggressively to reduce requests, (6) Don't retry immediately - that makes the problem worse.

What rate limiting algorithm should I use?

Depends on your needs: (1) Token bucket: Best for APIs needing burst tolerance (most common choice), (2) Sliding window counter: Best balance of accuracy and performance for production, (3) Fixed window: Simplest but has boundary issues, (4) Leaky bucket: Best for strictly smooth rate enforcement. For most APIs, start with token bucket or sliding window counter.

How do I implement rate limiting in distributed systems?

Options: (1) Centralized state with Redis (most accurate but adds latency), (2) Local limits on each server = total_limit / server_count (simple but less accurate), (3) Sticky sessions (route same user to same server), (4) Eventual consistency with gossip protocols (complex). Most production systems use Redis with fallback to local limiting if Redis fails.

Should rate limits be per IP, per API key, or per user?

Depends on authentication: (1) Public APIs without auth: per IP (watch for NAT, shared IPs), (2) Authenticated APIs: per API key or per user (more accurate), (3) Hybrid: IP limits for unauthenticated endpoints, user limits for authenticated, (4) Per-endpoint limits for expensive operations. Consider multiple layers (per-second, per-minute, per-hour) for better protection.

How do I choose appropriate rate limits?

Considerations: (1) Server capacity: what load can you handle?, (2) Business model: freemium tiers with different limits, (3) Abuse prevention: low enough to prevent DoS, (4) User experience: high enough for legitimate use, (5) Per-endpoint costs: expensive operations get stricter limits. Start conservative, monitor usage patterns, adjust based on data. Common pattern: 100 req/hour free, 10,000 req/hour paid.

Common Causes

Sending too many API requests in short time period
Client not implementing rate limit backoff or retry logic
Automated scripts or bots making rapid requests
Mobile app making requests in tight loop
Multiple browser tabs/windows making concurrent requests
Exceeding free tier quota on freemium API
Shared IP address (NAT, VPN) with multiple users hitting same limit
Web scraping or data extraction exceeding limits
DDoS attack or malicious traffic
Misconfigured client retrying failed requests too aggressively
Legitimate traffic spike beyond purchased tier limits
Polling an endpoint too frequently instead of using webhooks/SSE

Implementation Guidance

Check Retry-After header and wait the specified time before retrying
Implement exponential backoff: wait 1s, 2s, 4s, 8s, etc. between retries
Add random jitter to retry delays to prevent thundering herd
Monitor RateLimit-Remaining header proactively and slow down before hitting limit
Cache API responses to reduce number of requests
Batch multiple operations into single requests when API supports it
Implement client-side request queue/throttling
Use webhooks or Server-Sent Events instead of polling
Reduce polling frequency (e.g., poll every 60s instead of 5s)
Upgrade to higher tier/paid plan for increased rate limits
Distribute requests across multiple API keys if allowed
Review code for request loops or unintentional rapid requests
Consider using CDN or caching layer for frequently accessed data

429 Too Many Requests

What is HTTP 429 Too Many Requests?

Explain Like I’m 3

Explain Like I’m 5

Jr. Developer

Code Example

Crash Course

Code Example

Deep Dive

Code Example

Frequently Asked Questions

What's the difference between 429 Too Many Requests and 503 Service Unavailable?

What should I include in 429 responses?

How should clients handle 429 responses?

What rate limiting algorithm should I use?

How do I implement rate limiting in distributed systems?

Should rate limits be per IP, per API key, or per user?

How do I choose appropriate rate limits?

Common Causes

Implementation Guidance

Related Status Codes

Comments