504 Gateway Timeout
What is HTTP 504 Gateway Timeout?
Think of it like this: You want to know the weather, so you ask your teacher. Yo...
Explain Like I’m 3
Imagine you ask your friend for a cookie, but your friend has to ask their parent first. If their parent takes too long to answer and your friend gives up waiting, that’s like a 504 error - the middle person (your friend) got tired of waiting for the answer from the person who actually has what you need.
Explain Like I’m 5
Think of it like this: You want to know the weather, so you ask your teacher. Your teacher doesn’t know, so they call the weather station to find out. But the weather station takes SO long to answer that your teacher eventually hangs up and says ‘Sorry, I couldn’t get the answer in time!’ That’s a 504 error - when the middle helper (like your teacher or a computer) waits too long for information from somewhere else and eventually gives up.
Example: When you try to watch a video on a website and see ‘Gateway Timeout,’ it means the website asked the video storage computer for your video, but that computer took too long to respond, so the website gave up waiting.
Jr. Developer
A 504 Gateway Timeout occurs when a server acting as a gateway or proxy doesn’t receive a timely response from an upstream server. This is different from a 503 (Service Unavailable) - the gateway itself is working fine, but it’s waiting on another server that’s not responding quickly enough.
Common scenarios:
- A reverse proxy (like Nginx) waiting for your application server (like Node.js/Python)
- A CDN waiting for your origin server
- An API gateway waiting for a backend microservice
- A load balancer waiting for one of your app servers
The key distinction: The gateway gave up waiting because of a configured timeout, not because it received a bad response (that would be 502).
Code Example
// Express app behind Nginx reverse proxyapp.get('/slow-operation', async (req, res) => { // If this takes longer than nginx's proxy_read_timeout, // nginx will return 504 to the client const result = await database.complexQuery(); // Takes 65 seconds res.json(result);});
// Nginx config (default proxy_read_timeout is 60s)// location / {// proxy_pass http://localhost:3000;// proxy_read_timeout 60s; // If backend takes >60s, return 504// }Crash Course
A 504 Gateway Timeout indicates a timing issue in a multi-tier architecture. Per RFC 9110, this status code is returned when ‘the server, while acting as a gateway or proxy, did not receive a timely response from an upstream server.’
Critical distinctions:
- 502 Bad Gateway: Received a response, but it was invalid/malformed
- 503 Service Unavailable: The service itself is down/overloaded
- 504 Gateway Timeout: No response received within the timeout window
Common timeout chains:
Client → CDN (CloudFront) → Load Balancer (ALB) → Reverse Proxy (Nginx) → App Server (Node.js) → Database ↑ 60s timeout ↑ 60s timeout ↑ 60s timeout ↑ 30s timeout ↑ 5s timeoutIf the database query takes 6 seconds, the app server times out. If the app server takes 70 seconds total, Nginx times out. Each layer must have a timeout longer than its downstream dependencies.
Nginx-specific timeout directives:
proxy_connect_timeout: Time to establish connection to upstream (default: 60s)proxy_send_timeout: Time to send request to upstream (default: 60s)proxy_read_timeout: Time to read response from upstream (default: 60s)fastcgi_read_timeout: Time for FastCGI responses like PHP-FPM (default: 60s)
Common error log patterns:
upstream timed out (110: Connection timed out) while reading response header from upstream→ Backend didn't send response headers in time
connect() timed out (110: Connection timed out) while connecting to upstream→ Couldn't establish connection to backend in timeCode Example
// API Gateway with intelligent timeout handlingconst axios = require('axios');
app.get('/api/data', async (req, res) => { // Gateway timeout is 60s, set client timeout to 55s // to handle gracefully before nginx/ALB times out const controller = new AbortController(); const timeoutId = setTimeout(() => controller.abort(), 55000);
try { const response = await axios.get('http://upstream-service/data', { signal: controller.signal, timeout: 55000 // Client-side timeout });
clearTimeout(timeoutId); res.json(response.data);
} catch (error) { clearTimeout(timeoutId);
if (error.code === 'ECONNABORTED' || error.name === 'AbortError') { // Log the timeout for monitoring logger.error('Upstream timeout', { service: 'upstream-service', endpoint: '/data', timeout_ms: 55000 });
// Return 504 with helpful error return res.status(504).json({ error: 'Gateway Timeout', message: 'The upstream service did not respond in time', retry_after: 60 // Suggest retry in 60 seconds }); }
throw error; // Re-throw non-timeout errors }});
// Nginx configuration/*location /api/ { proxy_pass http://node-app:3000;
# Connection timeouts proxy_connect_timeout 10s;
# Read/write timeouts (must be > app's 55s timeout) proxy_read_timeout 60s; proxy_send_timeout 60s;
# Buffer settings can affect timeouts proxy_buffering on; proxy_buffer_size 4k; proxy_buffers 8 4k;}*/Deep Dive
RFC 9110 Specification
Per RFC 9110 Section 15.6.5, the 504 (Gateway Timeout) status code indicates that the server, while acting as a gateway or proxy, did not receive a timely response from an upstream server. Unlike 503, the gateway itself is operational; the issue is with upstream timing.
Timeout Hierarchy in Distributed Systems
In a production system, timeouts must be carefully orchestrated across layers:
Client Request (120s total budget) ↓CDN/Edge (CloudFront: 120s) ↓Load Balancer (ALB: 90s idle timeout) ↓API Gateway (Nginx: 85s proxy_read_timeout) ↓Application Server (Node.js: 80s) ↓Service Mesh (Istio: 75s) ↓Backend Service (70s) ↓Database Pool (30s query timeout) ↓Database Query (25s statement timeout)Critical principle: Each layer’s timeout must exceed the sum of all downstream timeouts plus overhead (typically 10-20%). If database timeout = 25s and query processing = 5s, service timeout should be ≥35s.
Gateway-Specific Timeout Configurations
Nginx Reverse Proxy
http { # Connection establishment timeout proxy_connect_timeout 10s;
# Reading response from upstream proxy_read_timeout 60s;
# Sending request to upstream proxy_send_timeout 60s;
# For FastCGI backends (PHP-FPM) fastcgi_connect_timeout 10s; fastcgi_send_timeout 60s; fastcgi_read_timeout 60s;
# Keep-alive to upstream keepalive_timeout 65s;
upstream backend { server app1:3000 max_fails=3 fail_timeout=30s; server app2:3000 max_fails=3 fail_timeout=30s;
# Connection pool to upstream keepalive 32; keepalive_requests 100; keepalive_timeout 60s; }}HAProxy Load Balancer
defaults timeout connect 10s # TCP connection to server timeout client 90s # Client inactivity timeout server 85s # Server response timeout timeout tunnel 1h # WebSocket/long-polling timeout http-request 10s # Complete HTTP request
backend api_servers balance roundrobin option httpchk GET /health http-check expect status 200
server api1 10.0.1.10:3000 check inter 5s fall 3 rise 2 server api2 10.0.1.11:3000 check inter 5s fall 3 rise 2AWS Application Load Balancer
- Idle timeout: 60s (default), configurable up to 4000s
- Connection timeout: 350s (fixed, not configurable)
- Sends 504 if target doesn’t respond within idle ti…
Code Example
// Production gateway with comprehensive timeout handlingconst express = require('express');const axios = require('axios');const promClient = require('prom-client');const { trace } = require('@opentelemetry/api');
const app = express();
// Metricsconst timeoutCounter = new promClient.Counter({ name: 'gateway_timeouts_total', labelNames: ['service', 'endpoint']});
const latencyHistogram = new promClient.Histogram({ name: 'upstream_duration_seconds', labelNames: ['service', 'status'], buckets: [0.1, 0.5, 1, 2, 5, 10, 30, 60]});
// Adaptive timeout managerclass AdaptiveTimeoutManager { constructor(serviceName, baseTimeout = 30000) { this.serviceName = serviceName; this.baseTimeout = baseTimeout; this.recentLatencies = []; }
recordLatency(latencyMs) { this.recentLatencies.push(latencyMs); if (this.recentLatencies.length > 1000) { this.recentLatencies.shift(); } }
getTimeout() { if (this.recentLatencies.length < 100) return this.baseTimeout;
const sorted = [...this.recentLatencies].sort((a, b) => a - b); const p99 = sorted[Math.floor(sorted.length * 0.99)]; return Math.max(this.baseTimeout, Math.min(p99 * 2.5, 120000)); }}
const dataServiceTimeout = new AdaptiveTimeoutManager('data-service', 30000);
// Circuit breakerclass CircuitBreaker { constructor(name, threshold = 5, timeout = 60000) { this.name = name; this.state = 'CLOSED'; this.failures = 0; this.threshold = threshold; this.timeout = timeout; this.lastFailure = null; }
async execute(fn) { if (this.state === 'OPEN') { if (Date.now() - this.lastFailure >= this.timeout) { this.state = 'HALF_OPEN'; } else { throw new Error(`Circuit breaker OPEN for ${this.name}`); } }
try { const result = await fn(); this.onSuccess(); return result; } catch (error) { this.onFailure(); throw error; } }
onSuccess() { this.failures = 0; if (this.state === 'HALF_OPEN') this.state = 'CLOSED'; }
onFailure() { this.failures++; this.lastFailure = Date.now(); if (this.failures >= this.threshold) this.state = 'OPEN'; }}
const dataCircuit = new CircuitBreaker('data-service', 5, 60000);
// Gateway endpoint with full timeout handlingapp.get('/api/data', async (req, res) => { const span = trace.getTracer('gateway').startSpan('proxy_request'); const startTime = Date.now(); const timeout = dataServiceTimeout.getTimeout();
try { const result = await dataCircuit.execute(async () => { const controller = new AbortController(); const timeoutId = setTimeout(() => controller.abort(), timeout);
try { const response = await axios.get('http://data-service/data', { signal: controller.signal, timeout, headers: { 'X-Request-ID': req.id, 'X-Timeout-Budget': timeout.toString() } ...Frequently Asked Questions
What's the difference between 502 Bad Gateway and 504 Gateway Timeout?
502 means the gateway received an invalid/malformed response from the upstream server, while 504 means the gateway didn't receive ANY response within the timeout period. With 502, the upstream sent something back (but it was wrong), while with 504, the upstream never responded in time.
Should my application timeout be longer or shorter than the gateway timeout?
Your application timeout should be SHORTER than the gateway timeout. This allows your app to handle the timeout gracefully, log it properly, and return a controlled 504 response. If the gateway times out first, it forcefully terminates the connection, preventing you from cleaning up or logging properly. Typical pattern: if nginx has 60s timeout, set app timeout to 55s.
How do I fix 504 errors in production?
First, check if it's a timeout configuration issue (timeouts too low) or a performance issue (upstream actually slow). Check error logs to see which layer timed out. If legitimate slowness: optimize the slow operation, implement caching, move to async processing, or add more resources. If timeout too aggressive: increase timeouts at each layer, ensuring each layer's timeout > sum of downstream timeouts.
Should I retry requests that return 504?
For idempotent operations (GET, PUT, DELETE): Yes, retry with exponential backoff. For non-idempotent operations (POST): Be very careful - the upstream might have processed the request but just didn't respond in time. Implement idempotency keys or check if the operation completed before retrying. Always respect Retry-After headers if present.
What timeout values should I use in Nginx?
Common production values: proxy_connect_timeout: 10s (connection establishment), proxy_read_timeout: 60s (reading response), proxy_send_timeout: 60s (sending request). These depend on your use case - APIs serving fast queries can use 30s, while video uploads might need 300s+. Key rule: gateway timeout must exceed the longest expected upstream response time plus buffer.
Can 504 errors affect SEO?
Yes, but less severely than 503. Search engines understand that 504s are often temporary network issues. Occasional 504s won't hurt rankings, but consistent 504s signal infrastructure problems and will impact SEO. Unlike 503, there's no Retry-After header that tells crawlers when to come back (though you can include it as a courtesy). Fix persistent 504s quickly to avoid ranking penalties.
Why am I getting 504 only for some requests but not others?
This indicates variable upstream performance. Likely causes: (1) some requests hit slow database queries, (2) cold start issues in serverless/containers, (3) resource contention under load, (4) certain endpoints calling slow external APIs. Use distributed tracing (OpenTelemetry, Jaeger) to identify which specific operations are slow. Implement per-endpoint timeout budgets rather than global timeouts.
What's the relationship between 504 and load balancer health checks?
If health checks are failing due to slow responses, the load balancer might return 504 to clients while simultaneously marking the backend unhealthy. Check: (1) health check timeout < response timeout, (2) health check endpoint is fast (<1s), (3) not checking expensive operations in health checks. Separate concerns: use /health for quick status checks, not for testing full request flow.
Common Causes
- Upstream server is processing a slow operation (complex database query, external API call)
- Upstream server is overloaded and can’t respond quickly enough
- Network issues between gateway and upstream server
- Timeout configured too aggressively for the workload
- Upstream server crashed or is unresponsive (though 502 is more common in this case)
- Database connection pool exhausted, requests waiting for available connections
- Cold start delays in serverless/container environments
- Cascading timeouts in microservice chains (each service times out waiting for the next)
Implementation Guidance
- Increase gateway timeout configuration: In Nginx: increase proxy_read_timeout, proxy_connect_timeout, and proxy_send_timeout. In HAProxy: increase timeout server. In ALB: increase idle timeout. Ensure each layer’s timeout exceeds downstream timeouts.
- Optimize slow upstream operations: Profile your application to find slow operations. Add database indexes, optimize queries, implement caching (Redis/Memcached), use connection pooling, parallelize independent operations. Consider async processing for operations >30s.
- Scale upstream infrastructure: Add more application servers, increase container/VM resources (CPU/RAM), scale database (read replicas, sharding), use auto-scaling to handle traffic spikes. Monitor resource utilization (CPU, memory, connections).
- Implement request timeout in application: Set application timeouts shorter than gateway timeouts (e.g., 55s app timeout for 60s gateway timeout). This allows graceful handling with proper logging and user feedback before the gateway forcefully terminates the connection.
- Add circuit breakers to prevent cascading failures: Use circuit breaker pattern to fail fast when upstream is consistently slow. After N failures, open circuit and return errors immediately instead of waiting for timeout. Transition to half-open after cooldown period to test recovery.
- Move long operations to background processing: For operations legitimately taking >30 seconds, use job queues (Bull, Celery, SQS) for async processing. Return 202 Accepted immediately with a job ID, allow client to poll for completion, send webhook when done.
- Implement client-side retry with exponential backoff: For idempotent requests, retry with exponential backoff (1s, 2s, 4s, 8s). Respect Retry-After headers. Add jitter to prevent thundering herd. Use circuit breakers on client side to stop retrying consistently failing services.
- Use distributed tracing to identify bottlenecks: Implement OpenTelemetry or similar to trace requests across services. Identify which specific service/operation is slow. Create timeout budget spans showing how time is allocated across the request chain.