# API Gateway Patterns for Microservices at Scale
API gateway patterns for microservices determine whether your architecture scales gracefully or collapses under load. The gateway is the front door to every service. Every client request passes through it. Every authentication decision, rate limit, and transformation happens there. Get it wrong and you've built a bottleneck. Get it right and you've built a control plane that makes every downstream service simpler.
We've built API gateways for production systems that handle millions of requests — from serverless APIs backed by Lambda and DynamoDB to high-traffic government platforms behind CloudFront. These patterns come from production experience, not whiteboards.
Why API Gateways Exist
Without an API gateway, every microservice handles its own authentication, rate limiting, request validation, CORS, logging, and protocol translation. That's not microservices — that's distributed duplication.
An API gateway centralizes cross-cutting concerns:
- Authentication and authorization: Verify identity once at the edge
- Rate limiting: Protect backend services from abuse and overload
- Request routing: Direct traffic to the correct service version
- Protocol translation: Accept REST, serve GraphQL, translate to gRPC
- Response caching: Reduce backend load for cacheable responses
- Observability: Centralized access logging and metrics
The gateway pattern doesn't replace service-level security or validation. It provides the first layer — the coarse-grained checks that reject bad requests before they consume backend resources.
AWS API Gateway: Choosing Your Type
AWS offers three API Gateway types, each optimized for different patterns:
REST API (v1)
Full-featured gateway with request/response transformation, usage plans, API keys, caching, WAF integration, and request validation. Higher per-request cost, more capabilities.
Use when: You need request transformation, caching at the gateway layer, or usage plan management for external API consumers.
HTTP API (v2)
Lightweight gateway with lower latency and lower cost. Supports JWT authorizers, Lambda integrations, and HTTP proxy. Fewer features, significantly cheaper.
Use when: You're building internal APIs, using JWT-based auth, and don't need gateway-level caching or transformation.
WebSocket API
Persistent connections for real-time communication. Routes messages based on message content to Lambda functions. Manages connection state.
Use when: You need real-time bidirectional communication — chat, live dashboards, streaming updates.
For most production microservice architectures, HTTP API covers 80% of use cases at half the cost of REST API. Reserve REST API for endpoints that specifically need its additional capabilities.
Rate Limiting Patterns
Rate limiting is the most important gateway function most teams implement last. Without it, a single misbehaving client — or an attacker — can overwhelm your entire backend.
Token Bucket at the Gateway
API Gateway's usage plans implement token bucket rate limiting: a bucket fills with tokens at a steady rate, and each request consumes one token. When the bucket is empty, requests are throttled.
# SAM template: API with usage plan rate limiting
ApiUsagePlan:
Type: AWS::ApiGateway::UsagePlan
Properties:
UsagePlanName: standard-tier
Throttle:
BurstLimit: 100
RateLimit: 50
Quota:
Limit: 10000
Period: DAY
ApiStages:
- ApiId: !Ref RestApi
Stage: production
PremiumUsagePlan:
Type: AWS::ApiGateway::UsagePlan
Properties:
UsagePlanName: premium-tier
Throttle:
BurstLimit: 500
RateLimit: 200
Quota:
Limit: 100000
Period: DAY
ApiStages:
- ApiId: !Ref RestApi
Stage: production Tiered Rate Limiting
Different consumers need different limits. Internal services need higher throughput than external partners. Authenticated users need more capacity than anonymous users.
Implement tiered rate limiting by mapping API keys to usage plans:
def determine_rate_tier(event):
api_key = event.get('requestContext', {}).get('identity', {}).get('apiKey')
user_claims = event.get('requestContext', {}).get('authorizer', {}).get('claims', {})
if user_claims.get('scope') == 'internal-service':
return 'unlimited'
elif user_claims.get('subscription') == 'premium':
return 'premium'
elif api_key:
return 'standard'
else:
return 'anonymous' Circuit Breaking
Rate limiting protects against external overload. Circuit breaking protects against internal failure. When a downstream service starts failing, the gateway should stop sending it traffic rather than queuing up requests that will timeout.
Implement circuit breaking with Lambda@Edge or a lightweight middleware:
interface CircuitState {
failures: number;
lastFailure: number;
state: 'CLOSED' | 'OPEN' | 'HALF_OPEN';
}
const FAILURE_THRESHOLD = 5;
const RECOVERY_TIMEOUT_MS = 30000;
function checkCircuit(serviceId: string, circuits: Map<string, CircuitState>): boolean {
const circuit = circuits.get(serviceId) ?? {
failures: 0, lastFailure: 0, state: 'CLOSED'
};
if (circuit.state === 'OPEN') {
if (Date.now() - circuit.lastFailure > RECOVERY_TIMEOUT_MS) {
circuit.state = 'HALF_OPEN';
return true;
}
return false;
}
return true;
}
function recordFailure(serviceId: string, circuits: Map<string, CircuitState>): void {
const circuit = circuits.get(serviceId) ?? {
failures: 0, lastFailure: 0, state: 'CLOSED'
};
circuit.failures++;
circuit.lastFailure = Date.now();
if (circuit.failures >= FAILURE_THRESHOLD) {
circuit.state = 'OPEN';
}
circuits.set(serviceId, circuit);
} Authentication at the Gateway
Authentication at the gateway eliminates the need for every downstream service to validate tokens independently. The gateway verifies the token and passes verified claims downstream.
JWT Authorizers
HTTP API's built-in JWT authorizer validates tokens against any OIDC-compliant identity provider:
HttpApi:
Type: AWS::ApiGatewayV2::Api
Properties:
Name: production-api
ProtocolType: HTTP
JWTAuthorizer:
Type: AWS::ApiGatewayV2::Authorizer
Properties:
ApiId: !Ref HttpApi
AuthorizerType: JWT
IdentitySource: "$request.header.Authorization"
Name: cognito-jwt
JwtConfiguration:
Audience:
- !Ref UserPoolClient
Issuer: !Sub "https://cognito-idp.${AWS::Region}.amazonaws.com/${UserPool}" Lambda Authorizers for Complex Logic
When authentication requires more than token validation — checking IP allowlists, verifying custom headers, querying a permissions database — use a Lambda authorizer:
def authorizer_handler(event, context):
token = event.get('authorizationToken', '').replace('Bearer ', '')
try:
claims = verify_jwt(token)
permissions = load_user_permissions(claims['sub'])
return generate_policy(
principal_id=claims['sub'],
effect='Allow',
resource=event['methodArn'],
context={
'userId': claims['sub'],
'permissions': json.dumps(permissions),
'organizationId': claims.get('org_id', ''),
}
)
except (InvalidTokenError, ExpiredTokenError):
raise Exception('Unauthorized')
def generate_policy(principal_id, effect, resource, context):
return {
'principalId': principal_id,
'policyDocument': {
'Version': '2012-10-17',
'Statement': [{
'Action': 'execute-api:Invoke',
'Effect': effect,
'Resource': resource,
}]
},
'context': context,
} The authorizer response is cached by API Gateway (configurable TTL), so subsequent requests with the same token don't invoke the Lambda function again.
Request Transformation
The gateway translates between what clients send and what services expect. This decouples your public API contract from internal service interfaces.
Mapping Templates (REST API)
REST API's Velocity Template Language transforms requests and responses without code:
## Request transformation: flatten nested client payload for backend
#set($body = $input.path('$'))
{
"userId": "$context.authorizer.userId",
"itemId": "$body.data.item.id",
"quantity": $body.data.item.quantity,
"requestId": "$context.requestId",
"timestamp": "$context.requestTime"
} Header Enrichment
Add context headers that backend services need but clients shouldn't provide:
IntegrationRequest:
Type: AWS::ApiGatewayV2::Integration
Properties:
ApiId: !Ref HttpApi
IntegrationType: HTTP_PROXY
IntegrationUri: !Sub "https://internal.${DomainName}/api/v2"
RequestParameters:
"append:header.X-Request-Id": "$context.requestId"
"append:header.X-User-Id": "$context.authorizer.claims.sub"
"append:header.X-Org-Id": "$context.authorizer.claims.org_id"
"overwrite:header.Host": !Sub "internal.${DomainName}" Caching Strategies
Gateway-level caching reduces backend load for read-heavy APIs. API Gateway REST API provides built-in caching with configurable TTL per endpoint.
Cache Key Design
The cache key determines what responses are shared. A poorly designed cache key either serves stale data to the wrong user or defeats caching entirely.
CacheSettings:
CachingEnabled: true
CacheClusterEnabled: true
CacheClusterSize: "1.6"
CacheDataEncrypted: true
CacheTtlInSeconds: 300
CacheKeyParameters:
- method.request.path.resourceId
- method.request.querystring.version
# Do NOT include Authorization header - responses are per-resource, not per-user For user-specific responses, skip gateway caching and cache at the service layer instead. Gateway caching works best for public or semi-public data: product catalogs, configuration, reference data.
Cache Invalidation
Cache invalidation is a hard problem. API Gateway supports manual cache flushing per stage, but that's a blunt instrument. For granular invalidation, use short TTLs combined with ETag-based conditional requests:
export async function handleRequest(event: APIGatewayProxyEvent) {
const resource = await getResource(event.pathParameters!.id!);
const etag = computeETag(resource);
if (event.headers['If-None-Match'] === etag) {
return { statusCode: 304, body: '' };
}
return {
statusCode: 200,
headers: { 'ETag': etag, 'Cache-Control': 'max-age=60' },
body: JSON.stringify(resource),
};
} API Versioning
APIs evolve. Clients can't all upgrade simultaneously. Versioning strategies determine how gracefully you can evolve your API without breaking existing consumers.
URL Path Versioning
The simplest and most explicit approach. Different versions route to different integrations:
Routes:
- path: /v1/orders
integration: OrderServiceV1
- path: /v2/orders
integration: OrderServiceV2 Header-Based Versioning
Cleaner URLs, more complex routing. Use a custom header or Accept header for version selection:
Accept: application/vnd.api.v2+json We prefer URL path versioning for external APIs (explicit, easy to document, easy to test) and header-based versioning for internal service-to-service communication where URL stability matters.
Building React and TypeScript production applications on top of well-designed APIs means the frontend team has a stable contract to build against — versioning at the gateway gives them that stability.
Our full-stack development practice treats the API gateway as a first-class architectural component, not infrastructure plumbing. The decisions you make at the gateway — caching, rate limiting, transformation, versioning — ripple through every service and every client.
Frequently Asked Questions
Should I use one API gateway or one per service?
One gateway per domain boundary. For most organizations, that's one gateway for external-facing APIs and one for internal service-to-service communication. A gateway per microservice is over-engineering — it reintroduces the distributed cross-cutting concern problem that gateways solve. For very large organizations with independent teams, a gateway per team or bounded context can work.
How do you handle API gateway failures?
API Gateway is a managed service with built-in redundancy across multiple AZs. For additional resilience, deploy across multiple regions with Route 53 health checks and failover routing. For self-managed gateways (Kong, Envoy), run multiple instances behind a load balancer with health checks and automatic replacement.
What's the performance overhead of API Gateway?
HTTP API adds 5-15ms of latency. REST API adds 15-30ms. For most applications, this is negligible compared to backend processing time. If sub-millisecond gateway latency is required, consider Application Load Balancer with Lambda targets or direct service-to-service communication for internal calls where gateway features aren't needed.
How do you test API gateway configurations?
Test at three levels: unit test request/response transformations with mock events, integration test the full gateway-to-service flow in a staging environment, and load test to validate rate limiting and caching behavior under realistic traffic patterns. API Gateway's stage deployment model lets you maintain a persistent staging environment that mirrors production configuration.
When should you not use an API gateway?
Skip the gateway for service-to-service communication within a VPC where both services trust each other and share the same security context. Use service mesh (Istio, App Mesh) for internal traffic management instead. Also skip for ultra-low-latency requirements where even 10ms of overhead is unacceptable — direct TCP connections with mutual TLS are appropriate there.
---
API gateway architecture determines how your microservices scale, secure, and evolve. Talk to Rutagon about building production API infrastructure that grows with your platform.