Skip to main content
INS // Insights

API Gateway Patterns for Microservices at Scale

Updated March 2026 · 11 min read

# API Gateway Patterns for Microservices at Scale

API gateway patterns for microservices determine whether your architecture scales gracefully or collapses under load. The gateway is the front door to every service. Every client request passes through it. Every authentication decision, rate limit, and transformation happens there. Get it wrong and you've built a bottleneck. Get it right and you've built a control plane that makes every downstream service simpler.

We've built API gateways for production systems that handle millions of requests — from serverless APIs backed by Lambda and DynamoDB to high-traffic government platforms behind CloudFront. These patterns come from production experience, not whiteboards.

Why API Gateways Exist

Without an API gateway, every microservice handles its own authentication, rate limiting, request validation, CORS, logging, and protocol translation. That's not microservices — that's distributed duplication.

An API gateway centralizes cross-cutting concerns:

  • Authentication and authorization: Verify identity once at the edge
  • Rate limiting: Protect backend services from abuse and overload
  • Request routing: Direct traffic to the correct service version
  • Protocol translation: Accept REST, serve GraphQL, translate to gRPC
  • Response caching: Reduce backend load for cacheable responses
  • Observability: Centralized access logging and metrics

The gateway pattern doesn't replace service-level security or validation. It provides the first layer — the coarse-grained checks that reject bad requests before they consume backend resources.

AWS API Gateway: Choosing Your Type

AWS offers three API Gateway types, each optimized for different patterns:

REST API (v1)

Full-featured gateway with request/response transformation, usage plans, API keys, caching, WAF integration, and request validation. Higher per-request cost, more capabilities.

Use when: You need request transformation, caching at the gateway layer, or usage plan management for external API consumers.

HTTP API (v2)

Lightweight gateway with lower latency and lower cost. Supports JWT authorizers, Lambda integrations, and HTTP proxy. Fewer features, significantly cheaper.

Use when: You're building internal APIs, using JWT-based auth, and don't need gateway-level caching or transformation.

WebSocket API

Persistent connections for real-time communication. Routes messages based on message content to Lambda functions. Manages connection state.

Use when: You need real-time bidirectional communication — chat, live dashboards, streaming updates.

For most production microservice architectures, HTTP API covers 80% of use cases at half the cost of REST API. Reserve REST API for endpoints that specifically need its additional capabilities.

Rate Limiting Patterns

Rate limiting is the most important gateway function most teams implement last. Without it, a single misbehaving client — or an attacker — can overwhelm your entire backend.

Token Bucket at the Gateway

API Gateway's usage plans implement token bucket rate limiting: a bucket fills with tokens at a steady rate, and each request consumes one token. When the bucket is empty, requests are throttled.

# SAM template: API with usage plan rate limiting
ApiUsagePlan:
  Type: AWS::ApiGateway::UsagePlan
  Properties:
    UsagePlanName: standard-tier
    Throttle:
      BurstLimit: 100
      RateLimit: 50
    Quota:
      Limit: 10000
      Period: DAY
    ApiStages:
      - ApiId: !Ref RestApi
        Stage: production

PremiumUsagePlan:
  Type: AWS::ApiGateway::UsagePlan
  Properties:
    UsagePlanName: premium-tier
    Throttle:
      BurstLimit: 500
      RateLimit: 200
    Quota:
      Limit: 100000
      Period: DAY
    ApiStages:
      - ApiId: !Ref RestApi
        Stage: production

Tiered Rate Limiting

Different consumers need different limits. Internal services need higher throughput than external partners. Authenticated users need more capacity than anonymous users.

Implement tiered rate limiting by mapping API keys to usage plans:

def determine_rate_tier(event):
    api_key = event.get('requestContext', {}).get('identity', {}).get('apiKey')
    user_claims = event.get('requestContext', {}).get('authorizer', {}).get('claims', {})

    if user_claims.get('scope') == 'internal-service':
        return 'unlimited'
    elif user_claims.get('subscription') == 'premium':
        return 'premium'
    elif api_key:
        return 'standard'
    else:
        return 'anonymous'

Circuit Breaking

Rate limiting protects against external overload. Circuit breaking protects against internal failure. When a downstream service starts failing, the gateway should stop sending it traffic rather than queuing up requests that will timeout.

Implement circuit breaking with Lambda@Edge or a lightweight middleware:

interface CircuitState {
  failures: number;
  lastFailure: number;
  state: 'CLOSED' | 'OPEN' | 'HALF_OPEN';
}

const FAILURE_THRESHOLD = 5;
const RECOVERY_TIMEOUT_MS = 30000;

function checkCircuit(serviceId: string, circuits: Map<string, CircuitState>): boolean {
  const circuit = circuits.get(serviceId) ?? {
    failures: 0, lastFailure: 0, state: 'CLOSED'
  };

  if (circuit.state === 'OPEN') {
    if (Date.now() - circuit.lastFailure > RECOVERY_TIMEOUT_MS) {
      circuit.state = 'HALF_OPEN';
      return true;
    }
    return false;
  }

  return true;
}

function recordFailure(serviceId: string, circuits: Map<string, CircuitState>): void {
  const circuit = circuits.get(serviceId) ?? {
    failures: 0, lastFailure: 0, state: 'CLOSED'
  };

  circuit.failures++;
  circuit.lastFailure = Date.now();

  if (circuit.failures >= FAILURE_THRESHOLD) {
    circuit.state = 'OPEN';
  }

  circuits.set(serviceId, circuit);
}

Authentication at the Gateway

Authentication at the gateway eliminates the need for every downstream service to validate tokens independently. The gateway verifies the token and passes verified claims downstream.

JWT Authorizers

HTTP API's built-in JWT authorizer validates tokens against any OIDC-compliant identity provider:

HttpApi:
  Type: AWS::ApiGatewayV2::Api
  Properties:
    Name: production-api
    ProtocolType: HTTP

JWTAuthorizer:
  Type: AWS::ApiGatewayV2::Authorizer
  Properties:
    ApiId: !Ref HttpApi
    AuthorizerType: JWT
    IdentitySource: "$request.header.Authorization"
    Name: cognito-jwt
    JwtConfiguration:
      Audience:
        - !Ref UserPoolClient
      Issuer: !Sub "https://cognito-idp.${AWS::Region}.amazonaws.com/${UserPool}"

Lambda Authorizers for Complex Logic

When authentication requires more than token validation — checking IP allowlists, verifying custom headers, querying a permissions database — use a Lambda authorizer:

def authorizer_handler(event, context):
    token = event.get('authorizationToken', '').replace('Bearer ', '')

    try:
        claims = verify_jwt(token)
        permissions = load_user_permissions(claims['sub'])

        return generate_policy(
            principal_id=claims['sub'],
            effect='Allow',
            resource=event['methodArn'],
            context={
                'userId': claims['sub'],
                'permissions': json.dumps(permissions),
                'organizationId': claims.get('org_id', ''),
            }
        )
    except (InvalidTokenError, ExpiredTokenError):
        raise Exception('Unauthorized')

def generate_policy(principal_id, effect, resource, context):
    return {
        'principalId': principal_id,
        'policyDocument': {
            'Version': '2012-10-17',
            'Statement': [{
                'Action': 'execute-api:Invoke',
                'Effect': effect,
                'Resource': resource,
            }]
        },
        'context': context,
    }

The authorizer response is cached by API Gateway (configurable TTL), so subsequent requests with the same token don't invoke the Lambda function again.

Request Transformation

The gateway translates between what clients send and what services expect. This decouples your public API contract from internal service interfaces.

Mapping Templates (REST API)

REST API's Velocity Template Language transforms requests and responses without code:

## Request transformation: flatten nested client payload for backend
#set($body = $input.path('$'))
{
  "userId": "$context.authorizer.userId",
  "itemId": "$body.data.item.id",
  "quantity": $body.data.item.quantity,
  "requestId": "$context.requestId",
  "timestamp": "$context.requestTime"
}

Header Enrichment

Add context headers that backend services need but clients shouldn't provide:

IntegrationRequest:
  Type: AWS::ApiGatewayV2::Integration
  Properties:
    ApiId: !Ref HttpApi
    IntegrationType: HTTP_PROXY
    IntegrationUri: !Sub "https://internal.${DomainName}/api/v2"
    RequestParameters:
      "append:header.X-Request-Id": "$context.requestId"
      "append:header.X-User-Id": "$context.authorizer.claims.sub"
      "append:header.X-Org-Id": "$context.authorizer.claims.org_id"
      "overwrite:header.Host": !Sub "internal.${DomainName}"

Caching Strategies

Gateway-level caching reduces backend load for read-heavy APIs. API Gateway REST API provides built-in caching with configurable TTL per endpoint.

Cache Key Design

The cache key determines what responses are shared. A poorly designed cache key either serves stale data to the wrong user or defeats caching entirely.

CacheSettings:
  CachingEnabled: true
  CacheClusterEnabled: true
  CacheClusterSize: "1.6"
  CacheDataEncrypted: true
  CacheTtlInSeconds: 300
  CacheKeyParameters:
    - method.request.path.resourceId
    - method.request.querystring.version
    # Do NOT include Authorization header - responses are per-resource, not per-user

For user-specific responses, skip gateway caching and cache at the service layer instead. Gateway caching works best for public or semi-public data: product catalogs, configuration, reference data.

Cache Invalidation

Cache invalidation is a hard problem. API Gateway supports manual cache flushing per stage, but that's a blunt instrument. For granular invalidation, use short TTLs combined with ETag-based conditional requests:

export async function handleRequest(event: APIGatewayProxyEvent) {
  const resource = await getResource(event.pathParameters!.id!);
  const etag = computeETag(resource);

  if (event.headers['If-None-Match'] === etag) {
    return { statusCode: 304, body: '' };
  }

  return {
    statusCode: 200,
    headers: { 'ETag': etag, 'Cache-Control': 'max-age=60' },
    body: JSON.stringify(resource),
  };
}

API Versioning

APIs evolve. Clients can't all upgrade simultaneously. Versioning strategies determine how gracefully you can evolve your API without breaking existing consumers.

URL Path Versioning

The simplest and most explicit approach. Different versions route to different integrations:

Routes:
  - path: /v1/orders
    integration: OrderServiceV1
  - path: /v2/orders
    integration: OrderServiceV2

Header-Based Versioning

Cleaner URLs, more complex routing. Use a custom header or Accept header for version selection:

Accept: application/vnd.api.v2+json

We prefer URL path versioning for external APIs (explicit, easy to document, easy to test) and header-based versioning for internal service-to-service communication where URL stability matters.

Building React and TypeScript production applications on top of well-designed APIs means the frontend team has a stable contract to build against — versioning at the gateway gives them that stability.

Our full-stack development practice treats the API gateway as a first-class architectural component, not infrastructure plumbing. The decisions you make at the gateway — caching, rate limiting, transformation, versioning — ripple through every service and every client.

Frequently Asked Questions

Should I use one API gateway or one per service?

One gateway per domain boundary. For most organizations, that's one gateway for external-facing APIs and one for internal service-to-service communication. A gateway per microservice is over-engineering — it reintroduces the distributed cross-cutting concern problem that gateways solve. For very large organizations with independent teams, a gateway per team or bounded context can work.

How do you handle API gateway failures?

API Gateway is a managed service with built-in redundancy across multiple AZs. For additional resilience, deploy across multiple regions with Route 53 health checks and failover routing. For self-managed gateways (Kong, Envoy), run multiple instances behind a load balancer with health checks and automatic replacement.

What's the performance overhead of API Gateway?

HTTP API adds 5-15ms of latency. REST API adds 15-30ms. For most applications, this is negligible compared to backend processing time. If sub-millisecond gateway latency is required, consider Application Load Balancer with Lambda targets or direct service-to-service communication for internal calls where gateway features aren't needed.

How do you test API gateway configurations?

Test at three levels: unit test request/response transformations with mock events, integration test the full gateway-to-service flow in a staging environment, and load test to validate rate limiting and caching behavior under realistic traffic patterns. API Gateway's stage deployment model lets you maintain a persistent staging environment that mirrors production configuration.

When should you not use an API gateway?

Skip the gateway for service-to-service communication within a VPC where both services trust each other and share the same security context. Use service mesh (Istio, App Mesh) for internal traffic management instead. Also skip for ultra-low-latency requirements where even 10ms of overhead is unacceptable — direct TCP connections with mutual TLS are appropriate there.

---

API gateway architecture determines how your microservices scale, secure, and evolve. Talk to Rutagon about building production API infrastructure that grows with your platform.

Ready to discuss your project?

We deliver production-grade software for government, defense, and commercial clients. Let's talk about what you need.

Initiate Contact