Container Security in Production CI/CD

Shipping containers to production without scanning them is like deploying code without tests — it works until it doesn't, and the failure mode is catastrophic. In regulated environments, it's also a compliance violation.

We've built container security into CI/CD pipelines across Kubernetes platforms in regulated environments and security-focused deployment systems where a vulnerable base image isn't just a risk — it's a showstopper. This article covers the production patterns we use: automated vulnerability scanning, policy-as-code enforcement, image signing, and CVE lifecycle management.

The Container Security Pipeline

Container security isn't a single tool — it's a pipeline stage with multiple enforcement points. A production-grade container security pipeline includes:

Base image selection and management — Curated, hardened base images
Build-time scanning — Vulnerability detection before images leave CI
Policy enforcement — Automated gates that block non-compliant images
Image signing — Cryptographic proof of provenance and integrity
Runtime scanning — Continuous monitoring of deployed images
CVE lifecycle management — Tracking, triaging, and remediating vulnerabilities

Each layer catches what the previous layer missed. Defense in depth isn't just a network security concept — it applies to your build pipeline.

Trivy Integration in GitLab CI

Trivy is our primary scanning tool for container images. It's fast, has comprehensive vulnerability databases, and integrates cleanly into CI/CD pipelines. Here's a production GitLab CI configuration:

stages:
  - build
  - scan
  - sign
  - deploy

variables:
  TRIVY_SEVERITY: "CRITICAL,HIGH"
  TRIVY_EXIT_CODE: "1"
  IMAGE_TAG: "${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHA}"

build_image:
  stage: build
  image:
    name: gcr.io/kaniko-project/executor:v1.19.2-debug
    entrypoint: [""]
  script:
    - /kaniko/executor
      --context "${CI_PROJECT_DIR}"
      --dockerfile "${CI_PROJECT_DIR}/Dockerfile"
      --destination "${IMAGE_TAG}"
      --cache=true
      --cache-repo="${CI_REGISTRY_IMAGE}/cache"

trivy_scan:
  stage: scan
  image:
    name: aquasec/trivy:0.49.0
    entrypoint: [""]
  script:
    - trivy image
        --exit-code ${TRIVY_EXIT_CODE}
        --severity ${TRIVY_SEVERITY}
        --ignore-unfixed
        --no-progress
        --format table
        ${IMAGE_TAG}
    - trivy image
        --exit-code 0
        --severity ${TRIVY_SEVERITY}
        --ignore-unfixed
        --no-progress
        --format json
        --output trivy-report.json
        ${IMAGE_TAG}
    - trivy image
        --exit-code 0
        --severity ${TRIVY_SEVERITY}
        --ignore-unfixed
        --no-progress
        --format sarif
        --output trivy-report.sarif
        ${IMAGE_TAG}
  artifacts:
    paths:
      - trivy-report.json
      - trivy-report.sarif
    reports:
      container_scanning: trivy-report.json
  allow_failure: false

Key decisions in this configuration:

--ignore-unfixed — Only flag vulnerabilities that have available patches. Blocking a pipeline for a CVE with no fix creates noise without improving security.
--exit-code 1 — The pipeline fails hard on CRITICAL and HIGH vulnerabilities. No manual review, no exceptions. Fix it or don't ship.
Multiple output formats — Table for human review in pipeline logs, JSON for GitLab's security dashboard integration, SARIF for centralized vulnerability management platforms.
Kaniko for builds — No Docker socket exposure. Kaniko builds images in userspace without privileged access, which is a security requirement in shared CI runners.

Policy-as-Code with OPA

Vulnerability scanning catches known CVEs, but policy enforcement catches architectural violations. Open Policy Agent (OPA) lets us define and enforce container policies as code:

package container.policy

default allow = false

allow {
    not has_critical_vulns
    not uses_root_user
    not uses_latest_tag
    approved_base_image
    has_health_check
}

has_critical_vulns {
    input.vulnerabilities[_].severity == "CRITICAL"
}

uses_root_user {
    input.config.User == ""
}

uses_root_user {
    input.config.User == "root"
}

uses_latest_tag {
    endswith(input.image, ":latest")
}

approved_base_images := {
    "alpine:3.19",
    "node:20-alpine",
    "python:3.12-slim",
    "golang:1.22-alpine",
    "nginx:1.25-alpine"
}

approved_base_image {
    some base in approved_base_images
    startswith(input.base_image, base)
}

has_health_check {
    count(input.config.Healthcheck) > 0
}

This policy enforces five rules simultaneously:

No critical vulnerabilities (redundant with Trivy, but defense in depth)
Container must not run as root
No :latest tags — every image must be pinned to a specific version
Only approved, hardened base images are allowed
Every container must define a health check

The policy file lives in the same repository as the application code. Changes to security policy go through the same code review process as application changes. Security Is Architecture — it's versioned, reviewed, and tested like everything else.

Image Signing with Cosign

After an image passes scanning and policy checks, we sign it with Cosign. Signed images provide cryptographic proof that an image was built by a trusted pipeline and passed all security gates:

sign_image:
  stage: sign
  image: bitnami/cosign:2.2.3
  id_tokens:
    SIGSTORE_ID_TOKEN:
      aud: sigstore
  script:
    - cosign sign
        --yes
        --recursive
        --oidc-issuer=https://gitlab.com
        --identity-token=${SIGSTORE_ID_TOKEN}
        ${IMAGE_TAG}
  needs:
    - build_image
    - trivy_scan

Notice the keyless signing with OIDC — no signing keys to manage, rotate, or protect. The signature is tied to the CI/CD pipeline's identity through the same OIDC federation pattern we use for credential-free deployments. Sigstore's transparency log provides an immutable record of every signing event.

Admission Control with Cosign Verification

Signing images is only half the equation. The other half is enforcing that only signed images can be deployed. Kubernetes admission controllers verify signatures at deploy time:

apiVersion: policy.sigstore.dev/v1beta1
kind: ClusterImagePolicy
metadata:
  name: require-signed-images
spec:
  images:
    - glob: "registry.example.com/**"
  authorities:
    - keyless:
        identities:
          - issuer: https://gitlab.com
            subject: https://gitlab.com/rutagon/*//.gitlab-ci.yml@refs/heads/main
        url: https://fulcio.sigstore.dev
      ctlog:
        url: https://rekor.sigstore.dev

Any attempt to deploy an unsigned image — or an image signed by an unauthorized pipeline — is rejected before it reaches a cluster node. The feedback loop is immediate: the deployment fails with a clear error message explaining which policy was violated.

Base Image Management

Base images are the foundation of container security. A vulnerability in your base image affects every container built on it. We maintain a curated base image registry with automated rebuilds:

rebuild_base_images:
  stage: build
  rules:
    - if: $CI_PIPELINE_SOURCE == "schedule"
  parallel:
    matrix:
      - BASE: ["alpine:3.19", "node:20-alpine", "python:3.12-slim"]
  script:
    - docker build
        --build-arg BASE_IMAGE=${BASE}
        --tag ${CI_REGISTRY_IMAGE}/base/${BASE}
        --file Dockerfile.base .
    - trivy image --exit-code 1 --severity CRITICAL,HIGH
        ${CI_REGISTRY_IMAGE}/base/${BASE}
    - docker push ${CI_REGISTRY_IMAGE}/base/${BASE}

This scheduled pipeline rebuilds base images nightly, pulling the latest security patches from upstream. If a rebuilt base image fails the Trivy scan, the pipeline fails and the team is notified — we don't push vulnerable base images even if upstream released them that way.

CVE Lifecycle Management

Finding vulnerabilities is the easy part. Managing them at scale is the real challenge. In production systems we've built, CVE management follows a structured lifecycle:

Automated Triage

Not every CVE is exploitable in your context. A vulnerability in libcurl doesn't matter if your container never makes outbound HTTP requests. Automated triage filters vulnerabilities based on:

Reachability analysis — Is the vulnerable code path actually invoked?
Network exposure — Is the vulnerable component accessible from untrusted networks?
Data sensitivity — Does the affected service handle sensitive data?
Exploit availability — Is there a known exploit in the wild?

SLA-Based Remediation

Triaged vulnerabilities get remediation SLAs based on severity and context:

Severity	Exploitable	SLA
Critical	Yes	24 hours
Critical	No	7 days
High	Yes	7 days
High	No	30 days
Medium	Any	Next sprint

These SLAs are encoded in the CI/CD pipeline. A vulnerability that exceeds its SLA blocks the next deployment — not as a punishment, but as an incentive to stay current.

AquaSec for Runtime Protection

Build-time scanning catches known vulnerabilities, but runtime protection catches zero-days and behavioral anomalies. AquaSec (or similar runtime security platforms) provides:

Runtime vulnerability scanning — Continuous scanning of running containers against updated CVE databases
Drift detection — Alerts when a container's filesystem changes after deployment (a strong indicator of compromise)
Network policy enforcement — Microsegmentation that limits container-to-container communication to explicitly allowed paths
Behavioral profiling — Machine learning models that detect anomalous process execution, network connections, and file access patterns

The integration between build-time and runtime scanning creates a feedback loop. Runtime findings feed back into build-time policies, and build-time scanning results inform runtime monitoring priorities.

Dockerfile Best Practices for Security

The most effective container security starts in the Dockerfile itself:

FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

FROM node:20-alpine AS production
RUN apk --no-cache add dumb-init \
    && addgroup -g 1001 appgroup \
    && adduser -u 1001 -G appgroup -s /bin/sh -D appuser
WORKDIR /app
COPY --from=builder --chown=appuser:appgroup /app/dist ./dist
COPY --from=builder --chown=appuser:appgroup /app/node_modules ./node_modules
COPY --from=builder --chown=appuser:appgroup /app/package.json ./
USER appuser
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "dist/server.js"]

This Dockerfile embodies several security principles:

Multi-stage build — Build dependencies don't ship to production
Non-root user — The container runs as appuser, not root
Minimal base image — Alpine reduces the attack surface dramatically
Health check — Required by our OPA policy
dumb-init — Proper signal handling prevents zombie processes
No shell access — Production containers don't need interactive shells

Ship, Don't Slide

Container security is a continuous practice, not a one-time implementation. New CVEs are published daily. Base images are updated weekly. Attack techniques evolve constantly. The pipeline must evolve with them.

The patterns in this article aren't theoretical — they're running in production systems that serve regulated workloads across defense and federal environments. Every container that reaches production has been scanned, evaluated against policy, and cryptographically signed. Every vulnerability has been triaged, tracked, and remediated within SLA.

That's what Ship, Don't Slide means in practice. Ship secure containers, don't slide on security standards.

What is the difference between build-time and runtime container scanning?

Build-time scanning analyzes container images during the CI/CD pipeline before deployment, catching known vulnerabilities in packages and libraries. Runtime scanning continuously monitors running containers for new CVEs, behavioral anomalies, filesystem drift, and unauthorized network connections. Both are necessary — build-time catches what's known, runtime catches what emerges after deployment.

Why use keyless signing with Cosign instead of traditional GPG keys?

Keyless signing eliminates the need to manage, rotate, and protect signing keys — which are themselves long-lived credentials. Cosign with Sigstore uses OIDC federation to tie signatures to the identity of the CI/CD pipeline that produced the image, creating an auditable chain of custody without any key material to compromise.

How do you handle vulnerabilities with no available fix?

We use Trivy's --ignore-unfixed flag to prevent blocking pipelines on vulnerabilities that have no available patch. These unfixed CVEs are still tracked and triaged — we monitor upstream projects for fixes and apply them as soon as they're available. If a critical unfixed CVE is exploitable in our context, we implement compensating controls at the network or runtime layer.

What base images does Rutagon recommend for production containers?

We recommend minimal, hardened base images: Alpine-based variants for most workloads, distroless images for compiled languages like Go and Rust, and slim variants for interpreted languages. The key principle is reducing attack surface — fewer packages means fewer potential vulnerabilities. Every base image should be rebuilt and re-scanned on a regular schedule.

How does policy-as-code prevent insecure containers from deploying?

Open Policy Agent (OPA) evaluates container configurations against a set of rules defined in Rego policy files. These rules enforce requirements like non-root execution, pinned image tags, approved base images, and health check definitions. The policy evaluation runs as a CI/CD pipeline stage — if any rule fails, the pipeline is blocked and the image never reaches a container registry.

Discuss your project with Rutagon