Helm Charts for Production Kubernetes Deployments

# Helm Charts for Production Kubernetes Deployments

Production Helm charts for Kubernetes are where many teams struggle — not because Helm is complicated, but because the gap between a working chart and a production-grade chart is filled with edge cases, security requirements, and operational patterns that only emerge under real workloads. A chart that deploys successfully is table stakes. A chart that rolls back cleanly, manages secrets safely, passes security scanning, and templates correctly across environments — that's production-grade.

We deploy to Kubernetes in regulated environments where a bad rollout isn't just an inconvenience — it's a compliance incident. These Helm patterns come from that production experience.

Chart Structure That Scales

The standard Helm chart structure works for simple applications. Production charts need more:

my-service/
├── Chart.yaml
├── Chart.lock
├── values.yaml
├── values-staging.yaml
├── values-production.yaml
├── templates/
│   ├── _helpers.tpl
│   ├── deployment.yaml
│   ├── service.yaml
│   ├── ingress.yaml
│   ├── hpa.yaml
│   ├── pdb.yaml
│   ├── networkpolicy.yaml
│   ├── serviceaccount.yaml
│   ├── configmap.yaml
│   ├── secret.yaml
│   ├── servicemonitor.yaml
│   └── tests/
│       ├── test-connection.yaml
│       └── test-health.yaml
├── ci/
│   ├── staging-values.yaml
│   └── production-values.yaml
└── README.md

The _helpers.tpl file is where reusable template functions live. Invest time here — good helpers eliminate duplication and enforce consistency:

{{/* templates/_helpers.tpl */}}

{{- define "app.fullname" -}}
{{- printf "%s-%s" .Release.Name .Chart.Name | trunc 63 | trimSuffix "-" -}}
{{- end -}}

{{- define "app.labels" -}}
helm.sh/chart: {{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}
app.kubernetes.io/name: {{ .Chart.Name }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
app.kubernetes.io/component: {{ .Values.component | default "backend" }}
{{- end -}}

{{- define "app.selectorLabels" -}}
app.kubernetes.io/name: {{ .Chart.Name }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end -}}

{{- define "app.securityContext" -}}
runAsNonRoot: true
runAsUser: {{ .Values.securityContext.runAsUser | default 1000 }}
runAsGroup: {{ .Values.securityContext.runAsGroup | default 1000 }}
fsGroup: {{ .Values.securityContext.fsGroup | default 1000 }}
seccompProfile:
  type: RuntimeDefault
{{- end -}}

Values Templating for Multiple Environments

The values.yaml file defines defaults. Environment-specific files override them. The key principle: defaults should be safe for production. Development overrides should loosen constraints, not the other way around.

# values.yaml — production-safe defaults
replicaCount: 3

image:
  repository: registry.internal/my-service
  tag: ""  # Set by CI/CD pipeline
  pullPolicy: IfNotPresent

resources:
  requests:
    cpu: 250m
    memory: 256Mi
  limits:
    cpu: 1000m
    memory: 512Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70
  targetMemoryUtilizationPercentage: 80

podDisruptionBudget:
  enabled: true
  minAvailable: 2

networkPolicy:
  enabled: true
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              purpose: ingress
      ports:
        - protocol: TCP
          port: 8080

securityContext:
  runAsUser: 1000
  runAsGroup: 1000
  fsGroup: 1000

serviceAccount:
  create: true
  annotations: {}

probes:
  liveness:
    path: /health/live
    initialDelaySeconds: 10
    periodSeconds: 15
    failureThreshold: 3
  readiness:
    path: /health/ready
    initialDelaySeconds: 5
    periodSeconds: 10
    failureThreshold: 3
  startup:
    path: /health/live
    initialDelaySeconds: 0
    periodSeconds: 5
    failureThreshold: 30

# values-staging.yaml — overrides for staging
replicaCount: 1

autoscaling:
  enabled: false

podDisruptionBudget:
  enabled: false

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 256Mi

Notice that staging overrides reduce resources and disable HA features. Production defaults are the baseline. This prevents the common mistake of building charts that work in staging but fail in production because someone forgot to configure replicas, PDBs, or resource limits.

The Deployment Template

The deployment template is the heart of the chart. A production deployment template handles probes, security contexts, resource management, and graceful shutdown:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "app.fullname" . }}
  labels:
    {{- include "app.labels" . | nindent 4 }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "app.selectorLabels" . | nindent 6 }}
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  template:
    metadata:
      labels:
        {{- include "app.selectorLabels" . | nindent 8 }}
      annotations:
        checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
    spec:
      serviceAccountName: {{ include "app.fullname" . }}
      securityContext:
        {{- include "app.securityContext" . | nindent 8 }}
      terminationGracePeriodSeconds: 60
      containers:
        - name: {{ .Chart.Name }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop:
                - ALL
          livenessProbe:
            httpGet:
              path: {{ .Values.probes.liveness.path }}
              port: http
            initialDelaySeconds: {{ .Values.probes.liveness.initialDelaySeconds }}
            periodSeconds: {{ .Values.probes.liveness.periodSeconds }}
            failureThreshold: {{ .Values.probes.liveness.failureThreshold }}
          readinessProbe:
            httpGet:
              path: {{ .Values.probes.readiness.path }}
              port: http
            initialDelaySeconds: {{ .Values.probes.readiness.initialDelaySeconds }}
            periodSeconds: {{ .Values.probes.readiness.periodSeconds }}
            failureThreshold: {{ .Values.probes.readiness.failureThreshold }}
          startupProbe:
            httpGet:
              path: {{ .Values.probes.startup.path }}
              port: http
            initialDelaySeconds: {{ .Values.probes.startup.initialDelaySeconds }}
            periodSeconds: {{ .Values.probes.startup.periodSeconds }}
            failureThreshold: {{ .Values.probes.startup.failureThreshold }}
          resources:
            {{- toYaml .Values.resources | nindent 12 }}
          volumeMounts:
            - name: tmp
              mountPath: /tmp
      volumes:
        - name: tmp
          emptyDir: {}
      {{- with .Values.nodeSelector }}
      nodeSelector:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.tolerations }}
      tolerations:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              {{- include "app.selectorLabels" . | nindent 14 }}

Key production details: maxUnavailable: 0 ensures zero-downtime deploys. The configmap checksum annotation triggers a rollout when configuration changes. readOnlyRootFilesystem: true with a writable /tmp volume prevents filesystem-based attacks. Topology spread constraints distribute pods across availability zones.

Rollback Strategies

Helm maintains release history. When a deployment goes wrong, rollback is straightforward:

# View release history
helm history my-service -n production

# Rollback to previous release
helm rollback my-service 0 -n production

# Rollback to specific revision
helm rollback my-service 5 -n production --wait --timeout 5m

But rollback is the recovery mechanism. Prevention is better. Implement deployment gates:

Pre-Upgrade Hooks

Run validation before the upgrade proceeds:

apiVersion: batch/v1
kind: Job
metadata:
  name: {{ include "app.fullname" . }}-pre-upgrade
  annotations:
    helm.sh/hook: pre-upgrade
    helm.sh/hook-weight: "-5"
    helm.sh/hook-delete-policy: before-hook-creation
spec:
  template:
    spec:
      containers:
        - name: db-migration-check
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          command: ["./migrate", "--dry-run"]
      restartPolicy: Never
  backoffLimit: 0

If the migration dry-run fails, the upgrade is aborted before any pods are replaced.

Post-Upgrade Validation

After deployment, validate that the new version is healthy before considering the release successful:

helm upgrade my-service ./my-service \
  -n production \
  -f values-production.yaml \
  --set image.tag=$IMAGE_TAG \
  --wait \
  --timeout 10m \
  --atomic

The --atomic flag automatically rolls back if the deployment doesn't become healthy within the timeout. This is essential for CI/CD pipelines where human intervention may not be immediate.

Secrets Management

Secrets in Helm charts are the most common security mistake. Putting secrets in values.yaml, committing them to Git, or base64-encoding them in templates (which is encoding, not encryption) are all violations.

External Secrets Operator

The cleanest pattern: secrets live in AWS Secrets Manager or SSM Parameter Store. The External Secrets Operator syncs them into Kubernetes secrets:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: {{ include "app.fullname" . }}
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: {{ include "app.fullname" . }}-secrets
    creationPolicy: Owner
  data:
    - secretKey: DATABASE_URL
      remoteRef:
        key: /production/my-service/database-url
    - secretKey: API_KEY
      remoteRef:
        key: /production/my-service/api-key

Secrets never appear in Helm values, Git repositories, or CI/CD logs. They're fetched at runtime from the secrets manager by the operator running in-cluster.

Sealed Secrets as Alternative

For environments without access to external secrets managers, Sealed Secrets encrypt secrets with a cluster-specific key. The encrypted form is safe to commit to Git:

# Encrypt a secret for the cluster
kubeseal --format yaml < secret.yaml > sealed-secret.yaml

# The sealed secret can be committed to Git safely
# Only the target cluster can decrypt it

Proper secrets management is a core part of container security in production CI/CD — if secrets leak through your Helm charts, all your other security controls are undermined.

Chart Testing

Untested Helm charts are a deployment risk. Test at multiple levels:

Template Testing with helm-unittest

Validate that templates render correctly for all value combinations:

# tests/deployment_test.yaml
suite: deployment tests
templates:
  - deployment.yaml
tests:
  - it: should set correct replica count
    set:
      replicaCount: 5
      autoscaling.enabled: false
    asserts:
      - equal:
          path: spec.replicas
          value: 5

  - it: should not set replicas when autoscaling is enabled
    set:
      autoscaling.enabled: true
    asserts:
      - isNull:
          path: spec.replicas

  - it: should enforce security context
    asserts:
      - equal:
          path: spec.template.spec.containers[0].securityContext.allowPrivilegeEscalation
          value: false
      - equal:
          path: spec.template.spec.containers[0].securityContext.readOnlyRootFilesystem
          value: true

  - it: should set resource limits
    asserts:
      - isNotNull:
          path: spec.template.spec.containers[0].resources.limits
      - isNotNull:
          path: spec.template.spec.containers[0].resources.requests

Linting and Schema Validation

# Lint the chart
helm lint ./my-service -f values-production.yaml

# Render templates and validate against Kubernetes schemas
helm template my-service ./my-service -f values-production.yaml | \
  kubeval --strict --kubernetes-version 1.28.0

# Security scanning with Checkov
helm template my-service ./my-service -f values-production.yaml | \
  checkov --framework kubernetes -

Integration Testing with ct

Chart Testing (ct) validates charts in a real cluster:

# ct.yaml
target-branch: main
chart-dirs:
  - charts
helm-extra-args: --timeout 120s
check-version-increment: true
validate-maintainers: false

ct lint-and-install --config ct.yaml --charts ./my-service

This spins up the chart in a test cluster, runs the test hooks, verifies health, and tears down — giving you confidence that the chart actually works, not just that it renders valid YAML.

Our Kubernetes containerization capabilities include comprehensive Helm chart development with testing, security scanning, and multi-environment deployment patterns built in from the start.

Frequently Asked Questions

Should I use Helm or Kustomize for production Kubernetes?

Both are viable. Helm excels when you need templating across multiple environments with significantly different configurations, package management with versioned releases, and rollback capabilities. Kustomize excels for simpler overlay-based customization without the complexity of Go templating. Many teams use both: Helm for application charts and Kustomize for environment-specific overlays on top of rendered Helm output.

How do you handle Helm chart versioning?

Follow semantic versioning. Bump the chart version on every change. Bump appVersion when the application version changes. Use Chart.lock for dependency pinning. In CI/CD, automate version bumping based on commit type: patch for fixes, minor for features, major for breaking changes to values schema.

What's the maximum number of Helm release revisions to keep?

Set --history-max to 10-20 revisions. Each revision stores the complete release manifest, which consumes etcd storage. Keeping too many revisions bloats the cluster. Keeping too few limits your rollback options. Ten revisions gives you a reasonable rollback window without storage concerns.

How do you manage Helm charts across multiple clusters?

Use a chart repository (ChartMuseum, OCI registry, or S3) to publish versioned charts. Each cluster's deployment configuration references the chart version and provides environment-specific values. GitOps tools like ArgoCD or Flux pull charts from the repository and apply environment values automatically.

How do you debug a Helm template that isn't rendering correctly?

Use helm template with --debug to see the rendered output. Add --show-only templates/deployment.yaml to isolate a specific template. For complex logic, temporarily add {{- fail (printf "Debug: %v" .Values.someValue) -}} to inspect values at specific points in the template. Remove debug statements before committing.

---

Production Kubernetes deployments deserve production-grade Helm charts. Contact Rutagon about building deployment infrastructure that's tested, secure, and reliable across environments.