10 Kubernetes Best Practices We Follow

10 Kubernetes Best Practices We Follow

Kubernetes provides immense power, but defaults are rarely production-ready. Without clear standards, resource starvation, scheduling anomalies, and security gaps will degrade your platform. Here are ten critical best practices we follow at Rescape when engineering containerized workloads.

1. Enforce Resource Requests & Limits

Without CPU/memory requests and limits, the Kubernetes scheduler cannot place pods efficiently, leading to CPU throttling or Out-Of-Memory (OOM) kills of critical system services.

resources:
  requests:
    memory: "256Mi"
    cpu: "200m"
  limits:
    memory: "512Mi"
    cpu: "1000m"

2. Define Health Probes Correctly

Liveness and readiness probes control self-healing and service routing. Avoid pointing liveness probes to heavy DB queries or external APIs; keep them cheap (e.g., checking a local memory health endpoint) to avoid cascading container restarts during network brownouts.

3. Restrict Container Privileges

Enforce the Principle of Least Privilege at the container level by configuring the security context. Run containers as non-root users, disable privilege escalation, and set the root filesystem to read-only.

securityContext:
  runAsNonRoot: true
  runAsUser: 10001
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true

4. Design Stateless Applications for Graceful Shutdown

Pods can be terminated at any time due to scaling, deployments, or rescheduling. Ensure containers handle the SIGTERM signal gracefully, finish in-flight requests, and close connections before the termination grace period expires.

5. Isolate Workloads Using Namespaces

Create clear logical boundaries between teams, environments, or application domains. Combine namespaces with Network Policies to restrict cross-namespace traffic and implement RBAC for user permissions.

6. Enforce Network Policies by Default

By default, all pods in Kubernetes can talk to all other pods. Lock this down by defining a default-deny ingress policy and explicitly allowing only approved communication paths.

7. Externalize Configurations and Secrets

Never bake API keys or configuration files into container images. Use ConfigMaps for environment-specific configs and utilize External Secrets Operators to fetch credentials dynamically from secure vaults (e.g., AWS Secrets Manager or HashiCorp Vault).

8. Standardize Pod Disruption Budgets (PDBs)

PDBs protect highly available workloads during voluntary disruptions like node upgrades. They ensure a minimum number of healthy replicas remain active at all times, preventing service outages during maintenance windows.

9. Adopt Horizontal Pod Autoscaling (HPA)

Let workloads scale dynamically based on CPU/memory usage or custom metrics (e.g., HTTP request queue depth). HPA paired with Cluster Autoscaler protects application performance under sudden traffic spikes.

10. Stream Logs and Standardize Telemetry

Direct all application logs to standard output (stdout/stderr). Deploy a daemon agent like FluentBit to collect and ship logs to a central indexing system, and instrument applications with OpenTelemetry to collect distributed traces.