Optimizing Kubernetes in Production
Most Kubernetes problems are configuration decisions made early and never revisited. Here is how to improve what you have without a rebuild.
Most clusters were not designed — they evolved
A Kubernetes cluster that works for five services often does not work for thirty. What starts as a reasonable setup — a few node pools, basic namespaces, default resource limits — quietly accumulates operational debt as the system grows.
The architecture does not break all at once. It degrades. Deployments get slower. Resource behavior becomes unpredictable. Teams start working around the cluster instead of with it.
By the time the problems are obvious, they have usually been present for months. The good news: the cluster itself is rarely the problem. The way it has been configured over time is — and that is fixable without starting over.
Slow or failing deployments are the clearest signal
If your team has accepted slow deployments as normal, that acceptance is worth questioning. A deployment that takes fifteen or twenty minutes, requires manual monitoring, or fails frequently enough that rollback has become a routine step is not a Kubernetes limitation — it is a configuration that has not kept up with how the system has changed.
The root causes are usually structural:
- Readiness and liveness probes that are misconfigured or absent
- Rollout parameters that do not match actual traffic patterns
- Resource limits that do not give pods enough headroom to start cleanly
- No pod disruption budgets, so rolling updates destabilize running workloads
These are not bugs. They are settings that were correct early on and were never revisited as the system grew.
The fix is incremental: understand what the cluster needs to deploy reliably under today's load, then tune the rollout strategy to match. No teardown needed.
Over-provisioning and under-provisioning are the same problem
Clusters running at fifteen percent utilization and clusters where workloads are routinely OOMKilled share the same root cause: nobody has done the work to understand what the workloads actually need.
Over-provisioning looks like a cost problem. It is actually a capacity-planning problem. When teams do not know what their services consume under real conditions, they pad resource requests aggressively and hope.
Under-provisioning looks like a stability problem. It is the same failure in the other direction: limits set too low, autoscaling that cannot respond quickly enough, memory pressure that causes cascading restarts.
Both patterns indicate the same underlying gap:
- Resource requests and limits were set once and never measured against reality
- No systematic process for reviewing actual consumption versus configured values
- Autoscaling configured in isolation from how traffic actually behaves
The improvement here is not a rewrite of your workload manifests — it is an iterative right-sizing pass driven by real observability data. Measure, adjust, re-measure.
The scaling misdiagnosis
One of the most common and costly mistakes we see: a team notices degraded performance, pulls up their metrics, and reaches for the wrong solution because they misread what the numbers are telling them.
Horizontal scaling adds more pod replicas. Vertical scaling gives each pod more CPU or memory. They solve different problems. Applying the wrong one does not just fail to fix the issue — it actively obscures it.
The misdiagnosis usually goes one of two ways:
- High CPU or memory usage triggers a scale-up (vertical) response — bigger nodes, higher limits — when the actual problem is too few replicas handling too much concurrent load. The pods do not need more resources per instance; they need more instances.
- Slow or degraded performance triggers a scale-out (horizontal) response — more replicas — when the actual problem is resource starvation. Each pod is hitting its CPU or memory limit. Spinning up more replicas just gives you more starving pods.
Metrics make this worse before they make it better. A CPU spike looks the same whether it is caused by a workload that needs more replicas or one that needs a higher limit. Without understanding the actual consumption pattern — steady-state vs. burst, per-request vs. per-connection — the numbers alone do not tell you which lever to pull.
The right starting point is not the metric. It is the question: is this workload being throttled, or is it being overwhelmed? Throttling is a vertical problem. Being overwhelmed is a horizontal one.
The deeper problem: capturing the wrong metrics
The scaling misdiagnosis is usually downstream of a more fundamental gap: teams are not capturing the metrics that actually explain Kubernetes behavior.
Most teams start with what is easy to see — node-level CPU and memory from their cloud provider dashboard. These numbers are real, but they are too coarse to diagnose container-level problems. A node running at sixty percent CPU tells you almost nothing about which workloads are being throttled, which have headroom, or whether any pod is actually hitting its configured limit.
The metrics that matter for Kubernetes diagnosis are container-level, not node-level:
- CPU throttling rate (container_cpu_cfs_throttled_seconds_total) — tells you whether a pod is hitting its CPU limit, which does not show up in utilization charts
- Memory working set vs. memory limit — cached pages inflate raw memory numbers; working set is what actually counts toward the limit
- Request latency at the p95 and p99 percentiles — averages hide the tail behavior that users actually experience
- Pod restarts and OOMKill events — these are symptoms that often go untracked until they cause an incident
- HPA and VPA decision logs — what is triggering autoscaling, and whether it is responding to the right signals
Without these signals, teams are making configuration decisions on incomplete information. The cloud provider dashboard says the cluster is healthy. The application is slow. Nobody can explain the gap because the instrumentation does not exist to close it.
Getting container-level observability in place is the prerequisite for every other improvement. Every other lever depends on it.
The common thread
Slow deployments and resource mismanagement share an origin: the cluster was configured for what the system looked like at one point in time, and was never revisited as the system grew.
This is not a failure of engineering judgment. It is what happens when infrastructure is treated as something you set up once rather than something you maintain. The fix is also not a one-time event — it is bringing the same ongoing architectural attention to the cluster that you already bring to the applications running on it.
What incremental improvement looks like
You do not need to start over. In most cases, the cluster itself is fine — the configuration around it has drifted, and that is recoverable through targeted, iterative work.
A typical improvement engagement looks like this:
- Auditing actual resource consumption and right-sizing requests and limits across workloads
- Reviewing rollout strategy — replica counts, readiness probes, pod disruption budgets, rollout parameters
- Adjusting node pool structure to match actual workload profiles (when warranted — often not)
- Tuning autoscaling so it responds predictably under real traffic conditions
- Cleaning up namespace and RBAC structure that has drifted from its original intent
In practice, roughly eighty percent of the operational improvement comes from resource configuration and deployment-strategy tuning — not rebuilding infrastructure. The cluster is usually not the problem. How it has been configured over time is.
Each of these can be done independently, in low-risk increments, while the cluster keeps serving traffic. There is no teardown, no migration window, no cutover risk.
When to invest
Not every cluster issue requires a sustained improvement effort. But two indicators reliably point to one:
- Deployments your team does not trust — slow, unreliable, or requiring manual intervention to recover
- Resource behavior nobody can explain — utilization patterns that do not match expectations, scaling that surprises people, OOMKills that feel random
If either of these is true, the architectural debt is already costing you. The question is whether to invest in improving the cluster deliberately or wait until it forces the issue.
Final thought
Kubernetes is supposed to make infrastructure easier to reason about, not harder. If the cluster has become something your team manages around rather than with, the configuration has drifted from its purpose — but the path back is not a rebuild.
It is a maintenance practice: measure, tune, re-measure. Most of what looks broken is recoverable without starting over. The starting point is an honest look at what the cluster is actually doing versus what it was configured to do, and a willingness to improve it one decision at a time.