Why Kubernetes Teams Trust CI/CD But Not Resource Automation
A survey of 321 practitioners reveals the trust gap holding back automated rightsizing as AI inference workloads arrive.

Kubernetes teams deploy code to production dozens of times daily without hesitation. Yet those same engineers pause when automation suggests adjusting CPU and memory requests on running workloads. A survey of 321 enterprise Kubernetes practitioners reveals the scale of this trust asymmetry—and why it's becoming expensive as AI inference lands on Kubernetes clusters.
The deployment versus rightsizing divide
According to the survey conducted earlier this year, 82% of practitioners report high or complete trust in automated delivery controls. But 71% still require human review before applying resource optimization recommendations. Only 27% allow CPU and memory changes to be auto-applied, even with guardrails in place.
The difference comes down to perceived risk. Code deployments feel additive—teams ship new value, rollback paths are well-understood, and failures surface quickly. Resource rightsizing feels subtractive because it removes safety margin from running services. The failure mode is fundamentally different.
When resource requests change, Kubernetes scheduling, prioritization, and allocation all shift in ways that aren't immediately visible. A problem might not surface until weeks later during a traffic spike, long after other changes have been made. Proving causation becomes nearly impossible, and the engineers responsible are the ones getting paged at 2 a.m.
Why it matters
GPU-accelerated inference workloads are forcing this issue. GPU compute costs significantly more per hour than CPU, turning over-provisioning from an acceptable buffer into a material expense. Meanwhile, inference jobs exhibit bursty patterns teams haven't built intuition for, and resource tuning involves at least four dimensions per workload across potentially thousands of workloads per cluster. The survey indicates manual optimization breaks down around 250 changes daily—a threshold inference workloads push teams past faster than traditional services ever did.
What practitioners actually want
When asked what would increase trust in optimization automation, 48% of respondents cited visibility and transparency into how decisions are made. Another 25% wanted proven guardrails, and 23% needed instant rollback capability.
Notably, practitioners didn't ask for full manual control or blind autonomy. They described automation that earns trust in stages—the same path CI/CD followed over years before teams trusted it with production deploys on every commit.
The teams furthest along started with single namespaces in development environments, observed system behavior, compared recommendations with outcomes, and gradually expanded scope. Different environments maintained different automation maturity levels simultaneously, and that was intentional.
Designing for gradual adoption
Some automation systems only deliver value with full delegation, creating an adoption problem by demanding trust organizations haven't built yet. The alternative is adaptive autonomy: systems designed to function at every stage of the trust curve.
Teams still evaluating get useful recommendations in read-only mode. Teams ready to act within boundaries can run guardrailed execution with limits they define. As confidence grows, the system handles more autonomous decisions while humans manage exceptions. Eventually, for environments where track records support it, closed-loop optimization runs in the background and becomes routine.
This design distinction matters more with AI workloads precisely because trust-building is starting from zero on workloads where costs of errors are highest. Rollout safety reinforces this: start with workloads showing the most headroom, make incremental changes small enough to contain bad outcomes, ensure fast rollback tied to existing health signals, and make participation opt-in rather than opt-out.
The 71% figure isn't resistance to automation—it's how operational trust actually forms: conditional, earned over time, and moving at different speeds depending on stakes. As one survey participant noted, automated rightsizing carries unique risk because it directly impacts application runtime stability, altering the invisible contract between workload and scheduler.
These findings were first reported by The New Stack based on their survey of Kubernetes practitioners at enterprise organizations.
This is an original analysis by the Omega editorial team. Source reporting: Automation Watch.
Want systems like this working for your business?
Book a Call

