VPA Resource Optimization Guide
How to use VPA to right-size Kubernetes resource requests based on actual workload behavior.
TL;DR — Just Tell Me What To Do
Everything is automatic. A Kyverno ClusterPolicy auto-creates VPA resources for every workload in the cluster. Infrastructure/monitoring namespaces get updateMode: "Off" (recommend only). User app namespaces get updateMode: "InPlaceOrRecreate" (auto-tuning with in-place pod resize).
Step 1: Check recommendations
# Human-readable VPA report
./scripts/vpa-report.sh
# Filter to one namespace
./scripts/vpa-report.sh argocd
# Or raw kubectl one-liner
kubectl get vpa -A -o custom-columns=\
NS:.metadata.namespace,\
NAME:.metadata.name,\
CPU:.status.recommendation.containerRecommendations[0].target.cpu,\
MEM:.status.recommendation.containerRecommendations[0].target.memory
Step 2: Open Grafana VPA dashboard
Go to https://grafana.vanillax.me and search for "VPA". The dashboard shows time-series graphs of VPA recommendations with historical trends.
Step 3: Look for problems
Look for: - Current request way below "Target" = pod is starved, increase it - Current request way above "Target" = wasting resources, decrease it - Current request below "Lower Bound" = pod is actively throttled, fix ASAP
Step 4: Apply changes (infrastructure only)
Infrastructure namespaces use updateMode: "Off" — edit the app's values.yaml in Git, update the resources: block, push, ArgoCD applies it. Add a comment explaining why:
# VPA-optimized (2026-02-28) — target was 2000m, previous 500m
resources:
requests:
cpu: 2000m
memory: 1Gi
User app namespaces use updateMode: "InPlaceOrRecreate" — VPA automatically adjusts resources via in-place pod resize (K8s 1.35 GA). No manual intervention needed.
Step 5: Wait and re-check
VPA recommendations update continuously. Check back in a week to see if the new values are good. Don't change things daily.
Architecture
kubelet /metrics/resource
│
▼
metrics-server (provides metrics.k8s.io API)
│
▼
VPA Recommender (reads metrics, writes recommendations to VPA .status)
▲
│
Kyverno ClusterPolicy (vpa-auto-generate)
│ • watches Deployments, StatefulSets, DaemonSets
│ • auto-creates VPA per workload
│ • infra/monitoring namespaces → updateMode: "Off"
│ • user app namespaces → updateMode: "InPlaceOrRecreate"
│ • GPU workloads → updateMode: "Off"
▼
VPA resources (one per workload)
│
├─ Infra namespaces: recommend-only (manual review)
└─ App namespaces: auto-resize (InPlaceOrRecreate)
│
▼
Human reviews infra → updates values.yaml → Git push → ArgoCD applies
VPA Updater auto-resizes app pods → no human intervention needed
Kyverno is the sole VPA creator. The vpa-auto-generate ClusterPolicy watches all workloads and generates VPA resources automatically. No manual VPA manifests needed.
Components
| Component | Chart | Namespace | Location |
|---|---|---|---|
| metrics-server | metrics-server/metrics-server |
kube-system |
infrastructure/controllers/metrics-server/ |
| VPA | fairwinds-stable/vpa |
vertical-pod-autoscaler |
infrastructure/controllers/vertical-pod-autoscaler/ |
| Kyverno VPA policy | — | kyverno |
infrastructure/controllers/kyverno/policies/vpa-auto-generate.yaml |
metrics-server and VPA are deployed via the Infrastructure ApplicationSet (Wave 4). The Kyverno policy is deployed as part of Kyverno (Wave 3).
VPA Sub-Components
| Component | Purpose |
|---|---|
| Recommender | Analyzes metrics, generates recommendations |
| Updater | Applies changes when mode is not Off (evicts or in-place resizes) |
| Admission Controller | Sets resources on new pods when mode is not Off |
Update Modes by Namespace
| Namespace Type | Update Mode | Behavior |
|---|---|---|
| Infrastructure (argocd, cilium, etc.) | Off |
Recommend only — manual GitOps workflow |
| Monitoring (prometheus-stack, loki-stack, etc.) | Off |
Recommend only — manual GitOps workflow |
| GPU workloads (runtimeClassName: nvidia) | Off |
Recommend only — VPA can't manage GPU resources |
| User apps (everything else) | InPlaceOrRecreate |
Auto-resize pods without restart when possible |
CLI Tools & Scripts
vpa-report.sh
The scripts/vpa-report.sh script provides a formatted table of all VPA recommendations with human-readable values.
Example output:
==========================================
VPA Resource Recommendations Report
==========================================
NAMESPACE WORKLOAD CONTAINER CPU TGT CPU RANGE MEM TGT MEM RANGE
-------------------------------------------------------------------------------------------------------------------------------------------------
argocd Deployment/argocd-server server 23m 12m-100m 175Mi 88Mi-700Mi
argocd Deployment/argocd-repo-server repo-server 2975m 1488m-11900m 523Mi 262Mi-2.0Gi
...
Total: 42 containers with VPA recommendations
Action needed if your current request is:
< lowerBound → INCREASE NOW (pod is being throttled)
< target → INCREASE (under-provisioned)
≈ target → KEEP (well-tuned)
> 2x target → DECREASE (over-provisioned)
kubectl One-Liners
# Quick overview: all VPA targets across the cluster
kubectl get vpa -A -o custom-columns=\
NAMESPACE:.metadata.namespace,\
NAME:.metadata.name,\
CPU:.status.recommendation.containerRecommendations[0].target.cpu,\
MEM:.status.recommendation.containerRecommendations[0].target.memory
# Detailed view for a specific namespace
kubectl get vpa -n argocd -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{range .status.recommendation.containerRecommendations[*]}{" "}{.containerName}{": cpu="}{.target.cpu}{" mem="}{.target.memory}{"\n"}{end}{end}'
# Full detail for a specific VPA
kubectl describe vpa <name> -n <namespace>
# Current resource usage vs requests (side-by-side comparison)
kubectl top pods -n <namespace>
kubectl get deploy <name> -n <ns> -o jsonpath='{.spec.template.spec.containers[0].resources}'
kubectl get vpa <name> -n <ns> -o jsonpath='{.status.recommendation.containerRecommendations[0].target}'
Reading Recommendations
The Four VPA Values
VPA recommendations include four values per container:
| Value | Meaning | Use For |
|---|---|---|
| lowerBound | Minimum to avoid throttling/OOM | Red flag if current request is below this |
| target | Optimal request based on observed usage | Set requests: to this value |
| upperBound | Peak observed consumption | Informs limits: setting |
| uncappedTarget | Ideal ignoring any VPA min/max constraints | Same as target when no constraints are set |
Memory values are in bytes. Quick conversions:
- 104857600 = 100Mi
- 268435456 = 256Mi
- 536870912 = 512Mi
- 1073741824 = 1Gi
- 1610612736 = 1.5Gi
Decision Matrix
| Situation | Action | Priority |
|---|---|---|
| Current request < lowerBound | INCREASE NOW | Pod is being throttled or OOM-killed |
| Current request < target | INCREASE | Under-provisioned, degraded performance |
| Current request within 20% of target | KEEP | Already well-tuned |
| Current request > 1.5x target | DECREASE | Over-provisioned, wasting resources |
| Current request > 5x target | DECREASE | Heavily over-provisioned |
Timing
- Wait at least 7 days before trusting VPA numbers. Initial recommendations are noisy.
- Review weekly, not daily. Over-correcting defeats the purpose.
- Re-check after major changes (new features, traffic spikes, version upgrades). VPA is backward-looking.
- Upper bounds stabilize over ~14 days. They'll be very wide initially.
Applying Changes (GitOps Workflow)
For Infrastructure Namespaces (updateMode: Off)
- Read the VPA recommendation (
./scripts/vpa-report.shor Grafana dashboard) - Update the app's
values.yamlwith new resource requests - Add a comment documenting the VPA data and reasoning:
# VPA-optimized (YYYY-MM-DD)
# VPA target: cpu Xm, memory Y
# Previous: cpu Am (reason for change)
resources:
requests:
cpu: Xm # Match VPA target
memory: Y # Match VPA target + buffer
limits:
cpu: 2Xm # 2x request for burst
memory: 2Y # 2x request for spikes
- Git commit and push — ArgoCD applies via GitOps
For User App Namespaces (updateMode: InPlaceOrRecreate)
No manual action needed. VPA automatically: 1. Watches pod resource usage 2. Calculates optimal requests 3. Patches pods in-place (K8s 1.35 GA feature) 4. Falls back to evict+recreate if in-place resize fails
Setting Requests vs Limits
| Field | Rule of Thumb |
|---|---|
requests.cpu |
VPA target (or 1.1-1.2x for buffer) |
requests.memory |
VPA target (or 1.2-1.5x — memory OOM is fatal, CPU throttling is not) |
limits.cpu |
2-4x request (allows burst). Or omit entirely to let pods burst freely. |
limits.memory |
2-4x request (or match VPA upperBound if spikes are expected) |
Common Workload Patterns
CPU-Bound (Helm rendering, image processing)
High CPU target, low memory target. Increase CPU generously, keep memory modest.
Example: argocd-repo-server
VPA target: cpu 2975m, memory 523Mi
Action: cpu 3000m request, memory 768Mi request
Memory-Bound (Databases, caches)
Low CPU target, high memory target. Increase memory, keep CPU low.
Idle/Lightweight (UI servers, webhooks)
Both CPU and memory very low. Set modest requests with generous limits for occasional spikes.
Example: argocd-server
VPA target: cpu 23m, memory 175Mi
Action: cpu 50m request, memory 256Mi request
GPU Workloads
VPA only tracks CPU/memory, not GPU. Recommendations will show low CPU/memory because compute happens on GPU VRAM. Set CPU/memory based on data loading needs, not inference. GPU workloads automatically get updateMode: "Off" via the Kyverno policy.
Real-World Example: ArgoCD Optimization
Before (manual guesswork)
controller: cpu: 1000m, memory: 1Gi # UNDER-PROVISIONED (below lowerBound!)
repo-server: cpu: 1000m, memory: 1Gi # UNDER-PROVISIONED 3x
server: cpu: 500m, memory: 512Mi # OVER-PROVISIONED 20x
applicationSet: cpu: 250m, memory: 256Mi # OVER-PROVISIONED 5x
redis: cpu: 100m, memory: 128Mi # OVER-PROVISIONED 4x
Total: 2.85 CPU, 2.9Gi memory
VPA Said
controller: target: 2048m CPU, 1.25Gi memory (lowerBound: 1021m > current 1000m!)
repo-server: target: 2975m CPU, 523Mi memory
server: target: 23m CPU, 175Mi memory
applicationSet: target: 49m CPU, 100Mi memory
redis: target: 23m CPU, 100Mi memory
After (VPA-optimized)
controller: cpu: 2000m, memory: 4Gi # DOUBLED CPU, quadrupled memory
repo-server: cpu: 3000m, memory: 768Mi # TRIPLED CPU, halved memory
server: cpu: 50m, memory: 512Mi # REDUCED 10x CPU
applicationSet: cpu: 100m, memory: 128Mi # REDUCED 2.5x
redis: cpu: 50m, memory: 128Mi # REDUCED 2x
Total: 5.2 CPU, 5.5Gi memory
Result: +2.35 CPU where it was needed (controller/repo-server), memory properly sized, no more throttling.
See infrastructure/controllers/argocd/values.yaml for the actual implementation with inline VPA documentation.
In-Place Pod Resize (K8s 1.35)
This cluster runs K8s v1.35.1 where In-Place Pod Resize is GA. VPA supports updateMode: "InPlaceOrRecreate" which resizes pods without restarting them when possible.
How It Works
- VPA Updater watches pods with
InPlaceOrRecreatemode - If recommendation differs significantly from current resources, it patches the pod spec
- Kernel applies new CPU/memory limits without restarting the container (when supported)
- If in-place resize fails, pod is evicted and recreated with new resources
Namespace Strategy
The Kyverno vpa-auto-generate policy sets update modes automatically:
- Infrastructure/monitoring: Off — changes go through GitOps review
- User apps: InPlaceOrRecreate — automatic resource adjustment
- GPU workloads: Off — VPA can't manage GPU resources
Kyverno VPA Policy
How It Works
The vpa-auto-generate ClusterPolicy (infrastructure/controllers/kyverno/policies/vpa-auto-generate.yaml) watches for Deployment, StatefulSet, and DaemonSet resources and generates a matching VPA.
Three rules:
1. generate-vpa-infra-off: Infrastructure/monitoring namespaces get updateMode: "Off"
2. generate-vpa-gpu-off: GPU workloads (runtimeClassName: nvidia) get updateMode: "Off"
3. generate-vpa-apps-auto: Everything else gets updateMode: "InPlaceOrRecreate"
Generated VPAs have ownerReferences set to the parent workload, so they're automatically cleaned up when the workload is deleted.
Excluded Namespaces
kube-system— excluded from all ruleskyverno— excluded from all rules (prevents circular dependency)volsync-system— excluded from all rules (transient mover jobs)
Checking Generated VPAs
# See all Kyverno-managed VPAs
kubectl get vpa -A -l app.kubernetes.io/managed-by=kyverno
# Check a specific VPA's update mode
kubectl get vpa -n immich -o jsonpath='{.items[0].spec.updatePolicy.updateMode}'
# Expected: InPlaceOrRecreate
kubectl get vpa -n argocd -o jsonpath='{.items[0].spec.updatePolicy.updateMode}'
# Expected: Off
Troubleshooting
No recommendations showing
- VPA needs ~5-10 minutes for initial data, 24+ hours for accuracy
- Check metrics-server:
kubectl top nodes(should return data) - Check VPA recommender:
kubectl logs -n vertical-pod-autoscaler -l app.kubernetes.io/component=recommender
VPAs not being created
- Check Kyverno background controller:
kubectl get pods -n kyverno - Check Kyverno logs:
kubectl logs -n kyverno -l app.kubernetes.io/component=background-controller - Verify the policy is ready:
kubectl get clusterpolicy vpa-auto-generate - Check VPA CRDs are installed:
kubectl get crd verticalpodautoscalers.autoscaling.k8s.io
VPA recommendations seem too high/low
- Not enough data — wait 7-14 days
- Workload changed recently — VPA is backward-looking
- Check
upperBoundfor peak usage context - Batch/cron workloads have spiky usage — use
upperBoundfor limits
Pods OOMKilled after applying VPA
- VPA target reflects steady-state, not initialization spikes
- Set
limits.memorywell aboverequests.memory(2-4x) - Check startup memory with
kubectl top podduring pod init
Duplicate VPA resources
- Kyverno is the sole VPA creator — if you see duplicates, check for manually created VPAs
- Remove any hand-crafted VPA manifests from Git and let Kyverno manage them
Grafana Dashboard
A community VPA dashboard is auto-provisioned in Grafana under the Infrastructure folder:
| Dashboard | Grafana.com ID | What It Shows |
|---|---|---|
| K8s Autoscaling VPA | 22168 | Cluster overview with drill-down to pod-level VPA details (target, lower/upper bounds) |
URL: https://grafana.vanillax.me → search for "VPA"
This dashboard reads VPA metrics exposed by kube-state-metrics Custom Resource State (kube_customresource_verticalpodautoscaler_*). Combined with vpa-report.sh, you have two ways to view VPA data:
- Grafana VPA dashboard — time-series graphs and historical trends
- CLI —
./scripts/vpa-report.shfor quick terminal output
Quick Reference
# Human-readable VPA report
./scripts/vpa-report.sh
./scripts/vpa-report.sh <namespace>
# All VPA recommendations (cluster-wide)
kubectl get vpa -A -o custom-columns=\
NS:.metadata.namespace,\
NAME:.metadata.name,\
CPU:.status.recommendation.containerRecommendations[0].target.cpu,\
MEM:.status.recommendation.containerRecommendations[0].target.memory
# Current resource usage vs requests
kubectl top pods -n <namespace>
# Compare current requests vs VPA target
kubectl get deploy <name> -n <ns> -o jsonpath='{.spec.template.spec.containers[0].resources}'
kubectl get vpa <name> -n <ns> -o jsonpath='{.status.recommendation.containerRecommendations[0].target}'
# Check Kyverno VPA policy
kubectl get clusterpolicy vpa-auto-generate
kubectl describe clusterpolicy vpa-auto-generate
# Check VPA recommender
kubectl logs -n vertical-pod-autoscaler -l app.kubernetes.io/component=recommender
# List Kyverno-managed VPAs
kubectl get vpa -A -l app.kubernetes.io/managed-by=kyverno
# Monitor VPA auto-resize events
kubectl get events -A --field-selector reason=VpaUpdated
Related Docs
- Monitoring README — metrics-server vs Prometheus pipelines
- VPA component README
- Kyverno VPA policy
Last Updated: 2026-02-28 Cluster: talos-prod-cluster (K8s v1.35.1, Talos v1.12.4)