Skip to content

VPA Resource Optimization Guide

How to use VPA to right-size Kubernetes resource requests based on actual workload behavior.

Everything is automatic. A Kyverno ClusterPolicy auto-creates VPA resources for every workload in the cluster. Infrastructure/monitoring namespaces get updateMode: "Off" (recommend only). User app namespaces get updateMode: "Initial" (sets optimal resources at pod creation).

Terminal window
# Human-readable VPA report
./scripts/vpa-report.sh
# Filter to one namespace
./scripts/vpa-report.sh argocd
# Or raw kubectl one-liner
kubectl get vpa -A -o custom-columns=\
NS:.metadata.namespace,\
NAME:.metadata.name,\
CPU:.status.recommendation.containerRecommendations[0].target.cpu,\
MEM:.status.recommendation.containerRecommendations[0].target.memory

Go to https://grafana.vanillax.me and search for “VPA”. The dashboard shows time-series graphs of VPA recommendations with historical trends.

Look for:

  • Current request way below “Target” = pod is starved, increase it
  • Current request way above “Target” = wasting resources, decrease it
  • Current request below “Lower Bound” = pod is actively throttled, fix ASAP

Step 4: Apply changes (infrastructure only)

Section titled “Step 4: Apply changes (infrastructure only)”

Infrastructure namespaces use updateMode: "Off" — edit the app’s values.yaml in Git, update the resources: block, push, ArgoCD applies it. Add a comment explaining why:

# VPA-optimized (2026-02-28) — target was 2000m, previous 500m
resources:
requests:
cpu: 2000m
memory: 1Gi

User app namespaces use updateMode: "Initial" — VPA sets optimal resources when pods are created. No manual intervention needed.

VPA recommendations update continuously. Check back in a week to see if the new values are good. Don’t change things daily.


kubelet /metrics/resource
metrics-server (provides metrics.k8s.io API)
VPA Recommender (reads metrics, writes recommendations to VPA .status)
Kyverno ClusterPolicy (vpa-auto-generate)
│ • watches Deployments, StatefulSets, DaemonSets
│ • auto-creates VPA per workload
│ • infra/monitoring namespaces → updateMode: "Off"
│ • user app namespaces → updateMode: "Initial"
│ • GPU workloads → updateMode: "Off"
VPA resources (one per workload)
├─ Infra namespaces: recommend-only (manual review)
└─ App namespaces: set at creation (Initial)
Human reviews infra → updates values.yaml → Git push → ArgoCD applies
VPA Admission Controller sets resources at pod creation → no human intervention needed

Kyverno is the sole VPA creator. The vpa-auto-generate ClusterPolicy watches all workloads and generates VPA resources automatically. No manual VPA manifests needed.


ComponentChartNamespaceLocation
metrics-servermetrics-server/metrics-serverkube-systeminfrastructure/controllers/metrics-server/
VPAkubernetes.github.io/autoscalervertical-pod-autoscalerinfrastructure/controllers/vertical-pod-autoscaler/
Kyverno VPA policykyvernoinfrastructure/controllers/kyverno/policies/vpa-auto-generate.yaml

metrics-server and VPA are deployed via the Infrastructure ApplicationSet (Wave 4). The Kyverno policy is deployed as part of Kyverno (Wave 3).

ComponentPurpose
RecommenderAnalyzes metrics, generates recommendations
UpdaterApplies changes when mode is not Off (evicts or in-place resizes)
Admission ControllerSets resources on new pods when mode is not Off
Namespace TypeUpdate ModeBehavior
Infrastructure (argocd, cilium, etc.)OffRecommend only — manual GitOps workflow
Monitoring (prometheus-stack, loki-stack, etc.)OffRecommend only — manual GitOps workflow
GPU workloads (runtimeClassName: nvidia)OffRecommend only — VPA can’t manage GPU resources
User apps (everything else)InitialSet optimal resources at pod creation

The scripts/vpa-report.sh script provides a formatted table of all VPA recommendations with human-readable values.

Terminal window
# All namespaces
./scripts/vpa-report.sh
# Single namespace
./scripts/vpa-report.sh argocd

Example output:

==========================================
VPA Resource Recommendations Report
==========================================
NAMESPACE WORKLOAD CONTAINER CPU TGT CPU RANGE MEM TGT MEM RANGE
-------------------------------------------------------------------------------------------------------------------------------------------------
argocd Deployment/argocd-server server 23m 12m-100m 175Mi 88Mi-700Mi
argocd Deployment/argocd-repo-server repo-server 2975m 1488m-11900m 523Mi 262Mi-2.0Gi
...
Total: 42 containers with VPA recommendations
Action needed if your current request is:
< lowerBound → INCREASE NOW (pod is being throttled)
< target → INCREASE (under-provisioned)
≈ target → KEEP (well-tuned)
> 2x target → DECREASE (over-provisioned)
Terminal window
# Quick overview: all VPA targets across the cluster
kubectl get vpa -A -o custom-columns=\
NAMESPACE:.metadata.namespace,\
NAME:.metadata.name,\
CPU:.status.recommendation.containerRecommendations[0].target.cpu,\
MEM:.status.recommendation.containerRecommendations[0].target.memory
# Detailed view for a specific namespace
kubectl get vpa -n argocd -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{range .status.recommendation.containerRecommendations[*]}{" "}{.containerName}{": cpu="}{.target.cpu}{" mem="}{.target.memory}{"\n"}{end}{end}'
# Full detail for a specific VPA
kubectl describe vpa <name> -n <namespace>
# Current resource usage vs requests (side-by-side comparison)
kubectl top pods -n <namespace>
kubectl get deploy <name> -n <ns> -o jsonpath='{.spec.template.spec.containers[0].resources}'
kubectl get vpa <name> -n <ns> -o jsonpath='{.status.recommendation.containerRecommendations[0].target}'

VPA recommendations include four values per container:

ValueMeaningUse For
lowerBoundMinimum to avoid throttling/OOMRed flag if current request is below this
targetOptimal request based on observed usageSet requests: to this value
upperBoundPeak observed consumptionInforms limits: setting
uncappedTargetIdeal ignoring any VPA min/max constraintsSame as target when no constraints are set

Memory values are in bytes. Quick conversions:

  • 104857600 = 100Mi
  • 268435456 = 256Mi
  • 536870912 = 512Mi
  • 1073741824 = 1Gi
  • 1610612736 = 1.5Gi
SituationActionPriority
Current request < lowerBoundINCREASE NOWPod is being throttled or OOM-killed
Current request < targetINCREASEUnder-provisioned, degraded performance
Current request within 20% of targetKEEPAlready well-tuned
Current request > 1.5x targetDECREASEOver-provisioned, wasting resources
Current request > 5x targetDECREASEHeavily over-provisioned
  • Wait at least 7 days before trusting VPA numbers. Initial recommendations are noisy.
  • Review weekly, not daily. Over-correcting defeats the purpose.
  • Re-check after major changes (new features, traffic spikes, version upgrades). VPA is backward-looking.
  • Upper bounds stabilize over ~14 days. They’ll be very wide initially.

For Infrastructure Namespaces (updateMode: Off)

Section titled “For Infrastructure Namespaces (updateMode: Off)”
  1. Read the VPA recommendation (./scripts/vpa-report.sh or Grafana dashboard)
  2. Update the app’s values.yaml with new resource requests
  3. Add a comment documenting the VPA data and reasoning:
# VPA-optimized (YYYY-MM-DD)
# VPA target: cpu Xm, memory Y
# Previous: cpu Am (reason for change)
resources:
requests:
cpu: Xm # Match VPA target
memory: Y # Match VPA target + buffer
limits:
cpu: 2Xm # 2x request for burst
memory: 2Y # 2x request for spikes
  1. Git commit and push — ArgoCD applies via GitOps

For User App Namespaces (updateMode: Initial)

Section titled “For User App Namespaces (updateMode: Initial)”

No manual action needed. VPA automatically:

  1. Watches pod resource usage and calculates optimal requests
  2. When a pod is recreated (deploy, scale, rollout), the admission controller sets optimal resources
  3. Running pods are not modified — changes take effect on next restart
FieldRule of Thumb
requests.cpuVPA target (or 1.1-1.2x for buffer)
requests.memoryVPA target (or 1.2-1.5x — memory OOM is fatal, CPU throttling is not)
limits.cpu2-4x request (allows burst). Or omit entirely to let pods burst freely.
limits.memory2-4x request (or match VPA upperBound if spikes are expected)

CPU-Bound (Helm rendering, image processing)

Section titled “CPU-Bound (Helm rendering, image processing)”

High CPU target, low memory target. Increase CPU generously, keep memory modest.

Example: argocd-repo-server
VPA target: cpu 2975m, memory 523Mi
Action: cpu 3000m request, memory 768Mi request

Low CPU target, high memory target. Increase memory, keep CPU low.

Example: Redis
VPA target: cpu 23m, memory 100Mi
Action: cpu 50m request, memory 128Mi request

Both CPU and memory very low. Set modest requests with generous limits for occasional spikes.

Example: argocd-server
VPA target: cpu 23m, memory 175Mi
Action: cpu 50m request, memory 256Mi request

VPA only tracks CPU/memory, not GPU. Recommendations will show low CPU/memory because compute happens on GPU VRAM. Set CPU/memory based on data loading needs, not inference. GPU workloads automatically get updateMode: "Off" via the Kyverno policy.


controller: cpu: 1000m, memory: 1Gi # UNDER-PROVISIONED (below lowerBound!)
repo-server: cpu: 1000m, memory: 1Gi # UNDER-PROVISIONED 3x
server: cpu: 500m, memory: 512Mi # OVER-PROVISIONED 20x
applicationSet: cpu: 250m, memory: 256Mi # OVER-PROVISIONED 5x
redis: cpu: 100m, memory: 128Mi # OVER-PROVISIONED 4x
Total: 2.85 CPU, 2.9Gi memory
controller: target: 2048m CPU, 1.25Gi memory (lowerBound: 1021m > current 1000m!)
repo-server: target: 2975m CPU, 523Mi memory
server: target: 23m CPU, 175Mi memory
applicationSet: target: 49m CPU, 100Mi memory
redis: target: 23m CPU, 100Mi memory
controller: cpu: 2000m, memory: 4Gi # DOUBLED CPU, quadrupled memory
repo-server: cpu: 3000m, memory: 768Mi # TRIPLED CPU, halved memory
server: cpu: 50m, memory: 512Mi # REDUCED 10x CPU
applicationSet: cpu: 100m, memory: 128Mi # REDUCED 2.5x
redis: cpu: 50m, memory: 128Mi # REDUCED 2x
Total: 5.2 CPU, 5.5Gi memory

Result: +2.35 CPU where it was needed (controller/repo-server), memory properly sized, no more throttling.

See infrastructure/controllers/argocd/values.yaml for the actual implementation with inline VPA documentation.


This cluster runs K8s v1.35.4 where In-Place Pod Resize is GA. VPA uses updateMode: "Initial" which sets optimal resources at pod creation time only — it does not resize or evict running pods.

  1. VPA Admission Controller intercepts new pod creation
  2. Sets resource requests based on current VPA recommendation
  3. Pod starts with right-sized resources from the beginning
  4. Running pods are NOT modified — changes apply on next pod restart/recreation

The Kyverno vpa-auto-generate policy sets update modes automatically:

  • Infrastructure/monitoring: Off — changes go through GitOps review
  • User apps: Initial — resources set at pod creation
  • GPU workloads: Off — VPA can’t manage GPU resources

The vpa-auto-generate ClusterPolicy (infrastructure/controllers/kyverno/policies/vpa-auto-generate.yaml) watches for Deployment, StatefulSet, and DaemonSet resources and generates a matching VPA.

Three rules:

  1. generate-vpa-infra-off: Infrastructure/monitoring namespaces get updateMode: "Off"
  2. generate-vpa-gpu-off: GPU workloads (runtimeClassName: nvidia) get updateMode: "Off"
  3. generate-vpa-auto-tune: Everything else gets updateMode: "Initial"

Generated VPAs have ownerReferences set to the parent workload, so they’re automatically cleaned up when the workload is deleted.

  • kube-system — excluded from all rules
  • kyverno — excluded from all rules (prevents circular dependency)
  • volsync-system — excluded from all rules (transient mover jobs)
Terminal window
# See all Kyverno-managed VPAs
kubectl get vpa -A -l app.kubernetes.io/managed-by=kyverno
# Check a specific VPA's update mode
kubectl get vpa -n immich -o jsonpath='{.items[0].spec.updatePolicy.updateMode}'
# Expected: Initial (resources set at pod creation)
kubectl get vpa -n argocd -o jsonpath='{.items[0].spec.updatePolicy.updateMode}'
# Expected: Off

  • VPA needs ~5-10 minutes for initial data, 24+ hours for accuracy
  • Check metrics-server: kubectl top nodes (should return data)
  • Check VPA recommender: kubectl logs -n vertical-pod-autoscaler -l app.kubernetes.io/component=recommender
  • Check Kyverno background controller: kubectl get pods -n kyverno
  • Check Kyverno logs: kubectl logs -n kyverno -l app.kubernetes.io/component=background-controller
  • Verify the policy is ready: kubectl get clusterpolicy vpa-auto-generate
  • Check VPA CRDs are installed: kubectl get crd verticalpodautoscalers.autoscaling.k8s.io
  • Not enough data — wait 7-14 days
  • Workload changed recently — VPA is backward-looking
  • Check upperBound for peak usage context
  • Batch/cron workloads have spiky usage — use upperBound for limits
  • VPA target reflects steady-state, not initialization spikes
  • Set limits.memory well above requests.memory (2-4x)
  • Check startup memory with kubectl top pod during pod init
  • Kyverno is the sole VPA creator — if you see duplicates, check for manually created VPAs
  • Remove any hand-crafted VPA manifests from Git and let Kyverno manage them

A community VPA dashboard is auto-provisioned in Grafana under the Infrastructure folder:

DashboardGrafana.com IDWhat It Shows
K8s Autoscaling VPA22168Cluster overview with drill-down to pod-level VPA details (target, lower/upper bounds)

URL: https://grafana.vanillax.me → search for “VPA”

This dashboard reads VPA metrics exposed by kube-state-metrics Custom Resource State (kube_customresource_verticalpodautoscaler_*). Combined with vpa-report.sh, you have two ways to view VPA data:

  1. Grafana VPA dashboard — time-series graphs and historical trends
  2. CLI./scripts/vpa-report.sh for quick terminal output

Terminal window
# Human-readable VPA report
./scripts/vpa-report.sh
./scripts/vpa-report.sh <namespace>
# All VPA recommendations (cluster-wide)
kubectl get vpa -A -o custom-columns=\
NS:.metadata.namespace,\
NAME:.metadata.name,\
CPU:.status.recommendation.containerRecommendations[0].target.cpu,\
MEM:.status.recommendation.containerRecommendations[0].target.memory
# Current resource usage vs requests
kubectl top pods -n <namespace>
# Compare current requests vs VPA target
kubectl get deploy <name> -n <ns> -o jsonpath='{.spec.template.spec.containers[0].resources}'
kubectl get vpa <name> -n <ns> -o jsonpath='{.status.recommendation.containerRecommendations[0].target}'
# Check Kyverno VPA policy
kubectl get clusterpolicy vpa-auto-generate
kubectl describe clusterpolicy vpa-auto-generate
# Check VPA recommender
kubectl logs -n vertical-pod-autoscaler -l app.kubernetes.io/component=recommender
# List Kyverno-managed VPAs
kubectl get vpa -A -l app.kubernetes.io/managed-by=kyverno
# Monitor VPA auto-resize events
kubectl get events -A --field-selector reason=VpaUpdated


Last Updated: 2026-04-17 Cluster: talos-prod-cluster (K8s v1.35.4, Talos v1.13.0-rc.0)