Skip to content

pvc-plumber โ€” Cheat Sheet ๐Ÿƒ

One-page poster. Full intro: pvc-plumber-start-here.md.

๐Ÿ“ Current state (2026-06-01)

v4.0.1 permissive ยท 24 PVCs / 18 namespaces managed ยท 24/24 DR_COMPLETE ยท 24/24 DR_COMPLETE ยท 4 restore drills passed ยท Kyverno removed ยท Longhorn 0 faulted/0 degraded/0 rebuilding.

๐Ÿท๏ธ The 3 labels that matter

namespace:  pvc-plumber.io/managed-namespace: "true"     # write-gate
PVC:        pvc-plumber.io/enabled:           "true"     # opt-in
PVC:        pvc-plumber.io/tier:              "hourly"   # cadence  (+ manage-volsync: "true")

๐Ÿงฑ The 4 systems involved

System Job
Argo desired state (PVC + labels) from Git
pvc-plumber owns RS/RD wiring + /audit
VolSync + Kopia moves bytes โ†’ RustFS S3
Longhorn live volume (CSI), snapshots for clones

๐Ÿ”Ž 5 questions when debugging

  1. Is the namespace gated? (managed-namespace=true)
  2. Is the PVC opted in? (enabled + manage-volsync + tier)
  3. Do RS and RD exist and are both managed-by=pvc-plumber?
  4. Does the PVC have dataSourceRef โ†’ <pvc>-dst (else it recreates EMPTY)?
  5. Is the last backup Successful? โ†’ check /audit (already-matches / stale=false).

๐Ÿงช Restore drill quick path

sentinel (embed OLD uid + sha256) โ†’ manual RS backup โ†’ RD refresh (new latestImage)
โ†’ scale app to 0 โ†’ delete PVC โ†’ recreate (with dsr!) โ†’ verify sentinel byte-identical
โ†’ scale up โ†’ restore RS schedule + RD restore-once trigger
โš ๏ธ Wait for application.status.sync.revision == dsr commit before deleting (or stale render recreates it empty). pvc-plumber does not revert your manual trigger patches โ€” restore them yourself.

๐Ÿ’ฅ Common failure modes

Symptom Cause / fix
PVC recreates empty no dataSourceRef โ†’ add it (or mark EMPTY_BY_DESIGN)
ComparisonError ... PVC is invalid: Forbidden added dsr to a Bound PVC (immutable) โ€” clears on delete
scale-up sync "Succeeded" but replicas stay 0 Argo stale cluster cache โ†’ hard-refresh, wait OutOfSync, re-sync
double-recreate needed deleted before render cache caught up โ†’ wait for reconciled rev
scheduled backups stopped after a drill RS left on manual trigger โ†’ restore schedule
restored volume degraded briefly Longhorn replica rebuild โ€” wait, don't touch replicas

๐Ÿšซ Never-migrate list

  • CNPG databases โ†’ Barman โ†’ S3 (native).
  • PostHog PVCs โ†’ backup-exempt (disposable).
  • redis-instance/redis-master-0 โ†’ backup-exempt (disposable).

โญ๏ธ Next ops tasks

  1. Kopia maintenance โ€” healthy; full not needed (docs/domains/storage/kopia-maintenance-plan.md).
  2. Rollback PV cleanup โ€” 7 retained; reclaim reset-batch first, per-PV approval.
  3. Longhorn replica/storage policy review (docs/domains/storage/architecture-future.md).
  4. (future) pvc-plumber v5 strict-mode plan โ€” not shipped.