Skip to content

Zero-Touch PVC Backup and Restore

This document describes the automated backup and restore system for Kubernetes PersistentVolumeClaims (PVCs).

The system automatically backs up PVCs to NFS storage on TrueNAS using Kopia and restores them on disaster recovery or app re-deployment. Simply add a label to your PVC and backups happen automatically.

  • Fast: Parallel uploads, zstd compression
  • Efficient: Content-defined chunking with deduplication
  • Encrypted: All data encrypted with KOPIA_PASSWORD before storage
  • Maintained: Active development, used by VolSync maintainers
  • Simpler: No S3 credentials to manage per-namespace
  • Faster: Direct filesystem access, no HTTP overhead
  • Browsable: Kopia UI can directly browse backups
  • Consistent: Same approach as home-ops reference implementation
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ 1Password │────▶│ External Secrets│────▶│ Secrets │
│ (rustfs) │ │ Operator │ │ (KOPIA_PASSWORD│
└─────────────────┘ └─────────────────┘ └─────────────────┘
┌─────────────────┐ ┌─────────────────┐ │
│ MutatingAdm- │────▶│ VolSync │◀───────────┘
│ issionPolicy │ │ Mover Jobs │
│ (NFS inject) │ └────────┬────────┘
└─────────────────┘ │
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ TrueNAS NFS │◀────│ Kyverno │◀────│ pvc-plumber │
│ volsync-kopia- │ │ ClusterPolicy │ │ (backup check) │
│ nfs │ │ (generates │ └─────────────────┘
└─────────────────┘ │ resources) │
└─────────────────┘

When a PVC is created with a backup label, the system automatically checks if a backup exists and restores it:

PVC Created ──▶ Kyverno Rule 0 ──▶ Calls pvc-plumber ──▶ Backup exists?
┌─────────────────────────────────────────┴─────────┐
│ YES │ NO
▼ ▼
Add dataSourceRef No mutation
to PVC spec (fresh PVC)
VolSync populates PVC
from Kopia backup
PVC becomes Bound ──▶ Kyverno Rule 2 ──▶ Creates ReplicationSource
(backup schedule starts)

Key protections:

  • Fail-closed gate: PVC creation denied if PVC Plumber is unreachable (prevents empty PVCs during disaster recovery)
  • Backup ReplicationSource only created AFTER PVC is Bound (prevents backup/restore conflicts)
  • Restore uses VolumePopulator pattern (dataSourceRef) for atomic restore
  • Server: 192.168.10.133
  • Path: /mnt/BigTank/k8s/volsync-kopia-nfs
  • Encryption: Kopia encrypts all data with KOPIA_PASSWORD
  • Automatically injects NFS mount into all VolSync mover jobs
  • Mounts /repository from TrueNAS NFS share
  • No per-app configuration needed
  • HTTP service that checks if a backup exists in Kopia repository
  • Called by Kyverno before creating a PVC to determine if restore is needed
  • Image: ghcr.io/mitchross/pvc-plumber
  • Endpoint: GET /exists/{namespace}/{pvc} returns {"exists": true/false}
  • Triggers on PVCs with label backup: hourly or backup: daily
  • Rule 0 (validate, FAIL-CLOSED): Calls pvc-plumber /readyz; if unreachable, denies PVC creation to prevent data loss during disaster recovery
  • Rule 1 (mutate): Calls pvc-plumber /exists; if backup exists, adds dataSourceRef to trigger restore
  • Rule 2 (generate): Creates ExternalSecret (fetches KOPIA_PASSWORD from 1Password)
  • Rule 3 (generate): Creates ReplicationSource (backup schedule) - only after PVC is Bound
  • Rule 4 (generate): Creates ReplicationDestination (restore capability)

4a. Kyverno ClusterCleanupPolicy (Orphan Cleanup)

Section titled “4a. Kyverno ClusterCleanupPolicy (Orphan Cleanup)”
  • Runs every 15 minutes to clean up orphaned backup resources
  • Targets ReplicationSource, ReplicationDestination, and ExternalSecret with labels app.kubernetes.io/managed-by=kyverno and volsync.backup/pvc
  • Checks if the corresponding PVC still has backup: hourly or backup: daily label
  • If label was removed or PVC no longer exists, deletes the orphaned resources
  • Prevents stale backup/restore jobs from running after backups are disabled
  • Performs actual backup/restore operations using Kopia
  • Uses Longhorn snapshots for consistent backups
  • Stores data on NFS with Kopia encryption and zstd compression
  • Web interface to browse backups
  • Accessible at kopia-ui.{domain}
  • Mounts same NFS share as VolSync

Remove the backup label from the PVC:

metadata:
labels:
# backup: "hourly" # Comment out or remove to disable backup

The volsync-orphan-cleanup ClusterCleanupPolicy runs every 15 minutes and automatically deletes the orphaned ReplicationSource, ReplicationDestination, and ExternalSecret. No manual cleanup needed.

Note: Removing the label does NOT delete existing backups from NFS. The Kopia repository on TrueNAS retains all previous snapshots. To re-enable backups later, simply re-add the label.

Add a backup label to your PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-data
namespace: my-app
labels:
backup: "hourly" # Backups every hour
# OR
backup: "daily" # Backups at 2am daily
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 10Gi

That’s it! Kyverno automatically generates all required resources.

LabelScheduleRetention
backup: hourlyEvery hour (0 * * * *)24 hourly, 7 daily, 4 weekly, 2 monthly
backup: daily2am daily (0 2 * * *)24 hourly, 7 daily, 4 weekly, 2 monthly

The rustfs item in 1Password must contain:

FieldPurpose
kopia_passwordKopia repository encryption key

Kyverno generates a secret per-PVC with:

KeyValuePurpose
KOPIA_REPOSITORYfilesystem:///repositoryRepository type and path
KOPIA_FS_PATH/repositoryFilesystem path
KOPIA_PASSWORD(from 1Password)Repository encryption
/mnt/BigTank/k8s/volsync-kopia-nfs/
├── kopia.repository # Kopia repository config
├── kopia.blobcfg # Blob storage config
├── p/ # Pack files (ALL deduplicated data from ALL PVCs)
├── q/ # Index blobs
├── n/ # Manifest blobs (snapshots tagged by namespace/pvc-name)
└── x/ # Session blobs

All PVC backups share the same Kopia repository, with snapshots tagged by namespace/pvc-name.

This shared repository design is a deliberate choice. Kopia uses content-defined chunking — files are split into variable-size chunks based on content boundaries, and each chunk is stored by its hash. If the same chunk exists anywhere in the repository (from any PVC, any namespace), it’s stored only once.

What this means in practice:

  • Delete and recreate an app → new PVC backs up → Kopia finds all chunks already exist → near-instant backup, almost zero new storage
  • Multiple apps with similar files (configs, timezone data, shared libraries) → one copy
  • Incremental backups only store changed chunks, not changed files
  • Storage grows by unique data, not by number of PVCs

Why not S3 + Restic? VolSync also supports Restic to S3, but each PVC gets its own separate Restic repository — zero cross-PVC deduplication. Delete and recreate an app = full backup from scratch. More storage, more bandwidth, slower.

Why Two Backup Systems (NFS for PVCs, S3 for Databases)

Section titled “Why Two Backup Systems (NFS for PVCs, S3 for Databases)”

For CloudNativePG database recovery runbooks (including ArgoCD race-condition handling) see docs/cnpg-disaster-recovery.md. For AI-assisted incident guidance, use LLM Recovery Prompt Templates.

PVC backups → NFS + Kopia because:

  • VolSync’s Kopia mover needs filesystem access for content-defined chunking and dedup
  • Direct NFS gives 10Gbps to TrueNAS with no HTTP overhead
  • No per-namespace S3 credentials — Kyverno just injects the NFS mount
  • One shared repository = cross-PVC deduplication (see above)

Database backups → S3 + Barman because:

  • CNPG’s built-in backup only supports Barman, and Barman speaks S3 (not NFS)
  • Barman does SQL-aware backups (pg_basebackup + continuous WAL archiving) for point-in-time recovery
  • Filesystem-level snapshots of running Postgres can be inconsistent without the WAL stream
  • CNPG has no native NFS backup option

Each tool uses its native backup mechanism. Forcing either into the other’s model would mean worse backups.

To manually trigger a restore:

  1. Delete the existing PVC
  2. Recreate the PVC with the same name and backup label
  3. VolSync will restore from the latest snapshot

Or use the ReplicationDestination directly:

Terminal window
# Trigger manual restore
kubectl patch replicationdestination <pvc-name>-restore -n <namespace> \
--type merge -p '{"spec":{"trigger":{"manual":"restore-$(date +%s)"}}}'
Terminal window
# List all ReplicationSources
kubectl get replicationsource -A
# Check specific backup
kubectl get replicationsource <pvc-name>-backup -n <namespace> -o yaml
Terminal window
# Find mover pod
kubectl get pods -n <namespace> | grep volsync
# View logs
kubectl logs -n <namespace> <volsync-pod-name>
Terminal window
# Check if NFS is mounted in mover pod
kubectl exec -n <namespace> <volsync-pod-name> -- df -h /repository
Terminal window
# Check ExternalSecret status
kubectl get externalsecret -n <namespace>
# Check secret contents
kubectl get secret volsync-<pvc-name> -n <namespace> -o yaml
Terminal window
# Check policy status
kubectl get clusterpolicy volsync-pvc-backup-restore
# Check policy events
kubectl describe clusterpolicy volsync-pvc-backup-restore
# Trigger regeneration by updating PVC label
kubectl label pvc <pvc-name> -n <namespace> kyverno-trigger=$(date +%s) --overwrite

The following namespaces are excluded from automatic backup:

  • kube-system
  • volsync-system
  • kyverno
FilePurpose
infrastructure/controllers/pvc-plumber/Backup existence checker service
infrastructure/storage/volsync/VolSync Helm chart
infrastructure/controllers/kyverno/policies/volsync-nfs-inject.yamlInjects NFS mount into mover pods
infrastructure/storage/kopia-ui/Kopia web UI for browsing backups
infrastructure/controllers/kyverno/policies/volsync-pvc-backup-restore.yamlKyverno backup/restore policy
infrastructure/controllers/kyverno/policies/volsync-orphan-cleanup.yamlCleanup orphaned backup resources
monitoring/prometheus-stack/volsync-alerts.yamlPrometheus alerting rules
infrastructure/database/cloudnative-pg/CNPG database clusters (separate backup path)

Database Backups (CNPG — Separate System)

Section titled “Database Backups (CNPG — Separate System)”

The PVC backup system above covers application data. Database backups use a completely separate path:

PVC BackupsDatabase Backups
ToolVolSync + KopiaCNPG + Barman
DestinationTrueNAS NFSRustFS S3 (s3://postgres-backups/cnpg/)
TriggerKyverno auto-generates on PVC labelCNPG ScheduledBackup resource
Auto-restoreYes (PVC Plumber + Kyverno)No — manual recovery required
ScheduleHourly or daily (per PVC label)Hourly + continuous WAL archiving

Why databases don’t use the PVC backup system

Section titled “Why databases don’t use the PVC backup system”
  • Filesystem-level backup of a running Postgres database can be inconsistent
  • Barman uses pg_basebackup + WAL archiving for point-in-time recovery
  • CNPG manages its own PVCs (names are auto-generated, can’t add Kyverno labels)

After a cluster nuke, CNPG creates fresh empty databases — it does NOT auto-restore from Barman backups. Recovery requires manually bypassing ArgoCD (SSA + CNPG webhook conflict prevents recovery mode through GitOps).

Database Applications are managed by a separate ApplicationSet (database-appset.yaml) with selfHeal: false, so skip-reconcile annotations persist during recovery. This avoids the need to scale down ArgoCD controllers.

See docs/cnpg-disaster-recovery.md for full recovery procedures.