Zero-Touch PVC Backup and Restore
This document describes the automated backup and restore system for Kubernetes PersistentVolumeClaims (PVCs).
Overview
The system automatically backs up PVCs to NFS storage on TrueNAS using Kopia and restores them on disaster recovery or app re-deployment. Simply add a label to your PVC and backups happen automatically.
Why Kopia?
- Fast: Parallel uploads, zstd compression
- Efficient: Content-defined chunking with deduplication
- Encrypted: All data encrypted with KOPIA_PASSWORD before storage
- Maintained: Active development, used by VolSync maintainers
Why NFS over S3?
- Simpler: No S3 credentials to manage per-namespace
- Faster: Direct filesystem access, no HTTP overhead
- Browsable: Kopia UI can directly browse backups
- Consistent: Same approach as home-ops reference implementation
Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ 1Password │────▶│ External Secrets│────▶│ Secrets │
│ (rustfs) │ │ Operator │ │ (KOPIA_PASSWORD│
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
┌─────────────────┐ ┌─────────────────┐ │
│ MutatingAdm- │────▶│ VolSync │◀───────────┘
│ issionPolicy │ │ Mover Jobs │
│ (NFS inject) │ └────────┬────────┘
└─────────────────┘ │
▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ TrueNAS NFS │◀────│ Kyverno │◀────│ pvc-plumber │
│ volsync-kopia- │ │ ClusterPolicy │ │ (backup check) │
│ nfs │ │ (generates │ └─────────────────┘
└─────────────────┘ │ resources) │
└─────────────────┘
Automatic Restore Flow
When a PVC is created with a backup label, the system automatically checks if a backup exists and restores it:
PVC Created ──▶ Kyverno Rule 0 ──▶ Calls pvc-plumber ──▶ Backup exists?
│
┌─────────────────────────────────────────┴─────────┐
│ YES │ NO
▼ ▼
Add dataSourceRef No mutation
to PVC spec (fresh PVC)
│
▼
VolSync populates PVC
from Kopia backup
│
▼
PVC becomes Bound ──▶ Kyverno Rule 2 ──▶ Creates ReplicationSource
(backup schedule starts)
Key protections: - Fail-closed gate: PVC creation denied if PVC Plumber is unreachable (prevents empty PVCs during disaster recovery) - Backup ReplicationSource only created AFTER PVC is Bound (prevents backup/restore conflicts) - Restore uses VolumePopulator pattern (dataSourceRef) for atomic restore
Components
1. TrueNAS NFS Storage
- Server:
192.168.10.133 - Path:
/mnt/BigTank/k8s/volsync-kopia-nfs - Encryption: Kopia encrypts all data with KOPIA_PASSWORD
2. Kyverno NFS Injection Policy
- Automatically injects NFS mount into all VolSync mover jobs
- Mounts
/repositoryfrom TrueNAS NFS share - No per-app configuration needed
3. pvc-plumber
- HTTP service that checks if a backup exists in Kopia repository
- Called by Kyverno before creating a PVC to determine if restore is needed
- Image:
ghcr.io/mitchross/pvc-plumber - Endpoint:
GET /exists/{namespace}/{pvc}returns{"exists": true/false}
4. Kyverno ClusterPolicy
- Triggers on PVCs with label
backup: hourlyorbackup: daily - Rule 0 (validate, FAIL-CLOSED): Calls pvc-plumber
/readyz; if unreachable, denies PVC creation to prevent data loss during disaster recovery - Rule 1 (mutate): Calls pvc-plumber
/exists; if backup exists, addsdataSourceRefto trigger restore - Rule 2 (generate): Creates ExternalSecret (fetches KOPIA_PASSWORD from 1Password)
- Rule 3 (generate): Creates ReplicationSource (backup schedule) - only after PVC is Bound
- Rule 4 (generate): Creates ReplicationDestination (restore capability)
4a. Kyverno ClusterCleanupPolicy (Orphan Cleanup)
- Runs every 15 minutes to clean up orphaned backup resources
- Targets ReplicationSource, ReplicationDestination, and ExternalSecret with labels
app.kubernetes.io/managed-by=kyvernoandvolsync.backup/pvc - Checks if the corresponding PVC still has
backup: hourlyorbackup: dailylabel - If label was removed or PVC no longer exists, deletes the orphaned resources
- Prevents stale backup/restore jobs from running after backups are disabled
5. VolSync
- Performs actual backup/restore operations using Kopia
- Uses Longhorn snapshots for consistent backups
- Stores data on NFS with Kopia encryption and zstd compression
6. Kopia UI (Optional)
- Web interface to browse backups
- Accessible at
kopia-ui.{domain} - Mounts same NFS share as VolSync
How to Disable Backup for a PVC
Remove the backup label from the PVC:
The volsync-orphan-cleanup ClusterCleanupPolicy runs every 15 minutes and automatically deletes the orphaned ReplicationSource, ReplicationDestination, and ExternalSecret. No manual cleanup needed.
Note: Removing the label does NOT delete existing backups from NFS. The Kopia repository on TrueNAS retains all previous snapshots. To re-enable backups later, simply re-add the label.
How to Enable Backup for a PVC
Add a backup label to your PVC:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-data
namespace: my-app
labels:
backup: "hourly" # Backups every hour
# OR
backup: "daily" # Backups at 2am daily
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 10Gi
That's it! Kyverno automatically generates all required resources.
Backup Schedules
| Label | Schedule | Retention |
|---|---|---|
backup: hourly |
Every hour (0 * * * *) | 24 hourly, 7 daily, 4 weekly, 2 monthly |
backup: daily |
2am daily (0 2 * * *) | 24 hourly, 7 daily, 4 weekly, 2 monthly |
1Password Configuration
The rustfs item in 1Password must contain:
| Field | Purpose |
|---|---|
kopia_password |
Kopia repository encryption key |
Generated Secret Contents
Kyverno generates a secret per-PVC with:
| Key | Value | Purpose |
|---|---|---|
KOPIA_REPOSITORY |
filesystem:///repository |
Repository type and path |
KOPIA_FS_PATH |
/repository |
Filesystem path |
KOPIA_PASSWORD |
(from 1Password) | Repository encryption |
NFS Repository Structure
/mnt/BigTank/k8s/volsync-kopia-nfs/
├── kopia.repository # Kopia repository config
├── kopia.blobcfg # Blob storage config
├── p/ # Pack files (ALL deduplicated data from ALL PVCs)
├── q/ # Index blobs
├── n/ # Manifest blobs (snapshots tagged by namespace/pvc-name)
└── x/ # Session blobs
All PVC backups share the same Kopia repository, with snapshots tagged by namespace/pvc-name.
Cross-PVC Deduplication
This shared repository design is a deliberate choice. Kopia uses content-defined chunking — files are split into variable-size chunks based on content boundaries, and each chunk is stored by its hash. If the same chunk exists anywhere in the repository (from any PVC, any namespace), it's stored only once.
What this means in practice: - Delete and recreate an app → new PVC backs up → Kopia finds all chunks already exist → near-instant backup, almost zero new storage - Multiple apps with similar files (configs, timezone data, shared libraries) → one copy - Incremental backups only store changed chunks, not changed files - Storage grows by unique data, not by number of PVCs
Why not S3 + Restic? VolSync also supports Restic to S3, but each PVC gets its own separate Restic repository — zero cross-PVC deduplication. Delete and recreate an app = full backup from scratch. More storage, more bandwidth, slower.
Why Two Backup Systems (NFS for PVCs, S3 for Databases)
For CloudNativePG database recovery runbooks (including ArgoCD race-condition handling) see docs/cnpg-disaster-recovery.md. For AI-assisted incident guidance, use LLM Recovery Prompt Templates.
PVC backups → NFS + Kopia because: - VolSync's Kopia mover needs filesystem access for content-defined chunking and dedup - Direct NFS gives 10Gbps to TrueNAS with no HTTP overhead - No per-namespace S3 credentials — Kyverno just injects the NFS mount - One shared repository = cross-PVC deduplication (see above)
Database backups → S3 + Barman because:
- CNPG's built-in backup only supports Barman, and Barman speaks S3 (not NFS)
- Barman does SQL-aware backups (pg_basebackup + continuous WAL archiving) for point-in-time recovery
- Filesystem-level snapshots of running Postgres can be inconsistent without the WAL stream
- CNPG has no native NFS backup option
Each tool uses its native backup mechanism. Forcing either into the other's model would mean worse backups.
Manual Restore
To manually trigger a restore:
- Delete the existing PVC
- Recreate the PVC with the same name and backup label
- VolSync will restore from the latest snapshot
Or use the ReplicationDestination directly:
# Trigger manual restore
kubectl patch replicationdestination <pvc-name>-restore -n <namespace> \
--type merge -p '{"spec":{"trigger":{"manual":"restore-$(date +%s)"}}}'
Troubleshooting
Check Backup Status
# List all ReplicationSources
kubectl get replicationsource -A
# Check specific backup
kubectl get replicationsource <pvc-name>-backup -n <namespace> -o yaml
Check Mover Pod Logs
# Find mover pod
kubectl get pods -n <namespace> | grep volsync
# View logs
kubectl logs -n <namespace> <volsync-pod-name>
Verify NFS Mount
# Check if NFS is mounted in mover pod
kubectl exec -n <namespace> <volsync-pod-name> -- df -h /repository
Secret Missing or Invalid
# Check ExternalSecret status
kubectl get externalsecret -n <namespace>
# Check secret contents
kubectl get secret volsync-<pvc-name> -n <namespace> -o yaml
Kyverno Not Generating Resources
# Check policy status
kubectl get clusterpolicy volsync-pvc-backup-restore
# Check policy events
kubectl describe clusterpolicy volsync-pvc-backup-restore
# Trigger regeneration by updating PVC label
kubectl label pvc <pvc-name> -n <namespace> kyverno-trigger=$(date +%s) --overwrite
Excluded Namespaces
The following namespaces are excluded from automatic backup:
- kube-system
- volsync-system
- kyverno
Files
| File | Purpose |
|---|---|
infrastructure/controllers/pvc-plumber/ |
Backup existence checker service |
infrastructure/storage/volsync/ |
VolSync Helm chart |
infrastructure/controllers/kyverno/policies/volsync-nfs-inject.yaml |
Injects NFS mount into mover pods |
infrastructure/storage/kopia-ui/ |
Kopia web UI for browsing backups |
infrastructure/controllers/kyverno/policies/volsync-pvc-backup-restore.yaml |
Kyverno backup/restore policy |
infrastructure/controllers/kyverno/policies/volsync-orphan-cleanup.yaml |
Cleanup orphaned backup resources |
monitoring/prometheus-stack/volsync-alerts.yaml |
Prometheus alerting rules |
infrastructure/database/cloudnative-pg/ |
CNPG database clusters (separate backup path) |
Database Backups (CNPG — Separate System)
The PVC backup system above covers application data. Database backups use a completely separate path:
| PVC Backups | Database Backups | |
|---|---|---|
| Tool | VolSync + Kopia | CNPG + Barman |
| Destination | TrueNAS NFS | RustFS S3 (s3://postgres-backups/cnpg/) |
| Trigger | Kyverno auto-generates on PVC label | CNPG ScheduledBackup resource |
| Auto-restore | Yes (PVC Plumber + Kyverno) | No — manual recovery required |
| Schedule | Hourly or daily (per PVC label) | Hourly + continuous WAL archiving |
Why databases don't use the PVC backup system
- Filesystem-level backup of a running Postgres database can be inconsistent
- Barman uses
pg_basebackup+ WAL archiving for point-in-time recovery - CNPG manages its own PVCs (names are auto-generated, can't add Kyverno labels)
Database disaster recovery
After a cluster nuke, CNPG creates fresh empty databases — it does NOT auto-restore from Barman backups. Recovery requires manually bypassing ArgoCD (SSA + CNPG webhook conflict prevents recovery mode through GitOps).
Database Applications are managed by a separate ApplicationSet (database-appset.yaml) with selfHeal: false, so skip-reconcile annotations persist during recovery. This avoids the need to scale down ArgoCD controllers.
See docs/cnpg-disaster-recovery.md for full recovery procedures.