Backup Restore
Backing Up and Restoring Your Cluster State
Backup etcd
-
Identify etcd endpoints: Typically located at
https://<etcd-endpoint>:2379. -
Use etcdctl to back up:
ETCDCTL_API=3 etcdctl snapshot save /path/to/backup/snapshot.db \ --endpoints=https://<etcd-endpoint>:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key -
Store backup: Securely store the snapshot (e.g., in a cloud storage service, S3 bucket) with versioning enabled.
Automating Backups
- CronJob for automated backups: Set up a Kubernetes CronJob or an external script to automate
etcdctl snapshotcommands at regular intervals.
Persistent Volume Backups
- Cloud provider snapshots: If you’re using cloud providers like AWS, GCP, or Azure, you can configure automatic snapshots for persistent volumes (e.g., AWS EBS, GCP PD) using either the cloud console or tools like Velero.
- Velero for PV backups:
-
Install and configure Velero:
velero install --provider <cloud-provider> --bucket <bucket-name> --secret-file <credentials-file> -
Backup a namespace:
velero backup create my-backup --include-namespaces <namespace>
-
Restoring the Cluster from Backup
Restoring etcd Snapshot
-
Stop kube-apiserver: Stop the Kubernetes API server so that it does not interfere with the restore process.
systemctl stop kube-apiserver -
Restore etcd:
ETCDCTL_API=3 etcdctl snapshot restore /path/to/backup/snapshot.db \ --data-dir /var/lib/etcd-from-backup -
Update etcd data directory: Ensure the
etcdservice is configured to use the restored directory (/var/lib/etcd-from-backup).- Edit
/etc/kubernetes/manifests/etcd.yamland update the data directory to point to the restored location.
- Edit
-
Restart the kube-apiserver:
systemctl start kube-apiserver
Restoring Persistent Volumes (PV)
- Velero restore:
-
List available backups:
velero backup get -
Restore from backup:
velero restore create --from-backup <backup-name>
-
Key Considerations for Cluster Recovery
- Disaster recovery: Document all backup and recovery processes and periodically test them to ensure they work when needed.
- Regular audits: Ensure that the backup files and snapshots are regularly tested and validated.
- Security of backups: Ensure backups (etcd snapshots, persistent volume snapshots) are encrypted and access-controlled to avoid tampering or unauthorized access.