Initial: backup/restore scripts, README

- backup.sh: weekly cron script for PVE1+PVE2
- restore.sh: interactive disaster recovery wizard
- README.md: step-by-step recovery guide for both nodes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-06-09 03:42:39 +00:00
commit 04fc9941ff
4 changed files with 649 additions and 0 deletions
+176
View File
@@ -0,0 +1,176 @@
# Proxmox Config Backup & Restore
Automated weekly backup of PVE1 and PVE2 configuration to GitHub.
In a host failure, this gets you back up in 1530 minutes.
---
## Repository Structure
```
proxmox-config/
backup.sh ← runs on each node weekly, commits changes
restore.sh ← interactive restore on a fresh Proxmox install
README.md ← this file
shared/ ← cluster-wide configs (backed up from PVE1)
storage.cfg — storage definitions (local-lvm, GoFlex, ProxBackups)
datacenter.cfg — cluster settings
user.cfg — users and API tokens
corosync.conf — cluster membership
jobs.cfg — backup job schedules
vzdump.cron — vzdump cron config
firewall/ — cluster firewall rules
ha/ — HA group/resource configs
nodes/
pve/
qemu-server/ — PVE1 VM .conf files (100120)
lxc/ — PVE1 LXC container configs (110)
pve2/
qemu-server/ — PVE2 VM .conf files (106, 302)
pve/ ← PVE1-specific (hostname: pve, 10.48.200.90)
network/ — /etc/network/interfaces
cron/ — root crontab
scripts/ — custom shell/python scripts
systemd/ — custom service units
jarvis-agent/ — JARVIS monitoring agent config
ssh/ — authorized_keys (no private keys)
pve2/ ← PVE2-specific (hostname: pve2, 10.48.200.91)
(same structure)
```
---
## Backup Schedule
Both nodes run `backup.sh` via cron every **Sunday at 3:00 AM**.
```
0 3 * * 0 /usr/local/bin/proxmox-backup >> /var/log/proxmox-backup.log 2>&1
```
The script:
- Pulls latest from GitHub (handles both nodes committing)
- Collects all config files listed above
- Skips large binaries (ollama, filebrowser — see reinstall notes below)
- Skips private keys (`/etc/pve/priv/`, SSH private keys)
- Commits with a timestamp and pushes
---
## Disaster Recovery — Host Failure
### Scenario A: PVE1 (primary) fails
PVE1 hosts most VMs (100113, 118, 120). Storage for those VMs is on `local-lvm` (PVE1's LVM thin pool) or `GoFlex` NAS. VMs on local-lvm **are at risk** if the disk fails — restore from PBS.
**Steps:**
1. **Install fresh Proxmox** on replacement hardware.
- Set IP to `10.48.200.90` during install, or set after.
- Set hostname to `pve`.
2. **Clone this repo:**
```bash
apt install -y git
git clone https://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@github.com/myronblair/proxmox-config.git /opt/proxmox-config
```
3. **Run the restore script:**
```bash
bash /opt/proxmox-config/restore.sh pve1
```
The script is interactive — confirm each section as it goes.
4. **Reboot** to apply network config:
```bash
reboot
```
5. **Restore VMs from PBS** (Proxmox Backup Server at `10.48.200.85`):
- Log in to PVE web UI: `https://10.48.200.90:8006`
- Datacenter → Storage → Add → Proxmox Backup Server
- Server: `10.48.200.85`
- Datastore: `PBSBackup`
- Username: `root@pam`
- Fingerprint: see `shared/pbs_fingerprint.txt`
- For each VM: select the backup → Restore
6. **Reinstall large binaries** (not in git — too large):
```bash
# Ollama (local LLM on VM 210 inside this host)
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3.2
# File Browser
curl -fsSL https://raw.githubusercontent.com/filebrowser/get/master/get.sh | bash
# Config is at /root/.filebrowser.json or wherever it was set
```
7. **Start services:**
```bash
systemctl start jarvis-agent filebrowser
```
---
### Scenario B: PVE2 (secondary) fails
PVE2 hosts VM 106 (unknown) and VM 302 (NetworkBackup). PVE2 uses `local-lvm` for storage.
**Steps:**
1. **Install fresh Proxmox**, set IP `10.48.200.91`, hostname `pve2`.
2. **Clone repo and run restore:**
```bash
apt install -y git
git clone https://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@github.com/myronblair/proxmox-config.git /opt/proxmox-config
bash /opt/proxmox-config/restore.sh pve2
```
3. **Join the cluster from PVE1:**
```bash
# On PVE1:
pvecm add 10.48.200.91
# This auto-syncs all cluster state to PVE2
```
4. **Restore VMs 106 and 302 from PBS.**
5. **Reboot PVE2**, then verify cluster: `pvecm status`
---
## What Is NOT Backed Up Here
| Item | Where to find it |
|------|-----------------|
| VM disk images | Proxmox Backup Server (10.48.200.85) — runs nightly |
| /etc/pve/priv/ | Private CA key, auth keys — DO NOT store in git; regenerated by Proxmox on fresh install |
| SSH private keys | /root/.ssh/id_rsa — regenerate or restore from secure storage |
| ollama binary | Reinstall via install script (see above) |
| filebrowser binary | Reinstall via get.sh (see above) |
| blockalign, urbackupclientctl | Reinstall from their respective sources if needed |
---
## Nodes Quick Reference
| Node | Hostname | IP | Port | Role |
|------|----------|----|------|------|
| PVE1 | pve | 10.48.200.90 | 8006 | Primary — most VMs |
| PVE2 | pve2 | 10.48.200.91 | 8006 | Secondary — VM 106, 302 |
| PBS | — | 10.48.200.85 | 8007 | Backup server |
Proxmox web login: `root` / `Joker1974!!!`
FortiGate DDNS (PVE1 from internet): `orbisne.fortiddns.com`
---
## Manual Backup Trigger
To run a backup immediately on either node:
```bash
/usr/local/bin/proxmox-backup
tail -f /var/log/proxmox-backup.log
```