Files
infra/ai-memory/project_jarvis_todo.md

275 lines
14 KiB
Markdown

---
name: project-jarvis-todo
description: "Master JARVIS TODO — agent deployment, self-healing, Windows service, HA integration, all outstanding work"
metadata:
node_type: memory
type: project
originSessionId: e8442c3a-86d9-4b82-8f6d-071acd19159a
---
# JARVIS Master TODO
Last updated: 2026-06-29
---
## ✅ FIXED THIS SESSION (2026-06-29) — ROUND 2
- [x] **HA Poller deployed on VM211**`jarvis-ha-poller.py` running as `jarvis-ha-poller.service`. Polls HA at `http://10.48.200.97:8123` every 30s. 241 entities now pushing (lights, switches). Token expires 2033.
- [x] **Missing DB tables created**`tasks`, `appointments`, `usage_patterns` tables added. Fixed `registered_agents` enum to include `windows`/`macos` and `version` column.
- [x] **schema.sql updated** — DB schema dumped from live VM211 to `db/schema.sql`, now includes all tables.
- [x] **ha.php domain filter** — Added `camera`, `siren`, `remote`, `todo`, `lawn_mower` to `$skipDomains`. Only `light` and `switch` (plugs) show in HOME tab.
- [x] **web.orbishosting.com fixed** — Root cause: Epson printer had ARP conflict at 10.48.200.200 (NPM's IP). FortiGate VIP for HTTP/HTTPS correctly forwards to 10.48.200.200. Fixed by: (1) bouncing NPM's eth0 to send gratuitous ARP, (2) setting permanent ARP entry on PVE1. https://web.orbishosting.com → NPM → NovaCPX (returns 401 Basic Auth — NovaCPX site auth).
- [x] **PVE1 static ARP**`/etc/network/if-up.d/npm-static-arp` persists `10.48.200.200 → BC:24:11:67:1D:47` across reboots.
## ✅ FIXED THIS SESSION (2026-06-29) — ROUND 1
- [x] **JARVIS API not executing PHP** — nginx `location ^~ /api` (no trailing slash) was intercepting `/api.php` and serving it as static text. Fixed to `location ^~ /api/`. PHP-FPM at `/run/php/php8.3-fpm.sock` confirmed working. `/api/ping` now returns JSON.
- [x] **api.php backward-compat path normalization** — Added path rewrite so old `/api/endpoints/agent.php` format routes to the `agent` endpoint. Agents on old configs can now register.
- [x] **DB schema: version + windows/macos** — Added `version VARCHAR(32)` to `registered_agents`; expanded `agent_type` enum to `linux|homeassistant|proxmox|windows|macos`.
- [x] **Ollama models in config.php**`OLLAMA_MODEL_PRIMARY` and `OLLAMA_MODEL_HEAVY` both set to `llama3.1:8b`. VM106 Ollama upgraded to 32GB RAM, 8 cores.
- [x] **Windows agent installer** — Created `install-windows.ps1`: one PowerShell command installs Python, pywin32, downloads agent, creates config, registers+starts Windows Service. No open PowerShell needed after install.
- [x] **Linux installer URL**`install.sh` was hardcoded to `http://10.48.200.211` (LAN only). Fixed to default `https://jarvis.orbishosting.com`. LAN installs override with `JARVIS_URL=http://10.48.200.211`.
---
## 🔴 CRITICAL — Outstanding
### A. Epson Printer IP Conflict — NEEDS PERMANENT FIX
Epson printer keeps taking 10.48.200.200 (NPM's static IP). Temporary fix: PVE1 has static ARP + gratuitous ARP from NPM. **Real fix**: assign Epson printer a different static IP in its web admin (find printer IP when it comes back up, log in to its config page, set static IP ≠ 10.48.200.200). DHCP reservation in FortiGate DHCP server for printer's MAC also works.
### B. Windows Agent on Myron's Desktop — NEEDS ADMIN POWERSHELL
Run this in an **Admin PowerShell** (not Claude Code terminal):
```powershell
$env:JARVIS_REG_KEY='f846a9aaf7ce9a61742c63c87c4186052a71d2a580c65518'
& 'C:\Users\myron\repos\jarvis\public_html\agent\install-windows.ps1'
```
After install: `Get-Service JARVISAgent` should show Running.
### C. web.orbishosting.com NovaCPX 401
Site routes correctly to NovaCPX but returns `401 Basic realm="Blair HQ"`. Need to check CyberPanel on NovaCPX (10.48.200.110) — either the website isn't created for web.orbishosting.com, or HTTP auth is enabled on the default site. Access CyberPanel at https://10.48.200.110:8090 to check.
### D. Re-install JARVIS HA Custom Component (VM109 rebuilt)
```bash
# From PVE1, get HA terminal or use Proxmox console for VM109:
# Copy from JARVIS server to HA config:
ssh root@10.48.200.211 'tar czf /tmp/ha-component.tgz -C /var/www/jarvis ha-component'
# Then on HA VM or via PVE1 -> VM109 console:
# mkdir -p /config/custom_components
# tar xzf ha-component.tgz -C /config/
# Restart HA
```
After restart: `ha_entities` should fill. Also restore HA backup file ID `1mLE1S9dSvxl0RYQnCt020WT-UZnQuxqP` from Google Drive via HA UI.
### B. Push to GitHub + Verify Auto-Deploy
The fixes in this session need to be committed and pushed so VM211 picks them up (webhook deploy):
```bash
cd C:\Users\myron\repos\jarvis
git add -A && git commit -m "Fix nginx PHP, API paths, Windows installer, install.sh URL"
git push
```
After push: verify VM211 auto-deployed (`journalctl -u jarvis-deploy -n 20` on VM211).
### C. Install Agent on Ollama VM106 (10.48.200.210) — new VM, no agent
```bash
# From PVE1:
ssh root@10.48.200.210 "curl -sk https://jarvis.orbishosting.com/agent/install.sh | bash -s ollama106 linux"
```
---
## 🟠 HIGH — Deploy Agents to All Hosts
**Linux/Proxmox (LAN) one-liner:**
```bash
JARVIS_URL=http://10.48.200.211 curl -sk https://jarvis.orbishosting.com/agent/install.sh | bash -s <hostname> <linux|proxmox>
```
**Linux (External/DO):**
```bash
curl -sk https://jarvis.orbishosting.com/agent/install.sh | bash -s <hostname> linux
```
**Windows (Admin PowerShell):**
```powershell
$env:JARVIS_REG_KEY='f846a9aaf7ce9a61742c63c87c4186052a71d2a580c65518'
irm https://jarvis.orbishosting.com/agent/install-windows.ps1 | iex
```
**Mac:**
```bash
curl -sk https://jarvis.orbishosting.com/agent/install-mac.sh | bash -s -- --key f846a9aaf7ce9a61742c63c87c4186052a71d2a580c65518
```
**Deployment status table:**
| Host | IP | Type | Status | Action Needed |
|------|-----|------|--------|--------|
| PVE1 | 10.48.200.90 | proxmox | ❓ check | Verify in Workers tab |
| PVE2 | 10.48.200.91 | proxmox | ❓ check | Verify in Workers tab |
| JARVIS VM211 | 10.48.200.211 | linux | ❓ check | Self-monitors |
| Ollama VM106 | 10.48.200.210 | linux | ❌ missing | Install now (see Critical C) |
| Jellyfin VM112 | 10.48.200.33 | linux | ❓ check | Verify in Workers tab |
| MediaStack VM103 | 10.48.200.35 | linux | ❓ check | Verify in Workers tab |
| HomeBridge VM118 | 10.48.200.18 | linux | ❓ check | Verify in Workers tab |
| NovaCPX VM120 | 10.48.200.110 | linux | ❓ check | Verify in Workers tab |
| SynchroNet VM100 | 10.48.200.50 | linux | ❌ missing | Install if SSH accessible |
| NetworkBackup | 10.48.200.99 | linux | ❓ check | Verify in Workers tab |
| HA VM109 | 10.48.200.97 | homeassistant | ❌ missing | See Critical A |
| CT110 WireGuard | 10.48.200.67 | linux | ❌ missing | apk add python3, install agent |
| DO Server | 165.22.1.228 | linux | ❓ check | Verify in Workers tab |
| Myron's Desktop (this PC) | DHCP | windows | ❌ missing | Run install-windows.ps1 |
| Windows VM104 | DHCP | windows | ❌ missing | Run install-windows.ps1 |
| mini_it12 | 10.48.200.87 | windows | ❌ offline | Last seen 2026-06-12, re-install |
| Mac machines | any | macos | ❌ missing | Run install-mac.sh |
---
## 🟠 HIGH — Self-Healing Details
**Linux** — systemd `Restart=always` + `RestartSec=10`. Already built into install.sh service file.
**Windows** — After install, run once to add recovery actions:
```powershell
sc.exe failure JARVISAgent reset=86400 actions=restart/10000/restart/30000/restart/60000
```
**Self-update** — All agents auto-update every 24h. Push new agent code to GitHub → VM211 deploys → all agents pick up update within 24h. For urgent: JARVIS Admin → Workers → UPDATE button per agent.
**Alpine/OpenRC** (CT110):
```sh
# /etc/local.d/jarvis-agent.start
nohup sh -c 'while true; do python3 /opt/jarvis-agent/jarvis-agent.py; sleep 10; done' &
```
---
## 🟠 HIGH — Windows Agent Compiled Executable
Current installer works (installs Python silently) but a standalone `.exe` is cleaner for restricted machines.
Build steps (run on any Windows machine with Python):
```powershell
pip install pyinstaller pywin32
pyinstaller --onefile --hidden-import=win32timezone --hidden-import=win32security `
C:\Users\myron\repos\jarvis\agent\jarvis-agent-windows.py -n jarvis-agent
# Upload dist/jarvis-agent.exe to VM211:/var/www/jarvis/public_html/agent/
```
Then update `install-windows.ps1` to download the exe and use `.\jarvis-agent.exe --startup auto install`.
---
## 🟡 MEDIUM — HA Integration (VM109)
Full post-rebuild checklist:
- [ ] Restore Google Drive backup (`1mLE1S9dSvxl0RYQnCt020WT-UZnQuxqP`)
- [ ] Re-install JARVIS HA component (see Critical A)
- [ ] Install Tailscale addon (Settings → Add-ons → Search Tailscale)
- [ ] Resize disk: power off → `qm resize 109 sata0 +118G` → boot → `ha os datadisk resize`
- [ ] Verify `HA_URL=http://orbisne.fortiddns.com:8123` in config.php
- [ ] After component reinstall: verify `ha_entities` fills (~2587 rows)
**HA HOME tab filter** — after backup restore, audit `ha.php` `$skipKeywords` to ensure Tuya/TP-Link/Z-Wave switches aren't being filtered out.
---
## 🟡 MEDIUM — JARVIS Server Health Checks
```bash
# SSH to VM211:
# Verify cron jobs
crontab -l
# Should have: */5 * * * * php /var/www/jarvis/api/endpoints/stats_cache.php
# */3 * * * * php /var/www/jarvis/api/endpoints/facts_collector.php
# Arc Reactor status
systemctl status jarvis-arc
curl -s http://10.48.200.211:7474/health
# Backups
ls -lh /var/backups/jarvis/
# JARVIS ping
curl -s https://jarvis.orbishosting.com/api/ping
```
---
## 🟢 LOW — FortiGate DNS + Synology Reverse Proxy
Adds `.lan` domain access. Full instructions in `project_infra_todo.md`.
Key entries: jarvis.lan → VM211:80, proxmox.lan → 10.48.200.90:8006, hoa.lan → VM109:8123.
---
## 🔑 Key Constants
| Item | Value |
|------|-------|
| Registration key | `f846a9aaf7ce9a61742c63c87c4186052a71d2a580c65518` |
| JARVIS URL (external) | `https://jarvis.orbishosting.com` |
| JARVIS LAN | `http://10.48.200.211` |
| HA Token | `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...` (in config.php, expires 2026) |
| Proxmox API token | `root@pam!jarvis=c45b5feb-f9a9-445d-a626-14fbb959f78b` |
---
## 🔴 OPEN (pre-existing)
- [ ] **HA HOME tab — show all lights, plugs, switches** — currently 17 lights show but only 3 useful switches (Sirens, CEC Scanner, ESPHome) pass the filter. Need to audit what real smart plug/switch entities exist in HA (Tuya, TP-Link Tapo, Z-Wave, etc.) and ensure they appear in the JARVIS HOME tab. The `$skipKeywords` filter in `api/endpoints/ha.php` strips camera/HACS junk — verify real device switches aren't accidentally filtered. Also check if HA custom component is syncing all entity domains.
- [ ] **Claude API credits** — image-based Vision Protocol (`handle_screenshot` with actual PNG) uses `claude-opus-4-8`. Top up at console.anthropic.com if vision fails. Guardian/SITREP now on Groq so those no longer drain credits.
- [x] **HA agent persistence** — SOLVED. The HA custom component (`jarvis_agent`) runs inside HA's Docker container (IP 172.30.32.1) and auto-starts with HA. Online and healthy. The old standalone Python script at 10.48.200.97 is obsolete.
---
## ✅ COMPLETED (2026-06-18 session)
- [x] JARVIS fully migrated DO → PVE1 VM 211 (8c/16GB, nginx/PHP8.3/MariaDB/Redis/Arc Reactor)
- [x] All 3 new VMs on PVE1: JARVIS-211, NPM-200, Ollama-210 (IP changed from .95 → .210, Reolink owns .95)
- [x] Tailscale on all key hosts — full mesh, all 13+ nodes documented
- [x] FortiGate VIPs updated to 97.247.237.97; JARVIS on port 1972, HOA on 8123
- [x] NPM running at http://10.48.200.200:81 (Docker, proxy hosts for hoa/novacpx)
- [x] JARVIS backups — /var/backups/jarvis/, daily cron 7am UTC (2am CDT), 7-day retention
- [x] Jellyfin agent — v3.1 online at 10.200.0.3 via PVE1 password SSH
- [x] HA agent — v3.1 running on HA VM at 10.48.200.97 via HA web terminal
- [x] PVE1 ImageMagick — installed
- [x] Vision screenshots tab — already existed in admin under Arc Reactor → Vision Protocol
- [x] MediaStack SSH key — DO key generated and copied to MediaStack via PVE1 hop
- [x] Workers terminology — already "JARVIS AGENT WORKERS" in admin
- [x] install-agent.sh — default URL updated to http://10.48.200.211
- [x] Website webhooks — all 7 repos → tomtomgames.com/webhook.php (DO); JARVIS repo → port 1972
- [x] webhook.php fixed (broken define from bad sed), synced to DO
- [x] Service monitor updated — nginx/php-fpm/mariadb/redis/arc/agent (not OLS-era services)
- [x] DO server WEB HOST block on front page — shows DO CPU/RAM/DISK via Tailscale agent
- [x] Network devices cleaned up — 23 named devices, all infrastructure labeled
- [x] Agent configs fixed — all 8+ agents pointing to new JARVIS VM, correct watch_services per host
- [x] Facts collector fixed — external site checks, correct column names, proper timeouts
- [x] CLAUDE.md + INFRASTRUCTURE-REFERENCE.md fully updated and pushed
---
## ✅ COMPLETED (2026-06-17 session)
- [x] Phase 3 JS modularization — jarvis-protocols.js split into 3 panel files
- [x] Guardian/SITREP/Vision text → Groq (image analysis stays Claude)
- [x] kb_facts freshness fix (storeFact updated_at, $fresh() column name)
- [x] Site health local loopback fix
- [x] Cloudflare Rocket Loader fix (face-api.js data-cfasync)
- [x] Session fix (api.php skip logic too broad)
- [x] JS syntax fix (apostrophe in face-api label)
- [x] All 8 online Linux/Proxmox agents → v3.1 with version in heartbeat
---
## ✅ COMPLETED (2026-06-12 session)
- [x] Agent v3.1 Linux, v3.0 Windows, v3.0 macOS
- [x] Workers tab VERSION column + agentUpdateFlow
- [x] Arc Reactor systemd active, tracked in git
- [x] All 13 "make it stand out" improvements
- [x] MediaStack NAS migration