Three issues caused periodic worker saturation:
1. Network section pinged 5 private LAN IPs (10.48.200.x) unreachable
from DO — each failed after 1s timeout = 5s wasted per run.
Replaced with a fast DB query on registered_agents.
2. pve_api_get() had no CURLOPT_CONNECTTIMEOUT — added 3s limit so
unreachable Proxmox fails fast instead of blocking the full 8s.
3. Ollama curl timeout reduced from 5s→3s total, added 2s connect limit.
Cron interpreter also changed from lsphp85 to php8.3 in crontab
(done directly on server) — lsphp85 adds ~8s LSAPI startup overhead
and consumes a PHP worker slot; php8.3 runs standalone.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
facts_collector was checking https://jarvis.orbishosting.com from the
DO server itself — traffic routes through Cloudflare CDN which can
return 524 timeouts. All sites are hosted on this same OLS instance,
so check via http://127.0.0.1 with a Host header instead. This gives
direct OLS response without CDN overhead.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ON DUPLICATE KEY UPDATE was not touching updated_at, so if a site's
status didn't change MySQL never fired the ON UPDATE trigger and the
row timestamp stayed 6 days stale. do_server.php's 15-min freshness
window then returned empty sites.
Also fixes $fresh() querying WHERE fact_category= (non-existent column)
instead of WHERE category=, which always returned no rows.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Proxmox: skip if data < 10 min old (was: every 3 min unconditionally)
- Ollama: skip if data < 15 min old (model list rarely changes)
- Site health: skip if data < 5 min old (was: 7 HTTP calls every 3 min)
- Home Assistant: removed entirely (HA agent pushes 212 entities every 30s)
Fast local reads (CPU/mem/network pings) still run every 3 min. External
HTTP calls now fire only when data is actually stale. Saves ~140 site-check
HTTP calls/hour and ~60 Proxmox API calls/hour in steady state.
- deploy/jarvis-watchdog.sh: self-healing watchdog (every 5 min)
* monitors lsws/mysql/redis, restarts on failure
* JARVIS HTTP self-check, restarts OLS on 5xx
* disk/memory alerts inserted to DB
* offline Proxmox VM agents restarted via qm guest exec
* log rotation (1000 line cap)
- deploy/jarvis-deploy.sh: smart deploy with PHP validation
* php8.3 syntax check on every changed .php file
* auto-reverts git commit + inserts critical alert on syntax error
* reloads OLS after JARVIS deploys
- api/endpoints/facts_collector.php: site health monitoring
* curls all 7 managed sites every 3 min
* stores up/down status in kb_facts
- api/endpoints/alerts.php: auto-heal + site alerts
* dispatches restart_service commands when services down on agents
* generates alerts from kb_facts site health data
- public_html/install-agent.sh: one-liner Linux agent installer
* installs deps, downloads agent, registers with JARVIS, sets up systemd
- public_html/webhook.php: fixed infra deploy path to /opt/infra
- 4-tier chat: HA control → Ollama → Groq → Claude
- Push-based agent system with heartbeat/metrics
- Network monitoring, alerts, Proxmox, Home Assistant
- Windows + Linux agent installers
- Stats cache cron, facts collector, KB engine