How I Built a Full Status Alert System
Website uptime, Supabase health, API monitoring, Garmin failure alerts, Fail2ban reports, personal changelog, and a weekly summary email — all automated from a headless home server
Ingredients
- Headless Linux server — the old laptop from the previous two posts, running 24/7 (already set up)
- cron — Linux task scheduler, built-in (free)
- Resend — email API for all alert delivery (free tier: 3,000 emails/month)
- Claude Code — terminal AI for writing every script ($200/yr)
- healthchecks.io — dead-man’s switch that alerts when a scheduled job stops running (free tier)
- Supabase project — for the Numerator game database health check (already set up)
The Problem: A Server You Can’t See
A headless server is quiet by design. That’s the point — it runs in the background, lid closed, in another room. But quiet also means invisible. If joseandgoose.com goes down at 2am, I won’t know until someone tells me. If Supabase has an outage and my contact form stops saving submissions, I’ll find out when I check the database manually (which I never do). If the Garmin recap cron job silently fails, I get no email and no clue.
The solution isn’t to check things manually — that defeats the purpose of automation. The solution is to make the system tell you when something is wrong. Every important job should either succeed quietly or fail loudly. Here’s the full stack:
- Alert 1: Website Uptime Monitor — checks joseandgoose.com every 5 minutes
- Alert 2: Garmin Recap Failure Check — dead-man’s switch if the 7am recap doesn’t run
- Alert 3: Nightly Fail2ban Ban Report — daily delta of new SSH attack IPs blocked
- Alert 4: Supabase Health + GitHub Activity — Sunday database ping and weekly code activity
- Alert 5: Personal Server Changelog — Claude writes a plain-English weekly standup
- Alert 6: Weekly Status Report Email — everything in one Sunday morning digest
- The Meta-Alert: healthchecks.io — alerts if the server itself goes offline
Alert 1: Website Uptime Monitor
The most basic question: is joseandgoose.com responding? A curl request every 5 minutes, checked against an expected HTTP status code. If it returns anything other than 200, send an alert.
🔧 Developer section: Uptime monitor script
curl -s -o /dev/null -w "%{http_code}" https://joseandgoose.com— gets the HTTP status code silently- If status ≠ 200: sends a Resend email with the status code, timestamp, and a note to check Vercel logs
- If status = 200: logs the timestamp quietly to
~/.system-reports/uptime.logwith no email - Cron:
*/5 * * * *— runs every 5 minutes, 288 checks per day - Also logs to a daily CSV so I can spot patterns (slow responses during peak hours, etc.)
In the first month of running: two downtime events. One was a Vercel deployment that briefly returned a 503 during a cold start. One was my own fault — a broken build that I caught within 5 minutes because the alert email beat me to it.
Add a cooldown: only alert once per hour per incident. If the site is still down an hour later, send another. One alert per incident is actionable; a flood of them is just noise.
Alert 2: Garmin Recap Failure Check
The Garmin recap runs at 7am. By 8am, a recap file should exist for today. If it doesn’t, something broke overnight — and I should know before I’ve been waiting all day for a recap email that’s never coming.
🔧 Developer section: Garmin failure check script
- Runs at 8:00am daily:
0 8 * * * - Checks if
~/.garmin-recap/recaps/garmin-recap-YYYY-MM-DD.mdexists for today - If it exists: silent pass, no email
- If it’s missing: sends an alert email with the last 20 lines of the recap log
- Alert subject: "⚠️ Garmin Recap Failed — [DATE]"
This is a dead-man’s switch pattern: instead of the job alerting on success, a second job alerts on missing success. It catches silent failures — crashes, auth errors, network timeouts — that don’t generate their own error output.
Alert 3: Nightly Fail2ban Ban Report
Fail2ban bans IPs automatically, but I wanted a daily snapshot: how many new IPs got banned today? Is that number trending up (could indicate a targeted scan) or holding steady (normal background noise)?
🔧 Developer section: Fail2ban report script
sudo fail2ban-client status sshd— outputs total banned count- Script reads the current total, compares to yesterday’s saved count
- Calculates: new bans = current total − previous total
- Saves today’s count to
~/.system-reports/fail2ban-lastcount.txtfor tomorrow - Sends an email: "🛡️ Fail2ban Daily Report — X new bans today (Y total)"
- Cron:
0 19 * * *
The delta matters more than the total. A large cumulative count after weeks of running is expected. An unusual spike in a single day is worth investigating.
Alert 4: Supabase Health + GitHub Activity (Sunday)
Two separate checks that share a Sunday timeslot because they’re both weekly sanity checks rather than urgent alerts:
🔧 Developer section: Supabase health check
- Runs a simple query against the Supabase REST API: count rows in the
contactstable - If the API returns a valid response: logs the count, no email
- If it errors (503, timeout, auth failure): sends an alert with the error body
- Also checks the
numerator_roundstable — confirms the game database is live - Uses the Supabase service role key from
.env.local
🔧 Developer section: GitHub activity report
- Calls the GitHub API:
GET /users/joseandgoose/events - Filters for the past 7 days of events: pushes, PRs, issues, stars
- Formats into a short summary and includes in the Sunday report email
- Uses a personal access token from the env file (read-only, public repo scope)
Alert 5: Personal Server Changelog (Sunday)
Every Sunday morning, Claude writes a short narrative of what the server did that week. It’s not a metrics dump — it’s a 3–5 sentence changelog in plain English, like a standup from the server to me.
🔧 Developer section: Claude-generated changelog
- Script collects stats: Garmin recaps generated this week, new Fail2ban bans, AI jobs completed, uptime log entries, disk usage, site downtime events
- Passes stats to
claude -p "..."with a prompt asking for a casual, 3–5 sentence changelog - Claude output is saved to a temp file and folded into the Sunday weekly report
- Cron:
0 7 * * 0— 7am Sunday, runs before the 9am weekly report email so it’s ready
A recent example output from Claude:
Claude writes the server’s weekly standup. No log-diving required.
Alert 6: Weekly Status Report Email
All the pieces come together in one Sunday email: changelog, Supabase health, GitHub activity, Fail2ban weekly total, disk space, and a resource summary. It’s the one email that tells me everything about the past week without opening a terminal.
🔧 Developer section: Weekly report assembly
- Runs at 9am Sunday — after all the 7am and 8am jobs have completed
- Reads the Claude changelog from the temp file generated at 7am
- Reads the Supabase and GitHub outputs from the 8am jobs
- Pulls resource stats:
df -hfor disk, last CSV entry fromresources-YYYY-MM-DD.csv - Assembles into an HTML email via Resend API
- Subject: "🖥️ Server Weekly — [WEEK OF DATE]"
The Meta-Alert: healthchecks.io
There’s one failure mode none of the above covers: what if the server itself goes down? If the machine crashes, no cron runs, no emails send, and I notice nothing until I happen to SSH in. The solution is a dead-man’s switch hosted externally.
🔧 Developer section: healthchecks.io heartbeat
- Free account at healthchecks.io creates a unique URL — the "check"
- If the URL isn’t pinged within a set interval, healthchecks.io sends a failure alert
- Added to cron: every 30 minutes,
curl -s https://hc-ping.com/[uuid]sends a heartbeat - If the server goes offline for 30+ minutes, I get an email from healthchecks.io
- Cron:
*/30 * * * *
Everything else I built monitors from the server. healthchecks.io monitorsthe server — from outside. It’s the only alert that can fire when the machine itself is unreachable. Without it, a power outage or crash is invisible until you notice the silence.
The Full Cron Schedule
Everything running on a single crontab:
🔧 Developer section: Complete cron schedule
* * * * *— AI job queue worker (processes inbox/*.txt files)*/5 * * * *— site uptime monitor (joseandgoose.com)*/10 * * * *— resource sampler (CPU, memory → CSV)*/30 * * * *— healthchecks.io heartbeat0 7 * * *— Garmin recap generation0 8 * * *— Garmin failure check (alerts if no recap file)0 8 * * 1-5— daily market briefing email0 19 * * *— Fail2ban nightly ban report0 7 * * 0— Claude personal changelog generation0 8 * * 0— Supabase health + GitHub activity0 9 * * 0— weekly status report email0 23 * * 6— Lynis security audit (Saturday night)0 8 * * 1— disk snapshot0 1 * * 0— log archiver (weekly)0 2 * * 0— security apt upgrades + journal vacuum0 2 1 * *— apt autoremove/autoclean (monthly)0 4 1 * *— scheduled monthly reboot
Final Output
The server now manages itself. I never log in to check if things are running. I get emails when something is wrong, and I get a weekly report that tells me everything is fine. The no-email state is the good state.
- Site downtime → email within 5 minutes
- Garmin recap failure → email by 8am
- Claude API credits exhausted → email immediately on failure
- SSH login from outside LAN → email within 3 seconds
- Server offline → healthchecks.io alert within 30 minutes
- Everything working fine → one Sunday morning summary email
What went fast
- Each individual script — Claude Code wrote every bash script from a plain-English description. Uptime monitor: 15 minutes. Fail2ban report: 20 minutes. Each one is simple; the value comes from having all of them running together.
- Resend API reuse — same API key, same sender, same pattern for every email. Once the first alert email worked, every subsequent one took 5 minutes to wire up.
- Cron scheduling —
crontab -e, paste a line, save. Linux cron is reliable and dead-simple for time-based jobs.
What needed patience
- Alert fatigue tuning — initial versions sent too many emails. Had to add cooldowns, state files (to remember last-alerted time), and delta calculations (ban count delta, not total). Getting the signal-to-noise ratio right took iteration.
- Cron environment — cron jobs run in a minimal shell environment without your normal PATH. Scripts that work fine in a terminal can silently fail in cron because
python3,node, orclaudecan’t be found. Fix: explicitly setPATHat the top of every cron command or in the crontab header. - Sunday job ordering — the weekly report at 9am depends on outputs from the 7am and 8am jobs. If any upstream job is slow, the 9am job reads an empty file. Added fallback messages ("data unavailable") for each section so the report always sends even if one input is missing.
- healthchecks.io setup — the concept clicked immediately; finding the right "grace period" setting (how long to wait before alerting) took some tuning. Too short (5 minutes) and a brief network hiccup triggers a false alarm. 35 minutes works well for a 30-minute heartbeat interval.
The hardest part of a home server isn’t setting it up — it’s knowing what’s happening on it without babysitting it. This alert stack is the answer. Every meaningful event surfaces as an email. Everything else is silence, which is good news.