2026-04-11 · 11 min

Why my production agent runs on a Mac mini

$599 of aluminum has outrun three cloud VM vendors for fifteen months. Here's the math, the tradeoffs, and the failure mode nobody talks about.

infra · agents · indie

It was 3:17 AM on a Tuesday in October 2024. A billing alert from Fly.io woke me up. My main production service, the Conway Automaton, had scaled to six instances and burned through half its monthly budget in four hours. A traffic spike, probably a bot, had triggered the autoscaler. I killed the extra machines, banned the IP block, and went back to bed, but not to sleep. The incident wasn't catastrophic, but it was a sharp reminder that I was renting a system whose failure modes included draining my bank account.

The next morning, I drove to the Apple Store in Xinyi A13 and bought the base model Mac mini. By evening, the Conway Automaton was running in a tmux session on a machine I could see and touch. The total cost was $599, plus another $20 for a USB-C to Ethernet adapter. For the last fifteen months, that single piece of aluminum has served all production traffic without a single billing alert. It's faster, more capable, and paradoxically, more reliable for my use case than any VM I've ever rented. This isn't a Luddite's rant against the cloud; it's a pragmatic look at the numbers and the physics of running a small, stateful service.

The boring numbers

Before we get into the technical arguments, let's look at the monthly cost. I've run this agent on three different cloud providers before settling on the Mac mini. The workload is a single, stateful Go binary that needs about 4GB of RAM to stay warm and serves a few hundred requests per minute, with occasional CPU-intensive bursts.

Here are the real, inflation-adjusted monthly costs I was paying or would pay today:

| Provider | Configuration | Monthly Cost (USD) | Notes | | --------------------- | ---------------------------------- | ------------------ | ----------------------------------- | | Fly.io | 2x shared-cpu-1x, 4GB RAM | ~$47 | Excellent DX, but expensive for RAM | | Hetzner Cloud | CPX31 (4 vCPU, 8GB RAM) | ~$12 | The best value in cloud VMs | | Oracle Cloud Free | Ampere A1 (4 OCPU, 24GB RAM) | $0 | Unreliable, can be reclaimed anytime| | Mac mini M4 | 8-core CPU, 16GB RAM, 256GB SSD | ~$17 | $599 purchase amortized over 36mo | | ...plus power | Mac mini M4 Idle/Load | ~$3 | Power cost in Taipei, Taiwan | | Total On-Prem | | ~$20 | Cheaper than Fly, comparable to Hetzner |

The Mac mini, amortized over a reasonable three-year lifespan, is competitive with even the cheapest budget VPS providers. When you factor in the performance differential, it's not even a contest. The cloud providers are charging me for RAM, CPU, and egress. I paid for the Mac's RAM and CPU once, and my residential internet has no egress fees.

What a Mac mini has that VMs don't

Cost isn't the whole story. If it were, I'd be fighting for scraps on Oracle's free tier. The M-series Mac mini has four distinct architectural advantages for my specific workload that are either unavailable or prohibitively expensive in the cloud.

Unified Memory (16GB): This is the killer feature. The agent keeps a large dataset in memory. On a cloud VM, if the process restarts or the machine reboots, it has to pull that data from disk, resulting in a slow, cold start. On the M4, the 16GB of unified memory is directly accessible by the CPU and GPU cores with massive bandwidth. There's no swapping, no NUMA nodes, just a flat, fast pool of memory. The agent's state is always hot. A restart takes milliseconds because the OS can keep the necessary data cached. You can't buy this performance on a $12 Hetzner box.
The Neural Engine (ANE): My agent performs a simple triage step on incoming data. It used to be a complex set of regex and heuristics. Now, I run a quantized Phi-3-mini model locally on the ANE to classify requests. It's fast, free, and uses a dedicated piece of silicon, leaving the CPU cores free to do the actual work. Running even a small model like this on a cloud CPU would be slow, and renting a GPU instance for this task would cost hundreds of dollars a month. The ANE is a free, power-efficient inference machine.
APFS Snapshots: This is the most underrated feature of running on macOS. Before every agent deploy, a simple shell script runs tmutil snapshot. If the deploy goes bad—a memory leak, a corrupted state—rolling back is not a complex Git revert and redeploy. It's a single command: tmutil restore. I can revert the entire application's working directory to its exact state from five minutes ago, instantly. It's a more robust and faster rollback mechanism than anything I've built with Docker or CI/CD pipelines.
I Own It: The buck, and the bill, stops with me. If my agent enters an infinite loop and starts consuming 100% CPU, it doesn't trigger an autoscaler or rack up a huge bill. It just makes the fan spin a little louder. I can't accidentally DDoS myself into a four-figure invoice. This peace of mind is invaluable for a solo developer. The financial blast radius of any failure is capped at the hardware's purchase price.

What I gave up

This setup is not without its sharp edges. It would be dishonest to pretend this is a universally better solution. I made specific, conscious tradeoffs.

No Horizontal Scaling: There is one box. If it gets overwhelmed, the service slows down. I can't just spin up more instances. The entire architecture is designed around vertical scaling: making the most of one very powerful machine.
Asymmetrical Bandwidth: I have a gigabit fiber connection from Chunghwa Telecom. The download speed is fantastic. The upload speed is... not. It's sufficient for my workload, but if I were serving large files, this would be a non-starter.
Unreliable Power: I live in Taipei. The power grid is generally stable, but typhoons and occasional grid maintenance cause blips. A five-second power outage is a five-second service outage. I don't have a UPS. If uptime were measured in nines, this would be unacceptable. For me, a minute of downtime per month is fine.
Zero Compliance: There is no SOC2, HIPAA, or any other certification. The server is in my apartment. This is fine for my personal projects, but it's a complete dealbreaker for anything handling sensitive customer data or selling to enterprise.

The architecture

The setup is brutally simple. There are no container orchestrators, no service mesh, no serverless functions. It's 2008-era infrastructure with a 2026 twist.

Here's a diagram of the whole system:

                               +-----------------+
                               | The Internet    |
                               +-------+---------+
                                       |
                                       | (HTTPS/443)
                                       |
+--------------------------------------v---------------------------------------+
| Cloudflare                                                                   |
|  - DNS                                                                       |
|  - Caching                                                                   |
|  - WAF                                                                       |
|  - Argo Tunnel                                                               |
+--------------------------------------+---------------------------------------+
                                       |
                                       | (Cloudflare Tunnel)
                                       |
+------------------+                   |
| My Apartment     |                   v
|                  |    +-----------------------------+      +-----------------+
|                  |    | Mac mini M4 (macOS 15)      |      | Hetzner         |
|                  |    |  - Conway Agent (Go binary) |----->| Storage Box     |
|                  |    |  - cloudflared daemon       |      | (rsync backups) |
|                  |    |  - Umami Analytics          |      +-----------------+
|                  |    +-----------------------------+
|                  |                 ^
|                  |                 | (SSH/Admin)
|                  |                 |
|                  |    +-----------------------------+
|                  |    | My Laptop (Tailscale)       |
|                  |    +-----------------------------+
|                  |
+------------------+

Public ingress is handled entirely by a Cloudflare Tunnel. This means I don't need a static IP, and I don't have to open any ports on my router. The cloudflared daemon on the Mac mini establishes a persistent outbound connection to Cloudflare's edge, which then proxies public traffic to it. It's secure and solves the entire NAT traversal problem.

For my own administrative access, I use Tailscale. It creates a zero-config VPN that lets me SSH into the Mac mini from my laptop or phone, wherever I am, without exposing SSH to the internet.

Finally, a simple cron job runs every night, using rsync to back up the agent's state directory and my Umami analytics database to a cheap $4/month storage box at Hetzner. If the Mac mini dies, I can have the service running on a new machine from backup in under an hour.

The failure mode nobody talks about

The M-series chips are incredibly efficient, but they are not magic. Under sustained load, they generate heat. The Mac mini has a fan, but it's designed for an office environment, not a subtropical apartment with no air conditioning.

In July, the ambient temperature in my apartment can reach 34°C (93°F). The first summer I ran the Mac mini, I noticed performance degradation during the afternoon. The agent was getting sluggish. A quick check with powermetrics confirmed it: the CPU was being thermally throttled. The aluminum chassis was hot to the touch. The elegant, silent box was cooking itself.

My solution was not elegant. I bought an $18 AC Infinity USB fan, the kind people use to cool their AV receivers, and placed it on top of the Mac mini. It's set to its lowest speed, and it's just enough to create constant airflow over the chassis. The CPU has not throttled since. It's an ugly hack that doubles the vertical height of the machine, but it works. Physics always wins.

What finally broke

For over a year, the system was flawless. Then, I got cocky. When the developer beta for macOS 15.3 came out, I installed it on the mini. I'm a developer, I told myself, I need the latest SDKs. This was a mistake. My production machine is not my development machine, even if they are the same physical hardware.

Two weeks later, the machine had a kernel panic at 4 AM. It rebooted itself, but the agent's launchd service failed to start because of a permissions change in the new beta. The site was down for three hours until I woke up and saw the alerts. I spent that morning wiping the machine and reinstalling the latest stable version of macOS. Lesson learned: production runs on stable. Always.

Would I do it again?

Yes. Without hesitation. But I'd buy a Mac Studio.

The Mac mini is a phenomenal machine, but the thermal issues in a warm climate are a real constraint. The Mac Studio has a much more substantial cooling system designed for sustained professional workloads. The extra thermal headroom would eliminate the need for my ugly USB fan hack and provide more performance for future growth. The entry-level Studio is more expensive, but the peace of mind would be worth it.

Code: The tiny `launchd` plist that keeps it all running

There's no Docker, no Kubernetes, no systemd. Just a simple XML file in ~/Library/LaunchAgents. This launchd plist ensures the Conway Automaton binary starts at login and restarts automatically if it ever crashes. It's the simplest process supervisor there is.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>win.kieran123.conway-automaton</string>
    <key>ProgramArguments</key>
    <array>
        <string>/Users/kieran/bin/conway-agent</string>
        <string>--config</string>
        <string>/Users/kieran/config/agent.toml</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
    <key>StandardOutPath</key>
    <string>/Users/kieran/logs/agent.log</string>
    <key>StandardErrorPath</key>
    <string>/Users/kieran/logs/agent_error.log</string>
</dict>
</plist>

To use it, I just save this file as win.kieran123.conway-automaton.plist and run launchctl load ~/Library/LaunchAgents/win.kieran123.conway-automaton.plist. That's it. The agent is now a managed service.

The real math: 3-Year TCO

Amortizing the hardware cost is one thing, but let's look at the total cost of ownership (TCO) over three years, which is a realistic lifespan for the hardware. This is for a workload that sits comfortably on one machine, serving maybe 400-500 requests per minute.

| Service | 3-Year Cost Calculation | 3-Year TCO (USD) | | --------------------- | ----------------------------------- | ---------------- | | Fly.io | $47/mo * 36 | $1,692 | | Hetzner Cloud | $12/mo * 36 | $432 | | Mac mini + Backup | $599 + ($3/mo * 36) + ($4/mo * 36) | $851 |

Hetzner is the clear winner on pure cost, but it doesn't offer the performance or the specific hardware features (like the ANE) of the Mac mini. For my workload, the Mac mini provides significantly more value and raw power than the Hetzner VM, for a TCO that's still half of a managed provider like Fly.io. Once you need more than a basic Linux box, the economics of owning the hardware become very compelling.

This setup isn't for everyone. It requires a tolerance for hardware ownership and a workload that fits on a single, powerful node. But for a growing number of applications, especially in the agent and local-AI space, a dedicated machine with fast unified memory and specialized silicon is the right tool for the job. The cloud is an incredible tool for elastic, planet-scale computing. But for a small, stateful service, the cloud is someone else's Mac mini, marked up.

all notes →