Series
Long-form writing on what it takes to run AI agents in production — the same infrastructure problems I solved for cloud, happening again.
-
Installment 1
I've Seen This Movie Before
Cloud management went through the same lifecycle gap that AI agents face now. The ops layer is inevitable — here's why.
April 12, 2026 · 7 min read -
Installment 2
The Agent Lifecycle Nobody's Managing
The full agent lifecycle from deployment to retirement. Which stages have tooling, which are completely unserved.
April 12, 2026 · 11 min read -
Installment 3
Agent Configuration as Code
~200 configurable fields, no versioning, no diffs. The case for config-as-code discipline for agents.
April 12, 2026 · 8 min read -
Installment 4
The Security Model Is Missing
Zero-trust for agents. Real CVEs, default-disabled auth, plaintext credentials — and a hardening checklist.
April 12, 2026 · 14 min read -
Installment 5
Capability-First, Not Persona-First
Built a 17-dimension persona schema. It was wrong. Capability layer drives behavior; persona is a thin modifier.
April 12, 2026 · 8 min read -
Installment 6
The Heartbeat Problem
Naive cron polling creates cascading problems. Event-driven heartbeats with Todoist as source of truth.
April 12, 2026 · 9 min read -
Installment 7
Your Agent's Memory Is a Liability
Unbounded state growth degrades everything. Retention policies, tiered storage, and git-backed snapshots.
April 12, 2026 · 9 min read -
Installment 8
Choosing Your Inference Stack
Not a benchmark comparison. A real decision tree: local vs API, fallback chains, cost/privacy/latency trade-offs.
April 12, 2026 · 9 min read -
Installment 9
Mission Profiles: Scoping What Your Agent Can Touch
Ten a-la-carte profiles with tool ownership boundaries. Predictable, debuggable agent behavior.
April 12, 2026 · 7 min read -
Installment 10
The Tool Sprawl Trap
Accumulating integrations without governance. Audit, ownership, and minimum viable toolset.
April 12, 2026 · 8 min read -
Installment 11
Agent Observability: What to Log and Why
Decision logging, tool tracing, cost accounting, error classification. Actionable, not comprehensive.
April 12, 2026 · 7 min read -
Installment 12
Sandbox Hardening for Agents That Touch Your Filesystem
Docker isolation, microVMs, and zero-trust operational philosophy for agent sandboxing.
April 12, 2026 · 8 min read -
Installment 13
The Human-Agent Interface
Separation of concerns: human tools, agent workspace files, and a thin sync layer between them.
April 12, 2026 · 8 min read -
Installment 14
Agent Evolution Without Regression
Config versioning, canary deployments, regression indicators. Agent updates as deployments, not experiments.
April 12, 2026 · 8 min read -
Installment 15
The Management Layer Market Map
Landscape of agent management tooling. Fragmentation phase parallels with cloud infrastructure history.
April 12, 2026 · 9 min read