Series

Long-form writing on what it takes to run AI agents in production — the same infrastructure problems I solved for cloud, happening again.

Installment 1
I've Seen This Movie Before

Cloud management went through the same lifecycle gap that AI agents face now. The ops layer is inevitable — here's why.
April 12, 2026 · 7 min read
Installment 2
The Agent Lifecycle Nobody's Managing

The full agent lifecycle from deployment to retirement. Which stages have tooling, which are completely unserved.
April 12, 2026 · 11 min read
Installment 3
Agent Configuration as Code

~200 configurable fields, no versioning, no diffs. The case for config-as-code discipline for agents.
April 12, 2026 · 8 min read
Installment 4
The Security Model Is Missing

Zero-trust for agents. Real CVEs, default-disabled auth, plaintext credentials — and a hardening checklist.
April 12, 2026 · 14 min read
Installment 5
Capability-First, Not Persona-First

Built a 17-dimension persona schema. It was wrong. Capability layer drives behavior; persona is a thin modifier.
April 12, 2026 · 8 min read
Installment 6
The Heartbeat Problem

Naive cron polling creates cascading problems. Event-driven heartbeats with Todoist as source of truth.
April 12, 2026 · 9 min read
Installment 7
Your Agent's Memory Is a Liability

Unbounded state growth degrades everything. Retention policies, tiered storage, and git-backed snapshots.
April 12, 2026 · 9 min read
Installment 8
Choosing Your Inference Stack

Not a benchmark comparison. A real decision tree: local vs API, fallback chains, cost/privacy/latency trade-offs.
April 12, 2026 · 9 min read
Installment 9
Mission Profiles: Scoping What Your Agent Can Touch

Ten a-la-carte profiles with tool ownership boundaries. Predictable, debuggable agent behavior.
April 12, 2026 · 7 min read
Installment 10
The Tool Sprawl Trap

Accumulating integrations without governance. Audit, ownership, and minimum viable toolset.
April 12, 2026 · 8 min read
Installment 11
Agent Observability: What to Log and Why

Decision logging, tool tracing, cost accounting, error classification. Actionable, not comprehensive.
April 12, 2026 · 7 min read
Installment 12
Sandbox Hardening for Agents That Touch Your Filesystem

Docker isolation, microVMs, and zero-trust operational philosophy for agent sandboxing.
April 12, 2026 · 8 min read
Installment 13
The Human-Agent Interface

Separation of concerns: human tools, agent workspace files, and a thin sync layer between them.
April 12, 2026 · 8 min read
Installment 14
Agent Evolution Without Regression

Config versioning, canary deployments, regression indicators. Agent updates as deployments, not experiments.
April 12, 2026 · 8 min read
Installment 15
The Management Layer Market Map

Landscape of agent management tooling. Fragmentation phase parallels with cloud infrastructure history.
April 12, 2026 · 9 min read