Series

Long-form writing on what it takes to run AI agents in production — the same infrastructure problems I solved for cloud, happening again.

  1. Installment 1

    I've Seen This Movie Before

    Cloud management went through the same lifecycle gap that AI agents face now. The ops layer is inevitable — here's why.

    April 12, 2026 · 7 min read
  2. Installment 2

    The Agent Lifecycle Nobody's Managing

    The full agent lifecycle from deployment to retirement. Which stages have tooling, which are completely unserved.

    April 12, 2026 · 11 min read
  3. Installment 3

    Agent Configuration as Code

    ~200 configurable fields, no versioning, no diffs. The case for config-as-code discipline for agents.

    April 12, 2026 · 8 min read
  4. Installment 4

    The Security Model Is Missing

    Zero-trust for agents. Real CVEs, default-disabled auth, plaintext credentials — and a hardening checklist.

    April 12, 2026 · 14 min read
  5. Installment 5

    Capability-First, Not Persona-First

    Built a 17-dimension persona schema. It was wrong. Capability layer drives behavior; persona is a thin modifier.

    April 12, 2026 · 8 min read
  6. Installment 6

    The Heartbeat Problem

    Naive cron polling creates cascading problems. Event-driven heartbeats with Todoist as source of truth.

    April 12, 2026 · 9 min read
  7. Installment 7

    Your Agent's Memory Is a Liability

    Unbounded state growth degrades everything. Retention policies, tiered storage, and git-backed snapshots.

    April 12, 2026 · 9 min read
  8. Installment 8

    Choosing Your Inference Stack

    Not a benchmark comparison. A real decision tree: local vs API, fallback chains, cost/privacy/latency trade-offs.

    April 12, 2026 · 9 min read
  9. Installment 9

    Mission Profiles: Scoping What Your Agent Can Touch

    Ten a-la-carte profiles with tool ownership boundaries. Predictable, debuggable agent behavior.

    April 12, 2026 · 7 min read
  10. Installment 10

    The Tool Sprawl Trap

    Accumulating integrations without governance. Audit, ownership, and minimum viable toolset.

    April 12, 2026 · 8 min read
  11. Installment 11

    Agent Observability: What to Log and Why

    Decision logging, tool tracing, cost accounting, error classification. Actionable, not comprehensive.

    April 12, 2026 · 7 min read
  12. Installment 12

    Sandbox Hardening for Agents That Touch Your Filesystem

    Docker isolation, microVMs, and zero-trust operational philosophy for agent sandboxing.

    April 12, 2026 · 8 min read
  13. Installment 13

    The Human-Agent Interface

    Separation of concerns: human tools, agent workspace files, and a thin sync layer between them.

    April 12, 2026 · 8 min read
  14. Installment 14

    Agent Evolution Without Regression

    Config versioning, canary deployments, regression indicators. Agent updates as deployments, not experiments.

    April 12, 2026 · 8 min read
  15. Installment 15

    The Management Layer Market Map

    Landscape of agent management tooling. Fragmentation phase parallels with cloud infrastructure history.

    April 12, 2026 · 9 min read