// case study · tooling
Three coding agents, one DAG, zero rate-limit cliffs.
A VS Code extension that decomposes natural-language project requests into a typed DAG, scores each task against agent capabilities, and runs Claude, Copilot, and Codex in parallel.

- Period
- Apr 2025 – Jun 2025
- Role
- Designer & sole engineer
- Repo
- GitHub
Any single AI coding agent is bottlenecked by two things: rate-limits and capability mismatch. Ask the wrong model for the wrong task and you burn quota for a worse answer. PackAI treats the three big assistants — Claude, Copilot, and Codex — as interchangeable workers behind a planner, and routes each unit of work to whichever one looks best for it.
How it works
- Intent analysis. The user describes a project in natural language. An NLP layer extracts project type, target stack, feature list, and a complexity estimate.
- DAG planning. The planner expands the intent into a phased task graph (“scaffold → schema → routes → tests → review”), with explicit dependencies so independent tasks can run concurrently.
- Capability scoring. Each task is scored against each agent on signals like task type, language, complexity, and recent benchmark behavior. The highest score wins.
- Parallel execution. Independent tasks dispatch simultaneously. A rate-limit queue catches throttled agents and re-routes their work to a peer.
- Conflict resolution. When two agents touch the same file, the orchestrator stages outputs, runs validation gates, and merges with deterministic precedence rules.
- Validation gates. Every output passes through syntax, import-resolution, basic security, and style checks before it lands on disk.
Why a DAG, not a chain
Most multi-agent systems either run agents sequentially (slow) or fully in parallel without coordination (chaotic). A typed DAG was the sweet spot: explicit about what depends on what, parallel by default for everything else, and trivial to visualize in the extension's live dashboard.
3
Agents orchestrated
Claude · Copilot · Codex
741+
Vitest tests
across orchestrator + plan layers
5
Workflow templates
e-commerce, landing, dashboard, blog, generic
0
Hard rate-limit failures
thanks to fallback queueing
Engineering decisions worth flagging
- State machine over framework. No LangGraph, no external agent library — the orchestrator is a plain TypeScript state machine. Easier to reason about, easier to test, and the 741+ tests reflect that.
- Capability scores are configurable. Each user can override default weights, so an iOS engineer can bias toward the agent that knows Swift best without forking the extension.
- Pause / resume / cancel at any node. Long-running sessions failed too often without granular control. The DAG made this almost free.
What surprised me
The capability-scoring layer mattered less than the conflict resolution layer. The agents were “good enough” that most routing decisions barely moved quality — but two agents editing the same file concurrently broke things constantly until the staging-and-merge layer was in place. The reliability win was in coordination, not selection.