Spec first. Code second.
Humans stay in control.

ae (Artifact Engine) turns backend development into formal, machine-validated YAML specs — then hands AI agents bounded, reviewable tasks instead of an open-ended request. Every module is specified, validated, and approved before a single line of code is written.

specs/modules/order-repository.yaml

kind: module_spec
metadata:
  id: module:order-repository@v1
  status: frozen
spec:
  identity:
    name: order-repository
    layer: infrastructure
  responsibilities:
    primary:
      - Persist order entities to PostgreSQL
    forbidden:
      - Business logic or state machine rules
  public_contract:
    operations:
      - name: SaveOrder
        errors: [ERR_PERSISTENCE_FAILURE]
      - name: GetOrder
        errors: [ERR_ORDER_NOT_FOUND]
  completion_criteria:
    - SaveOrder/GetOrder implemented against PostgreSQL

Outcome 01 · Models

You don’t need a frontier model. You need a better task.

Frontier models compensate for vague tasks. ae removes the vagueness instead: each agent gets one small, fully specified module contract — and suddenly a flash-class model performs like a senior engineer. Our entire measured pipeline, specs and implementation, runs on low-cost models.

specs · gpt-5.4-mini
code · qwen-class agents
flash pricing, end to end

Outcome 02 · Correctness

Hallucinations don’t get fixed. They get engineered out.

Models invent APIs when they’re forced to guess. ae leaves nothing to guess: every callable dependency is spelled out in the spec, and 40+ machine-checked rules reject incomplete specs before code generation starts. An agent that genuinely can’t proceed files a structured blockage report — it never improvises.

70 modules measured · 0 invented APIs
40+ machine-checked rules

Outcome 03 · Cost

Up to 50× fewer tokens than an agentic session.

A typical coding agent burns its budget re-reading the repo, retrying, and reasoning through ambiguity. An ae module task is a bounded contract executed once: ~13,701 tokens per implemented module, ~10,842 for a full service’s spec pipeline — measured, not estimated.

~13,701 tokens / module
~10,842 tokens · full spec pipeline

Outcome 04 · Scale

A context window that never grows with your codebase.

Agents never see your repository. Each one receives exactly one frozen module spec — scope, contracts, tests — so context stays small, attention stays sharp, and quality doesn’t degrade as the project scales from 3 modules to 70.

1 frozen spec per agent
same quality at module 3 and module 70

Foundation 05 · Method

Spec first. Code is the last step, not the first draft.

One sentence in; project, architecture, and module specs out — versioned YAML artifacts with a real lifecycle. Code is generated from frozen specs, never from chat history. Change the spec, not the prompt.

draft → in_review → approved → frozen
versioned YAML — diffable, reviewable

Foundation 06 · Validation

Validated by machines before a single line of code.

40+ deterministic rules check every artifact: structural completeness, contract coverage, dependency cycles, scope consistency. A validate-and-fix loop drives specs to PASS, and the freeze gate makes approved specs immutable — so what the agent implements is exactly what was reviewed.

R · DS · PDEP · ITASK rule families
validate → fix → PASS, then freeze

Foundation 07 · Decomposition

Decomposed until one cheap agent can own it.

ae splits the architecture into modules sized for a single low-cost agent, computes the dependency graph, and schedules work in waves — providers first, consumers after. Independent modules implement in parallel: ~2 minutes each in our measured runs.

providers before consumers — by graph
~2 min / module, in parallel

Foundation 08 · The contract

Every module spec is a complete contract.

Purpose. Scope: in and out. Responsibilities. Public contract. Dependency contracts. Invariants. Error model. Test spec. Completion criteria. The implementing agent gets everything it needs and nothing it doesn’t — that’s what “bounded” means here.

every section mandatory — checked by rules
nothing left implicit

Foundation 09 · Honesty

When an agent is blocked, you get a report — not a guess.

In our 70-module run, 68 succeeded and 2 stopped with structured cross-module blockage reports naming the exact missing contract. That honesty is a feature: you fix one spec line and rerun, instead of debugging confident fiction.

68/70 implemented clean
2 structured reports · 0 guesses

Spec-first · machine-validated · local-first

Bounded specs in. Hallucination‑free modules out.

ae turns every module into a machine-validated contract — allowed scope, forbidden scope, invariants, completion criteria. A contract that tight leaves a low-cost AI agent no room to hallucinate: it implements exactly what's written, and when it genuinely can't, it files a structured blockage report instead of inventing an API. The specs themselves? Also written by a cheap model — 40+ deterministic validation rules do the thinking.

up to 50×

lower token spend than an open-ended agentic session^*

~13,701

tokens per implemented module — about a cent

hallucinated APIs — blockers surface as structured reports

Generate a spec in the playground View on GitHub →

^* Measured across 70 bounded module implementations in 30+ dogfooding projects (May–June 2026), mostly qwen-class coding models; spec-stage numbers from ae's built-in token accounting on gpt-5.4-mini.

ae — spec → frozen contract → implemented module

How it works

Spec first, code second. A module must be formally specified, validated, and frozen before any code is written.

1

Describe

One sentence about your service. ae ai breakdown turns it into a structured module breakdown.
2

Specify

ae ai specs writes project, architecture, and module specs — formal YAML with scope, invariants, contracts, and completion criteria.
3

Validate & freeze

Every file passes schema + semantic validation. Failing specs are auto-repaired in a bounded fix loop. You approve; the engine freezes.
4

Implement — on a cheap agent

ae hands a bounded execution contract to a low-cost coding agent. Scope is closed, invariants are checks, dependencies are inlined — nothing left to guess, nothing to hallucinate.

Scene 1 · notification service — three independent modules, fully parallel

1Describe & spec
2Decompose
3Validate
4Frozen bundles
5Fan out to cheap agents
6Implement in parallel

Scene 2 · order service — the repo's integration-test fixture (testdata/integration/order-service); use-cases depends on both providers

1Describe & spec
2Decompose
3Validate
4Frozen bundles
5Wave 1 · providers in parallel
6Wave 2 · consumer unblocked

Why structured beats smart

AI agents hallucinate on open-ended requests because they don't know what they must not do. ae makes every boundary explicit — and machine-checkable. Measured result: 68 of 70 modules implemented clean by low-cost agents.

Lifecycle

Spec-first state machine

Every module moves through draft → in_review → approved → frozen → implementing → implemented. No implementation task exists until the spec is frozen, and freezing requires non-empty completion criteria. Every transition is an audited history event.

No hallucinations

Bounded contracts, not vibes

An implementation task is a closed world: allowed scope, forbidden scope, invariants turned into validation checks, dependency contracts inlined, completion criteria spelled out. The agent has nothing to guess — so a qwen-class model implements it as reliably as a frontier one. And when a task genuinely can't be done, the agent files a structured cross_module_blockage report naming the contract change it needs — it doesn't invent an API.

# implementation task — bounded execution contract
allowed_scope:       internal/deliverystatustracking/
forbidden_scope:     other modules, public contracts, deps' internals
invariants:          INV-001 only email|sms channels reach providers
validation_checks:   derived from invariants + public operations
completion_criteria: unit tests cover every public operation

The core trick

Validate-and-fix loop: rules instead of model IQ

Each AI call runs generate → extract YAML → validate → fix-prompt retry until the artifact passes. The validator enforces 40+ deterministic rules — unique operations, acyclic dependency graphs, invariant ownership, scope/forbidden overlap checks. The schema does the thinking, so neither the spec writer nor the implementer has to be expensive.

call AI → extract YAML → write file → ae validate
  PASS → done
  FAIL → build fix prompt (errors inlined) → retry
  repeat up to max_retries

Decomposition

Knows when to split

The decomposition engine scores each module and rules DECOMPOSABLE, ATOMIC, or NOT_BENEFICIAL — then generates child stubs with lineage, and orders work in dependency-safe waves: providers before consumers.

Impact analysis

Blast radius without AI

When an implementation is blocked on another module's contract, ae implementation impact traverses the dependency graph and tells you exactly which modules need lifecycle actions, task regeneration, or a staleness check. Pure graph analysis — zero tokens.

Portable

Works with your agent

Export self-contained implementation bundles — task, rendered prompt, frozen spec, dependency contracts, result template — and feed them to Claude Code, Codex, or any agent. Import the result report back for analysis and retry bundles.

Local-first

Your specs are files

All state is human-readable YAML on your filesystem, tracked in your repo. Bring any OpenAI-compatible endpoint: OpenRouter, Groq, OpenAI, or a local Ollama — per-profile, per-task-type.

Token economics, measured

ae logs every AI call and every implementation run — model, tokens, outcome. These are real averages from real dogfooding projects, not projections.

Specify — flash-class model

Stage	Command	Avg tokens / run
Module breakdown	`ae ai breakdown`	3,059
3 validated spec files	`ae ai specs`	4,910
Decomposition spec	`ae ai decomp`	2,873
Complete validated project specification		~10,842

Implement — low-cost coding model

Metric	Measured
Avg tokens per implemented module	~13,701
Typical wall-clock per module	~2 min
Hallucinated workarounds	0

Cheap twice — and honest

Implementation on low-cost agents. A bounded contract turns "write this module" from a frontier-agent job into a qwen-class job: ~13,701 tokens per module — about a cent at open-model pricing — with 68 of 70 modules landing clean. The 2 that couldn't proceed filed structured blockage reports naming the exact contract change they needed; none invented code around the gap.

Planning on flash models. Structured prompts with injected context replace meandering planning chats: ~10,842 tokens buys a full validated project spec — about $0.003 at $0.25 per million. An open-ended planning session routinely burns 10–50× that before anything is pinned down.

Prefer a frontier agent anyway? Fine — the contract is the same, and those tokens stop being wasted on re-deriving scope. But you'll rarely need to.

See it produce an agent-ready contract from one sentence.

The playground runs the real ae pipeline on a flash-class model and shows you every validation rule it passes — and what the run cost. The same specs are what low-cost agents implement without hallucinating.

Open the playground Star on GitHub

FAQ

Does ae write the code too?

ae generates the specs and the bounded implementation tasks. Implementation runs through your preferred coding agent — ae can drive one directly (Claude Code, Codex, or any OpenAI-compatible endpoint) or export a portable bundle and import the result report afterward.

Which models can I use?

Any OpenAI-compatible endpoint, configured per profile: OpenRouter, OpenAI, Groq, local Ollama. A typical setup uses a flash-class model for specs and a stronger model (or a local agent CLI) for implementation.

Is my project sent to your servers?

No. ae is a local-first CLI — all artifacts are YAML files on your machine, and AI calls go directly from your machine to the endpoint you configure. This website's playground is the only hosted piece, and it only sees the demo descriptions you type into it.

How is this different from just prompting an agent well?

Prompts are advisory; ae's rules are enforced. A prompt can ask a model to keep scope tight — ae rejects the artifact if scope overlaps, dependencies cycle, invariants lack owners, or completion criteria are missing, and auto-retries with the exact errors inlined. The boundary lives in the validator, not in the model's goodwill.

Why don't low-cost agents hallucinate here?

Because the task is a closed world. The agent receives the exact packages it may touch, everything it must not touch, invariants restated as validation checks, every dependency's public contract inlined, and explicit completion criteria. Hallucination thrives on gaps — and a frozen ae spec doesn't have any. When a task truly can't be done, the contract gives the agent an honest exit: a structured cross_module_blockage report instead of an invented API. In our measured runs: 68 of 70 modules implemented clean, 2 honest blockage reports, 0 hallucinated workarounds.

What does the playground let me do?

Registered users can run the spec-generation pipeline on a limited quota: describe a service, get the module breakdown and three validated YAML spec files, inspect the validation report, and download the result. Implementation-task generation is CLI-only.

Spec first. Code second.Humans stay in control.