Engineering ceremonies exist to solve a coordination problem: a group of people building one system has to stay aligned on what they're building, who's doing what, and whether it's working — and left to itself, that alignment decays. The rituals are the forcing functions that keep it from decaying. But which rituals isn't fixed; each dominant process model was a response to where the technology of its era put the cost of building software. Waterfall fit a world of mainframes and long, expensive release cycles, where compute was scarce and changing your mind late was ruinous — so you specified everything up front and moved in sequence. Agile arrived once iteration got cheap: faster hardware, version control, continuous integration, and frequent releases dropped the cost of change, so tight feedback loops beat big up-front plans, and the ceremonies reorganized around short cycles. Each regime, in other words, was tuned to a specific constraint — and they all shared one assumption underneath: that the expensive, error-prone step is humans writing the code.
That assumption is what's changing. When code generation gets cheap, the constraint moves off the keyboard and onto alignment and validation — and a ritual aimed at the old constraint quietly optimizes the wrong thing. The rest of this post walks ceremony by ceremony through what each becomes, which new ones earn a slot, and how the roles shift once code stops being the bottleneck.
When AI started writing real code, the obvious question for engineering leaders was which tools should we adopt? The more useful question, a couple of cycles later, is which of our ceremonies still earn their slot?
Code production has compressed by an order of magnitude. The rituals around it are still budgeted as if it hadn't. Standups still narrate yesterday's work. Grooming still hands fuzzy stories to engineers. Planning still estimates by author-time. Demos still show finished UIs. The ceremonies were built to coordinate humans typing — and most of them are now optimizing the wrong segment of the pipeline.
A fresh-look playbook: existing Agile ceremonies, the new ones that earn a place, the role shifts that come with them, and the bottlenecks that emerge once code stops being the constraint.
1. The unit of work has changed
The most consequential pattern in AI-assisted engineering, across the variants currently in play, is that the spec — not the ticket, not the PR — is becoming the unit of work. A few of the families to know:
- Structured Prompt-Driven Development (SPDD). Martin Fowler's writeup describes a six-step loop — story, requirements clarification, analysis context, a structured prompt artifact (the "REASONS canvas"), code generation, and tests. The prompt is versioned in Git alongside the code and stays synchronized with it.
- Kiro (Amazon). Three stages — requirements, design, tasks — using EARS (Easy Approach to Requirements Syntax) for the requirements phase. Specs are first-class artifacts the agent works against.
- GitHub Spec Kit. A slash-command workflow (
/specify,/plan,/tasks) plus a "constitution" of immutable principles the agent must respect. - Tessl. Treats the spec as the artifact; code is regenerable output of the spec.
Birgitta Böckeler, writing for Thoughtworks, organizes these into three levels of rigor: spec-first (write spec, then build), spec-anchored (keep the spec live and enforced as the system evolves), and spec-as-truth (the spec is the source; the code is generated). Most teams adopting this in earnest are operating somewhere between spec-first and spec-anchored.
The variants differ. The pattern doesn't: spec is durable, code is regenerable, validation is explicit.
2. What actually shifts underneath
The implications for ceremonies fall out of a handful of shifts:
- Pre-implementation expands; implementation compresses. The hard part of an engineer's day moved upstream of the keyboard, into the spec.
- Validation moves from gate to craft. When a model produces the artifact, fluency hides errors. Validation has to be planned and reviewed — not deferred to the end.
- Drift is silent. Without a sync ritual, the spec rots, the code wanders, and the audit trail stops being one.
- Auditability is a feature, not a side effect. Agent runs leave traces; ceremonies should consume them, not duplicate them.
Ceremonies that used to allocate time-to-implement now need to allocate time-to-align and time-to-validate. That's the lens for everything that follows.
3. The playbook — ceremony by ceremony
The five ceremonies any team will recognize, in their pre-AI form and the form that earns its slot today.
| Ceremony | Pre-AI purpose | Pre-AI failure mode | AI-era shape | Now optimizes for |
|---|---|---|---|---|
| Standup | Sync work-in-flight, surface blockers | Status theater | Decision-and-blocker board, async-first | Alignment latency |
| Backlog refinement / grooming | Make stories sprint-ready | Under-refined stories pushed into sprint | Spec workshop — produces a versioned spec artifact | Spec quality at the point of generation |
| Sprint planning | Commit to a scope | Estimating in author-time only | Commit to a validation budget; smaller batches | Realistic throughput across the whole pipe |
| Demo / sprint review | Show what was built | Polished UI hides shaky decisions | Decision review — show spec, trade-offs, validation evidence | Stakeholder confidence in the decision |
| Retrospective | Improve process | Vague complaints, no follow-through | Adds AI-usage retro reading audit logs and eval trends | Trust calibration over time |
Standup — from status to decisions and blockers
Audit trails already say what was done. What they don't surface is where I'm stuck deciding and which spec needs another pair of eyes — and the synchronous slot earns its place only when there's something to decide together.
Backlog refinement — from grooming to spec workshop
"Ready" now means a structured spec — REASONS canvas, EARS, /specify, ADR — good enough to drive code generation. The session is a workshop; tech lead and senior engineer sit alongside the PM; the output is a versioned artifact, not a ticket comment.
Sprint planning — from author-budget to validator-budget
AI compressed author-hours; nobody compressed validator-hours, so capacity is validator-bound. Two-week boundaries get awkward both ways — too long for AI throughput, too short for validation depth. Plan in specs authored, generated against, validated, and shipped.
Demo / sprint review — from feature demo to decision review
AI makes everything look shippable, so demo-by-screenshot tells you less than it used to. The high-leverage demo is the decision review: spec, trade-offs taken, alternatives rejected, validation evidence. Stakeholders walk away knowing why this option, why these criteria, what we'd change if we did it again — the value a person stamps onto machine-assisted output.
Retrospective — same purpose, new evidence base
Failure modes are now AI-specific — hallucinated APIs that passed review, prompt regressions, eval drift, validators rubber-stamping output. These leave evidence: audit logs, eval trends, drift reports. Add an AI-usage retro that reads those artifacts. Where did we trust output we shouldn't have? Where did we re-verify what didn't need it? Trust calibration over time.
4. New ceremonies that earn a slot
A handful of ceremonies have no good pre-AI analogue. They earn a slot because the bottleneck moved.
| New ceremony | Why it now earns a slot | Cadence | Bottleneck addressed |
|---|---|---|---|
| Spec review | Senior judgment now lives in the spec; bad specs hide as fluent code | Per spec, before generation | Alignment latency, spec quality |
| Eval & regression review | Without a periodic read, regressions go silent | Weekly or sprint-aligned | Eval coverage gaps, silent drift |
| Prompt-library / context tending | The agent's context is a product; it rots without an owner | Sprintly, often folded into spec or eval review | Context / prompt-library decay |
| Validation triage (optional) | Choke point is human reviewer capacity, not author capacity | As needed; small standing slot | Validator capacity |
Spec review
Design review, reborn — lighter than a code review, heavier than story refinement, closest to an ADR review. Bad specs review as fluent code; good specs review as a system that holds together. Catch the difference before generation, not after.
Eval & regression review
Borrowed from ML ops, now standard for any AI-assisted codebase. New failure modes? Baseline regressions? Models silently degraded after an upgrade? If you don't have eval coverage, that's the project — start there.
Prompt-library / context tending
CLAUDE.md, AGENTS.md, spec library, shared prompts — a product the team consumes, with the hygiene of dependencies: owner, review cadence, retirement of stale entries. Often folded into spec or eval review — but somebody owns it.
Validation triage (optional)
Borrowed from on-call triage: which outputs warrant deep human review, which take cheap automated checks. Not every team needs this; the ones whose validation queue blows out a sprint know who they are.
5. How the roles shift
Ceremony change without role change is theatre. The shifts line up across the org:
| Role | Pre-AI focus | AI-era focus | Where this role now leans in |
|---|---|---|---|
| Engineer (IC) | Implement to spec, write tests, peer-review code | Author specs, validate AI output, design eval cases, tend the prompt library | Spec workshops, spec review, AI-usage retros |
| Engineering manager | Capacity by author-hours, unblock, coach craft | Capacity by validator-hours, manage review queues, watch eval trends | Sprint planning, validation triage, retros |
| Director / VP engineering | Org design, headcount-to-throughput | Smaller teams, denser accountability; fund eval / audit / prompt-library infrastructure | Tooling and platform investment, cross-team eval reviews |
| Product manager / partner | Write stories, prioritize backlog, align stakeholders | Own the spec, not just the story; co-write criteria precise enough to drive generation | Spec workshops, spec review, decision-style demos |
| Stakeholder / business sponsor | Engage at demo and release | Engage at the spec stage, where input still shapes the outcome | Spec review, decision-style demos |
| QA / tester | Write and run tests against requirements; bug triage | Own the eval suite; design validation strategy; triage | Eval & regression review, validation triage, spec review |
| Platform / infrastructure | CI/CD, observability, on-call | Same plus agent infrastructure: audit, budget, eval pipelines, prompt-library tooling | Prompt-library tending, eval pipelines, audit guardrails |
| Operations / SRE | Uptime, incident response, capacity | Same plus agent ops: cost circuit breakers, runaway-loop detection, drift detection | AI-usage retros (incidents), eval review (silent degradation) |
Each role moves up a level of abstraction. The org chart compresses; the responsibility per name does not.
6. The new bottlenecks (and where to put the touch points)
Reorganize ceremonies for yesterday's bottleneck and you'll regret it in six months. Today's pairs of bottleneck → touch point (owner):
- Alignment latency → spec workshop (PM + tech lead).
- Validator capacity → validation triage; plan by validator-hours (engineering manager).
- Spec/code drift → spec-sync ritual; drift audit in retro (the engineer who owns the spec).
- Eval coverage gaps → eval & regression review (QA).
- Trust calibration → AI-usage retro (engineering manager).
- Context decay → prompt-library tending (platform team).
These will move again. The discipline is asking each quarter: what's the choke point now, and which ceremony addresses it?
Closing thought
Ceremonies are forcing functions for alignment. AI didn't reduce the need for alignment — it concentrated the value of alignment into fewer, higher-stakes touch points. The teams that adopt these tools well aren't the ones with the shiniest agents. They're the ones who reorganized their rituals around the new bottleneck, lifted each role one level of abstraction to match, and put a ceremony exactly where it earned its slot.
Same lesson as before: leverage exposes the next constraint. Do the work to find it, and put a ceremony there.
Further reading
- Structured-Prompt-Driven Development — Martin Fowler. The primary source for the SPDD workflow and REASONS canvas.
- Spec-driven development — Thoughtworks Technology Radar. Birgitta Böckeler's three-level taxonomy.
- GitHub
spec-kit— slash-command spec-driven workflow with a constitution. - Software Engineering Practices for the AI Era — companion post on the discipline that underpins these rituals.
- Inference and Risk — companion post on validation, evals, and the design dimension behind the bottlenecks above.
- The Future of Knowledge Work with LLMs — companion post on accountability, judgement, and validation as the human contribution.