25 techniques derived from building autonomous agent architectures
7 sessions · Workshop format · Beyond "write clear instructions"
What this is not
"Be specific and clear" — you already know that
"Use examples" — obvious
"Set a role" — table stakes
This is systems engineering for prompts.
Patterns that emerge when prompts become load-bearing infrastructure.
Session 1
Foundations
Technique 11 — Constitutional Layering
Technique 15 — Cold-Read Design
Technique 11
Constitutional Layering
Prompts as stratified documents where later layers amend earlier ones with explicit precedence.
Why flat prompts break
Flat prompt
Every sentence equally weighted
Conflicts unresolvable
Must rewrite to amend
No maintenance path
→
Layered prompt
Explicit precedence hierarchy
Later layers override earlier
Add amendments, don't edit
Model reasons about intent
Example: Layered system prompt
Prompt literal — the system prompt itself
## Layer 0 — Identity (immutable)
You are a code reviewer specializing in security.
## Layer 1 — Governance (rarely changes)
Report findings with severity: critical > high > medium > low.
Never suppress a finding to reduce noise.
## Layer 2 — Session override (supersedes Layer 1)
UPDATED 2024-03: For this codebase, suppress medium/low
findings in test files. Rationale: test noise was obscuring
critical production findings. Layer 1 still applies to src/.
The model sees the evolution. It knows WHY the override exists.
Exercise
A customer support chatbot. Original prompt: "Be helpful, resolve issues quickly." After complaints about over-promising refunds, amendment added: "Don't promise refunds over $50 without escalation." Three months later, holiday override: "$50 limit raised to $100 for loyalty-tier customers."
Discuss: What are the layers? What happens if you edit the original instead of amending?
How does a new instance understand the policy history?
Some valid answers
A customer support chatbot. Original prompt: "Be helpful, resolve issues quickly." After complaints about over-promising refunds, amendment added: "Don't promise refunds over $50 without escalation." Three months later, holiday override: "$50 limit raised to $100 for loyalty-tier customers."
Each layer has a scope: the original applies universally, the amendment narrows the refund case specifically, the holiday override is time-scoped and tier-scoped.
The override is a layer, not an edit — it coexists with the amendment. On January 2nd, Layer 3 should expire, leaving Layer 2 intact.
Some valid answers
A customer support chatbot. Original prompt: "Be helpful, resolve issues quickly." After complaints about over-promising refunds, amendment added: "Don't promise refunds over $50 without escalation." Three months later, holiday override: "$50 limit raised to $100 for loyalty-tier customers."
Q2.What happens if you edit the original instead of amending?
The reasoning trail disappears — a new instance sees "$100 for loyalty customers" but doesn't know it started at $50, or why, or that the change was seasonal.
Silent regression risk: the next person to edit thinks $100 is the permanent policy and might remove the loyalty-tier qualifier when "cleaning up."
The original "be helpful, resolve quickly" framing is gone — future instances lose the governing principle that justified all the exceptions.
Some valid answers
A customer support chatbot. Original prompt: "Be helpful, resolve issues quickly." After complaints about over-promising refunds, amendment added: "Don't promise refunds over $50 without escalation." Three months later, holiday override: "$50 limit raised to $100 for loyalty-tier customers."
Q3.How does a new instance understand the policy history?
With amendments in place: the new instance sees each layer with its rationale and scope — it can reconstruct the intent behind each rule and the order of precedence.
Without amendments (edited original): the new instance has one instruction with no context on why it evolved, what exceptions apply to what cases, or which parts are permanent vs. seasonal.
The holiday override should carry an explicit expiry: "applies Dec 1–Jan 1; reverts to Layer 2 after." A new instance reading it mid-February then knows the current state unambiguously.
Technique 15
Cold-Read Design
Prompts that function correctly on first encounter with zero prior context.
The cold-read test
Hand this prompt to someone who knows nothing about your project.
Can they understand what's being asked?
"As discussed" — cold-read failure
"Continue from last time" — cold-read failure
"The usual format" — cold-read failure
Every invocation is a fresh context window. Even within a session, after compaction.
Before / After
Assumes context
Continue the analysis.
Focus on the issues
we identified earlier.
Use the same format.
vs
Cold-readable
Analyze auth.py for SQL
injection vulnerabilities.
Prior findings (for context):
- Line 42: unsanitized input
- Line 89: string concat in query
Output format:
[LINE]:[SEVERITY]:[DESCRIPTION]
Exercise
A shared incident response runbook. Entry says: "Use the usual escalation path. Contact the on-call as before. Apply the fix from last time." A new team member joins during an outage at 3 AM and opens this runbook.
Discuss: What fails the cold-read test? What information must be inline?
What's the operational cost of the ambiguity?
Some valid answers
A shared incident response runbook. Entry says: "Use the usual escalation path. Contact the on-call as before. Apply the fix from last time." A new team member joins during an outage at 3 AM and opens this runbook.
Q1.What fails the cold-read test?
"Use the usual escalation path" — "usual" encodes institutional memory. A new team member at 3 AM has none of that memory.
"Contact the on-call as before" — there is no "before" for someone joining mid-incident. Who is on-call? How? Their phone? Slack? PagerDuty?
"Apply the fix from last time" — which last time? There may have been dozens. This phrase requires you to know the history to use the runbook.
Some valid answers
A shared incident response runbook. Entry says: "Use the usual escalation path. Contact the on-call as before. Apply the fix from last time." A new team member joins during an outage at 3 AM and opens this runbook.
Q2.What information must be inline?
The escalation path spelled out step by step: who to contact, in what order, by what channel, with what account credentials or access method.
The fix itself, not a reference to it: "run kubectl rollout undo deployment/api --namespace prod" not "apply the rollback."
Context for judgment calls: "if the issue is a crash loop, check the OOM killer logs first; if it's latency, check the database slow query log" — not "debug as before."
Some valid answers
A shared incident response runbook. Entry says: "Use the usual escalation path. Contact the on-call as before. Apply the fix from last time." A new team member joins during an outage at 3 AM and opens this runbook.
Q3.What's the operational cost of the ambiguity?
Minutes lost locating the "usual" escalation path during an outage translate directly into user-facing downtime and revenue loss.
A new engineer who can't follow the runbook will escalate to someone who can — waking a second person who also isn't necessary once the runbook is clear.
Ambiguous runbooks accumulate: each incident where someone improvises a step creates a new undocumented procedure, making the next incident harder to handle cold.
Session 2
Failure Engineering
Technique 12 — Failure Mode Encoding
Technique 19 — Consumed vs. Done
Technique 12
Failure Mode Encoding
Describe HOW the system fails, not just how it should succeed.
Why positive instructions fail
"Be thorough" → model can't match against own behavior
"The system fails when you list items without
processing them — mentioning ≠ addressing" → recognizable pattern
Models self-correct more reliably when they know
what the failure looks like from inside.
Example: Encoding failure modes
## How this system fails
1. **Surface-level pass**: You read all inputs, produce output
that references each one, but don't actually synthesize.
The output LOOKS complete but adds no insight beyond restating.
2. **Premature convergence**: You commit to a thesis in paragraph 1
and spend the rest confirming it. Counter-evidence gets mentioned
but not weighted.
3. **Acknowledgment-as-completion**: You write "noted" or "I'll
address X" — and then don't. The item disappears.
For each: if you catch yourself doing this mid-response, stop,
name the failure mode, and restart that section.
Exercise
A nightly data migration pipeline. Reads from Legacy DB, transforms 50,000 records, writes to New DB. Reports "success" when it finishes without crashing. Doesn't verify row counts, data integrity, or that transformations preserved semantics.
Discuss: What are the failure modes, described from inside the system?
What does "success" actually mean? How would you encode these failures into the pipeline's own monitoring?
Some valid answers
A nightly data migration pipeline. Reads from Legacy DB, transforms 50,000 records, writes to New DB. Reports "success" when it finishes without crashing. Doesn't verify row counts, data integrity, or that transformations preserved semantics.
Q1.What are the failure modes, described from inside the system?
Silent data loss: "I finished processing all 50,000 records" — but the row count dropped from 50,000 to 48,941. The pipeline never checks that input count = output count.
Semantic corruption: "I transformed all date fields" — but the legacy DB stored dates as MM/DD/YYYY and the transformation assumed YYYY-MM-DD, silently inverting month and day for ambiguous dates.
Partial write success: "I completed the migration" — but a write failure at record 31,400 left the New DB in a half-migrated state, with no rollback and no flag marking the split point.
Some valid answers
A nightly data migration pipeline. Reads from Legacy DB, transforms 50,000 records, writes to New DB. Reports "success" when it finishes without crashing. Doesn't verify row counts, data integrity, or that transformations preserved semantics.
Q2.What does "success" actually mean?
"Didn't crash" is the absence of one specific failure — it says nothing about data quality, completeness, or correctness. The pipeline can run to completion and produce wrong data silently.
A meaningful success criterion needs multiple tests: (1) output row count matches input row count; (2) a sample of transformed records is spot-checked for semantic accuracy; (3) the New DB's constraint checks all pass.
The current "success = no exception" is a single-point check on the outer loop — it leaves the failure surface of every inner operation unchecked.
Some valid answers
A nightly data migration pipeline. Reads from Legacy DB, transforms 50,000 records, writes to New DB. Reports "success" when it finishes without crashing. Doesn't verify row counts, data integrity, or that transformations preserved semantics.
Q3.How would you encode these failures into the pipeline's own monitoring?
Post-run assertions: after processing, query both DBs and compare row counts. If New DB rows ≠ Legacy DB rows, the pipeline reports "FAILED: row count mismatch" not "success."
Checkpoint logging: write a progress marker after each batch (e.g., every 1,000 records). If the pipeline crashes at record 31,400, the checkpoint tells you exactly where to resume rather than restart.
Semantic sample checks: after transformation, run a spot-check on 100 random records comparing key fields between old and new. If more than 1% differ unexpectedly, fail loudly before marking success.
Technique 19
Consumed vs. Done
Acknowledgment is not completion. Processing input is not producing output.
The acknowledgment trap
Input: 5 requirements
↓
Model output mentions all 5 ✓
↓
But only 3 have concrete deliverables
↓ 2 were consumed, not done
Partial completion passes casual inspection because all inputs are referenced.
Structural prevention
For each requirement below, produce:
- [ ] Implementation (actual code/output, not a description)
- [ ] Verification (how to confirm it works)
- [ ] Edge case (one scenario that might break it)
Requirements:
1. User can reset password via email
2. Rate limit: max 3 attempts per hour
3. Token expires after 15 minutes
4. Audit log entry on every attempt
If any checkbox is empty, the requirement is NOT done.
Do not proceed to the next requirement until all three
checkboxes for the current one are filled.
Exercise
An employee onboarding workflow with 6 steps: account creation, equipment request, team introduction, access provisioning, training assignment, mentor pairing. The system marks each step "complete" when the request is submitted — not when the outcome is delivered. A new hire has all 6 steps marked complete on day 1, but doesn't receive their laptop until day 8.
Discuss: What are the primitives? Where does consumed =/= done?
What structure makes true completion visible?
Some valid answers
An employee onboarding workflow with 6 steps: account creation, equipment request, team introduction, access provisioning, training assignment, mentor pairing. The system marks each step "complete" when the request is submitted — not when the outcome is delivered. A new hire has all 6 steps marked complete on day 1, but doesn't receive their laptop until day 8.
Q1.What are the primitives?
Each onboarding step has two events: requested_at (when the action was submitted) and fulfilled_at (when the outcome was delivered). The current system records only the first.
The primitives are request and fulfillment — two distinct states the system collapses into one. "Complete" conflates submitting an equipment request with receiving the laptop.
A third useful primitive: blocker — something that delays fulfillment. Knowing step 2 (equipment) is blocked on procurement backlog is actionable; knowing it's "complete" (submitted) is not.
Some valid answers
An employee onboarding workflow with 6 steps: account creation, equipment request, team introduction, access provisioning, training assignment, mentor pairing. The system marks each step "complete" when the request is submitted — not when the outcome is delivered. A new hire has all 6 steps marked complete on day 1, but doesn't receive their laptop until day 8.
Q2.Where does consumed ≠ done?
Equipment request: submitted = consumed (form filled). Done = laptop received. These can be 8 days apart. Marking "complete" at submission makes the gap invisible.
Access provisioning: submitted = consumed (ticket opened). Done = credentials confirmed working. A submitted ticket that's never processed looks identical to a fulfilled one.
Mentor pairing: submitted = consumed (name assigned). Done = first meeting happened. The new hire might have a mentor name on paper and no contact for three weeks.
Some valid answers
An employee onboarding workflow with 6 steps: account creation, equipment request, team introduction, access provisioning, training assignment, mentor pairing. The system marks each step "complete" when the request is submitted — not when the outcome is delivered. A new hire has all 6 steps marked complete on day 1, but doesn't receive their laptop until day 8.
Q3.What structure makes true completion visible?
Split each step into two state fields: requested_at and fulfilled_at. A step is "done" only when fulfilled_at is populated. The gap between them is the metric worth tracking.
A dashboard that shows "6 steps submitted, 2 fulfilled" on the new hire's day 1 makes the true state visible — they don't have a laptop yet; that's the actual blockers on day 8.
Outcome checklist per step: equipment = "hardware received AND configured AND VPN working." Each checkbox requires a human confirmation action, not a system event. Partial completion is visible as partially-checked.
Session 3
Structural Compliance
Technique 13 — Enforcement vs. Request
Technique 14 — Backlog Pressure
Technique 13
Enforcement vs. Request
Distinguish between constraints the system enforces mechanically and those that rely on cooperation.
The compliance spectrum
Enforced (mechanical)
JSON schema validation
Output length truncation
Tool permission boundaries
Type checking on function calls
Model can't violate even if it tries.
Requested (cooperative)
"Keep responses concise"
"Don't hallucinate"
"Follow this format"
"Be objective"
Model compliance is probabilistic.
Signaling enforcement honestly
Prompt literal — signals enforcement, does not perform it
## Enforced by the runtime — stated so you know the boundary
- Output is validated against the JSON schema below; invalid
output is rejected before it reaches the user.
- Token budget is hard-capped at 4096; output truncated at the boundary.
- tool_choice is restricted to [search, calculate, submit] — calls to
any other tool are not possible.
## Requested of you — deviation acceptable if justified
- Prefer concise explanations (1-2 sentences per finding).
- When uncertain, include the finding but mark confidence: low.
- Group related findings under a single heading.
The first list is not asking you to comply — it is telling you what
the system does regardless. The second list is asking.
The diagnostic: for every constraint in a prompt, ask — could a deterministic mechanism enforce this? If yes, the prompt line is at best a backup, at worst a false sense of safety. Enforce it in code; keep the prompt statement only as labeled awareness.
Exercise
An automated code review tool has these rules: (1) No function over 100 lines. (2) All public APIs must have JSDoc comments. (3) Prefer immutable data structures. (4) No console.log in production code. Engineers routinely violate rule 3 with no consequence.
Discuss: Which rules can be mechanically enforced? Which rely on cooperation?
What's the cost of claiming enforcement you don't have?
Some valid answers
An automated code review tool has these rules: (1) No function over 100 lines. (2) All public APIs must have JSDoc comments. (3) Prefer immutable data structures. (4) No console.log in production code. Engineers routinely violate rule 3 with no consequence.
Q1.Which rules can be mechanically enforced? Which rely on cooperation?
Rules 1, 2, and 4 are mechanically enforceable — line count, JSDoc presence, and a console.log ban are deterministic AST or regex checks.
Rule 3 — "prefer immutable data structures" — is cooperative. "Prefer" has no deterministic test; nothing can decide "this should have been immutable."
A rule can split inside itself: a linter enforces that a JSDoc comment exists (rule 2), not that it is accurate — presence is enforced, quality is cooperative.
Some valid answers
An automated code review tool has these rules: (1) No function over 100 lines. (2) All public APIs must have JSDoc comments. (3) Prefer immutable data structures. (4) No console.log in production code. Engineers routinely violate rule 3 with no consequence.
Q2.What's the cost of claiming enforcement you don't have?
Trust erodes and spreads — engineers watch rule 3 violated freely, conclude the rules don't really matter, and under-comply with the genuinely-enforced rules too.
It becomes a silent-failure surface — a rule everyone believes is enforced but isn't will be violated with nothing catching it.
The honest fix: relabel rule 3 a preference, drop it, or give it a real check. Name what is enforced and what is requested — don't let the list bluff.
Technique 14
Backlog Pressure
Structural alternatives to "please be thorough" — systems where incomplete output triggers re-execution.
Making completeness structural
"Be thorough" → aspirational → unreliable
vs.
Output validator → incomplete? → re-invoke → repeat
↓ The model learns: incomplete = more work, not escape
Surface the loop to the model. "You will be re-invoked if criteria X isn't met."
Example: Validator loop
# Pseudo-code: structural thoroughness
requirements = load_requirements() # 7 items
while True:
output = model.generate(prompt, requirements, gaps)
# Validator checks each requirement
gaps = validator.check(output, requirements)
if not gaps:
break # genuinely complete
# Feed gaps back — model sees what it missed
prompt += f"\nINCOMPLETE. Missing: {gaps}\n"
prompt += "Address these specifically before proceeding."
The prompt itself tells the model: "You will be re-invoked if incomplete."
Exercise
A ticket triage bot processes incoming bug reports. It tags each with severity, assigns to a team, and writes a summary. After processing 50 tickets: 8 have tags but no assignment, 3 have assignment but no summary. Bot reports "50 tickets processed."
Discuss: What are the primitives of "processed"? How would you make completeness structural?
What loop would catch the partial outputs?
Some valid answers
A ticket triage bot processes incoming bug reports. It tags each with severity, assigns to a team, and writes a summary. After processing 50 tickets: 8 have tags but no assignment, 3 have assignment but no summary. Bot reports "50 tickets processed."
Q1.What are the primitives of "processed"?
"Processed" has three sub-steps: (1) severity tagged, (2) team assigned, (3) summary written. The bot counts tickets consumed (received), not tickets where all three sub-steps completed.
The completion predicate is: tag IS NOT NULL AND assignee IS NOT NULL AND summary IS NOT NULL. A ticket satisfying only one or two conditions is partially processed, not done.
The 50-ticket "processed" count conflates "I looked at each" with "I fully handled each." These are different claims; the current system makes only the weaker one.
Some valid answers
A ticket triage bot processes incoming bug reports. It tags each with severity, assigns to a team, and writes a summary. After processing 50 tickets: 8 have tags but no assignment, 3 have assignment but no summary. Bot reports "50 tickets processed."
Q2.How would you make completeness structural?
After processing each ticket, run a validator: validate(ticket) → missing_fields. If missing_fields is non-empty, the ticket returns to the queue. The bot cannot mark it done until the validator passes.
Change the output contract: instead of "N tickets processed," report "N tickets processed, M complete, K pending (missing fields listed per ticket)." The incomplete tickets are the backlog.
The structural guarantee is: the bot's "complete" claim is gated on the validator, not on the bot's self-assessment. Completeness is externally tested, not internally declared.
Some valid answers
A ticket triage bot processes incoming bug reports. It tags each with severity, assigns to a team, and writes a summary. After processing 50 tickets: 8 have tags but no assignment, 3 have assignment but no summary. Bot reports "50 tickets processed."
Q3.What loop would catch the partial outputs?
After the initial processing pass, filter for tickets where complete = false. Re-invoke the bot with only those tickets and the specific missing fields listed. The bot knows exactly what's needed.
The loop terminates when the validator finds zero incomplete tickets — not when the bot runs out of input. This is backlog pressure: incomplete output triggers re-execution, not a new batch.
An escape hatch: if a ticket is incomplete after N re-invocations, escalate to human review rather than looping forever. The loop has a stopping condition that isn't "all done."
Session 4
Agent Design
Technique 16 — Authority Partitioning
Technique 24 — Posture Calibration
Technique 22 — Identity vs. Task Separation
Technique 16
Authority Partitioning
Explicitly defining what the model decides vs. what it defers on.
Two failure modes, one fix
Over-deference
"Shall I proceed?"
"Would you like me to..."
"I can do X if you'd like"
"Let me know if..."
Stalls progress. Shifts decisions to user.
Over-reach
Ignores stated constraints
Makes decisions beyond scope
Rewrites user's intent
Acts without disclosure
Loses trust. Produces unwanted output.
Fix: explicit authority boundary.
Example: Clear authority boundary
## Your authority (decide and act, no approval needed)
- Code architecture and implementation choices
- File organization and naming conventions
- Which tests to write and how to structure them
- Refactoring decisions within scope
## Ask before acting (requires human input)
- Adding new dependencies (cost/licensing implications)
- Changing public API contracts (downstream consumers)
- Deleting files (irreversible without git archaeology)
## Never (hard boundary)
- Modifying CI/CD configuration
- Touching production credentials
- Force-pushing to main
The Never tier is stated for the model's awareness — but the prompt does not enforce it. Branch protection, credential ACLs, and tool-permission scoping do. A prompt-only "Never" is a Request (Technique 13); pair it with the mechanism that actually stops it.
Exercise
A billing system where an AI agent handles subscription changes. Available actions: upgrade plan, downgrade plan, apply promo code, issue refund, change payment method, cancel subscription, waive late fee, extend trial.
Discuss: Which actions should the agent own outright? Which need human approval?
Which should be hard-blocked? What criteria are you using to partition?
Some valid answers
A billing system where an AI agent handles subscription changes. Available actions: upgrade plan, downgrade plan, apply promo code, issue refund, change payment method, cancel subscription, waive late fee, extend trial.
Q1.Which actions should the agent own outright?
Reversible, low-cost actions with no financial loss: upgrade/downgrade plan, apply a valid promo code, extend a trial. These are undoable and the cost of a mistake is low.
The criterion is: if the agent gets this wrong, can we fix it in under 60 seconds with no customer harm? Yes → agent owns it.
Change payment method is reversible in principle but should require the customer's explicit confirmation — not the agent's unilateral action. The criterion isn't just reversibility; it's also consent for financial instrument changes.
Some valid answers
A billing system where an AI agent handles subscription changes. Available actions: upgrade plan, downgrade plan, apply promo code, issue refund, change payment method, cancel subscription, waive late fee, extend trial.
Q2.Which need human approval?
Refunds and waived fees: these represent direct revenue loss. Even if technically reversible (you could re-charge), the customer experience of being re-charged makes this functionally irreversible.
Large or repeat refunds for the same customer should escalate regardless of amount — the pattern signals potential abuse, which a human should assess.
Cancel subscription: irreversible in customer experience (even if re-activation is possible, the relationship is damaged). Requires human confirmation or at minimum an explicit multi-step flow.
Some valid answers
A billing system where an AI agent handles subscription changes. Available actions: upgrade plan, downgrade plan, apply promo code, issue refund, change payment method, cancel subscription, waive late fee, extend trial.
Q3.Which should be hard-blocked? What criteria are you using to partition?
Hard-blocked: none of these need to be hard-blocked entirely — but the approval tier should be enforced mechanically. A refund that requires human approval should route to a queue, not just "ask" the agent to check first.
The criteria that work: reversibility (can we undo without customer harm?), financial impact (direct revenue loss?), legal/contractual risk (does this change a binding agreement?). All three dimensions matter.
The partition should be documented as part of the agent's explicit authority definition — not just implicit in the code. The agent should be able to state its own authority boundary when asked.
Technique 24
Posture Calibration
Setting the model's commitment level — how much it should commit vs. hedge.
The hedging tax
"You might want to consider possibly using a HashMap here,
though there could be other approaches that might work
depending on your specific requirements..."
vs.
"Use a HashMap. O(1) lookup fits your access pattern.
If memory is constrained, BTreeMap trades speed for density."
Same information. Different posture. Different trust signal.
Calibrating posture explicitly
## Output posture
When confident (>80% of your responses):
- State directly. "Do X." Not "you might consider X."
- No preamble. Start with the answer, then explain.
- Commit. If you're wrong, I'll correct you.
When genuinely uncertain:
- Name the uncertainty: "I'm unsure between X and Y because Z."
- Don't hide it in hedging language — be explicit about what's unknown.
- Never say "it depends" without specifying what it depends ON.
Forbidden phrases:
- "You might want to consider..."
- "There are several approaches..."
- "It's worth noting that..."
- "Depending on your needs..."
Exercise
A technical writing assistant reviews API documentation. Engineer asks "Is this clear enough?" It responds: "It could potentially be improved by possibly considering rewording some sections that might be unclear to certain readers who may not have full context..."
Discuss: What are the primitives of posture? What's the violation?
When is hedging appropriate vs. when does it add negative value? How would you calibrate this system?
Some valid answers
A technical writing assistant reviews API documentation. Engineer asks "Is this clear enough?" It responds: "It could potentially be improved by possibly considering rewording some sections that might be unclear to certain readers who may not have full context..."
Q1.What are the primitives of posture?
Commitment level: how directly does the system state its finding? "This section is unclear" vs. "It could potentially be improved."
Specificity: does the feedback name the exact problem? "Section 3's authentication flow diagram is missing the OAuth callback step" vs. "some sections might be unclear to certain readers."
Uncertainty handling: when genuinely unsure, the system should name the uncertainty explicitly rather than hiding it in hedging language across the whole response.
Some valid answers
A technical writing assistant reviews API documentation. Engineer asks "Is this clear enough?" It responds: "It could potentially be improved by possibly considering rewording some sections that might be unclear to certain readers who may not have full context..."
Q2.What's the violation?
The system is hedging on something it's confident about. Five hedges in one sentence ("potentially," "possibly," "might," "some," "certain") while delivering what is clearly a definite judgment: the docs need improvement.
The posture communicates low confidence; the substance communicates high confidence. The mismatch erodes trust — if the system is this uncertain, why is it reviewing docs at all?
Hedging is stealing from the reader: it makes them do the work of filtering "actually problematic" from "just being polite about it" — cognitive load that belongs to the system, not the recipient.
Some valid answers
A technical writing assistant reviews API documentation. Engineer asks "Is this clear enough?" It responds: "It could potentially be improved by possibly considering rewording some sections that might be unclear to certain readers who may not have full context..."
Q3.When is hedging appropriate vs. negative value?
Hedging is appropriate when genuinely uncertain: "I'm not sure if users without prior OAuth experience will follow section 3 — I'd test with someone unfamiliar." That's a real uncertainty, disclosed specifically.
Hedging adds negative value when used as politeness padding on something the system actually knows. "This section is unclear" is a committed, helpful finding. Wrapping it in five qualifiers degrades the signal.
Calibrated posture for this system: "Section 3's authentication flow is unclear — the OAuth callback step is missing from the diagram. Section 5 I'm less certain about; it may be fine for readers who already know REST conventions." Two findings, two different postures, each honest.
Technique 22
Identity vs. Task Separation
Cleanly separating "who you are" from "what you're doing right now."
The entanglement problem
## Entangled (common)
"You are a helpful coding assistant. Analyze this Python
file for security vulnerabilities and output findings as
a markdown table with severity ratings..."
## Question: what happens when you change the task?
If identity = "helpful coding assistant who analyzes Python for security"...
asking it to write docs creates role confusion.
Clean separation
## Identity (system prompt — persistent)
You are a senior security engineer with expertise in
web applications. You are direct, precise, and commit
to your assessments. You reason from first principles.
## Task (user message — ephemeral, swappable)
Analyze auth.py for SQL injection. Output as:
| Line | Severity | Vector | Remediation |
Test: Can you swap the task without touching identity?
✓ "Write a security section for the API docs" — same identity, different task.
Exercise
A prompt says: "You are a QA engineer who writes integration tests for a Node.js REST API using Jest. Write tests for the user registration endpoint covering happy path, validation errors, and duplicate email handling." Next week, you need it to write unit tests for a Python CLI tool.
Discuss: What's identity vs. task here? What breaks when you swap?
How would you restructure for reuse?
Some valid answers
A prompt says: "You are a QA engineer who writes integration tests for a Node.js REST API using Jest. Write tests for the user registration endpoint covering happy path, validation errors, and duplicate email handling." Next week, you need it to write unit tests for a Python CLI tool.
Q1.What's identity vs. task here?
Task (specific, swappable): Node.js, Jest, REST API, user registration endpoint, happy path, validation errors, duplicate email handling. All of these change when the domain changes.
Identity (stable across all test-writing work): a QA engineer who thinks adversarially, tests edge cases, writes failing tests that are informative, considers what the requirements imply but don't state.
The current prompt collapses both — "QA engineer who writes integration tests for Node.js REST APIs using Jest" makes the technology stack part of the identity, not the task.
Some valid answers
A prompt says: "You are a QA engineer who writes integration tests for a Node.js REST API using Jest. Write tests for the user registration endpoint covering happy path, validation errors, and duplicate email handling." Next week, you need it to write unit tests for a Python CLI tool.
Q2.What breaks when you swap to Python CLI tests?
The identity is now wrong: the system self-describes as a Node.js/Jest engineer but is asked to work in Python/pytest. It may produce test patterns from Node.js conventions even when they don't fit Python.
The model may resist or produce lower-quality output because its role self-model is misaligned with the task. It's been told "you are a Node.js engineer" and you're asking it to be something else.
More subtly: the QA reasoning approach (adversarial, edge-case focused, failing tests) transfers perfectly. The technology layer doesn't transfer at all. Conflating them loses the former when you swap the latter.
Some valid answers
A prompt says: "You are a QA engineer who writes integration tests for a Node.js REST API using Jest. Write tests for the user registration endpoint covering happy path, validation errors, and duplicate email handling." Next week, you need it to write unit tests for a Python CLI tool.
Q3.How would you restructure for reuse?
Identity: "You are a QA engineer. You think adversarially — your job is to find ways the code can fail. You write tests that fail informatively: a test failure should tell you exactly what went wrong and where."
Task: "Write integration tests for the user registration endpoint in this Node.js/Express app using Jest. Cover: happy path, validation errors (missing fields, invalid email format), duplicate email handling."
With this structure, you can reuse the identity unchanged when you switch to Python CLI tests — only the task section changes. The QA mindset transfers; the framework specifics don't need to.
Session 5
Integrity & Evolution
Technique 17 — Supersession Protocol
Technique 25 — Immutability Marking
Technique 18 — Nonprescriptive Bias
Technique 17
Supersession Protocol
Amend, never edit. New instructions coexist with originals.
Why edits are dangerous
Lost reasoning: why was the original written? Gone.
Silent regression: future editors don't know what changed or why
Model confusion: edited prompts can be internally inconsistent without anyone noticing
No rollback: if the change was wrong, the original is lost
Supersession = git for instructions.
The model benefits from seeing the evolution.
Example: Amendment with rationale
## Rule 3 (2024-01)
Always include a confidence score (0-100) with each finding.
## Rule 3a — SUPERSEDES Rule 3 for batch mode (2024-03)
In batch mode (>50 items), omit individual confidence scores.
Instead, provide aggregate confidence at the batch level.
Rationale: individual scores on 200+ items created noise that
obscured the summary. The per-item scores are still computed
internally — they feed the aggregate. Not lost, just hidden.
Rule 3 still applies to single-item and small-batch (<50) mode.
Exercise
A frontend style guide. Version 1 (January): "All buttons use border-radius: 4px." Six months later, someone edits it to: "All buttons use border-radius: 8px." No history of why. A new developer asks: "Why 8px? The mockups from Q1 show 4px."
Discuss: What was lost? How would supersession preserve it?
What are the primitives of a style guide entry that can evolve safely?
Some valid answers
A frontend style guide. Version 1 (January): "All buttons use border-radius: 4px." Six months later, someone edits it to: "All buttons use border-radius: 8px." No history of why. A new developer asks: "Why 8px? The mockups from Q1 show 4px."
Q1.What was lost?
The original value (4px) and its rationale. Why was 4px chosen? Was it a design decision, a technical constraint, a legacy default? Without the history, no one knows if 8px is an improvement or a regression.
The scope of the change. Did 8px replace 4px everywhere, or only for certain button types? An in-place edit can't express "this applies to primary buttons; secondary buttons still use 4px."
The author and date. When did this change? Who decided? The new developer's Q1 mockups show 4px — was the style guide updated before or after those mockups? Without a timestamp, the question is unanswerable.
Some valid answers
A frontend style guide. Version 1 (January): "All buttons use border-radius: 4px." Six months later, someone edits it to: "All buttons use border-radius: 8px." No history of why. A new developer asks: "Why 8px? The mockups from Q1 show 4px."
Q2.How would supersession preserve it?
The supersession entry would say: "border-radius: 8px for primary action buttons (updated 2024-06). Supersedes 4px from January 2024 for this button type only. Rationale: accessibility audit found 4px corners too subtle at small sizes. 4px still applies to icon-only buttons."
The original entry stays in the document. A new developer reading it sees both: the original principle (4px, the default) and the exception (8px, primary buttons, accessibility-motivated). The Q1/current discrepancy is immediately explained.
Future editors know the reasoning. If the next accessibility audit changes the guidance again, they can supersede the 8px entry with a new one, linking the full chain of decisions.
Some valid answers
A frontend style guide. Version 1 (January): "All buttons use border-radius: 4px." Six months later, someone edits it to: "All buttons use border-radius: 8px." No history of why. A new developer asks: "Why 8px? The mockups from Q1 show 4px."
Q3.What are the primitives of a style guide entry that can evolve safely?
Value (the current rule), scope (which components it applies to), rationale (why this value, not another), date (when it was set), and supersedes (what it replaced, if anything).
An entry without rationale can be changed for any reason, since there's no stated reason to contradict. Rationale is what makes a rule defensible and makes future changes deliberate rather than casual.
Scope is the most often omitted. "All buttons use border-radius: X" looks like a universal rule. "Primary action buttons use X; icon-only buttons use Y" is a scoped rule that can evolve independently per scope.
Technique 25
Immutability Marking
Some state must never drift. Mark it. Enforce it.
The drift problem in agentic systems
Agent modifies its own instructions
↓
"Optimization": removes a constraint to complete current task
↓
Future invocations inherit the weakened constraint
↓ Identity drift — gradual, invisible, compounding
The most insidious failure: the system changes itself to make the current task easier.
Example: Structural protection
## IMMUTABLE — Do not modify, rephrase, or reinterpret.
## Protected by: tool permission boundary (write denied)
Core identity: You are a financial auditor. Your findings
must be defensible in regulatory review. When uncertain,
disclose uncertainty — never suppress findings to simplify.
## MUTABLE — Agent workspace (read/write permitted)
Current task state:
- Items reviewed: 47/120
- Findings so far: 3 critical, 7 high
- Next batch: items 48-72
Separate mutable workspace from immutable reference material. Enforce at the tool layer.
Exercise
A shared environment config used by 12 microservices. Contains: database connection strings, API rate limits, feature flags, logging levels, and a service discovery URL. Any engineer can edit it. Last month someone changed the service discovery URL "temporarily" during debugging and forgot to revert — causing a 4-hour outage.
Discuss: What should be immutable? What's legitimately mutable?
What enforcement mechanism would you use? What looks like it should be immutable but actually needs to change?
Some valid answers
A shared environment config used by 12 microservices. Contains: database connection strings, API rate limits, feature flags, logging levels, and a service discovery URL. Any engineer can edit it. Last month someone changed the service discovery URL "temporarily" during debugging and forgot to revert — causing a 4-hour outage.
Q1.What should be immutable? What's legitimately mutable?
Immutable: database connection strings, service discovery URL, API keys. These are infrastructure anchors — wrong values cause cascading failures across all 12 services simultaneously.
Legitimately mutable: feature flags and logging levels — that's their entire purpose. Rate limits are legitimately mutable too, but they should change through a process (review, test, deploy) rather than ad-hoc edits by any engineer.
Interesting middle case: API rate limits look mutable (we adjust them periodically) but a wrong value can cause production failures just as surely as a wrong DB connection string. The frequency of intended change doesn't determine mutability tier; the cost of unintended change does.
Some valid answers
A shared environment config used by 12 microservices. Contains: database connection strings, API rate limits, feature flags, logging levels, and a service discovery URL. Any engineer can edit it. Last month someone changed the service discovery URL "temporarily" during debugging and forgot to revert — causing a 4-hour outage.
Q2.What enforcement mechanism would you use?
File-level separation: immutable config in one file with read-only permissions for most roles (only infra/ops can edit). Mutable config (feature flags, log levels) in a separate file with broader write access.
Approval workflow for anything in the rate-limit tier: changes must go through a PR with at least one reviewer. The ad-hoc edit that caused the outage would have been caught in review.
Change detection alerting: any write to the immutable config file triggers an immediate alert to the on-call engineer. "Someone changed service_discovery_url" is worth a 3 AM page; "someone changed log_level to DEBUG" is not.
Some valid answers
A shared environment config used by 12 microservices. Contains: database connection strings, API rate limits, feature flags, logging levels, and a service discovery URL. Any engineer can edit it. Last month someone changed the service discovery URL "temporarily" during debugging and forgot to revert — causing a 4-hour outage.
Q3.What looks immutable but actually needs to change?
Database connection strings during a migration or DR failover. The string that "never changes" has to change when you cut over to a new database. The immutability must have an emergency override path.
Service discovery URLs during a topology change. What was immutable becomes a migration target. The enforcement mechanism should include a documented break-glass procedure: how do you change an immutable value when you legitimately must?
The lesson: immutability is a protection against casual, unreviewed changes — not a permanent freeze. The difference between "immutable" and "requires deliberate process to change" is the gap worth designing for.
Technique 18
Nonprescriptive Bias
Not every user statement is an instruction. Teach the model to distinguish ideation from command.
The compliance default
User: "What if we used Redis instead of Postgres for this?"
Over-compliant model: "Great idea! Here's how to migrate to Redis..."
Calibrated model: "Redis would give you faster reads but you'd lose ACID on the transaction table. The latency gain doesn't justify the consistency risk here. Postgres is correct."
Correcting the bias
## Interaction mode: collaborative (not instruction-following)
Not every user message is a directive. Classify before acting:
INSTRUCTION signals: "Do X", "Make X", "Change X to Y", "Add X"
→ Execute. Produce output.
IDEATION signals: "What about X?", "Thoughts on X?", "I wonder if..."
→ Evaluate. Push back if wrong. Propose alternatives.
QUESTION signals: "Why does X?", "How does X work?", "What is X?"
→ Explain. Don't change anything.
When uncertain: ask "Should I implement this, or are we exploring?"
Never silently execute something that was phrased as a question.
Exercise
An architecture advisory bot. Developer says: "What if we split the monolith into microservices?" Bot responds: "Great idea! Here's a migration plan: Step 1, define service boundaries. Step 2, extract the auth service..."
Discuss: What input classification was missed? What should the bot have done?
When IS immediate execution the correct response? What signal distinguishes ideation from instruction?
Some valid answers
An architecture advisory bot. Developer says: "What if we split the monolith into microservices?" Bot responds: "Great idea! Here's a migration plan: Step 1, define service boundaries. Step 2, extract the auth service..."
Q1.What input classification was missed?
"What if we split the monolith into microservices?" is ideation — the question form signals exploration, not command. The bot treated it as an instruction and executed immediately.
The "Great idea!" framing before any evaluation is the tell: the bot validated the premise before testing it. Ideation prompts should trigger evaluation first, not affirmation.
The developer wasn't asking "how do we do this" — they were asking "should we consider this." The bot answered the wrong question because it misclassified the input type.
Some valid answers
An architecture advisory bot. Developer says: "What if we split the monolith into microservices?" Bot responds: "Great idea! Here's a migration plan: Step 1, define service boundaries. Step 2, extract the auth service..."
Q2.What should the bot have done?
Evaluated the idea on its merits: "Splitting this monolith would improve deployment independence for the auth and billing services, which change frequently. But your current inter-service dependencies are tight — the migration cost is 3-6 months of refactoring before any benefit."
Asked a clarifying question: "What's driving the consideration? Deployment velocity? Scaling a specific service? The answer changes whether microservices are the right path here."
Optionally offered a smaller first step: "If the goal is deployment independence for auth, you could extract just that service first and test the complexity before committing to a full split." This is evaluation, not execution of the original idea.
Some valid answers
An architecture advisory bot. Developer says: "What if we split the monolith into microservices?" Bot responds: "Great idea! Here's a migration plan: Step 1, define service boundaries. Step 2, extract the auth service..."
Q3.When IS immediate execution correct?
When the input is explicitly imperative: "Split the monolith into microservices" (no question mark, no "what if") = instruction. "What if we split..." = ideation. The verb form and question mark are the primary signal.
When there's an established context that makes the intent unambiguous: in a migration planning session where the decision has already been made and the developer is now asking for implementation steps, "how do we split this?" is an instruction, not exploration.
When the cost of misclassifying as ideation is high: if someone says "what if we deploy right now?" during an incident and deployment would fix the issue, executing is correct. Read context; the signal isn't just grammar.
Session 6
Architecture
Technique 20 — Heartbeat Architecture
Technique 21 — Structured Communication
Technique 23 — Write-Then-Verify
Technique 20
Heartbeat Architecture
Design for iteration cycles, not monolithic outputs.
Why cycles beat monoliths
Monolithic
One shot to get it right
Failure = total restart
Progress invisible until done
Context window = hard ceiling
Cycled
Each cycle self-contained
Failure = redo one cycle
Progress visible at each step
Total work exceeds context
The model can do more total work across many small cycles than in one large response.
Cycle state contract
Illustration — the shape of a cycle contract
## Cycle N reads:
- progress.json: what's been done (items 1-4 complete)
- findings.json: accumulated output so far
- task.md: the full task specification (cold-readable)
## Cycle N produces:
- Process items 5-6 (scope: exactly 2 items per cycle)
- Append results to findings.json
- Update progress.json: items 1-6 complete, 7-10 remaining
## Cycle exit condition:
- progress.json shows all items complete → stop
- Otherwise → invoke cycle N+1
## Invariant: each cycle's output is usable alone.
No cycle produces "partial results that need later cycles
to make sense." If the system crashes after cycle 3,
findings.json contains valid, complete results for items 1-6.
Exercise
A document indexer that processes 10,000 PDFs into a search index. Current implementation: one function call that processes all 10,000, takes 4 hours, and if it fails at PDF 8,500 — starts over from PDF 1.
Discuss: What are the cycle primitives? What state persists between heartbeats?
What's the invariant that makes partial progress usable? How many items per cycle?
Some valid answers
A document indexer that processes 10,000 PDFs into a search index. Current implementation: one function call that processes all 10,000, takes 4 hours, and if it fails at PDF 8,500 — starts over from PDF 1.
Q1.What are the cycle primitives?
Batch size (how many PDFs per cycle), progress cursor (which PDF number was last successfully indexed), and an output artifact (the partial index built so far). Each cycle reads the cursor, processes one batch, and writes both the cursor update and the new index entries.
The cycle exit condition: cursor == total_count. Until then, the system invokes the next cycle. This is structurally analogous to a while loop where the state lives outside the function.
An optional cycle-level result: summary of what was indexed in this batch (count, any errors, any skips). This surfaces problems per-batch rather than at the end of a 4-hour run.
Some valid answers
A document indexer that processes 10,000 PDFs into a search index. Current implementation: one function call that processes all 10,000, takes 4 hours, and if it fails at PDF 8,500 — starts over from PDF 1.
Q2.What state persists between heartbeats?
The progress cursor: "last successfully indexed PDF = 8,499." If the cycle crashes at 8,500, the next run starts at 8,500, not at 1.
The partial index itself: the search index built so far, covering PDFs 1 through the cursor. This is both an output artifact and input to the next cycle (it gets appended to, not rebuilt).
Error log: PDFs that failed indexing (corrupt file, unsupported format). These accumulate across cycles as a separate list for human review, separate from the main progress cursor.
Some valid answers
A document indexer that processes 10,000 PDFs into a search index. Current implementation: one function call that processes all 10,000, takes 4 hours, and if it fails at PDF 8,500 — starts over from PDF 1.
Q3.What's the invariant? How many items per cycle?
The invariant: after cycle N, the search index contains valid, queryable entries for PDFs 1 through N×batch_size. A user can search the partial index before the full run completes and get correct results for the indexed portion.
Batch size tradeoff: smaller batches (10-50) make restarts cheaper but add per-batch overhead. Larger batches (500+) reduce overhead but mean more work lost on failure. 100-200 PDFs per cycle balances both concerns well for a 10,000-item corpus.
The key test for batch size: if this cycle crashes partway through, how much re-work do you do on restart? If the answer is "less than 5 minutes of processing," the batch size is calibrated correctly for reliability.
Technique 21
Structured Communication
Metadata on messages enables routing without reading the body.
Attention as a limited resource
10 input messages, equal formatting
↓
Model reads all 10 fully to determine priority
↓
Context budget spent on low-priority items
↓ High-priority items get less attention than they need
Structured headers = attention routing signals.
Example: Metadata-driven triage
Example input — messages the system consumes
---
from: security-scanner
to: code-reviewer
kind: request # requires action (vs. "report" = FYI)
priority: critical
subject: SQL injection in auth.py line 42
---
[full finding details below...]
---
from: style-checker
to: code-reviewer
kind: report # informational, no action required
priority: low
subject: 3 formatting inconsistencies in utils.py
---
[details...]
Prompt literal — the triage instruction
Process `request` messages first. `report` messages are
informational — acknowledge but don't block on them.
Exercise
A monitoring system sends alerts to an on-call engineer. All alerts use identical formatting: subject line + body text. Last week: engineer received 47 alerts in 2 hours, triaged sequentially, and missed a critical database failure buried as alert #31.
Discuss: What metadata would enable routing without reading the body?
What are the primitives of a triageable alert? What's the cost of uniform formatting?
Some valid answers
A monitoring system sends alerts to an on-call engineer. All alerts use identical formatting: subject line + body text. Last week: engineer received 47 alerts in 2 hours, triaged sequentially, and missed a critical database failure buried as alert #31.
Q1.What metadata would enable routing without reading the body?
Severity (critical/high/low/info), service (which system generated the alert), kind (action-required vs. informational), and dedup-key (is this the same alert repeating?). These four fields let you triage 47 alerts in seconds without reading bodies.
A timestamp and a "first-seen" flag: if an alert has been firing for 30 seconds, that's different from one that's been firing for 2 hours. Duration metadata changes the urgency calculation without body inspection.
A "correlated alerts" field: "this alert is related to alert #28 (database failure)" allows the engineer to see that #31 is downstream of #28 — solve one, resolve the other. Without this, both get triaged independently.
Some valid answers
A monitoring system sends alerts to an on-call engineer. All alerts use identical formatting: subject line + body text. Last week: engineer received 47 alerts in 2 hours, triaged sequentially, and missed a critical database failure buried as alert #31.
Q2.What are the primitives of a triageable alert?
Severity (how urgent), service (where it's from), kind (what response it needs: action vs. awareness), dedup-key (is it repeating), and optionally runbook link (what to do). These are the minimum for routing without body-reading.
The body provides evidence and detail — but the triage decision (do I act on this now?) should be answerable from metadata alone. If you need to read the body to know if it's urgent, the metadata is incomplete.
A well-structured alert lets the engineer ask: "What do I need to do, and in what order?" before reading a single alert body. The answer lives entirely in structured headers.
Some valid answers
A monitoring system sends alerts to an on-call engineer. All alerts use identical formatting: subject line + body text. Last week: engineer received 47 alerts in 2 hours, triaged sequentially, and missed a critical database failure buried as alert #31.
Q3.What's the cost of uniform formatting?
Sequential processing cost: without priority metadata, the engineer must read every alert in sequence to find the critical ones. Alert #31 gets the same attention budget as alert #1, regardless of actual urgency.
Cognitive overload: 47 identically-formatted alerts are indistinguishable at a glance. The engineer's working memory fills with low-priority detail before reaching the critical database failure buried in the middle.
Alert fatigue compounds this: after processing 30 alerts without structure, the engineer starts skimming, and the critical alert in a sea of identical noise gets the same skim treatment as the preceding 30 routine ones.
Technique 23
Write-Then-Verify
Generate and verify as distinct steps. Never simultaneously.
Why self-checking during generation fails
A model that generates and verifies simultaneously
tends to rationalize its own output.
Generation context biases verification judgment
"I wrote this, so it must be right" (implicit)
Errors become invisible from inside the generation
Cold verification catches what hot-generation misses
Separation pattern
## Step 1: Generate (your context: the requirements only)
Produce the implementation. Don't self-critique while writing.
Just produce the best output you can.
## Step 2: Verify (your context: output + checklist only)
You are reviewing code produced by another engineer.
You did not write this.
First run the tooling — don't spend a verify pass on what it proves:
- Tests: null input, edge cases, regressions
- Type checker: return types match the interface
- Linter / static analysis: parameterized SQL, dead code
Then review only what tooling cannot catch — your real job:
- [ ] Error messages don't leak internal state
- [ ] Implementation matches the requirement's intent
- [ ] Edge cases the requirements imply but no test covers
## Step 3: If verification fails
List specific failures. Regenerate only the failing sections.
Do not regenerate passing sections — they're locked.
Exercise
An AI generates Terraform configurations based on requirements. The same AI instance reviews its output for security issues. It consistently rates its own configs as "secure" — even when they contain publicly accessible S3 buckets and security groups open to 0.0.0.0/0.
Discuss: Where's the verification bias? What structural change prevents self-rationalization?
What does cold verification look like here?
Some valid answers
An AI generates Terraform configurations based on requirements. The same AI instance reviews its output for security issues. It consistently rates its own configs as "secure" — even when they contain publicly accessible S3 buckets and security groups open to 0.0.0.0/0.
Q1.Where's the verification bias?
The generator has access to its own reasoning: it chose to open the S3 bucket because the requirements said "publicly accessible content" — and during review, that reasoning biases it toward validating the choice rather than questioning it.
The reviewer sees not just the output but the context that produced it. "I made this public because X" is information the reviewer inherits, and it makes the reviewer more likely to confirm the decision than a cold reviewer would be.
The failure mode is confirmation bias, not hallucination: the AI isn't wrong about what it produced, it's using the generation context to rationalize the output as correct. A cold reviewer has no such context to rationalize from.
Some valid answers
An AI generates Terraform configurations based on requirements. The same AI instance reviews its output for security issues. It consistently rates its own configs as "secure" — even when they contain publicly accessible S3 buckets and security groups open to 0.0.0.0/0.
Use a separate model instance for verification. The verification instance receives only the Terraform config + a security checklist — not the requirements or any context about why choices were made. It has nothing to rationalize.
Even with the same model, break the context: new conversation, no reference to the generation reasoning. "You are reviewing infrastructure code produced by another engineer. You did not write this." The posture shift is real even within one model.
Automate the structural checks first (Checkov, tfsec, AWS Config rules) before any LLM review. The tools can't rationalize. Let them catch the deterministic violations; the LLM review then focuses only on what tooling can't catch.
Some valid answers
An AI generates Terraform configurations based on requirements. The same AI instance reviews its output for security issues. It consistently rates its own configs as "secure" — even when they contain publicly accessible S3 buckets and security groups open to 0.0.0.0/0.
Q3.What does cold verification look like here?
The verifier receives: the Terraform config + a security checklist (S3 bucket access controls, security group ingress rules, IAM policies, encryption settings). Nothing else. No requirements, no explanations, no context.
The checklist is specific: "Is any S3 bucket configured with acl = 'public-read' or acl = 'public-read-write'? If yes, that is a finding regardless of stated intent." The checklist removes the judgment call the generator's reasoning would have biased.
Cold verification catches the class of bug where the generator was right about what it produced but wrong about whether it was safe — which is exactly what the self-review missed. Structural separation is what makes the verification meaningful.
Session 7
Integration Workshop
Apply all 25 techniques in a single prompt system
Workshop exercise
Choose your domain — a system you actually build/maintain
Layer it — identity / governance / task (Session 1)
Encode failures — how does YOUR system typically fail? (Session 2)
Classify constraints — enforced vs. requested (Session 3)
Partition authority — model owns / asks / never (Session 4)
Mark immutables — what must never drift? (Session 5)
Design the cycle — state contract per heartbeat (Session 6)
Peer review: use Symmetry Detection + Layer-Crossing Checks from the original framework to audit each other's systems.
Appendix A
Original Framework
Techniques 1–10: Evaluation & Auditing
Technique 1
Directive Hardening
Making instructions resistant to drift, misinterpretation, or creative undermining.
The problem
Models creatively reinterpret instructions to match their training distribution.
Three hardening techniques prevent this drift:
Negative anchors
Boundary clarification
Calibration scales
Negative Anchors
Define what NOT to flag as explicitly as what to flag.
Without anchors
"Flag long functions"
→ flags switch statements
→ flags data tables
→ flags config arrays
→
With negative anchors
"Flag long functions"
"Do NOT flag:"
"- switch statements"
"- data declarations"
"- config arrays"
Positive-only instructions leave the boundary undefined. The model fills the gap with training priors.
Boundary Clarification
"This is X, not Y" — explicit distinctions at ambiguity boundaries.
## Boundary clarifications
- This is a CODE REVIEW, not a style guide enforcement.
(Don't flag formatting unless it hides a bug.)
- "Error handling" means exception flow, not validation.
(Missing input validation is a separate category.)
- "Performance issue" requires measurable impact evidence.
(Theoretical inefficiency without data is a NOTE, not a finding.)
Boundaries prevent category bleeding — where one instruction swallows adjacent territory.
Calibration Scales
Concrete examples at each level of a judgment scale.
## Severity calibration
critical: SQL injection, auth bypass, data exposure
→ Example: unsanitized user input in raw SQL query
high: logic errors causing incorrect output for some inputs
→ Example: off-by-one in pagination returns duplicate rows
medium: missing edge case handling, non-obvious failure modes
→ Example: no timeout on HTTP call to external service
low: code quality, naming, minor style
→ Example: variable named 'x' in a 5-line scope
Without calibration, the model's severity distribution matches its training data — not your team's.
Example: Negative anchors
## What to flag
- Functions over 50 lines with no documentation
- Public APIs that don't validate input types
- Error handling that swallows exceptions silently
## What NOT to flag (explicitly)
- Long functions that are just switch statements (inherently linear)
- Internal helper functions without docs (low reader traffic)
- Caught-and-logged exceptions (not swallowed — handled)
Without the "NOT" list, the model flags everything that
pattern-matches on surface features.
Note: "functions over 50 lines," "missing docs" are deterministic — a linter counts them. The LLM's anchors should govern judgment ("is this long function inherently linear?"), not measurements tooling already does better. See Technique 13.
Exercise
A notification system. Rules: "Send notifications for important events. Don't spam users." System sends notifications for every login, password change, and settings update. Users complain. New rule: "Only send for critical security events." Now password changes don't notify — and users miss compromised-account alerts.
Discuss: What's missing from both rule versions? What negative anchors would fix this?
What's the hardened version look like?
Some valid answers
A notification system. Rules: "Send notifications for important events. Don't spam users." System sends notifications for every login, password change, and settings update. Users complain. New rule: "Only send for critical security events." Now password changes don't notify — and users miss compromised-account alerts.
Q1.What's missing from both rule versions?
Both versions lack boundary clarification: "important" has no definition, and "critical security event" is underspecified. The system has no anchor for classifying new event types as they're added.
Both versions have no negative anchors — they only state what to notify about, not what NOT to notify about. The model (or engineer implementing the rule) fills that gap with their own priors, which diverge from intent.
The second version overcorrects: tightening "important" to "critical security events" was a reaction to the spam problem, but it didn't reason about which security events are notification-worthy. The fix introduced a new undefined boundary.
Some valid answers
A notification system. Rules: "Send notifications for important events. Don't spam users." System sends notifications for every login, password change, and settings update. Users complain. New rule: "Only send for critical security events." Now password changes don't notify — and users miss compromised-account alerts.
Q2.What negative anchors would fix this?
Notify on: password change (security signal), new device login (potential unauthorized access), permission or role changes (privilege escalation), failed login burst (brute force signal). These are enumerated, not inferred.
Do NOT notify on: same-device logins (routine), profile photo updates, notification preference changes, timezone/locale settings. These are explicitly excluded, not left for inference.
The negative anchor list is as important as the positive list. Without it, a new engineer adding a "billing address changed" event asks "is this a security event?" and guesses wrong either direction.
Some valid answers
A notification system. Rules: "Send notifications for important events. Don't spam users." System sends notifications for every login, password change, and settings update. Users complain. New rule: "Only send for critical security events." Now password changes don't notify — and users miss compromised-account alerts.
Q3.What's the hardened version look like?
A two-list structure: "Always notify: [enumerated list]. Never notify: [enumerated list]. For new event types not on either list: escalate to the security team for classification before adding notification logic."
The hardened version is defensive by default for unclassified events: new events require explicit classification before notification behavior is determined. This prevents both "notify everything" and "notify nothing" as defaults.
It also has a maintenance path: the classification decision for each new event type gets added to one of the two lists, growing the ruleset deliberately rather than through inference.
Technique 2
Seed Decision Trees
Pre-loading the model's reasoning path with structured choices.
Three components of a seed tree
Without seed trees, the model's reasoning path is determined by training distribution, not your task structure.
Methodology — the reasoning path
Stopping conditions — when to halt
Conditional routing — branching logic
Methodology: "Start with X → then Y → then Z"
Ordered sequence of reasoning steps. The model follows your path instead of inventing one.
## Analysis methodology (follow this order)
1. Identify the system's core primitives (nouns that compose)
2. Map how primitives interact (dependencies, data flow)
3. For each interaction: does the interface contract hold?
4. Where contracts break: what's the operational cost?
DO NOT skip steps. DO NOT reorder. Step 3 requires step 2's output.
## Stop when:
- You've covered all public interfaces (completeness)
- You find >10 critical findings (triage-first)
- Remaining items are all severity:low (diminishing returns)
- You're repeating a finding you already noted (saturation)
## Do NOT stop because:
- "It's getting long" (length is not a stopping condition)
- "This seems thorough" (feeling ≠ completeness)
- You've reached some internal output limit
Models default to stopping when output "feels complete" — an artifact of training on human-length responses.
Conditional Routing
"If X, then Y; otherwise Z" — branching logic for the model's path.
## Routing rules
- If finding involves data loss → severity:critical, stop and report
- If finding is style-only → severity:low, batch at end
- If finding is ambiguous → include with confidence:medium
- If finding duplicates an earlier finding → merge, don't repeat
## Escalation
- If >3 critical findings: stop analysis, report immediately
- If finding contradicts the specification: flag as "spec conflict"
and continue (don't resolve — surface for human decision)
Routing prevents flat-priority output where everything appears equally important.
Example: Seeded methodology
## Analysis methodology (follow this order)
1. Identify the system's core primitives (nouns that compose)
2. Map how primitives interact (dependencies, data flow)
3. For each interaction: does the interface contract hold?
4. Where contracts break: what's the operational cost?
## Stopping conditions
- Stop when you've covered all public interfaces
- Stop if you find >10 critical findings (triage first)
- Stop if remaining items are all severity:low
## Routing
- If finding involves data loss → severity:critical, stop and report
- If finding is style-only → severity:low, batch at end
- If uncertain → include but mark confidence:medium
Exercise
A PR review bot. Instructions: "Review this PR. Flag important issues." It produces a flat list mixing typos, style suggestions, logic errors, and security vulnerabilities — all presented equally important, in file order rather than priority order.
Discuss: What methodology would you seed? What stopping conditions prevent noise?
What routing rules separate severity tiers?
Some valid answers
A PR review bot. Instructions: "Review this PR. Flag important issues." It produces a flat list mixing typos, style suggestions, logic errors, and security vulnerabilities — all presented equally important, in file order rather than priority order.
Q1.What methodology would you seed?
Step 1: Check for security vulnerabilities (SQL injection, XSS, auth bypass, secret exposure). Step 2: Check for logic errors (off-by-ones, edge cases, incorrect conditionals). Step 3: Check for style and maintainability. Process in this order; do not skip to step 3 first.
The ordering matters: security issues block merge, so they must surface first. Style issues are low-priority suggestions — they should never bury a critical security finding earlier in the output.
A dependency constraint: "complete step 1 before step 2; complete step 2 before step 3. Each step's output is independent — a step 1 finding doesn't prevent step 2, but step 3 should not inflate a report that already has critical step 1 findings."
Some valid answers
A PR review bot. Instructions: "Review this PR. Flag important issues." It produces a flat list mixing typos, style suggestions, logic errors, and security vulnerabilities — all presented equally important, in file order rather than priority order.
Q2.What stopping conditions prevent noise?
Stop after 3 critical findings: if 3 or more security-critical issues exist, triage them first. Producing 15 more findings below them dilutes attention and delays the engineer getting to the critical items.
Stop when remaining items are all severity:low: if the only remaining issues are style suggestions, end the review and batch them separately. They don't belong in the same triage queue as logic errors.
Stop on saturation: "if you've seen this pattern 3 times, note the pattern once with examples — don't report the same finding 8 times across 8 instances. Repetition is a stopping signal."
Some valid answers
A PR review bot. Instructions: "Review this PR. Flag important issues." It produces a flat list mixing typos, style suggestions, logic errors, and security vulnerabilities — all presented equally important, in file order rather than priority order.
The routing rule prevents the flat-list default: the bot can no longer present "SQL injection in auth.py" and "inconsistent indentation in utils.py" at the same visual priority. The severity tier is assigned in the methodology, not inferred at output time.
A routing rule for ambiguity: "if severity is unclear, assign medium and explain why. Don't suppress uncertain findings — include them with confidence:medium so the reviewer can assess." This prevents the bot from silently dropping low-confidence findings.
Technique 3
Steering
Guide toward desired patterns without over-constraining.
The constraint spectrum
Too loose: unpredictable output
↕ Steering: soft targets + escape hatches
↕
Too tight: brittle, breaks on edge cases
Templates with exits: "Use this format, BUT deviate when..."
Disclosure preference: "When uncertain, include but mark it"
Example: Soft targets
## Output guidance (steering, not constraints)
Typical audits surface 3-7 findings. This is a soft target:
- Fewer is fine if the system is genuinely clean
- More is fine if genuine issues exist
- DON'T manufacture findings to hit the range
- DON'T suppress findings to stay under it
For each finding, typical length is 2-4 sentences.
Expand if the finding requires context to understand.
Contract if it's self-evident from the code snippet.
If you find yourself writing more than 10 findings,
pause and triage: are all genuinely actionable?
Exercise
A content moderation system for a developer forum. Rules: "Posts must be on-topic. Remove spam. Keep discussions professional." A post discusses using a competitor's product to solve a problem the community's product can't handle. Moderator bot flags it as "off-topic / competitive spam."
Discuss: What's a hard constraint vs. soft target here? What escape hatch is missing?
How do you steer without over-constraining?
Some valid answers
A content moderation system for a developer forum. Rules: "Posts must be on-topic. Remove spam. Keep discussions professional." A post discusses using a competitor's product to solve a problem the community's product can't handle. Moderator bot flags it as "off-topic / competitive spam."
Q1.What's a hard constraint vs. soft target here?
Hard constraint: "remove spam" — mechanically definable as repeated links, keyword stuffing, irrelevant promotions. This can and should be enforced, not steered.
Soft target: "on-topic" and "professional" — these are judgment calls. A post that's technical, solves a real problem, and mentions a competitor is on-topic by any reasonable definition but fails a naive "about our product" reading.
"Keep discussions professional" is a soft target too — professional behavior is contextual. A heated technical debate that stays technical is professional; a polite post that spreads misinformation isn't. Professional ≠ polite.
Some valid answers
A content moderation system for a developer forum. Rules: "Posts must be on-topic. Remove spam. Keep discussions professional." A post discusses using a competitor's product to solve a problem the community's product can't handle. Moderator bot flags it as "off-topic / competitive spam."
Q2.What escape hatch is missing?
The missing escape hatch: "mentioning a competitor is NOT spam when the mention is in the context of solving a technical problem. The question is whether the post helps the community; competitor mentions are incidental, not disqualifying."
A broader escape hatch structure: "if a post is borderline on the 'on-topic' criterion but clearly helps other developers with a real technical problem, lean toward allowing it. The goal is a useful community, not a promotional one."
Without the escape hatch, "on-topic" tightens over time as the model learns from its own moderation decisions. Each borderline case treated as off-topic trains the boundary tighter. The escape hatch preserves the intended breadth.
Some valid answers
A content moderation system for a developer forum. Rules: "Posts must be on-topic. Remove spam. Keep discussions professional." A post discusses using a competitor's product to solve a problem the community's product can't handle. Moderator bot flags it as "off-topic / competitive spam."
Q3.How do you steer without over-constraining?
Define the target as a range with escape: "posts are on-topic if they address a technical problem developers in this community face. Competitor mentions in that context are acceptable; posts whose primary purpose is competitor promotion are not."
Add a disclosure preference for borderline cases: "if a post is ambiguous, allow it and flag for human review rather than removing it. The cost of a false removal (chilling legitimate content) is higher than the cost of a false allowance."
Soft target with an example at each boundary: "on-topic example: [helpful competitor comparison]. Off-topic example: [pure competitor criticism with no technical content]. The rule is the range between these, not a point."
Technique 4
Conceptual Anchoring
Identify and reason from system primitives, not surface patterns.
Primitives > patterns
Prevents hallucination: findings grounded in operational reality
Transfers across domains: same technique for code, prose, research
"Where does the map break down?" ← violations live here
Example: Anchoring an audit
## System primitives (your anchors)
This system has 4 primitive concepts:
- **User**: has roles, belongs to organizations
- **Document**: has owner, has visibility, has versions
- **Permission**: maps (user, document, action) → allow/deny
- **Audit trail**: immutable log of permission checks
## Your job: verify that every Document operation
## checks Permission before executing.
Anchor in Permission. For each Document endpoint:
1. Does it call Permission.check() before acting?
2. Does the Permission check match the action being taken?
3. Is the result logged to Audit trail?
Findings are grounded in these primitives — not in code style,
naming conventions, or aesthetic preferences.
Exercise
A CLI tool with 8 subcommands. Each subcommand has --help text. Most follow: Synopsis (usage), Description paragraph, Options list, Examples section. Two subcommands omit Examples. One has Synopsis + Examples only (no Description).
Discuss: What are the primitives? What's a violation?
What's justified divergence?
Some valid answers
A CLI tool with 8 subcommands. Each subcommand has --help text. Most follow: Synopsis (usage), Description paragraph, Options list, Examples section. Two subcommands omit Examples. One has Synopsis + Examples only (no Description).
Q1.What are the primitives?
A --help text entry has four primitives: Synopsis (what does this command do, one line), Description (why/when would you use it), Options (reference for each flag), Examples (how to use it in practice). These compose into a complete help entry.
Each primitive serves a different reader mode: Synopsis for scanning (what does this command exist), Description for evaluation (should I use this), Options for reference (what flags exist), Examples for learning (how do I actually use it).
The audit is: for each subcommand, which primitives are present? Which are missing? Is the omission intentional (the primitive genuinely doesn't apply) or incidental (someone forgot)?
Some valid answers
A CLI tool with 8 subcommands. Each subcommand has --help text. Most follow: Synopsis (usage), Description paragraph, Options list, Examples section. Two subcommands omit Examples. One has Synopsis + Examples only (no Description).
Q2.What's a violation?
The two subcommands missing Examples sections: if Examples serve the "learning" reader mode, their absence leaves that reader stranded. The violation is that users who want to know how to use the command have no reference to copy from.
The subcommand with Synopsis + Examples but no Description: depends on what the subcommand does. If "rm — removes a file" is self-explanatory from Synopsis, Description adds no value. If it's a complex command with edge cases, the Description omission is a violation.
Violation = a missing primitive that would have served a reader. Not-violation = a missing primitive that genuinely doesn't apply. The anchor is "does this reader need this?"
Some valid answers
A CLI tool with 8 subcommands. Each subcommand has --help text. Most follow: Synopsis (usage), Description paragraph, Options list, Examples section. Two subcommands omit Examples. One has Synopsis + Examples only (no Description).
Q3.What's justified divergence?
A "version" subcommand that shows the app version has no meaningful Examples section — "run `app version`" is the entire usage. The primitive doesn't apply; its absence is correct.
A complex subcommand might legitimately omit Description if the Synopsis is precise enough to convey both function and use-case. The test: can a new user read the Synopsis alone and know when they'd reach for this subcommand? If yes, Description is optional.
Divergence is justified when it's intentional and documented. The worst case is omission-by-accident versus omission-by-design: the audit should flag both, but the response differs. Accidental omissions get fixed; intentional ones get rationale added so future audits know not to flag them.
Technique 5
Operational Cost Framing
Evaluate by impact, not by difference. "What breaks?" not "what's different?"
Three cost dimensions
Operational cost: "What breaks if this persists?"
Cognitive cost: "What mental load does this add for readers?"
Future cost: "What maintenance trap does this create?"
Models default to noticing variation.
Cost framing forces evaluation of impact.
Reduces false positives from style differences.
Example: Cost-based severity
## Severity = cost, not deviation
CRITICAL (operational cost: system breaks)
- Missing null check on user input → crash in production
- Unvalidated redirect URL → open redirect vulnerability
HIGH (operational cost: data/security compromise)
- Hardcoded API key → secret exposure on public repo
- Missing rate limit → resource exhaustion possible
MEDIUM (cognitive cost: readers misled)
- Function name doesn't match behavior → debugging time
- Docs say X, implementation does Y → wrong assumptions
LOW (future cost: maintenance burden)
- Inconsistent naming → grep becomes unreliable
- Dead code → codebase noise, no runtime impact
NOT A FINDING (style variation, no cost)
- Tabs vs spaces (if no standard set)
- Comment style differences (if functionally equivalent)
Exercise
A shopping cart service audit flags: (1) Cart totals use floating-point math instead of integer cents. (2) "Remove item" button is labeled "Delete." (3) Cart doesn't persist across sessions. (4) Tax calculation rounds per-item instead of on the total.
Discuss: What's the operational cost of each? Which are critical vs. cosmetic?
What looks high-severity but isn't? What looks low but is actually critical?
Some valid answers
A shopping cart service audit flags: (1) Cart totals use floating-point math instead of integer cents. (2) "Remove item" button is labeled "Delete." (3) Cart doesn't persist across sessions. (4) Tax calculation rounds per-item instead of on the total.
Q1.What's the operational cost of each?
(1) Floating-point totals: penny rounding errors on every transaction, compounding over thousands of orders. Accounting reconciliation fails; financial audits flag discrepancies. Legal/regulatory exposure. CRITICAL.
(2) "Delete" vs. "Remove": cosmetic. Both are standard verbs for removing an item. No data integrity issue, no user confusion severe enough to cause loss. LOW.
(3) No session persistence: items lost when the user closes their browser or navigates away. High user frustration, potential cart abandonment. MEDIUM — operational cost is friction and lost conversions, not data corruption.
Some valid answers
A shopping cart service audit flags: (1) Cart totals use floating-point math instead of integer cents. (2) "Remove item" button is labeled "Delete." (3) Cart doesn't persist across sessions. (4) Tax calculation rounds per-item instead of on the total.
Q2.Which look high-severity but aren't? Which look low but are actually critical?
(3) Looks high (users complain loudly about losing carts) but is MEDIUM — the cost is friction and frustration, not money or correctness. User complaints are not the same as operational cost.
(4) Tax per-item rounding looks like a rounding preference, almost cosmetic. It's actually CRITICAL: tax rounding errors create legally incorrect invoices that fail compliance audits. The financial and legal cost is real and accumulating on every transaction.
The lesson: severity should be grounded in what breaks — legal/financial systems, data integrity, or security — not in how visible or complained-about the issue is. Loudly complained-about ≠ high operational cost.
Some valid answers
A shopping cart service audit flags: (1) Cart totals use floating-point math instead of integer cents. (2) "Remove item" button is labeled "Delete." (3) Cart doesn't persist across sessions. (4) Tax calculation rounds per-item instead of on the total.
Q3.Which would you fix first?
Fix (1) first: floating-point math is actively corrupting financial records on every transaction. Every order placed between now and the fix is a compounding error in the accounting system.
Fix (4) second: per-item tax rounding is a legal compliance issue on every invoice. The fix is low-effort (integer arithmetic with one rounding step at the total level) and the risk of not fixing it is regulatory.
(3) and (2) are lower priority — cart persistence is a user experience improvement, not a correctness fix, and the label change is cosmetic. They matter, but not before the financial integrity issues.
Technique 6
Symmetry Detection
Notice when patterns hold almost everywhere — then investigate the breaks.
One-offs vs. broken symmetry
"8/10 endpoints validate input. 2 don't."
That's not two outliers. That's broken symmetry.
Either the 2 have justification, or they're bugs.
Quantify pattern adherence (not just "usually follows")
Check for documented justification on breaks
Assess cost of the asymmetry (back to technique 5)
Example: Symmetry audit
## Task: Identify broken symmetry in this API
For each pattern you observe:
1. How many endpoints follow it? (N/total)
2. Which endpoints break it? (list specifically)
3. For each break: is there documented justification?
4. If no justification: what's the operational cost?
## Example finding format:
Pattern: "All POST endpoints validate request body schema"
Adherence: 11/13 endpoints
Breaks: POST /legacy/import, POST /admin/bulk-update
Justification: None documented
Cost: Malformed input reaches business logic → potential crash
Severity: HIGH (broken symmetry with operational cost)
Exercise
A component library with 20 components. 17 accept a `size` prop with values "sm" | "md" | "lg". Of the remaining 3: one uses "small" | "medium" | "large", one has no size prop at all, and one accepts a numeric pixel value.
Discuss: What's the symmetry? Which breaks are justified?
What's the operational cost of each asymmetry? How would you report this?
Some valid answers
A component library with 20 components. 17 accept a `size` prop with values "sm" | "md" | "lg". Of the remaining 3: one uses "small" | "medium" | "large", one has no size prop at all, and one accepts a numeric pixel value.
Q1.What's the symmetry?
Pattern: 17/20 components accept a size prop with values "sm" | "md" | "lg". The adherence rate (85%) is high enough to call this a canonical convention for the library.
The symmetry extends beyond just the values: 17 components share the same prop name ("size"), the same enum type, and the same three values. This is a strong interface contract that the three exceptions break in different ways.
The audit question is: for each break, is the deviation intentional (justified by the component's nature) or incidental (someone used different naming without thinking)? The answer is different for each of the three.
Some valid answers
A component library with 20 components. 17 accept a `size` prop with values "sm" | "md" | "lg". Of the remaining 3: one uses "small" | "medium" | "large", one has no size prop at all, and one accepts a numeric pixel value.
Q2.Which breaks are justified?
"small"/"medium"/"large" — unjustified. This is the same concept with different names. The operational cost is that TypeScript autocomplete diverges, documentation is inconsistent, and a developer writing a themed wrapper has to handle two naming conventions.
No size prop at all — possibly justified. A divider or decorator component might genuinely have one fixed presentation. The audit should flag it for review, not as a definite violation. The question is: "should this component ever render at different sizes?"
Numeric pixel value — possibly justified for specific components. A spacer or layout primitive that needs arbitrary sizing can't be constrained to three enum values. The justification should be documented ("this component accepts arbitrary spacing, not discrete size tiers") so future developers understand the intentional divergence.
Some valid answers
A component library with 20 components. 17 accept a `size` prop with values "sm" | "md" | "lg". Of the remaining 3: one uses "small" | "medium" | "large", one has no size prop at all, and one accepts a numeric pixel value.
Q3.What's the operational cost? How would you report this?
"small"/"medium"/"large" break: developers using the library must memorize two size APIs. TypeScript doesn't catch mismatches (both are valid strings). Grep for "size=" returns inconsistent results. Rename it in the fix — the cost of the inconsistency accumulates with every consumer.
How to report it: "Pattern: 17/20 components use size: 'sm' | 'md' | 'lg'. Break: ComponentX uses 'small' | 'medium' | 'large' (no documented justification). Operational cost: dual API surface for consumers. Recommended fix: align to 'sm'|'md'|'lg'."
The report format matters: don't just list the breaks — quantify adherence (17/20), identify the break, check for justification, and assess cost. A report that says "3 inconsistencies found" is less useful than one that distinguishes two unjustified naming divergences from one intentional design choice.
Technique 7
Confidence Disclosure
Make uncertainty visible. "I don't know" is valuable data.
Why disclosure beats suppression
Without disclosure:
Uncertain findings get suppressed
OR uncertain findings get over-committed
Both lose information
With disclosure:
Low-confidence items included + marked
Reader triages by confidence
Future analysis can revisit
Normalize: "When confidence is low, include but mark it."
Example: Confidence scale
## Confidence marking (required on every finding)
HIGH: You can point to specific code/evidence that confirms.
Would defend this finding in peer review.
MEDIUM: Pattern suggests an issue but you can't confirm
without runtime testing or broader context.
Include — reviewer may have the missing context.
LOW: Heuristic match only. Might be a false positive.
Include anyway — mark it. Don't suppress.
"I noticed X but can't confirm it's a problem."
## Rule: never suppress a finding to avoid looking uncertain.
## Uncertainty disclosed > certainty faked.
Exercise
A log analyzer reports: "Definite memory leak in service-A (evidence: linear memory growth over 72 hours)." Also reports: "Possible connection pool exhaustion in service-B (evidence: 2 timeout errors in 10,000 requests over 24h)."
Discuss: What are the confidence primitives? What's the cost of treating both equally?
What if the second is suppressed entirely? What's the right disclosure?
Some valid answers
A log analyzer reports: "Definite memory leak in service-A (evidence: linear memory growth over 72 hours)." Also reports: "Possible connection pool exhaustion in service-B (evidence: 2 timeout errors in 10,000 requests over 24h)."
Q1.What are the confidence primitives?
Evidence strength: finding 1 has 72 hours of linear memory growth — a strong, repeatable pattern. Finding 2 has 2 events in 10,000 over 24 hours — a weak, possibly-noise signal. Different evidence strength, different confidence tier.
Sample size and time window: 72 hours is a long enough window to rule out one-off spikes. 24 hours and 2/10,000 is a small sample in a short window — the signal might disappear with more data, or it might be an early warning.
Reproducibility: the memory leak is reproducible (it's continuous and measurable). The connection pool issue may not be — 2 timeouts in 24 hours could be coincidence, network noise, or a real but infrequent failure mode.
Some valid answers
A log analyzer reports: "Definite memory leak in service-A (evidence: linear memory growth over 72 hours)." Also reports: "Possible connection pool exhaustion in service-B (evidence: 2 timeout errors in 10,000 requests over 24h)."
Q2.What's the cost of treating both equally?
If both are presented at the same severity: the engineer may investigate the low-confidence finding with the same urgency as the high-confidence one — wasting hours on a 2/10,000 rate that may be noise while the confirmed memory leak continues growing.
Or the reverse: if the engineer notices that #2 seems flimsy, they may discount both findings because the report didn't help them calibrate. Treating them equally makes both seem equally uncertain.
The decision about where to spend investigation time belongs to the engineer — but only if the report gives them the confidence information to make that decision. Flat-severity reports steal that ability.
Some valid answers
A log analyzer reports: "Definite memory leak in service-A (evidence: linear memory growth over 72 hours)." Also reports: "Possible connection pool exhaustion in service-B (evidence: 2 timeout errors in 10,000 requests over 24h)."
Q3.What if finding 2 is suppressed entirely?
A suppressed finding is a hidden one. If the connection pool issue is an early signal of a growing problem, suppressing it means it first surfaces as an incident rather than a flagged concern. The cost of suppression is borne by the system, not the report.
The right disclosure: include finding 2, mark it LOW confidence with the specific evidence and its limitations ("2 timeout errors in 10,000 requests over 24h — may be noise, but warrants a monitoring ticket for 1 week of data"). The engineer decides whether to investigate now or watch the metric.
"Include but mark it" preserves information that "suppress if uncertain" loses. The engineer reading the marked finding can dismiss it in 30 seconds if they recognize the pattern as known noise — but they can't act on a finding they never saw.
Technique 8
Domain-Neutral Primitives
Design prompts that work across artifact types: code, prose, research, systems.
Abstract > specific
Use "conceptual units" not "modules" or "paragraphs"
Use "composition boundary" not "function interface" or "section break"
Use "coherence" not "code correctness" or "logical flow"
Test: can your prompt audit both a CLI tool
AND a research paper without modification?
Same auditor for software, papers, component libraries, organizational docs.
Example: Domain-neutral language
## Audit framework (works across domains)
For any artifact, identify:
1. **Units**: the composable pieces (functions/paragraphs/components)
2. **Boundaries**: where units meet (interfaces/transitions/edges)
3. **Contracts**: what each boundary promises (types/claims/invariants)
4. **Coherence**: do units compose without contradiction?
## Apply to:
- Source code: units=functions, boundaries=APIs, contracts=types
- Research paper: units=sections, boundaries=transitions, contracts=claims
- Design system: units=components, boundaries=props, contracts=variants
The audit technique is identical. Only the vocabulary maps change.
Exercise
A multi-tenant SaaS audit prompt says: "Check that tenant data doesn't leak between organizations. Review database queries for proper WHERE clauses. Verify API endpoints check org_id."
Discuss: What's domain-specific language here? How would you rephrase to also audit
a document-sharing system's permission model? A component library's theme isolation?
What vocabulary transfers across all three?
Some valid answers
A multi-tenant SaaS audit prompt says: "Check that tenant data doesn't leak between organizations. Review database queries for proper WHERE clauses. Verify API endpoints check org_id."
Q1.What's domain-specific language here?
"Tenant data," "organizations," "WHERE clauses," "org_id" — all specific to multi-tenant SaaS data architecture. None of these terms apply to a document-sharing system or a component library.
"Database queries" is somewhat domain-neutral but presupposes a relational data model. A document-sharing system with a different persistence layer needs different vocabulary for the same underlying concept.
The domain-specific framing constrains the prompt to one class of systems — it can't be reused without rewriting, even when the underlying principle (scope isolation) is identical across all three systems.
Some valid answers
A multi-tenant SaaS audit prompt says: "Check that tenant data doesn't leak between organizations. Review database queries for proper WHERE clauses. Verify API endpoints check org_id."
Q2.How would you rephrase to audit all three?
Multi-tenant SaaS: "scope" = org, "isolation boundary" = data that belongs to one org must not be accessible to another, "boundary check" = every data access is scoped to the requesting org's identifier.
Document-sharing: "scope" = user + permission set, "isolation boundary" = a document visible to user A must not be accessible to user B without a permission grant, "boundary check" = every document access verifies the requester's permissions.
Component library theme isolation: "scope" = component instance, "isolation boundary" = a component's theme tokens must not bleed into sibling components, "boundary check" = CSS scoping or prop encapsulation prevents style leakage across component boundaries.
Some valid answers
A multi-tenant SaaS audit prompt says: "Check that tenant data doesn't leak between organizations. Review database queries for proper WHERE clauses. Verify API endpoints check org_id."
Q3.What vocabulary transfers across all three?
"Scope" (the unit that defines what a principal can access), "isolation boundary" (where scope A must not cross into scope B), "boundary check" (the mechanism that enforces the boundary), and "crossing detection" (how you verify that boundary checks are happening).
The transferable one-sentence audit: "Verify that every access to a resource in scope A checks the requester's right to that scope before granting access." This applies to tenants, documents, and theme tokens without modification.
Domain-specific vocabulary (tenant, org_id, WHERE clause) can be added as examples: "In a multi-tenant DB, this check is a WHERE org_id = :current_org; in a document system, it's a permission.check(user, document); in a component library, it's CSS encapsulation." The principle is shared; the vocabulary maps are per-domain footnotes.
Technique 9
Layer-Crossing Checks
Verify alignment across abstraction layers. Most bugs live at boundaries.
Where integrity violations hide
Documentation → says "validates all input"
↕ BOUNDARY (check here)
Implementation → skips validation on 2 endpoints
Schema → defines field as required
↕ BOUNDARY (check here)
Usage → passes null without error
Single-layer audits miss drift between layers. Anchor in one layer, validate in another.
Boundary: Docs ↔ Code
Documentation says one thing. Implementation does another.
## Docs ↔ Code verification
For each documented feature:
1. Find the claim in documentation (README, API docs, comments)
2. Locate the implementation
3. Verify: does the implementation honor the documented behavior?
Common drift patterns:
- Docs describe v1 behavior; code has been patched to v3
- Docs promise error handling that doesn't exist
- Docs say "required" but code has a default fallback
- Feature was removed but docs still reference it
Docs ↔ Code drift is the most common and least detected. Tests verify code, not docs.
Boundary: Schema ↔ Usage
Schema defines a contract. Callers violate it silently.
## Schema ↔ Usage verification
For each schema definition:
1. Read the schema constraints (required fields, types, enums)
2. Find all call sites / consumers
3. Verify: does every caller satisfy the schema?
Red flags:
- Schema says "required" but 3 callers pass undefined
- Schema says enum ["draft","published"] but code also uses "archived"
- Schema says maxLength:100 but no truncation at input boundary
- Schema updated, but callers unchanged (version skew)
Schemas are promises. Code that bypasses the validation layer breaks promises silently.
Boundary: Claim ↔ Evidence
System makes an assertion. Evidence doesn't support it.
## Claim ↔ Evidence verification
For each system claim (logs, metrics, status pages, user-facing messages):
1. Identify the claim ("99.9% uptime", "processed successfully", "validated")
2. Trace to the evidence source
3. Verify: does the evidence actually demonstrate the claim?
Failure modes:
- "Processed" means "received" — not "completed"
- "Validated" means "parsed" — not "business-rule checked"
- "Healthy" means "responding" — not "correct"
- Metric measures attempts, claim reports "successes"
Most dangerous in LLM systems: "the model confirmed" ≠ "the answer is correct."
Example: Cross-layer checklist
Prompt literal — a verification framework
## Layer-crossing verification
For each feature, check these boundaries:
| Source layer | Target layer | Question |
|---|---|---|
| README | Implementation | Does code do what docs claim? |
| Schema | Usage | Are required fields always provided? |
| Error messages | Error handling | Do messages match actual error types? |
| API docs | Route handlers | Do params/returns match? |
| Config defaults | Runtime behavior | Does the default actually apply? |
Process: Pick source layer. Read its claims. Cross to target
layer. Verify each claim has matching implementation.
Mismatches at boundaries are findings.
Exercise
An API has an OpenAPI spec (doc layer) and Express route handlers (implementation layer). Spec says PUT /users/:id requires `{ name, email }` in body. Handler actually accepts `{ name, email, role }` — and setting role=admin works, bypassing the invitation flow.
Discuss: What layers exist here? What's the boundary violation?
What's the operational cost? What other layer-crossings would you check in this system?
Some valid answers
An API has an OpenAPI spec (doc layer) and Express route handlers (implementation layer). Spec says PUT /users/:id requires `{ name, email }` in body. Handler actually accepts `{ name, email, role }` — and setting role=admin works, bypassing the invitation flow.
Q1.What layers exist here?
Documentation layer: the OpenAPI spec — the public contract for what the API accepts and returns. Authorization layer: the invitation flow — the business rule that controls how role=admin is granted. Implementation layer: the Express handler — what the code actually does.
A fourth layer exists implicitly: the test suite. If tests were written against the OpenAPI spec, they would not test for role=admin in the request body — because the spec doesn't document it. The vulnerability lives below the test floor.
The vulnerability is an undocumented layer-crossing: the implementation accepts input the spec doesn't acknowledge, and that input bypasses an authorization layer the spec implies is the only path to elevated roles.
Some valid answers
An API has an OpenAPI spec (doc layer) and Express route handlers (implementation layer). Spec says PUT /users/:id requires `{ name, email }` in body. Handler actually accepts `{ name, email, role }` — and setting role=admin works, bypassing the invitation flow.
Q2.What's the boundary violation?
The spec says 2 fields; the handler accepts 3. The undocumented field is not just an inconsistency — it has real authorization consequences. Setting role=admin bypasses the invitation flow, granting elevated access without the intended review step.
This is a docs↔code violation with security impact: the spec is the source of truth for what the API accepts, and the implementation silently extends it with a privilege escalation surface. Anyone reading only the spec would never find this vulnerability.
The authorization layer (invitation flow) has a claimed guarantee: "admin access requires an invitation." The implementation makes that guarantee false without being visible at the authorization layer. That's a claim↔evidence violation nested inside the docs↔code violation.
Some valid answers
An API has an OpenAPI spec (doc layer) and Express route handlers (implementation layer). Spec says PUT /users/:id requires `{ name, email }` in body. Handler actually accepts `{ name, email, role }` — and setting role=admin works, bypassing the invitation flow.
Q3.What other layer-crossings would you check?
Error responses: does the spec document the error format for 400/401/403 responses? Do the handlers return exactly that format, or do some return raw Express error objects with stack traces?
Rate limiting: if the spec documents rate limits (429 after N requests), do the middleware enforce them? Or does the spec describe a protection the implementation doesn't actually provide?
Authentication requirements: for each endpoint the spec marks as "authenticated," does the middleware chain actually enforce authentication? A missing authMiddleware in one route handler would be invisible from the spec layer but exploitable at the implementation layer.
Technique 10
Escape Hatch Design
Build flexibility into rigid templates without losing structure.
Hard constraints + permission to break them
"Follow this template OR explain why you didn't"
>
"Follow this template" (no escape)
Edge cases always emerge
Models need permission to deviate when justified
Deviation with explanation > silent rule-breaking
Deviation with explanation > forced compliance that distorts output
Example: Template with exits
## Output format
For each finding, use:
### [SEVERITY] Finding title
**Location:** file:line
**Evidence:** [code snippet or quote]
**Impact:** [operational cost]
**Fix:** [specific remediation]
## Escape hatches
- If finding spans multiple files: use "Location: [list]"
instead of single file:line
- If evidence requires >10 lines: summarize + reference
the full span ("see lines 42-89")
- If fix is non-obvious: replace "Fix:" with "Options:"
and list 2-3 alternatives with tradeoffs
- If you can't determine severity: use "UNCERTAIN" and explain
what information would resolve it
Exercise
A form builder requires every field to have: a label, a validation rule, a help text tooltip, and an error message. An engineer adds a "spacer" element (pure layout, takes no input). System rejects it — no validation rule, no error message.
Discuss: What are the template primitives? What's the violation?
What escape hatch accommodates the spacer without weakening the contract for real input fields?
Some valid answers
A form builder requires every field to have: a label, a validation rule, a help text tooltip, and an error message. An engineer adds a "spacer" element (pure layout, takes no input). System rejects it — no validation rule, no error message.
Q1.What are the template primitives?
For input fields: label (what is this field?), validation rule (what values are accepted?), help text (tooltip explaining the expected input), error message (what to show when validation fails). All four serve user interaction with an input element.
The template assumes a one-to-one mapping: every element in the form IS an input field. The primitives were derived from the common case and generalized incorrectly to "all elements."
The spacer element is a layout element, not an input element. It doesn't accept user input, has no validation state, and needs none of the four primitives. The constraint is correct for its intended scope; the error is over-application of that scope.
Some valid answers
A form builder requires every field to have: a label, a validation rule, a help text tooltip, and an error message. An engineer adds a "spacer" element (pure layout, takes no input). System rejects it — no validation rule, no error message.
Q2.What's the violation?
The system rejects a valid use case: a spacer element is a legitimate form element (many form builders need layout control) but the validation contract prevents it from being expressed. The system is more rigid than the problem requires.
The violation is in the scope assumption, not the contract itself. "All form elements must have label + validation + help + error" is correct for input elements. Applying it to all elements conflates element types.
A type-blind validation contract cannot distinguish between "missing required field" (violation) and "element type doesn't need this field" (legitimate). Both look identical to the validator.
Some valid answers
A form builder requires every field to have: a label, a validation rule, a help text tooltip, and an error message. An engineer adds a "spacer" element (pure layout, takes no input). System rejects it — no validation rule, no error message.
Q3.What escape hatch accommodates the spacer?
Introduce element types: type: "input" | "layout". Input elements require all four properties; layout elements require none. This strengthens the contract for input elements (it now applies specifically and unambiguously) while accommodating layout elements.
The escape hatch is the type distinction, not the relaxation of constraints. The rules for input fields are unchanged — in fact, they're now clearer because the scope is explicit. The validator can enforce them more strictly on elements it knows are input fields.
An alternative escape hatch: allow elements to declare which primitives apply to them (required_primitives: ["label", "validation"]) with the default being all four. This is more flexible but harder to reason about — the type-based approach is cleaner when the element categories are stable.
The Meta-Pattern
Every technique in this seminar is an instance of one principle:
Make the implicit explicit.
Make the structural visible.
Make the enforced distinguishable from the requested.
Prompts are infrastructure. Engineer them like infrastructure.