Patterns

Six canonical patterns for LLM inference pipelines. Start with the simplest one that meets your requ

Pipeline Patterns

Six canonical patterns for LLM inference pipelines. Start with the simplest one that meets your requirements.

Decision Guide

Apply in order — stop at the first match:

Tool use + dynamic branching?
  → Embedded Agent

Explicit states + error recovery + compliance auditability?
  → State Machine

External document retrieval required?
  → RAG Pipeline

Runtime prompt assembly (multi-tenant, feature flags)?
  → Dynamic Prompt

Quality gate on output only (no multi-step pipeline)?
  → Eval Loop (standalone)

Everything else:
  → Simple Chain ← DEFAULT

Bias toward Simple Chain. It handles ≥70% of standard use cases. Complexity is a cost — in code, in ops, in debugging.

Pattern 1: Simple Chain

When to use: Single-responsibility tasks, high-volume inference, latency-sensitive paths, cost-constrained deployments.

Structure:

Input → [Prompt A] → LLM → Parse → [Prompt B] → LLM → Output

Generated artifacts:

`prompts/step-a.prompt.md`, `prompts/step-b.prompt.md`
`pipeline.config.yaml`
`src/pipeline.py` or `src/pipeline.ts`
`eval/cases.jsonl`, `eval/eval.py`
`cost-estimate.md`

Anti-patterns to avoid:

Adding agent loop "just in case"
Using sonnet when haiku passes eval at >85%
Adding framework dependencies not needed for the task

Pattern 2: Embedded Agent

When to use: Routing decisions, structured extraction with ambiguity, tool-gated tasks needing retry logic but not full autonomy.

The embedded agent is a component in a flow — not the flow itself. Key constraints:

≤5 tools
Bounded iterations (max_iterations required — no infinite loops)
Deterministic exit conditions
Falls back to `escalate` if exit condition not met

Structure:

Flow → [Agent: classify + route] → Flow
Flow → [Agent: extract structured data] → Flow

Generated artifacts:

`prompts/system.prompt.md` (scoped system prompt)
`tools/` (typed tool definitions)
`pipeline.config.yaml` (with `agent_config`)
`src/agent.py` or `src/agent.ts`

When NOT to use:

More than 5 tools needed → redesign as State Machine or pipeline steps
Unbounded iteration required → use Ralph loop (development) or State Machine (production)
Full autonomy needed → use AIWG-style agents, not embedded agent

Pattern 3: State Machine

When to use: Document processing pipelines, multi-stage classification, workflows with explicit retry/escalation logic, compliance-critical flows where state must be auditable.

Structure:

INIT → EXTRACT → VALIDATE → [PASS → ENRICH → OUTPUT] | [FAIL → RETRY(n) → ESCALATE]

Generated artifacts:

`fsm.config.yaml` (states, transitions, guards)
`prompts/` (one prompt file per LLM state)
`src/pipeline.py` or `src/pipeline.ts` (FSM runtime)
`audit/transitions.jsonl` (append-only audit log)

Key principles:

Every state has a defined type (`llm`, `transform`, `decision`, `terminal`, `escalate`)
Every transition has a guard condition
Terminal states have explicit outcomes: `accept`, `reject`, `escalate`
Max retries defined — no infinite loops
Audit log captures all transitions

When NOT to use:

Simple sequential steps with no branching → Simple Chain
Need for unbounded tool use → Embedded Agent or full agent system

Pattern 4: RAG Pipeline

When to use: Knowledge base Q&A, document-grounded generation, any case where the LLM needs external context it cannot have in the system prompt.

Structure:

Query → Embed → Retrieve(k) → Rerank(optional) → [Context + Query → Prompt] → LLM → Response

Generated artifacts:

`retrieval.config.yaml` (chunk size, overlap, k, embedding model)
`prompts/rag.prompt.md` (with `{{context}}` injection)
`src/retrieval.py`, `src/pipeline.py`
`eval/rag-eval.py` (RAGAS-compatible eval harness)

Key parameters:

`k` = number of chunks to retrieve (default: 5; increase if recall is low)
`chunk_size` = 512 words (default; decrease for precise retrieval)
`chunk_overlap` = 64 words (prevents context boundary splits)
`rerank` = false by default (enable if recall is important; adds latency)

When NOT to use:

"The context might be long" → use prompt caching on simple chain instead
Context fits in system prompt → use simple chain with direct injection
Knowledge changes per-request → dynamic prompt may be more appropriate

Pattern 5: Eval Loop

When to use: Quality gate over generated output where a single-pass generation isn't reliable enough. Standalone or composed with any other pattern.

Structure:

GENERATE(prompt, input)
  → output
  → EVAL(eval_prompt, input, output)   ← isolated call
    → {score, pass, feedback}
    → if pass: ACCEPT
    → if fail and attempts < max: REFINE(feedback) → GENERATE again
    → if fail and attempts >= max: ESCALATE

The most important property: strict isolation. The evaluator has NO knowledge of the generator's internals, chain-of-thought, or intermediate steps.

Generated artifacts:

`prompts/generator.prompt.md`
`prompts/evaluator.prompt.md` (separate file — never mixed with generator)
`eval/loop.py` or `eval/loop.ts` (configurable: max_attempts, pass_threshold)

Configuration:

`pass_threshold`: 0.85 (default)
`max_attempts`: 3 (default)
`eval_model`: haiku (cheaper than generator; sufficient for scoring)

Anti-patterns:

Evaluator and generator in the same prompt file → isolation violation
Evaluator receiving chain-of-thought or intermediate steps → isolation violation
Using the same model as generator for evaluation → increases cost with no benefit

Pattern 6: Dynamic Prompt

When to use: Personalized generation, multi-tenant prompts, feature-flagged prompt variants, systematic prompt iteration.

Structure:

Config + Context → PromptBuilder → Rendered Prompt → [Eval Loop] → Accepted Prompt

Generated artifacts:

`prompts/builder.config.yaml` (template blocks, variable schema)
`prompts/template.prompt.md.j2` (Jinja2 or Handlebars template)
`src/prompt_builder.py` or `src/prompt_builder.ts`
`eval/prompt-eval.py`

Key principle: The template is code. Version it, test it, review it. A/B testing prompt variants is an explicit use case — not a side effect.

Anti-Pattern Reference

Anti-Pattern	Signal	Recommended Pattern
Agentic overkill	"I need an agent that..." for single-step extraction	Simple Chain
Tool proliferation	>5 tools in embedded agent	State Machine or split pipeline
Framework cargo-cult	"We're using LangChain, so..."	Evaluate if load-bearing; default to clean stub
Missing eval	No mention of quality measurement	Add eval loop to any pattern
Eval contamination	Evaluator knows generator's reasoning	Strict isolation protocol
Infinite loop risk	No exit condition or max_iterations	Embedded Agent with bounds or State Machine
RAG for everything	"Context might be large"	Check if prompt caching handles it first