Productionization
The aiwg nlp productionize command reviews a pipeline for production readiness and generates a harde
Productionization Guide
The `aiwg nlp productionize` command reviews a pipeline for production readiness and generates a hardened `prod/` version. This document explains the checklist, what gets added, and what gets removed.
The Production Checklist
Before generating production artifacts, `productionize` runs this checklist:
Prompts
- [ ] All prompt files exist with version headers
- [ ] Evaluator prompt is a separate file
- [ ] System prompt is ≤2000 tokens (flag if larger)
- [ ] Output format is explicitly specified
Eval
- [ ] `eval/cases.jsonl` exists with ≥5 test cases
- [ ] A recent eval run exists (`eval/results.jsonl`)
- [ ] Most recent pass rate ≥85% (configurable threshold)
- [ ] If pass rate <85%: block productionization until quality gate passes
Code
- [ ] Timeout handling on all LLM calls
- [ ] Retry wrapper for rate limits (429) and transient errors (502, 503)
- [ ] Structured output validation (Pydantic/Zod schema)
- [ ] Token budget enforcement (max_tokens set and enforced at call site)
- [ ] No verbose dev logging in hot path
Cost
- [ ] `cost-model.yaml` exists
- [ ] Cost per call is within `warn_above_usd` threshold
What Gets Added
Timeout handling
# Before (dev)
response = client.messages.create(model=MODEL, ...)
# After (prod)
response = client.messages.create(model=MODEL, ..., timeout=TIMEOUT_SECONDS)
Retry wrapper
# Added automatically:
for attempt in range(MAX_RETRIES):
try:
return client.messages.create(...)
except anthropic.RateLimitError:
time.sleep(BACKOFF_SECONDS * (2 ** attempt))
except anthropic.APIStatusError as e:
if e.status_code in {502, 503}:
time.sleep(BACKOFF_SECONDS * (2 ** attempt))
else:
raise
Structured output validation (Python)
# Added automatically using Pydantic:
from pydantic import BaseModel, ValidationError
class PipelineOutput(BaseModel):
field_one: str | None
field_two: str | None
try:
validated = PipelineOutput.model_validate(json.loads(raw))
except ValidationError as e:
raise ValueError(f"Output schema validation failed: {e}") from e
Token budget enforcement
# max_tokens from pipeline.config.yaml enforced at call site:
response = client.messages.create(
model=MODEL,
max_tokens=pipeline_config["steps"][0]["max_tokens"], # enforced
...
)
Cost cap guard
# Abort if estimated cost exceeds warn_above_usd:
estimated_cost = (input_tokens * INPUT_PRICE + output_tokens * OUTPUT_PRICE) / 1000
if estimated_cost > COST_WARN_THRESHOLD:
raise CostGuardError(f"Estimated call cost ${estimated_cost:.4f} exceeds threshold")
What Gets Removed
Framework boilerplate
If LangChain or LangGraph is detected and the dependency is not load-bearing (i.e., the pipeline doesn't use graph traversal, LCEL composition, or LangSmith tracing), it gets replaced with direct SDK calls.
Why: LangChain adds ~180ms cold start, 40+ transitive dependencies, and an abstraction layer that makes debugging harder. For most inference pipelines, the Anthropic SDK is sufficient.
Replacement pattern:
# Before (with langchain)
from langchain_anthropic import ChatAnthropic
chain = ChatAnthropic(model=MODEL) | StructuredOutputParser.from_response_schemas(schemas)
result = chain.invoke({"input": input_text})
# After (direct SDK)
client = anthropic.Anthropic()
response = client.messages.create(model=MODEL, ..., system=system, messages=[...])
output = PipelineOutput.model_validate(json.loads(response.content[0].text))
Dev-only logging
# Removed from prod/:
print(f"DEBUG: raw response = {raw}")
logger.debug(f"Prompt tokens: {response.usage.input_tokens}")
Structured logging (stderr, machine-readable) is added in its place.
The `prod/` Directory
After productionization:
prod/
├── prompts/ # Copied from dev (hardened if changes were made)
├── src/
│ └── pipeline.py # Hardened: timeouts, retries, validation, no framework
├── Dockerfile # Minimal container
├── cost-model.yaml # From cost analysis
└── README.md # This ops runbook
The `dev/` pipeline is unchanged. `prod/` is a separate hardened version.
Productionization Criteria
A pipeline is production-ready when:
1. Eval pass rate ≥ configured threshold (default 85%)
2. All timeout and retry handling is in place
3. Structured output validation enforced at call site
4. No framework dependencies that aren't load-bearing
5. `cost-model.yaml` is current and cost is within budget
6. Ops runbook (`prod/README.md`) documents start/stop/rollback/monitoring
When NOT to Productionize
- Pass rate <85%: fix the prompts first
- No test cases: write eval cases first
- Eval run is >7 days old: re-run eval to confirm quality
- Cost model shows cost spike: investigate before deploying