Rlm Agent Examples

Externalized from the agent definition per the few-shot-examples rule (#1587/#1600).

Recursive Language Model Agent — Worked Examples

Externalized from the agent definition per the few-shot-examples rule (#1587/#1600).

These worked examples demonstrate the RLM execution patterns (keyword filtering,

structural chunking, recursive corpus analysis, incremental aggregation, and

model-prior-guided search) end-to-end. The agent definition retains the

capability descriptions and pattern catalog; the full input → output walkthroughs

live here.

Example 1: Simple — Search Large File for Specific Function

Task: Find the implementation of `validatePassword()` in a 3000-line auth module.

Context:

  • File: `src/auth/authentication.ts` (3000 lines)
  • User wants to understand password validation logic

RLM Approach:

THOUGHT [Goal]: I need to find validatePassword() in a large file.

THOUGHT [Research]: Instead of reading 3000 lines, I'll search for the function
definition first to get the line number.

ACTION: Grep
  pattern: "validatePassword"
  path: "src/auth/authentication.ts"
  output_mode: "content"
  -n: true

OBSERVATION:
  142: export function validatePassword(password: string): ValidationResult {
  143:   const minLength = 8;
  144:   const hasUpper = /[A-Z]/.test(password);
  ...
  168: }

THOUGHT [Extraction]: Function starts at line 142, likely ends around 168.

ACTION: Read
  file_path: "src/auth/authentication.ts"
  offset: 140
  limit: 35

OBSERVATION: [Full function implementation]

THOUGHT [Synthesis]: I now have the complete validatePassword() function.
Writing to intermediate result for reference.

ACTION: Write
  file_path: ".aiwg/rlm/tasks/task-001/final-result.md"
  content: |
    # validatePassword() Implementation

    Located at: src/auth/authentication.ts:142-168

    ## Function:

[full function code]


    ## Summary:
    - Minimum 8 characters
    - Requires uppercase, lowercase, number, special character
    - Returns ValidationResult with error messages

COMPLETION: Task complete. Function analysis written to final-result.md.

Why This Is Good:

  • Used Grep to locate before reading (saved 2850 lines from context)
  • Read only relevant section (35 lines vs 3000)
  • Wrote final result to file for persistence
  • Clear completion signal

Example 2: Moderate — Analyze Multiple Files and Aggregate

Task: Find all database migration files and summarize their changes.

Context:

  • Directory: `db/migrations/` (unknown number of files)
  • User wants a changelog of all schema changes

RLM Approach:

THOUGHT [Goal]: I need to find all migration files and summarize their changes.

THOUGHT [Research]: First discover how many migration files exist, then analyze each.

ACTION: Glob
  pattern: "db/migrations/*.sql"

OBSERVATION:
  db/migrations/001-create-users.sql
  db/migrations/002-add-email-verification.sql
  db/migrations/003-create-sessions.sql
  db/migrations/004-add-user-roles.sql
  [12 total files]

THOUGHT [Reasoning]: 12 files is manageable but will exceed context if I read all.
I'll delegate per-file analysis to sub-agents and aggregate their results.

ACTION: Write
  file_path: ".aiwg/rlm/tasks/task-002/query-plan.md"
  content: |
    # Migration Analysis Plan

    Files to analyze: 12
    Strategy: Delegate per-file to sub-agents
    Aggregation: Collect summaries into changelog

ACTION: Task (parallel)
  [Spawn 12 sub-agents in parallel]

  Sub-agent 1: "Summarize database changes in db/migrations/001-create-users.sql.
                Write result to .aiwg/rlm/tasks/task-002/intermediate/summary-001.md"

  Sub-agent 2: "Summarize database changes in db/migrations/002-add-email-verification.sql.
                Write result to .aiwg/rlm/tasks/task-002/intermediate/summary-002.md"

  [... 10 more sub-agents ...]

OBSERVATION: All sub-agents completed successfully.
  [12 summary files written to intermediate/]

THOUGHT [Extraction]: Sub-agents have analyzed each migration. Now aggregate.

ACTION: Bash
  command: "cat .aiwg/rlm/tasks/task-002/intermediate/summary-*.md > .aiwg/rlm/tasks/task-002/intermediate/all-summaries.md"

OBSERVATION: Aggregated file created (2.1KB).

ACTION: Read
  file_path: ".aiwg/rlm/tasks/task-002/intermediate/all-summaries.md"

OBSERVATION: [All 12 summaries]

THOUGHT [Synthesis]: I now have all migration summaries. Creating final changelog.

ACTION: Write
  file_path: ".aiwg/rlm/tasks/task-002/final-result.md"
  content: |
    # Database Migration Changelog

    ## Summary
    12 migrations applied from 2024-01 to 2025-12

    ## Changes by Migration

    ### 001 - Create Users (2024-01-15)
    - Created `users` table
    - Columns: id, email, password_hash, created_at

    ### 002 - Add Email Verification (2024-02-10)
    - Added `email_verified` boolean to users
    - Added `verification_token` column

    [... remaining 10 migrations ...]

    ## Schema Impact
    - 5 new tables created
    - 8 columns added to existing tables
    - 2 indexes created for performance

COMPLETION: Migration changelog complete. Final result written to final-result.md.

Why This Is Good:

  • Used Glob to discover files (no guessing)
  • Delegated to parallel sub-agents (12 independent analyses)
  • Sub-agents wrote structured output to intermediate files
  • Root agent aggregated efficiently (read intermediate files, not re-analyze)
  • Clear completion with comprehensive summary

Example 3: Complex — Recursive Corpus Analysis with Sub-Agent Delegation

Task: Analyze the AIWG research corpus to find all papers discussing "agent memory" and synthesize their key approaches.

Context:

  • Corpus: `.aiwg/research/sources/` (89 papers)
  • User wants comparative analysis of memory systems across research

RLM Approach:

THOUGHT [Goal]: Find all papers discussing agent memory and synthesize approaches.

THOUGHT [Research]: 89 papers is too many to process directly. I'll use a two-phase
approach: (1) Filter to relevant papers, (2) Deep analysis of filtered set.

ACTION: Bash
  command: "ls .aiwg/research/sources/ | wc -l"

OBSERVATION: 89

THOUGHT [Reasoning]: Step 1 - Filter corpus to memory-relevant papers using keyword search.

ACTION: Grep
  pattern: "memory|episodic|semantic|reflexion|recall"
  path: ".aiwg/research/sources/"
  glob: "*.md"
  output_mode: "files_with_matches"

OBSERVATION:
  .aiwg/research/sources/REF-021-reflexion.md
  .aiwg/research/sources/REF-035-memgpt.md
  .aiwg/research/sources/REF-041-generative-agents.md
  .aiwg/research/sources/REF-057-agent-laboratory.md
  .aiwg/research/sources/REF-089-recursive-language-models.md
  [12 total papers]

THOUGHT [Progress]: Filtered from 89 to 12 papers. Now write the filtered list.

ACTION: Write
  file_path: ".aiwg/rlm/tasks/task-003/intermediate/memory-relevant-papers.txt"
  content: |
    REF-021-reflexion.md
    REF-035-memgpt.md
    REF-041-generative-agents.md
    REF-057-agent-laboratory.md
    REF-089-recursive-language-models.md
    [... 7 more papers]

THOUGHT [Reasoning]: 12 papers. Delegate deep analysis to sub-agents (one per paper).
Each sub-agent will extract the memory approach and key findings.

ACTION: Task (parallel)
  [Spawn 12 sub-agents]

  Sub-agent 1 (Depth 1):
    Task: "Analyze @.aiwg/research/sources/REF-021-reflexion.md
           Extract: (1) Type of memory system, (2) Key mechanism, (3) Performance impact
           Write: .aiwg/rlm/tasks/task-003/intermediate/analysis-REF-021.yaml"

  Sub-agent 2 (Depth 1):
    Task: "Analyze @.aiwg/research/sources/REF-035-memgpt.md
           [same extraction criteria]
           Write: .aiwg/rlm/tasks/task-003/intermediate/analysis-REF-035.yaml"

  [... 10 more sub-agents ...]

OBSERVATION: All 12 sub-agents completed.
  Sub-agent 1 wrote: analysis-REF-021.yaml (3.2KB)
  Sub-agent 2 wrote: analysis-REF-035.yaml (2.8KB)
  [... 10 more ...]

THOUGHT [Extraction]: Sub-agents have extracted structured data. Now aggregate.

ACTION: Bash
  command: |
    for f in .aiwg/rlm/tasks/task-003/intermediate/analysis-*.yaml; do
      echo "---"
      cat "$f"
    done > .aiwg/rlm/tasks/task-003/intermediate/all-analyses.yaml

OBSERVATION: Aggregated YAML created (34KB).

ACTION: Read
  file_path: ".aiwg/rlm/tasks/task-003/intermediate/all-analyses.yaml"

OBSERVATION: [12 structured analyses]

THOUGHT [Synthesis]: I now have all memory approaches. Synthesizing into comparative analysis.

ACTION: Write
  file_path: ".aiwg/rlm/tasks/task-003/final-result.md"
  content: |
    # Agent Memory Systems: Comparative Analysis

    ## Papers Analyzed
    12 papers from AIWG research corpus (filtered from 89)

    ## Memory System Types

    ### Episodic Memory (4 papers)
    - **Reflexion** (REF-021): Sliding window of past task attempts
      - Stores: trajectory, reward, reflection
      - Window size: Ω=1-5 (task-dependent)
      - Impact: +34% success rate on HumanEval

    - **Agent Laboratory** (REF-057): Draft-then-edit memory
      - Stores: previous drafts, human feedback
      - Window: Last 3 iterations
      - Impact: 84% cost reduction vs autonomous

    [... 2 more episodic systems ...]

    ### Working Memory (3 papers)
    - **MemGPT** (REF-035): Hierarchical context management
      - Stores: Core context + archival
      - Paging: Move between tiers
      - Impact: 10x context capacity

    [... 2 more working memory systems ...]

    ### Semantic Memory (2 papers)
    - **Generative Agents** (REF-041): Retrieval-based memory
      - Stores: Embeddings of past interactions
      - Retrieval: Similarity search
      - Impact: Consistent long-term behavior

    [... 1 more semantic system ...]

    ### Hybrid Systems (3 papers)
    - **RLM** (REF-089): External environment as memory
      - Stores: REPL state variables
      - Access: Programmatic via code
      - Impact: Lossless, scales unbounded

    [... 2 more hybrid systems ...]

    ## Key Findings

    | Memory Type | Best For | Cost | Complexity |
    |-------------|----------|------|------------|
    | Episodic | Iterative tasks | Low | Low |
    | Working | Long conversations | Medium | Medium |
    | Semantic | Long-term consistency | High | High |
    | Hybrid | Large-scale analysis | Variable | Medium |

    ## Recommendations for AIWG

    1. **Agent loops**: Episodic memory (Reflexion pattern) - already implemented
    2. **Long sessions**: Working memory (MemGPT-style paging) - planned
    3. **Cross-session**: Semantic memory (embeddings) - future work
    4. **Large corpora**: Hybrid (RLM pattern) - this agent!

    ## References

    All 12 papers analyzed:
    - @.aiwg/research/sources/REF-021-reflexion.md
    - @.aiwg/research/sources/REF-035-memgpt.md
    [... full list ...]

    ## Detailed Analyses

    Individual paper analyses available at:
    .aiwg/rlm/tasks/task-003/intermediate/analysis-*.yaml

COMPLETION: Agent memory comparative analysis complete.
  - 89 papers screened
  - 12 papers deeply analyzed
  - Synthesis written to final-result.md

Why This Is Good:

  • Two-phase approach: filter then analyze (saved 77 irrelevant papers)
  • Used Grep to identify relevant subset before deep analysis
  • Delegated per-paper analysis to parallel sub-agents (12 independent tasks)
  • Sub-agents wrote structured YAML for machine-readable aggregation
  • Root agent synthesized high-level comparative analysis
  • All intermediate data preserved for future reference
  • Clear completion with actionable recommendations

Key RLM Patterns Demonstrated:

1. Keyword filtering: Used Grep to narrow corpus from 89 to 12

2. Recursive delegation: Root agent spawned 12 sub-agents (depth 1)

3. Structured output: Sub-agents wrote YAML for easy aggregation

4. Incremental aggregation: Collected all analyses before synthesis

5. Model priors: Used domain knowledge (memory keywords) to guide search

6. Explicit completion: Clear COMPLETION signal with summary

RLM-Specific Pattern Walkthroughs

These are the worked command sequences for the five RLM strategy patterns. The

agent definition keeps the strategy descriptions; the demonstrative command

sequences live here.

Pattern 1: Keyword Filtering Before Reading

Problem: Need to find authentication logic in a 5000-line file.

RLM Solution:

# Step 1: Find relevant line numbers
grep -n "authenticate\|login\|auth" src/large-file.ts

# Step 2: Read only relevant sections (±20 lines of context)
# If line 142 matched, read lines 122-162

Anti-Pattern: Reading the entire file into context.

Pattern 2: Structural Chunking

Problem: Analyze all functions in a module.

RLM Solution:

# Step 1: Extract function names
grep -E "^(export )?function \w+|^(export )?(const|let) \w+ = " src/module.ts

# Step 2: Delegate per-function analysis
Task("Analyze function authenticateUser()") for each function
Task("Analyze function validateToken()")
...

# Step 3: Aggregate results
Write intermediate/{function-name}-analysis.md for each
Synthesize into final-module-analysis.md

Pattern 3: Recursive Corpus Analysis

Problem: Analyze 100 research papers for a specific claim.

RLM Solution:

# Root agent (you):
1. Glob for all papers: .aiwg/research/sources/*.pdf
2. Spawn sub-agent per paper: Task("Extract key claims from {paper}")
3. Sub-agents write: intermediate/claims-{paper-id}.yaml
4. Root aggregates: Read all intermediate/*.yaml → synthesize

# Sub-agents (depth 1):
1. Receive single paper path
2. Search for keywords
3. Extract relevant passages
4. Write structured claims YAML
5. DONE (no further recursion needed)

Pattern 4: Incremental Aggregation

Problem: Find all API endpoints across a codebase.

RLM Solution:

# Step 1: Discover route files
glob "**/*route*.{ts,js}" → intermediate/route-files.txt

# Step 2: Extract endpoints per file
For each file in route-files.txt:
  grep -E "router\.(get|post|put|delete)" {file} → intermediate/endpoints-{file}.txt

# Step 3: Aggregate
cat intermediate/endpoints-*.txt → intermediate/all-endpoints.txt

# Step 4: Deduplicate and structure
Parse intermediate/all-endpoints.txt → Write final-api-inventory.json

Problem: Find where database transactions are handled.

RLM Solution:

# Use domain knowledge to narrow search BEFORE reading
# Likely locations: repositories, services, database modules

# Step 1: Search likely paths first
grep -r "transaction\|BEGIN\|COMMIT" src/repositories/ src/services/ src/db/

# Step 2: If found, read those files
# Step 3: If not found, expand search
grep -r "transaction\|BEGIN\|COMMIT" src/

Key Insight: Use your prior knowledge about code organization to guide the search, don't search exhaustively when domain priors exist.

Output Format Templates

During Execution

─────────────────────────────────────────
RLM Agent: {task-id}
─────────────────────────────────────────

Phase: DISCOVERY
- Scanning: {directory/file}
- Found: {N} relevant files/sections

Phase: DECOMPOSITION
- Strategy: {by_structure | by_keyword | by_file}
- Sub-calls: {N} parallel tasks

Phase: AGGREGATION
- Collected: {N} intermediate results
- Aggregating: {approach}

Phase: SYNTHESIS
- Synthesizing final result
- Writing: {completion-artifact}

Completed: {timestamp}
─────────────────────────────────────────

On Completion

═══════════════════════════════════════════
RLM Agent: COMPLETE
═══════════════════════════════════════════

Task: {task-description}
Status: SUCCESS

Execution Summary:
- Files analyzed: {N}
- Sub-agents spawned: {M}
- Recursion depth: {D}
- Duration: {time}

Cost Metrics:
- Total tokens: {tokens}
- Sub-call tokens: {sub-tokens}
- Cost vs baseline: {percentage}

Artifacts:
- Query plan: {path}/query-plan.md
- Intermediate results: {path}/intermediate/ ({N} files)
- Final result: {path}/final-result.md

═══════════════════════════════════════════

Execution-Pattern Flow Diagram

The environment-first loop, drawn out:

┌─────────────────────────────────────────┐
│         RLM EXECUTION PATTERN           │
├─────────────────────────────────────────┤
│                                         │
│  ┌──────────────┐                      │
│  │ Identify     │                      │
│  │ What to Know │                      │
│  └──────┬───────┘                      │
│         │                               │
│         ▼                               │
│  ┌──────────────┐                      │
│  │ Write Code   │ ◀─ Grep/Glob/Read    │
│  │ to Query     │    with line ranges  │
│  └──────┬───────┘                      │
│         │                               │
│         ▼                               │
│  ┌──────────────┐                      │
│  │ Execute &    │                      │
│  │ Observe      │                      │
│  └──────┬───────┘                      │
│         │                               │
│    ┌────┴────┐                         │
│    │ Enough? │                         │
│    └────┬────┘                         │
│         │                               │
│     NO  │        YES                    │
│         │         │                     │
│    ┌────▼────┐   ▼                     │
│    │ Recurse │  Set                    │
│    │ Deeper  │  Completion             │
│    └─────────┘   State                 │
│         │         │                     │
│         │    ┌────▼────┐               │
│         └───▶│ DONE    │               │
│              └─────────┘               │
│                                         │
└─────────────────────────────────────────┘

State Directory Layout

RLM agents maintain explicit state on the filesystem:

.aiwg/rlm/tasks/{task-id}/
├── query-plan.md          # Task decomposition plan
├── intermediate/          # Named intermediate results
│   ├── search-results.txt
│   ├── filtered-files.json
│   └── aggregated-data.yaml
├── sub-calls/             # Delegated sub-tasks
│   ├── analyze-module-a.md
│   └── analyze-module-b.md
└── final-result.md        # Completion artifact

Key Principle: If an intermediate result might be useful later, write it to a file. Don't rely on context memory.

Configuration Samples

Basic Configuration

rlm_config:
  max_depth: 5                    # Maximum recursion depth
  max_sub_calls: 20               # Maximum sub-agents per task
  sub_model: "sonnet"             # Model for sub-agents (default: same as parent)
  parallel_sub_calls: true        # Allow parallel Task execution
  intermediate_dir: ".aiwg/rlm/tasks/{task-id}/intermediate/"
  completion_artifact: "final-result.md"

Advanced Configuration

advanced_rlm_config:
  chunk_strategy: "auto"          # auto | by_function | by_section | fixed_size
  chunk_size: 1000                # Lines per chunk (if fixed_size)
  cache_intermediate: true        # Reuse intermediate results
  cost_tracking: true             # Track token costs per sub-call
  timeout_per_subcall: 300        # Seconds (5 minutes default)
  fallback_on_depth_limit: true   # Warn instead of error at max depth

Cost Model Table

Based on REF-089 research findings:

MetricDirect ProcessingRLM Pattern
Median cost1.0x (baseline)0.8-1.2x
Cost varianceLowModerate (some outliers 3x+)
When cheaperShort contextsLong contexts, sparse access
When expensiveLong contextsInefficient decomposition

Key Insight: RLM is up to 3x cheaper than summarization agents when context access is selective. Cost depends on decomposition quality.

Research Foundation Quotes

Verbatim quotes from REF-089: Recursive Language Models (Zhang et al., 2026):

"The key insight is that arbitrarily long user prompts should not be fed into the neural network directly but should instead be treated as part of the environment that the LLM is tasked to symbolically and recursively interact with."

"Compared to the summarization agent which ingests the entire input context, RLMs are up to 3× cheaper while maintaining stronger performance across all tasks because the RLM is able to selectively view context."

"Even without explicit training, RLMs exhibit interesting context decomposition and problem decomposition behavior."

Comparison Tables

RLM vs Context Compaction

DimensionContext CompactionRLM Pattern
Information lossLossy (summarized)Lossless (original preserved)
Access patternSequentialRandom access via code
CostFixedVariable (sub-call dependent)
Scale ceilingLimited by compressed sizeUnbounded (recursive)
Best forShort/medium contextsLong/information-dense contexts

RLM vs RAG

DimensionRAGRLM Pattern
RetrievalPre-computed embeddingsDynamic, code-driven
FlexibilityFixed strategyAdaptive per query
Multi-hopDifficultNatural (recursive)
Setup costHigh (indexing)Zero (no preprocessing)
Best forKnown patterns, stable corporaAd-hoc analysis, changing data