Architecture Designer Tot Protocol Examples

Externalized from the agent definition per the few-shot-examples rule (#1587).

Architecture Designer ToT Protocol — Worked Examples

Externalized from the agent definition per the few-shot-examples rule (#1587).


Phase 1: Criteria Definition — Example

## Evaluation Criteria

| Criterion | Weight | Description | Source NFR |
|-----------|--------|-------------|------------|
| Performance | 30% | Sub-100ms API response times | @.aiwg/requirements/nfr-modules/performance.md |
| Scalability | 25% | Handle 10x growth without redesign | @.aiwg/requirements/nfr-modules/scalability.md |
| Maintainability | 20% | Team can modify without expert | @.aiwg/requirements/nfr-modules/maintainability.md |
| Security | 15% | OWASP Top 10 mitigation | @.aiwg/requirements/nfr-modules/security.md |
| Cost | 10% | Operational costs <$5K/month | Budget constraint |

Minimum acceptable score: 65/100
Critical criteria: Security must score 8+ (pass/fail)

Phase 2: Alternative Generation — Example

## Options Considered

### Option 1: Microservices with REST APIs
- Service mesh (Istio) for inter-service communication
- Kubernetes orchestration
- PostgreSQL per-service databases
- Event bus for async workflows (Kafka)

### Option 2: Modular Monolith
- Single deployable with clear module boundaries
- Shared PostgreSQL with schema-per-module
- In-process event bus
- Future extraction path to microservices

### Option 3: Serverless Functions (AWS Lambda)
- API Gateway + Lambda functions
- DynamoDB for data persistence
- EventBridge for async workflows
- Fully managed, auto-scaling

### Option 4: Hybrid Microservices + Monolith
- Core domain as microservices (high-change areas)
- Stable features as modular monolith
- Shared API gateway
- Mixed persistence (PostgreSQL + DynamoDB)

### Option 5: Service-Oriented Architecture (SOA)
- Coarse-grained services (larger than microservices)
- ESB for orchestration
- Centralized PostgreSQL with replication
- Strong versioning and contracts

Phase 3: Systematic Evaluation — Example

### Option [N]: [Name]

#### Evaluation

| Criterion | Score (0-10) | Rationale |
|-----------|--------------|-----------|
| Performance | 7 | Good latency with caching. Network hops add 10-20ms. Service mesh overhead 5ms. |
| Scalability | 9 | Kubernetes auto-scaling, horizontal pod scaling. Well-proven at scale. |
| Maintainability | 6 | High operational complexity. Requires DevOps expertise. Distributed debugging difficult. |
| Security | 8 | Service mesh provides mTLS. RBAC at service level. Attack surface larger than monolith. |
| Cost | 5 | High infrastructure costs. Kubernetes cluster overhead. Multiple databases expensive. |

**Weighted Score:** (7×0.30) + (9×0.25) + (6×0.20) + (8×0.15) + (5×0.10) = 7.05 × 10 = 70.5/100

**Critical Criteria Check:**
- [x] Security (8/10) → PASS (threshold: 8+)

#### Pros
- Independent scaling of services
- Technology diversity possible
- Team autonomy (service ownership)
- Fault isolation between services

#### Cons
- High operational complexity
- Distributed system challenges (debugging, monitoring)
- Network latency between services
- Higher infrastructure costs

#### Risks
- **Risk:** Team lacks microservices expertise
  **Mitigation:** Training program, hire experienced SRE, start with 2-3 services only
- **Risk:** Over-engineering for current scale (100 users)
  **Mitigation:** Begin with modular monolith, extract services later when needed

Phase 4: Comparison and Selection — Example

## Options Comparison Matrix

| Option | Perf | Scale | Maint | Sec | Cost | **Total** | Critical Pass? |
|--------|------|-------|-------|-----|------|-----------|----------------|
| 1. Microservices REST | 7 (2.1) | 9 (2.25) | 6 (1.2) | 8 (1.2) | 5 (0.5) | **70.5** | Yes |
| 2. Modular Monolith | 8 (2.4) | 6 (1.5) | 8 (1.6) | 8 (1.2) | 8 (0.8) | **75.0** | Yes |
| 3. Serverless Lambda | 9 (2.7) | 10 (2.5) | 7 (1.4) | 8 (1.2) | 7 (0.7) | **85.0** | Yes |
| 4. Hybrid Micro+Mono | 8 (2.4) | 8 (2.0) | 5 (1.0) | 8 (1.2) | 6 (0.6) | **72.0** | Yes |
| 5. SOA with ESB | 7 (2.1) | 7 (1.75) | 6 (1.2) | 7 (1.05) | 6 (0.6) | **67.5** | Fail (Sec<8) |

*Numbers in parentheses show weighted contribution (score × weight)*

Selection process walk-through:

1. Eliminate failures:

  • Remove options failing critical criteria (Option 5: Security 7 < 8 required)
  • Remove options below minimum threshold (all remaining pass 65/100)

2. Identify quantitative winner:

  • Highest total: Option 3 Serverless (85.0)
  • Second: Option 2 Modular Monolith (75.0)

3. Apply context factors:

  • Team expertise: Strong in traditional web apps, zero serverless experience
  • Current architecture: Monolithic PHP app, migration complexity matters
  • Timeline: 6-month deadline, learning curve a risk
  • Vendor lock-in: Leadership prefers cloud-agnostic where possible
  • Scale reality: 1K users today, 10K projected in 2 years (not hyperscale)

4. Make selection:

  • Selected: Option 2 Modular Monolith (75.0)
  • Why not Option 3 (85.0)? Despite higher score:
  • Team skill gap too large (risk)
  • AWS lock-in conflicts with strategy (business)
  • Premature optimization for current scale (context)
  • 10-point difference acceptable given risk reduction
## Decision

**Selected Option:** Option 2 - Modular Monolith

**Quantitative Rationale:**
- Scored 75.0/100 (above 65.0 threshold)
- Ranked 2nd among viable options
- Passed critical security criterion (8/10)

**Qualitative Rationale:**

While Option 3 (Serverless) scored higher (85.0), we selected Option 2 due to:

1. **Team capability alignment:** Strong existing expertise in monolithic architectures, zero serverless experience. Learning curve poses schedule risk.
2. **Migration complexity:** Current PHP monolith maps naturally to modular monolith refactor, not to Lambda functions. Reduces migration risk.
3. **Strategic fit:** Cloud-agnostic architecture preferred; serverless creates AWS lock-in.
4. **Scale appropriateness:** Serverless optimizes for hyperscale (millions of requests). We project 10K users in 2 years. Modular monolith handles this easily.
5. **Future flexibility:** Modular monolith provides clear extraction path to microservices if/when scale demands it.

**Trade-offs Accepted:**
- **Lower scalability ceiling:** Monolith scales to ~50K concurrent users before requiring microservices extraction (acceptable for 5-year horizon).
- **Technology diversity limited:** Single tech stack vs per-service selection in microservices (acceptable given team size of 8 developers).

Phase 5: Backtracking Triggers — Example

## Backtracking Triggers

Re-evaluate this decision if:

1. **Scalability ceiling:** Active user count exceeds 40K (approaching monolith capacity limit)
2. **Performance degradation:** 95th percentile API response time consistently exceeds 200ms for 2+ weeks
3. **Deployment frequency bottleneck:** Unable to deploy more than 2x per week due to monolith coordination
4. **Team growth:** Development team exceeds 20 engineers (monolith coordination overhead becomes issue)
5. **Feature isolation need:** Regulatory requirement demands strict data isolation between features (suggests service boundaries)
6. **Technology diversity requirement:** Hiring market shifts strongly toward specialized languages/frameworks (e.g., Go for performance-critical components)
7. **Operational cost spike:** Infrastructure costs exceed $15K/month (3x current projection, suggests inefficiency)

**Backtracking Action:** When trigger occurs, re-run ToT evaluation with updated context, constraints, and scale requirements.

Alternative Generation Diversity — Bad vs Good

Bad Example (Insufficient Diversity)

Options:
1. PostgreSQL with pgBouncer pooling
2. PostgreSQL with PgPool-II pooling
3. PostgreSQL with Odyssey pooling

Problem: All options are PostgreSQL with different connection poolers. No architectural diversity. This is implementation detail selection, not architecture decision requiring ToT.

Good Example (Sufficient Diversity)

Options:
1. PostgreSQL with read replicas (RDBMS, ACID, vertical scale primary)
2. MongoDB sharded cluster (NoSQL, flexible schema, horizontal scale)
3. DynamoDB with DAX (Managed NoSQL, serverless, AWS-native)
4. Hybrid: PostgreSQL transactional + Redis cache + S3 objects
5. CockroachDB (NewSQL, distributed SQL, cloud-native)

Why good: Represents different paradigms (RDBMS vs NoSQL vs NewSQL), deployment models (self-hosted vs managed), and trade-offs (consistency vs availability, cost vs flexibility).


Evaluation Scoring Rationale — Bad vs Good

Bad Example (Insufficient Rationale)

| Performance | 8 | Good performance |

Problem: Circular reasoning. No specifics. Not auditable.

Good Example (Sufficient Rationale)

| Performance | 8 | Achieves <100ms p95 latency for read queries via materialized views. Write operations 200ms due to transaction overhead (acceptable per NFR-PERF-003). Degrades under 10K concurrent writes (mitigation: write sharding). |

Why good: Specific metrics, identifies trade-offs, references NFRs, acknowledges limitations.


Context Factor Documentation — Template

**Qualitative Rationale:**

While Option X scored highest (YY.Y), we selected Option Z due to:

1. **[Context factor category]:** [Specific situation]
2. **[Context factor category]:** [Specific situation]
3. **[Context factor category]:** [Specific situation]

The score difference ([X.X] points) is acceptable given these risk reductions.

Backtracking Trigger Measurability — Bad vs Good

Bad Example (Unmeasurable)

Backtracking triggers:
- If performance becomes a problem
- If the team struggles with the technology
- If costs get too high

Problem: Subjective, no thresholds, not actionable.

Good Example (Measurable)

Backtracking triggers:
1. **Performance:** P95 response time exceeds 500ms for 7+ consecutive days
2. **Team capability:** >3 senior developers leave within 6 months AND replacements not hired within 60 days
3. **Cost:** Monthly infrastructure cost exceeds $20K (2x projection) for 3+ consecutive months

Why good: Specific metrics, clear thresholds, time windows, unambiguous detection.


Full ADR Worked Examples

Example 1: Database Selection ADR

Context: Selecting primary database for new e-commerce platform

Criteria:

  • Performance (30%): Sub-100ms query latency
  • Scalability (25%): Handle 50K concurrent users
  • Maintainability (20%): Team familiar with technology
  • Security (15%): ACID compliance, encryption at rest
  • Cost (10%): <$5K/month operational cost

Options Generated (k=5):

1. PostgreSQL with read replicas

2. MongoDB sharded cluster

3. Amazon DynamoDB with DAX

4. MySQL with ProxySQL

5. CockroachDB distributed SQL

Evaluation: [Full scoring matrix with rationale]

Selection: PostgreSQL (scored 78.0, ranked 2nd) chosen over DynamoDB (scored 82.0, ranked 1st) due to team expertise and cloud-agnostic strategy.

Backtracking Triggers:

  • User count exceeds 40K
  • Query latency exceeds 200ms p95
  • Sharding becomes necessary (complexity threshold)

Example 2: API Design Pattern ADR

Context: Choosing API style for mobile app backend

Criteria:

  • Developer Experience (35%): Mobile team ease of use
  • Performance (25%): Minimize over-fetching
  • Flexibility (20%): Support evolving UI requirements
  • Maturity (15%): Ecosystem, tools, best practices
  • Learning Curve (5%): Time to team proficiency

Options Generated (k=4):

1. REST with JSON:API specification

2. GraphQL with Apollo

3. gRPC with Protocol Buffers

4. REST with custom JSON + WebSocket for real-time

Evaluation: [Full scoring matrix]

Selection: GraphQL (scored 81.0, ranked 1st) aligns with criteria and team preferences.

Backtracking Triggers:

  • Query complexity causes N+1 performance issues
  • Caching strategy proves inadequate
  • Mobile team rejects GraphQL after 3-month trial

Agent Interaction Patterns — Worked Dialogs

User Requests ToT Evaluation

User: "Create an ADR for database selection using Tree of Thoughts"

Agent Response:

1. Acknowledge ToT request

2. Read NFR modules to establish criteria

3. Present criteria with weights for user confirmation

4. Generate k=5 database alternatives

5. Evaluate each option with scoring

6. Build comparison matrix

7. Recommend selection with rationale

8. Define backtracking triggers

9. Populate ADR template

10. Save to `.aiwg/architecture/decisions/ADR-XXX-database-selection.md`

User Requests Simple Justification

User: "Document why we chose PostgreSQL"

Agent Response:

1. Clarify: "Would you like a simple justification ADR, or should I evaluate alternatives using Tree of Thoughts for a more thorough decision record?"

2. If user confirms simple: Create basic ADR without full ToT process

3. If user wants thorough: Proceed with ToT workflow

User Needs Quick Decision

User: "We need to choose a logging framework today - we're blocked"

Agent Response:

1. Acknowledge urgency

2. Offer expedited ToT: k=3 options instead of k=5, simplified evaluation

3. Present comparison matrix within 30 minutes

4. Note in ADR that this was time-constrained ToT (may revisit later)