Regression Analyst Examples

Externalized from the agent definition per the few-shot-examples rule (#1587).

Regression Analyst — Worked Examples

Externalized from the agent definition per the few-shot-examples rule (#1587).

Few-Shot Examples

Example 1: Detecting a Regression from Test Failures

Input:

"The CI pipeline is failing with 3 test failures that were passing yesterday. Investigate and identify the regression."

Output:

## Regression Detection Report

**Analysis Date**: 2024-01-20
**CI Build**: #4521 (failed)
**Previous Passing Build**: #4520 (passed)

### Thought: Goal
I need to identify what changed between build #4520 and #4521 that caused 3 tests to fail.

### Step 1: Identify Failing Tests

**Extraction**: From CI logs, the failing tests are:
- `test/unit/auth/token.test.ts`: "should refresh expired token"
- `test/unit/auth/token.test.ts`: "should handle refresh failure gracefully"
- `test/integration/auth/session.test.ts`: "should maintain session across refresh"

All failures are in the authentication module, specifically token refresh.

### Step 2: Compare Git History

**Action**: Check commits between builds

git log --oneline #4520..#4521


**Observation**: 4 commits found:

def7890 Update README with new API docs

abc4567 Refactor token refresh logic for clarity

890abcd Add logging to payment module

123efgh Fix typo in error message


**Reasoning**: Commit `abc4567` "Refactor token refresh logic" is the most likely culprit given all failures are in token refresh.

### Step 3: Analyze the Suspect Commit

**Action**: Review the diff

git show abc4567 --stat


**Observation**:

src/auth/token-manager.ts | 45 +++++++++++++++++++++------------------------

1 file changed, 21 insertions(+), 24 deletions(-)


**Extraction**: Key changes in the diff:

async refreshToken(token: string): Promise<string> {
if (this.isExpired(token)) {
return await this.requestNewToken(token);
}
return token;
}

async refreshToken(token: string): Promise<string> {
// Refactored for clarity
if (!this.isExpired(token)) {
return token; // Early return for valid tokens
}
return await this.requestNewToken(token);
}


**Exception**: Wait - the logic appears identical after refactor. Let me check more carefully.

**Further Extraction**: Looking at the full diff:

private isExpired(token: string): boolean {
const exp = this.decodeExpiry(token);
return Date.now() > exp;
}

private isExpired(token: string): boolean {
const exp = this.decodeExpiry(token);
return Date.now() >= exp; // Changed > to >=
}


**Reasoning**: Found it! The comparison changed from `>` to `>=`. This means tokens expiring at exactly the current timestamp are now considered expired when they weren't before. The tests likely use fixed timestamps that hit this edge case.

### Step 4: Confirm with Git Bisect

**Action**: Run automated bisect

git bisect start HEAD~5 HEAD~10

git bisect run npm test -- --grep "should refresh expired token"


**Observation**:

abc4567 is the first bad commit


Confirmed: Commit `abc4567` introduced the regression.

### Regression Summary

**ID**: REG-2024-001
**Severity**: High (authentication affected)
**Type**: Functional - boundary condition change

| Aspect | Value |
|--------|-------|
| Introduced In | abc4567 |
| Author | [email protected] |
| Root Cause | Comparison operator change `>` to `>=` |
| Blast Radius | 3 tests, auth module only |

### Recommended Fix

// Revert the comparison operator

private isExpired(token: string): boolean {

const exp = this.decodeExpiry(token);

return Date.now() > exp; // Use > not >=

}


### Regression Test to Add

it('should not consider token expired at exact expiry time', () => {

// This catches the boundary condition

const exactExpiryToken = createToken({ exp: Date.now() / 1000 });

expect(tokenManager.isExpired(exactExpiryToken)).toBe(false);

});

it('should consider token expired 1ms after expiry', () => {

const expiredToken = createToken({ exp: (Date.now() - 1) / 1000 });

expect(tokenManager.isExpired(expiredToken)).toBe(true);

});

Why This Is Good:

Systematic investigation: Follows a logical progression from symptom to root cause
Uses thought protocol: Explicitly tracks reasoning with Goal, Extraction, Reasoning, Exception
Validates findings: Confirms bisect result before concluding
Identifies root cause precisely: Pinpoints exact line change causing regression
Provides actionable fix: Includes specific code change to resolve
Recommends prevention: Adds boundary condition tests to prevent recurrence

Example 2: Analyzing Root Cause with Git Bisect

Input:

"Our API response time has degraded from p99 of 200ms to 450ms. Find when this started and identify the cause."

Output:

## Performance Regression Analysis

**Metric**: API p99 Response Time
**Baseline**: 200ms (v2.3.0)
**Current**: 450ms (v2.4.0-rc2)
**Degradation**: +125%

### Thought: Goal
I need to identify when the p99 latency regression was introduced and determine the root cause.

### Step 1: Establish Timeline

**Action**: Query performance metrics over time

Get daily p99 values for the past 2 weeks

curl -s "$METRICS_API/query?metric=api_latency_p99&range=14d" | jq '.data'


**Observation**: Performance data shows degradation starting 2024-01-18:

2024-01-15: 195ms

2024-01-16: 201ms

2024-01-17: 198ms

2024-01-18: 312ms <-- Degradation starts

2024-01-19: 425ms

2024-01-20: 448ms


**Extraction**: The regression was introduced between 2024-01-17 and 2024-01-18.

### Step 2: Identify Commits in Window

**Action**: Find commits deployed on 2024-01-18

git log --after="2024-01-17" --before="2024-01-19" --oneline


**Observation**:

f1e2d3c Add detailed request logging

a4b5c6d Implement user activity tracking

789abc0 Update dependency versions


### Step 3: Create Bisect Test Script

**Reasoning**: I need a test that measures p99 latency reliably. A benchmark suite that makes 100 requests and checks the 99th percentile.

**Action**: Create bisect test

#!/bin/bash

performance-bisect.sh

npm run build || exit 125

Start server in background

npm run start:test &

SERVER_PID=$!

sleep 5

Run benchmark

RESULT=$(npm run benchmark -- --requests=100 --json)

P99=$(echo $RESULT | jq '.p99')

kill $SERVER_PID

Threshold: 250ms (between good and bad)

if (( $(echo "$P99 > 250" | bc -l) )); then

exit 1 # Bad commit

else

exit 0 # Good commit


### Step 4: Run Git Bisect

**Action**: Execute bisect

git bisect start HEAD v2.3.0

git bisect run ./performance-bisect.sh


**Observation**:

running ./performance-bisect.sh

p99: 198ms - PASS

running ./performance-bisect.sh

p99: 445ms - FAIL

running ./performance-bisect.sh

p99: 312ms - FAIL

running ./performance-bisect.sh

p99: 199ms - PASS

f1e2d3c is the first bad commit

commit f1e2d3c

Author: [email protected]

Date: Thu Jan 18 09:15:00 2024

Add detailed request logging


### Step 5: Analyze the Culprit Commit

**Action**: Review the diff

git show f1e2d3c


**Extraction**: Key changes:

import { RequestLogger } from './logging/request-logger';

async handleRequest(req: Request, res: Response) {

const logger = new RequestLogger();
await logger.logRequestStart(req);

const result = await this.processRequest(req);

await logger.logRequestEnd(req, result);
await logger.flush(); // Sync write to disk!

return res.json(result);

}


**Reasoning**: The issue is `await logger.flush()` - this is a synchronous disk write on every request. This blocks the response until logging is complete, adding significant latency especially under load when disk I/O queues up.

### Step 6: Profile to Confirm

**Action**: Add timing instrumentation

npm run profile -- --endpoint=/api/users --requests=100


**Observation**:

Request breakdown (avg):

Business logic: 45ms
Database query: 35ms
Serialization: 12ms
Logging flush: 245ms <-- 54% of request time!
Other: 11ms

Total: 348ms


**Synthesis**: Confirmed. The synchronous logging flush accounts for 245ms of the 350ms average response time, explaining the p99 jump from 200ms to 450ms.

### Root Cause Analysis

**ID**: REG-PERF-2024-002
**Type**: Performance regression
**Severity**: High
**Introduced**: f1e2d3c (2024-01-18)

| Factor | Finding |
|--------|---------|
| What | Synchronous disk write on every request |
| Why | `logger.flush()` called with await in request path |
| Impact | +245ms average latency per request |
| Blast Radius | All API endpoints |

### Recommended Fixes

**Option A: Async Logging (Recommended)**

async handleRequest(req: Request, res: Response) {

const logger = new RequestLogger();

logger.logRequestStart(req); // Fire and forget

const result = await this.processRequest(req);

logger.logRequestEnd(req, result);

logger.scheduleFlush(); // Batch and flush asynchronously

return res.json(result);

}


**Option B: Background Queue**

// Use a dedicated logging worker

import { loggingQueue } from './logging/queue';

async handleRequest(req: Request, res: Response) {

const result = await this.processRequest(req);

// Queue for background processing

loggingQueue.enqueue({ type: 'request', req, result });

return res.json(result);

}


### Performance Guardrails to Add

.github/workflows/performance-gate.yml

performance-test:

runs-on: ubuntu-latest

steps:

uses: actions/checkout@v3
run: npm run benchmark -- --json > results.json
name: Check p99 threshold

run: |

P99=$(jq '.p99' results.json)

if (( $(echo "$P99 > 250" | bc -l) )); then

echo "Performance regression: p99 ${P99}ms > 250ms threshold"

exit 1

Why This Is Good:

Data-driven timeline: Uses metrics to establish when regression started
Automated bisect with performance test: Creates reproducible benchmark script
Profiling confirmation: Validates root cause with actual timing data
Multiple fix options: Provides alternatives with trade-offs
CI guardrail: Recommends automated prevention for future

Example 3: Generating Regression Prevention Recommendations

Input:

"We've had 5 regressions in the billing module in the past quarter. Analyze the pattern and recommend prevention strategies."

Output:

## Regression Pattern Analysis & Prevention Plan

**Module**: `src/billing/`
**Analysis Period**: Q4 2023 (Oct - Dec)
**Regressions Analyzed**: 5

### Thought: Goal
I need to identify patterns in the billing module regressions and recommend targeted prevention strategies.

### Historical Regression Analysis

| ID | Date | Type | Root Cause | Time to Detect | Fix Time |
|----|------|------|------------|----------------|----------|
| REG-041 | Oct 5 | Calculation | Float precision in tax calc | 3 days | 2 hours |
| REG-047 | Oct 22 | Integration | Payment gateway timeout not handled | 1 day | 4 hours |
| REG-052 | Nov 8 | Boundary | Discount > 100% allowed | 5 days | 1 hour |
| REG-058 | Nov 29 | State | Invoice state machine invalid transition | 2 days | 3 hours |
| REG-063 | Dec 15 | Calculation | Currency rounding error | 7 days | 2 hours |

### Pattern Extraction

**Reasoning**: Analyzing the regression types and root causes:

| Pattern | Count | Percentage | Examples |
|---------|-------|------------|----------|
| Calculation/Precision | 3 | 60% | REG-041, REG-052, REG-063 |
| Integration/External | 1 | 20% | REG-047 |
| State Management | 1 | 20% | REG-058 |

**Key Finding**: 60% of regressions are related to numerical calculations (tax, discount, currency). This indicates a systemic weakness in testing financial calculations.

### Coverage Analysis

**Action**: Analyze current test coverage

npm run coverage -- --scope=src/billing


**Observation**:

--------------------------|---------|----------|---------|---------|

src/billing/ | 72.3 | 58.4 | 81.2 | 71.8 |

calculator.ts | 68.1 | 45.2 | 75.0 | 67.3 | <-- Weak

tax-service.ts | 65.4 | 52.1 | 70.0 | 64.8 | <-- Weak

discount-engine.ts | 71.2 | 48.9 | 80.0 | 70.5 | <-- Weak

invoice-state.ts | 82.3 | 71.4 | 90.0 | 81.9 |

payment-gateway.ts | 78.5 | 62.3 | 85.0 | 77.2 |


**Extraction**: The three files with lowest branch coverage (`calculator.ts`, `tax-service.ts`, `discount-engine.ts`) are exactly where calculation regressions occurred.

### Risk Heat Map

**Reasoning**: Combining regression history, coverage, and complexity:

| Component | Regressions | Coverage | Complexity | Risk Score |
|-----------|-------------|----------|------------|------------|
| calculator.ts | 2 | 45% branch | 15 cyclomatic | **CRITICAL** |
| tax-service.ts | 1 | 52% branch | 12 cyclomatic | **HIGH** |
| discount-engine.ts | 1 | 49% branch | 11 cyclomatic | **HIGH** |
| invoice-state.ts | 1 | 71% branch | 8 cyclomatic | MEDIUM |
| payment-gateway.ts | 0 | 62% branch | 9 cyclomatic | MEDIUM |

### Prevention Recommendations

#### 1. Property-Based Testing for Calculations (CRITICAL)

**Rationale**: Calculation regressions stem from edge cases that example-based tests miss. Property-based testing generates hundreds of test cases automatically.

**Implementation**:

// test/billing/calculator.property.test.ts

import * as fc from 'fast-check';

import { calculateTotal, applyDiscount, calculateTax } from '@/billing/calculator';

describe('Calculator Properties', () => {

describe('calculateTotal', () => {

it('should always be >= 0', () => {

fc.assert(

fc.property(

fc.array(fc.record({

price: fc.float({ min: 0, max: 10000 }),

quantity: fc.integer({ min: 1, max: 100 })

})),

(items) => calculateTotal(items) >= 0

)

);

});

it('should be sum of (price * quantity)', () => {

fc.assert(

fc.property(

fc.array(fc.record({

price: fc.float({ min: 0, max: 10000 }),

quantity: fc.integer({ min: 1, max: 100 })

})),

(items) => {

const expected = items.reduce((sum, i) => sum + i.price * i.quantity, 0);

return Math.abs(calculateTotal(items) - expected) < 0.01;

}

)

);

});

describe('applyDiscount', () => {

it('discount should never exceed original price', () => {

fc.assert(

fc.property(

fc.float({ min: 0, max: 10000 }),

fc.float({ min: 0, max: 100 }),

(price, discountPercent) => applyDiscount(price, discountPercent) <= price

)

);

});

it('discount should never result in negative price', () => {

fc.assert(

fc.property(

fc.float({ min: 0, max: 10000 }),

fc.float({ min: 0, max: 200 }), // Even with invalid >100% discount

(price, discountPercent) => applyDiscount(price, discountPercent) >= 0

)

);

});


**Expected Impact**: Would have caught REG-041, REG-052, REG-063 (all 3 calculation regressions)

#### 2. State Machine Testing for Invoice Lifecycle (HIGH)

**Rationale**: State transition bugs are hard to find with example tests. Model-based testing exhaustively tests all state transitions.

**Implementation**:

// test/billing/invoice-state.model.test.ts

import { createModel } from '@xstate/test';

import { invoiceStateMachine } from '@/billing/invoice-state';

const testModel = createModel(invoiceStateMachine).withEvents({

CREATE: { exec: async (context) => await context.invoice.create() },

APPROVE: { exec: async (context) => await context.invoice.approve() },

SEND: { exec: async (context) => await context.invoice.send() },

PAY: { exec: async (context) => await context.invoice.pay() },

CANCEL: { exec: async (context) => await context.invoice.cancel() },

});

describe('Invoice State Machine', () => {

const testPlans = testModel.getSimplePathPlans();

testPlans.forEach(plan => {

describe(plan.description, () => {

plan.paths.forEach(path => {

it(path.description, async () => {

const context = { invoice: new Invoice() };

await path.test(context);

});


**Expected Impact**: Would have caught REG-058 (invalid state transition)

#### 3. Integration Test with Chaos Engineering (MEDIUM)

**Rationale**: REG-047 was caused by unhandled timeout. Inject failures to test resilience.

**Implementation**:

// test/integration/payment-gateway.chaos.test.ts

import { PaymentGateway } from '@/billing/payment-gateway';

import { mockGatewayWithLatency, mockGatewayWithFailure } from '@/test/mocks';

describe('Payment Gateway Resilience', () => {

it('should handle gateway timeout gracefully', async () => {

const gateway = new PaymentGateway(mockGatewayWithLatency(5000));

const result = await gateway.processPayment({

amount: 100,

timeout: 3000

});

expect(result.status).toBe('timeout');

expect(result.retryable).toBe(true);

});

it('should handle gateway 5xx errors with retry', async () => {

const gateway = new PaymentGateway(mockGatewayWithFailure(500, 2)); // Fail twice, then succeed

const result = await gateway.processPayment({ amount: 100 });

expect(result.status).toBe('success');

expect(result.attempts).toBe(3);

});


**Expected Impact**: Would have caught REG-047 (gateway timeout)

#### 4. CI Pipeline Enhancements

.github/workflows/billing-quality-gate.yml

name: Billing Quality Gate

on:

pull_request:

paths:

'src/billing/**'

jobs:

quality-gate:

runs-on: ubuntu-latest

steps:

uses: actions/checkout@v3

name: Property-based tests

run: npm run test:property -- --scope=billing

name: Mutation testing

run: npx stryker run --mutate="src/billing/**/*.ts"

name: Branch coverage check

run: |

npm run coverage -- --scope=src/billing --json > coverage.json

BRANCH=$(jq '.total.branches.pct' coverage.json)

if (( $(echo "$BRANCH < 75" | bc -l) )); then

echo "Branch coverage ${BRANCH}% below 75% threshold"

exit 1

name: Calculation precision audit

run: |

Check for float comparisons without epsilon

grep -r "==\|!=" src/billing/*.ts | grep -v "===" | grep -v "!==" && exit 1

Check for integer division in financial code

grep -r "/ [0-9]" src/billing/*.ts && exit 1

exit 0


### Implementation Roadmap

| Phase | Action | Owner | ETA | Impact |
|-------|--------|-------|-----|--------|
| 1 | Add property-based tests for calculator.ts | Test Engineer | Week 1 | Prevents 60% of regressions |
| 2 | Add property-based tests for tax and discount | Test Engineer | Week 2 | Completes calculation coverage |
| 3 | Add state machine tests for invoice | Test Engineer | Week 3 | Prevents state regressions |
| 4 | Enable mutation testing in CI | DevOps | Week 3 | Catches weak tests |
| 5 | Add chaos tests for payment gateway | Test Engineer | Week 4 | Prevents integration regressions |
| 6 | Implement coverage gates | DevOps | Week 4 | Maintains quality |

### Synthesis: Expected Outcomes

| Metric | Current | Target | Timeline |
|--------|---------|--------|----------|
| Regressions per quarter | 5 | 1 | Q1 2024 |
| Branch coverage | 58% | 80% | 4 weeks |
| Mutation score | Unknown | 75% | 4 weeks |
| Time to detect regression | 3.6 days avg | < 1 day | Immediate |

### Regression Register Update

Added prevention tracking to `.aiwg/testing/regression-register/prevention-plan.yaml`:

module: src/billing/

analysis_date: 2024-01-20

regressions_analyzed: 5

patterns_identified:

calculation_precision
integration_timeout
state_transition

prevention_measures:

id: PREV-001

type: property_based_testing

status: planned

target_regressions: [REG-041, REG-052, REG-063]

id: PREV-002

type: state_machine_testing

status: planned

target_regressions: [REG-058]

id: PREV-003

type: chaos_testing

status: planned

target_regressions: [REG-047]

next_review: 2024-04-01

Why This Is Good:

Pattern-based analysis: Groups regressions by type to identify systemic issues
Data-driven recommendations: Uses coverage data to target weak areas
Specific test code: Provides ready-to-implement test examples
CI integration: Includes workflow configuration for automated enforcement
Prioritized roadmap: Orders actions by impact with clear ownership
Measurable outcomes: Sets specific targets for improvement

Reference Output Formats

Externalized from the agent definition (#1600). These are the full templates the

agent definition summarizes; reproduce them verbatim when emitting reports.

Regression Analysis Report

## Regression Analysis Report

**Project**: [project-name]
**Analysis Date**: YYYY-MM-DD
**Baseline Version**: v2.1.0
**Current Version**: v2.2.0-rc1

### Executive Summary

- **Total Regressions Found**: 3
- **Critical**: 1 (blocks release)
- **High**: 1 (fix before release)
- **Medium**: 1 (schedule fix)

### Critical Regressions (Fix Immediately)

#### REG-001: Payment calculation returns incorrect discount

**Symptom**: Discount calculation fails for orders > $1000
**Introduced In**: commit abc1234 (2024-01-15)
**Author**: [email protected]
**Root Cause**: Integer overflow in discount percentage calculation

**Git Bisect Results**:

abc1234 is the first bad commit

commit abc1234

Author: [email protected]

Date: Mon Jan 15 10:30:00 2024

Optimize discount calculation for performance


**Affected Code**:

const discount = (price * discountPercent) / 100;
const discount = price * (discountPercent / 100); // Integer division!


**Blast Radius**:
- Direct: `src/billing/discount.ts`
- Transitive: `src/checkout/cart.ts`, `src/orders/summary.ts`, `src/reports/revenue.ts`
- Tests Affected: 12 unit tests, 3 integration tests

**Recommended Fix**:

const discount = (price * discountPercent) / 100.0; // Force float division


**Regression Tests to Add**:

describe('discount calculation', () => {

it('should handle large orders correctly', () => {

expect(calculateDiscount(10000, 15)).toBe(1500);

});

it('should maintain precision for percentage calculations', () => {

expect(calculateDiscount(33, 10)).toBeCloseTo(3.3, 2);

});


### High Priority Regressions

[... detailed analysis for each ...]

### Regression Prevention Recommendations

| Area | Risk Level | Current Coverage | Recommended Action |
|------|------------|------------------|-------------------|
| `src/billing/` | High | 65% | Add property-based tests for calculations |
| `src/auth/` | Medium | 78% | Add contract tests for token validation |
| `src/api/` | Medium | 72% | Add performance benchmarks |

### Metrics

| Metric | Baseline | Current | Delta |
|--------|----------|---------|-------|
| Test Pass Rate | 100% | 97.2% | -2.8% |
| p50 Latency | 45ms | 52ms | +15.5% |
| Error Rate | 0.1% | 0.3% | +200% |

Regression Register Entry

# .aiwg/testing/regression-register/REG-001.yaml
id: REG-001
title: "Payment calculation returns incorrect discount"
status: open  # open, investigating, fixing, resolved, wont-fix
severity: critical
type: functional

detection:
  date: 2024-01-20
  method: automated_test_failure
  reporter: ci-pipeline
  test_name: "billing.discount.should handle large orders"

analysis:
  root_cause: "Integer division in discount calculation"
  introduced_in:
    commit: abc1234
    date: 2024-01-15
    author: [email protected]
    pr: "#456"
  blast_radius:
    direct_files: 1
    transitive_files: 3
    affected_tests: 15

resolution:
  fix_commit: null
  fix_pr: null
  regression_test_added: false
  resolved_date: null

prevention:
  guardrails_recommended:
    - "Add property-based tests for all financial calculations"
    - "Enable integer overflow detection in CI"
  similar_risks:
    - "src/billing/tax.ts uses same pattern"

High-Risk Area Identification (reference implementation)

interface HighRiskArea {
  path: string;
  riskFactors: string[];
  regressionHistory: number;  // Past regression count
  testCoverage: number;       // Percentage
  complexityScore: number;    // Cyclomatic complexity
  recommendation: string;
}

function identifyHighRiskAreas(
  codebase: CodebaseAnalysis,
  regressionHistory: RegressionRegister
): HighRiskArea[] {
  return codebase.modules
    .map(module => ({
      path: module.path,
      riskFactors: [
        module.testCoverage < 0.8 ? 'Low test coverage' : null,
        module.complexityScore > 10 ? 'High complexity' : null,
        module.changeFrequency > 5 ? 'Frequently modified' : null,
        regressionHistory.countForPath(module.path) > 2 ? 'Prior regressions' : null
      ].filter(Boolean),
      regressionHistory: regressionHistory.countForPath(module.path),
      testCoverage: module.testCoverage,
      complexityScore: module.complexityScore,
      recommendation: generateRecommendation(module)
    }))
    .filter(area => area.riskFactors.length > 0)
    .sort((a, b) => b.riskFactors.length - a.riskFactors.length);
}