Quality Agent Examples

Externalized from the agent definition per the few-shot-examples rule (#1587).

Quality Agent — Worked Examples

Externalized from the agent definition per the few-shot-examples rule (#1587).

Example 1: Simple - Single Source Assessment

Input:

"Assess the quality of REF-025 (OAuth 2.0 Security Best Practices paper). Generate a quality report with dimension scores, GRADE rating, and FAIR compliance check."

Output:

Thought [Goal]: I need to assess REF-025 quality across 5 dimensions, apply GRADE methodology, validate FAIR compliance, and generate a comprehensive quality report.

Thought [Extraction]: Loading metadata from `.aiwg/research/sources/metadata/REF-025.yaml`. Found: ACM CCS 2023 (A* venue), authors Smith & Doe, DOI 10.1145/3576915.3623456.

Thought [Reasoning]: Authority score will be high (A* venue = 40, established authors = 25). Querying Semantic Scholar for citation count.

# Quality Assessment Report: REF-025

**Generated:** 2026-02-03T10:30:00Z

## Executive Summary

**Overall Score:** 87/100 (High Quality)
**GRADE Rating:** High (strong confidence in evidence)
**FAIR Compliance:** 15/15 principles met
**Recommendation:** ✓ Approved for integration

## Dimension Scores

| Dimension | Score | Weight | Weighted | Justification |
|-----------|-------|--------|----------|---------------|
| Authority | 85 | 30% | 25.5 | A* venue, established authors, 42 citations |
| Currency | 90 | 20% | 18.0 | Published 2023, still actively cited |
| Accuracy | 95 | 25% | 23.75 | Peer-reviewed, rigorous methodology, open data |
| Coverage | 80 | 15% | 12.0 | Comprehensive for OAuth, single institution limitation |
| Objectivity | 85 | 10% | 8.5 | No conflicts, balanced treatment |
| **Total** | - | 100% | **87.75** | Rounded: 87/100 |

## GRADE Assessment

**Study Design:** Empirical with user testing
**Starting:** Low (observational)
**Upgrades:** +2 (large effect, well-controlled)
**Final GRADE:** High

## FAIR Compliance: 15/15 ✓

[Full FAIR checklist with all principles verified]

## Recommendation

✓ Approved for integration as primary evidence for OAuth security claims.

Thought [Synthesis]: Quality assessment complete. Score of 87/100 (High) with GRADE rating of High indicates strong evidence. All FAIR principles met. Approved for integration.

Why This Is Good:

  • Complete dimension scoring: All 5 dimensions scored with justification
  • GRADE methodology applied: Starting level determined by study design, adjustments explained
  • FAIR validation thorough: All 15 principles checked individually
  • Clear recommendation: Approved status with usage guidance
  • Calculation transparency: Weighted score calculation shown step-by-step

Example 2: Moderate - Batch Assessment with Quality Gate

Input:

"Perform batch quality assessment on all 25 documented sources (REF-001 through REF-025). Apply quality gate with threshold 70. Identify low-quality sources and suggest alternatives. Generate aggregate report with statistics."

Output:

Thought [Goal]: I need to assess 25 sources in batch, apply quality gate filtering, and generate an aggregate report with quality distribution statistics.

Thought [Extraction]: Loading metadata for REF-001 through REF-025. Querying Semantic Scholar API for citation counts in parallel to optimize performance.

Batch quality assessment for 25 sources...
Quality gate enabled (threshold: 70)

Assessing sources:
[████████████████████████████████████████████████] 25/25 (100%)

Results:
[1] REF-001: 82/100 (High) GRADE: Moderate ✓ PASS
[2] REF-002: 75/100 (High) GRADE: Moderate ✓ PASS
[3] REF-003: 45/100 (Low) GRADE: Very Low ✗ FAIL - Blog post, no peer review
[4] REF-004: 88/100 (High) GRADE: High ✓ PASS
[5] REF-005: 79/100 (High) GRADE: Moderate ✓ PASS
[6] REF-006: 65/100 (Moderate) GRADE: Low ⚠ WARN - Preprint, pending review
...
[17] REF-017: 52/100 (Moderate) GRADE: Low ✗ FAIL - No DOI, FAIR violation
...
[25] REF-025: 87/100 (High) GRADE: High ✓ PASS

Batch Summary:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Quality Distribution:
  High (70+):     18 sources (72%) ████████████████████
  Moderate (50-69): 5 sources (20%) ██████
  Low (<50):       2 sources (8%)  ██

GRADE Distribution:
  High:         6 sources (24%)
  Moderate:    12 sources (48%)
  Low:          5 sources (20%)
  Very Low:     2 sources (8%)

Quality Gate Results:
  Passed:  18 sources (72%)
  Warning:  5 sources (20%)
  Failed:   2 sources (8%)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Low-Quality Sources:

REF-003 (Score: 45/100)
- Issue: Blog post without peer review
- GRADE: Very Low (expert opinion only)
- Recommendation: Seek peer-reviewed alternative on same topic
- Search suggestion: "OAuth security best practices" in ACM/IEEE

REF-017 (Score: 52/100)
- Issue: Missing DOI (FAIR F1 violation)
- GRADE: Low (preprint without validation)
- Recommendation: Add DOI if available, or upgrade when peer-reviewed
- Alternative: REF-025 covers similar content with higher quality

Quality Improvement Actions:
1. Remove or replace REF-003 (critical quality issue)
2. Monitor REF-017 for peer review publication
3. Consider supplementing with multi-site studies (coverage gap)

Statistics:
- Average score: 72.4/100
- Median score: 75/100
- Standard deviation: 14.2
- Assessment time: 12 minutes 35 seconds
- Sources per minute: ~2.0

Thought [Reasoning]: Batch assessment identified 18/25 sources passing quality gate (72%). Two sources (REF-003, REF-017) failed due to low quality and FAIR violations respectively. These should be flagged for replacement or remediation.

Thought [Exception]: REF-017 is a preprint that may be upgraded to peer-reviewed status. Should monitor for publication and re-assess. REF-003 is a blog post with no path to peer review - recommend replacement.

Thought [Synthesis]: Quality gate effectively filtered corpus to 18 high-quality sources. Flagged sources need attention: REF-003 (replace), REF-017 (monitor for upgrade). Overall corpus quality is good (72% high quality).

Why This Is Good:

  • Batch efficiency: Assessed 25 sources in ~12 minutes using parallel API calls
  • Quality gate enforcement: Applied threshold consistently across all sources
  • Distribution statistics: Provided clear breakdown of quality and GRADE ratings
  • Actionable recommendations: Specific guidance for each failed source
  • Visual representation: Used bar charts for quality distribution
  • Performance metrics: Included assessment time and throughput

Example 3: Complex - Quality Trend Analysis Across Corpus

Input:

"Analyze quality trends across the entire research corpus. Identify quality patterns by: 1) publication year, 2) source type (journal vs conference vs preprint), 3) research topic, 4) FAIR compliance maturity. Generate recommendations for improving overall corpus quality."

Output:

Thought [Goal]: I need to analyze quality patterns across multiple dimensions (temporal, source type, topical, FAIR maturity) and generate strategic recommendations for corpus improvement.

Thought [Extraction]: Loading all 50 source assessments from `.aiwg/research/quality/`. Parsing metadata for year, source type, topics, and FAIR scores.

# Research Corpus Quality Analysis

**Generated:** 2026-02-03T11:00:00Z
**Corpus Size:** 50 sources
**Analysis Period:** 2020-2024

## Executive Summary

**Average Quality Score:** 74.2/100
**High-Quality Sources (70+):** 36 (72%)
**FAIR Compliance Rate:** 88% (44/50 fully compliant)
**Key Finding:** Recent sources (2023-2024) show 15% higher quality than older sources (2020-2021)

## Quality by Publication Year

| Year | Count | Avg Score | High Quality % | GRADE: High % |
|------|-------|-----------|----------------|---------------|
| 2024 | 8 | 81.5 | 87% | 37% |
| 2023 | 15 | 78.2 | 80% | 33% |
| 2022 | 12 | 72.1 | 67% | 25% |
| 2021 | 9 | 68.3 | 55% | 22% |
| 2020 | 6 | 65.8 | 50% | 16% |

**Trend:** +3.2 points per year (linear regression, R²=0.89)
**Interpretation:** Quality increasing over time, likely due to improved research methods and open science practices

## Quality by Source Type

| Source Type | Count | Avg Score | High Quality % | FAIR Compliance |
|-------------|-------|-----------|----------------|-----------------|
| Journal | 20 | 79.5 | 85% | 95% |
| Conference | 22 | 75.8 | 77% | 90% |
| Preprint | 6 | 62.3 | 33% | 67% |
| Technical Report | 2 | 58.0 | 0% | 50% |

**Insight:** Journal articles consistently higher quality (79.5 avg) than conferences (75.8) or preprints (62.3)
**Recommendation:** Prioritize journal articles and A*/A conferences over preprints

## Quality by Research Topic

| Topic | Count | Avg Score | Strengths | Weaknesses |
|-------|-------|-----------|-----------|------------|
| Agentic Systems | 18 | 77.8 | High currency, strong methodology | Limited long-term studies |
| Security | 12 | 81.2 | High accuracy, peer-reviewed | Some single-institution bias |
| LLM Performance | 10 | 70.5 | Recent, relevant | Rapidly outdated |
| Tool Use | 6 | 68.9 | Emerging field | Limited peer review |
| Human-AI | 4 | 75.0 | Rigorous RCTs | Small sample sizes |

**Insight:** Security research highest quality (81.2 avg), tool use research lowest (68.9)
**Explanation:** Security is mature field with established venues; tool use is emerging

## FAIR Compliance Maturity

### Overall Compliance

| Principle | Compliance Rate | Common Gaps |
|-----------|-----------------|-------------|
| Findable (F1-F4) | 92% | 4 sources missing DOI |
| Accessible (A1-A2) | 96% | 2 sources paywalled without metadata |
| Interoperable (I1-I3) | 84% | 8 sources lack qualified references |
| Reusable (R1-R3) | 80% | 10 sources missing clear license |

**Overall FAIR:** 88% (44/50 sources meet all principles)

### FAIR Trend by Year

| Year | FAIR Compliance |
|------|-----------------|
| 2024 | 100% (8/8) |
| 2023 | 93% (14/15) |
| 2022 | 83% (10/12) |
| 2021 | 77% (7/9) |
| 2020 | 83% (5/6) |

**Trend:** FAIR compliance improving steadily, recent sources near 100%

## Dimension Analysis

### Authority (30% weight)

**Average:** 76.8/100
**Strengths:** Most sources from A*/A venues (78%)
**Weaknesses:** 12 sources from unranked venues

**Action:** Replace unranked venue sources with A*/A alternatives

### Currency (20% weight)

**Average:** 79.2/100
**Strengths:** 62% published in last 2 years
**Weaknesses:** 8 sources >5 years old in fast-moving fields

**Action:** Update aging sources (REF-003, REF-008, REF-015) with recent alternatives

### Accuracy (25% weight)

**Average:** 81.5/100
**Strengths:** 86% peer-reviewed
**Weaknesses:** 6 preprints, 1 blog post

**Action:** Monitor preprints for peer review, replace blog post

### Coverage (15% weight)

**Average:** 68.3/100
**Strengths:** Good breadth across topics
**Weaknesses:** Many single-institution studies

**Action:** Supplement with multi-site studies for generalizability

### Objectivity (10% weight)

**Average:** 77.9/100
**Strengths:** 94% declare funding sources
**Weaknesses:** 3 sources with undeclared industry funding

**Action:** Investigate funding for REF-012, REF-034, REF-041

## Quality Improvement Recommendations

### Immediate Actions (High Priority)

1. **Replace low-quality sources**
   - REF-003 (blog post, score 45) → Seek peer-reviewed alternative
   - REF-017 (no DOI, score 52) → Add DOI or replace

2. **Update aging sources**
   - REF-008 (2020, LLM performance) → Outdated, seek 2023-2024 study
   - REF-015 (2019, tool use) → Replace with recent work

3. **Fix FAIR violations**
   - 4 sources: Add DOI for findability
   - 10 sources: Clarify license in metadata
   - 8 sources: Add qualified references

### Medium-Term Actions

4. **Diversify source types**
   - Current: 44% conference, 40% journal, 12% preprint
   - Target: 50% journal (higher quality), 40% conference, 10% preprint

5. **Improve topic balance**
   - Underrepresented: Human-AI interaction (4 sources)
   - Overrepresented: Agentic systems (18 sources)
   - Action: Add 6 human-AI sources, maintain others

6. **Upgrade preprints**
   - Monitor 6 preprints for peer review publication
   - Re-assess quarterly, replace if not peer-reviewed in 1 year

### Long-Term Strategies

7. **Set quality baseline**
   - Current avg: 74.2/100
   - Target: 78+/100 (High quality corpus)
   - Strategy: Replace lowest quartile (12 sources <68)

8. **Maximize FAIR compliance**
   - Current: 88% (44/50)
   - Target: 95% (48/50)
   - Strategy: Prioritize FAIR-compliant sources in discovery

9. **Increase GRADE High sources**
   - Current: 28% (14/50)
   - Target: 40% (20/50)
   - Strategy: Prioritize RCTs, systematic reviews, large-scale studies

## Summary Statistics

Quality Score Distribution:

90-100: ████████ 8 sources (16%)

80-89: ████████████████ 16 sources (32%)

70-79: ████████████ 12 sources (24%)

60-69: ██████ 6 sources (12%)

50-59: ████ 4 sources (8%)

<50: ██ 2 sources (4%)

Quality Quartiles:

Q3 (75th): 82

Q2 (50th): 75

Q1 (25th): 68


## Recommendations Priority Matrix

| Priority | Action | Impact | Effort | Sources Affected |
|----------|--------|--------|--------|------------------|
| 1 | Replace REF-003, REF-017 | High | Low | 2 |
| 2 | Add DOIs for 4 sources | High | Low | 4 |
| 3 | Update REF-008, REF-015 | Medium | Medium | 2 |
| 4 | Clarify licenses (10 sources) | Medium | Low | 10 |
| 5 | Add qualified references (8) | Medium | Medium | 8 |
| 6 | Diversify to 50% journal | Low | High | 10 |
| 7 | Add human-AI sources | Low | High | 6 |
| 8 | Replace lowest quartile | High | High | 12 |

Thought [Reasoning]: Trend analysis reveals quality improving over time (+3.2 points/year), with recent sources (2023-2024) significantly higher quality than older ones. Journal articles outperform conferences and preprints consistently. Security research is highest quality domain.

Thought [Exception]: Unexpected finding: FAIR compliance lower for Reusable principle (80%) than others (84-96%), primarily due to missing license information. This is easily fixable.

Thought [Synthesis]: Corpus quality is good (74.2 avg) but can be improved to target of 78+ through strategic replacements and FAIR fixes. Immediate priorities: replace 2 low-quality sources, add DOIs for 4 sources, update 2 aging sources. Long-term: shift to more journal articles, balance topic coverage, maximize FAIR compliance.

Why This Is Good:

  • Multi-dimensional analysis: Examined quality by year, source type, topic, and FAIR maturity
  • Trend identification: Detected +3.2 points/year quality improvement with statistical support (R²=0.89)
  • Actionable recommendations: Prioritized by impact and effort, with specific source identifiers
  • Visual data representation: Used bar charts and tables effectively
  • Strategic vs tactical: Separated immediate actions from long-term strategies
  • Evidence-based: All recommendations grounded in data analysis, not assumptions