Research Acquisition Agent Examples

Externalized from the agent definition per the few-shot-examples rule (#1587).

Research Acquisition Agent — Worked Examples

Externalized from the agent definition per the few-shot-examples rule (#1587).

Example 1: Simple - Single Paper Acquisition

Input:

"Acquire paper REF-025 from the queue."

Thought (Goal):

I need to download the paper, extract metadata, assign REF-025, compute checksum, and validate FAIR compliance.

Action:

Load acquisition queue, find paper ID abc123def456, query Semantic Scholar API for open access URL.

Observation:

Found open access URL: https://arxiv.org/pdf/2301.12345.pdf. Paper is CC-BY-4.0 licensed.

Thought (Reasoning):

arXiv URL is reliable and fast. CC-BY-4.0 means high FAIR reusability score. I'll download directly from arXiv.

Action:

Download PDF using curl with 60-second timeout.

Observation:

Download complete: 2.4 MB in 3 seconds. File saved successfully.

Thought (Extraction):

PDF metadata shows: Title "OAuth 2.0 Security Best Practices", authors Smith & Doe, year 2023. Need to validate format.

Action:

Validate PDF format using `file` command and compute SHA-256 checksum.

Observation:

File type: PDF document, version 1.4. SHA-256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

Thought (Reasoning):

FAIR scoring: DOI present (+40), all metadata fields complete (+50), CC-BY-4.0 license (+100 for accessible+reusable), JSON format (+50), provenance will be documented (+50). Total: 94/100 (High).

Output:

# Download log
Acquiring REF-025: "OAuth 2.0 Security Best Practices"
Downloading from: https://arxiv.org/pdf/2301.12345.pdf
Download complete: 2.4 MB in 3 seconds
Validating PDF format... OK
Extracting metadata... OK
Computing SHA-256 checksum... OK
FAIR validation: 94/100 (High)

Saved:
- PDF: .aiwg/research/sources/pdfs/REF-025-oauth-2-security-best-practices.pdf
- Metadata: .aiwg/research/sources/metadata/REF-025-metadata.json
- Checksum: Updated .aiwg/research/sources/checksums.txt

Why This Is Good:

  • Uses structured thought protocol (Goal → Action → Observation → Extraction → Reasoning)
  • Shows decision-making for source selection (arXiv vs others)
  • Validates file integrity with checksum
  • Calculates detailed FAIR score with breakdown
  • Provides clear output showing all saved artifacts

Example 2: Moderate - Bulk Acquisition with Paywalled Paper

Input:

"Process the full acquisition queue with 5 papers."

Thought (Goal):

I need to process 5 papers: download PDFs, handle any paywalled papers, extract metadata, assign REF-XXX identifiers, and generate acquisition report.

Action:

Load `.aiwg/research/discovery/acquisition-queue.json`, read current ref counter (024), process papers sequentially with 3-second delays to respect rate limits.

Observation:

Queue contains: abc123 (arXiv), def456 (Semantic Scholar), ghi789 (paywalled-ACM), jkl012 (arXiv), mno345 (Semantic Scholar)

Paper 1: abc123

Thought (Reasoning):

Paper abc123 has arXiv ID, will download directly from arXiv for reliability.

Action:

Download https://arxiv.org/pdf/2206.00001.pdf

Observation:

Success: 1.8 MB, 2 seconds. PDF valid. Metadata extracted.

Thought (Progress):

Acquired REF-025 successfully. 1/5 complete.

Paper 2: def456

Action:

Query Semantic Scholar for paper def456 open access URL.

Observation:

Found OA URL via publisher. Download successful: 3.2 MB, 4 seconds.

Thought (Progress):

Acquired REF-026 successfully. 2/5 complete.

Paper 3: ghi789 (Paywalled)

Action:

Query Semantic Scholar and Unpaywall for paper ghi789.

Observation:

Semantic Scholar: No OA URL. Unpaywall: No OA version found. Publisher: ACM (paywalled).

Thought (Exception):

Paper ghi789 is paywalled. Cannot auto-download. Will flag for manual upload.

Action:

Add paper ghi789 to paywalled list in acquisition report.

Observation:

Paywalled paper logged. Metadata saved for manual upload workflow.

Thought (Progress):

REF-027 flagged for manual upload. 3/5 processed.

Paper 4: jkl012

Action:

Download from arXiv.

Observation:

Success: 2.1 MB. REF-028 acquired.

Thought (Progress):

4/5 complete.

Paper 5: mno345

Action:

Download from Semantic Scholar OA link.

Observation:

Success: 2.9 MB. REF-029 acquired.

Thought (Synthesis):

Batch complete: 4/5 papers acquired (80%), 1 paywalled (20%). Total size: 10.0 MB. Average FAIR score: 87/100 (4 papers averaged, excluding paywalled).

Output:

# Acquisition Report: 2026-01-25T14:30:00Z

**Queue Size:** 5 papers
**Acquired:** 4 papers (80%)
**Paywalled:** 1 paper (20%)
**Failed:** 0 papers

## Summary
- Total size: 10.0 MB
- Average FAIR score: 87/100
- Time elapsed: 25 seconds

## Acquired Papers
| REF | Title | Source | FAIR Score |
|-----|-------|--------|------------|
| REF-025 | "Attention Is All You Need" | arXiv | 90/100 |
| REF-026 | "BERT: Pre-training of Deep Bidirectional Transformers" | Semantic Scholar | 88/100 |
| REF-028 | "Language Models are Few-Shot Learners" | arXiv | 92/100 |
| REF-029 | "Constitutional AI: Harmlessness from AI Feedback" | Semantic Scholar | 78/100 |

## Paywalled Papers (Manual Upload Required)
| Paper ID | Title | Publisher | Assigned REF |
|----------|-------|-----------|--------------|
| ghi789 | "Formal Verification of Neural Networks" | ACM | REF-027 |

**Next Steps:**
1. Manually download REF-027 from ACM Digital Library
2. Run: `aiwg research acquire --upload /path/to/paper.pdf --ref REF-027`

## FAIR Score Distribution
- High (≥80): 3 papers (75%)
- Moderate (60-79): 1 paper (25%)
- Low (<60): 0 papers (0%)

Why This Is Good:

  • Handles mixed sources (arXiv, Semantic Scholar, paywalled)
  • Gracefully handles paywalled paper without blocking others
  • Tracks progress through batch with counters
  • Generates comprehensive report with actionable next steps
  • Calculates FAIR score distribution for quality insight
  • Uses thought protocol to show decision-making at each step

Example 3: Complex - Manual Upload with Metadata Extraction

Input:

"User uploads a paywalled paper manually: `aiwg research acquire --upload /tmp/acm-paper.pdf --ref REF-027`"

Thought (Goal):

I need to validate the uploaded PDF, extract metadata from the PDF itself (since no API metadata available), prompt user to confirm/correct extracted metadata, compute checksum, calculate FAIR score (will be lower due to manual upload), and complete acquisition.

Action:

Validate PDF format using `file` command.

Observation:

File type: PDF document, version 1.7. Size: 1.5 MB. Format valid.

Thought (Reasoning):

PDF is valid. Now attempt metadata extraction from PDF using pdftotext and parsing.

Action:

Extract text from first page using `pdftotext -f 1 -l 1 /tmp/acm-paper.pdf -`.

Observation:

Extracted text contains:

Formal Verification of Neural Networks: A Survey
John Smith¹, Jane Doe², Alice Johnson¹
¹University of California, Berkeley
²Stanford University
ACM Computing Surveys, Vol. 55, No. 3, 2023
DOI: 10.1145/3576915.3000000

Thought (Extraction):

Successfully extracted: Title "Formal Verification of Neural Networks: A Survey", Authors: Smith, Doe, Johnson, Year: 2023, Venue: "ACM Computing Surveys", DOI: 10.1145/3576915.3000000.

Action:

Present extracted metadata to user for confirmation.

User Interaction:

Extracted metadata:
- Title: "Formal Verification of Neural Networks: A Survey"
- Authors: ["Smith, John", "Doe, Jane", "Johnson, Alice"]
- Year: 2023
- Venue: "ACM Computing Surveys"
- DOI: 10.1145/3576915.3000000

Confirm metadata? (y/n/edit): y

Observation:

User confirmed metadata is correct.

Thought (Reasoning):

Metadata confirmed. Now compute checksum and FAIR score. FAIR score will be moderate because: paywalled source (lower accessibility), but DOI present and metadata complete.

Action:

Compute SHA-256 checksum using `sha256sum /tmp/acm-paper.pdf`.

Observation:

SHA-256: a7b9c8d1e2f3456789abcdef0123456789abcdef0123456789abcdef01234567

Thought (Reasoning):

FAIR scoring for manually uploaded paper:

  • Findable: DOI present (+40), metadata complete (+50 for all fields)
  • Accessible: No persistent URL (-50), License unknown (-50) = 0/100
  • Interoperable: JSON format (+50), schema compliance (+50) = 100/100
  • Reusable: License unknown (-50), provenance documented (+50) = 50/100

Overall: (90 + 0 + 100 + 50) / 4 = 60/100 (Moderate)

Action:

Copy PDF to `.aiwg/research/sources/pdfs/REF-027-formal-verification-neural-networks-survey.pdf`, save metadata JSON, update checksums file, generate acquisition confirmation.

Observation:

All files saved successfully. Acquisition complete.

Thought (Exception):

FAIR score is only 60/100 (Moderate) due to missing license information and lack of persistent URL. This is expected for manually uploaded paywalled papers. User should be informed.

Output:

Validating PDF format... OK
Extracting metadata from PDF...
  - Title: "Formal Verification of Neural Networks: A Survey"
  - Authors: ["Smith, John", "Doe, Jane", "Johnson, Alice"]
  - Year: 2023
  - Venue: "ACM Computing Surveys"
  - DOI: 10.1145/3576915.3000000

Confirm metadata? (y/n/edit): y

Metadata confirmed by user.
Assigning identifier: REF-027
Computing SHA-256 checksum... OK
FAIR validation: 60/100 (Moderate - manual upload, missing license info)

⚠️  Note: FAIR score is moderate due to:
  - No persistent open access URL (paywalled source)
  - License information not available

Saved:
- PDF: .aiwg/research/sources/pdfs/REF-027-formal-verification-neural-networks-survey.pdf
- Metadata: .aiwg/research/sources/metadata/REF-027-metadata.json
- Checksum: Updated .aiwg/research/sources/checksums.txt

Acquisition complete for REF-027.

Why This Is Good:

  • Handles manual upload workflow gracefully
  • Extracts metadata from PDF when API unavailable
  • Prompts user for confirmation to catch extraction errors
  • Explains FAIR score reduction with specific reasons
  • Completes acquisition despite missing information
  • Warns user about quality limitations
  • Uses thought protocol to show metadata extraction and FAIR scoring logic