Research Acquisition Agent Examples
Externalized from the agent definition per the few-shot-examples rule (#1587).
Research Acquisition Agent — Worked Examples
Externalized from the agent definition per the few-shot-examples rule (#1587).
Example 1: Simple - Single Paper Acquisition
Input:
"Acquire paper REF-025 from the queue."
Thought (Goal):
I need to download the paper, extract metadata, assign REF-025, compute checksum, and validate FAIR compliance.
Action:
Load acquisition queue, find paper ID abc123def456, query Semantic Scholar API for open access URL.
Observation:
Found open access URL: https://arxiv.org/pdf/2301.12345.pdf. Paper is CC-BY-4.0 licensed.
Thought (Reasoning):
arXiv URL is reliable and fast. CC-BY-4.0 means high FAIR reusability score. I'll download directly from arXiv.
Action:
Download PDF using curl with 60-second timeout.
Observation:
Download complete: 2.4 MB in 3 seconds. File saved successfully.
Thought (Extraction):
PDF metadata shows: Title "OAuth 2.0 Security Best Practices", authors Smith & Doe, year 2023. Need to validate format.
Action:
Validate PDF format using `file` command and compute SHA-256 checksum.
Observation:
File type: PDF document, version 1.4. SHA-256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Thought (Reasoning):
FAIR scoring: DOI present (+40), all metadata fields complete (+50), CC-BY-4.0 license (+100 for accessible+reusable), JSON format (+50), provenance will be documented (+50). Total: 94/100 (High).
Output:
# Download log
Acquiring REF-025: "OAuth 2.0 Security Best Practices"
Downloading from: https://arxiv.org/pdf/2301.12345.pdf
Download complete: 2.4 MB in 3 seconds
Validating PDF format... OK
Extracting metadata... OK
Computing SHA-256 checksum... OK
FAIR validation: 94/100 (High)
Saved:
- PDF: .aiwg/research/sources/pdfs/REF-025-oauth-2-security-best-practices.pdf
- Metadata: .aiwg/research/sources/metadata/REF-025-metadata.json
- Checksum: Updated .aiwg/research/sources/checksums.txt
Why This Is Good:
- Uses structured thought protocol (Goal → Action → Observation → Extraction → Reasoning)
- Shows decision-making for source selection (arXiv vs others)
- Validates file integrity with checksum
- Calculates detailed FAIR score with breakdown
- Provides clear output showing all saved artifacts
Example 2: Moderate - Bulk Acquisition with Paywalled Paper
Input:
"Process the full acquisition queue with 5 papers."
Thought (Goal):
I need to process 5 papers: download PDFs, handle any paywalled papers, extract metadata, assign REF-XXX identifiers, and generate acquisition report.
Action:
Load `.aiwg/research/discovery/acquisition-queue.json`, read current ref counter (024), process papers sequentially with 3-second delays to respect rate limits.
Observation:
Queue contains: abc123 (arXiv), def456 (Semantic Scholar), ghi789 (paywalled-ACM), jkl012 (arXiv), mno345 (Semantic Scholar)
Paper 1: abc123
Thought (Reasoning):
Paper abc123 has arXiv ID, will download directly from arXiv for reliability.
Action:
Download https://arxiv.org/pdf/2206.00001.pdf
Observation:
Success: 1.8 MB, 2 seconds. PDF valid. Metadata extracted.
Thought (Progress):
Acquired REF-025 successfully. 1/5 complete.
Paper 2: def456
Action:
Query Semantic Scholar for paper def456 open access URL.
Observation:
Found OA URL via publisher. Download successful: 3.2 MB, 4 seconds.
Thought (Progress):
Acquired REF-026 successfully. 2/5 complete.
Paper 3: ghi789 (Paywalled)
Action:
Query Semantic Scholar and Unpaywall for paper ghi789.
Observation:
Semantic Scholar: No OA URL. Unpaywall: No OA version found. Publisher: ACM (paywalled).
Thought (Exception):
Paper ghi789 is paywalled. Cannot auto-download. Will flag for manual upload.
Action:
Add paper ghi789 to paywalled list in acquisition report.
Observation:
Paywalled paper logged. Metadata saved for manual upload workflow.
Thought (Progress):
REF-027 flagged for manual upload. 3/5 processed.
Paper 4: jkl012
Action:
Download from arXiv.
Observation:
Success: 2.1 MB. REF-028 acquired.
Thought (Progress):
4/5 complete.
Paper 5: mno345
Action:
Download from Semantic Scholar OA link.
Observation:
Success: 2.9 MB. REF-029 acquired.
Thought (Synthesis):
Batch complete: 4/5 papers acquired (80%), 1 paywalled (20%). Total size: 10.0 MB. Average FAIR score: 87/100 (4 papers averaged, excluding paywalled).
Output:
# Acquisition Report: 2026-01-25T14:30:00Z
**Queue Size:** 5 papers
**Acquired:** 4 papers (80%)
**Paywalled:** 1 paper (20%)
**Failed:** 0 papers
## Summary
- Total size: 10.0 MB
- Average FAIR score: 87/100
- Time elapsed: 25 seconds
## Acquired Papers
| REF | Title | Source | FAIR Score |
|-----|-------|--------|------------|
| REF-025 | "Attention Is All You Need" | arXiv | 90/100 |
| REF-026 | "BERT: Pre-training of Deep Bidirectional Transformers" | Semantic Scholar | 88/100 |
| REF-028 | "Language Models are Few-Shot Learners" | arXiv | 92/100 |
| REF-029 | "Constitutional AI: Harmlessness from AI Feedback" | Semantic Scholar | 78/100 |
## Paywalled Papers (Manual Upload Required)
| Paper ID | Title | Publisher | Assigned REF |
|----------|-------|-----------|--------------|
| ghi789 | "Formal Verification of Neural Networks" | ACM | REF-027 |
**Next Steps:**
1. Manually download REF-027 from ACM Digital Library
2. Run: `aiwg research acquire --upload /path/to/paper.pdf --ref REF-027`
## FAIR Score Distribution
- High (≥80): 3 papers (75%)
- Moderate (60-79): 1 paper (25%)
- Low (<60): 0 papers (0%)
Why This Is Good:
- Handles mixed sources (arXiv, Semantic Scholar, paywalled)
- Gracefully handles paywalled paper without blocking others
- Tracks progress through batch with counters
- Generates comprehensive report with actionable next steps
- Calculates FAIR score distribution for quality insight
- Uses thought protocol to show decision-making at each step
Example 3: Complex - Manual Upload with Metadata Extraction
Input:
"User uploads a paywalled paper manually: `aiwg research acquire --upload /tmp/acm-paper.pdf --ref REF-027`"
Thought (Goal):
I need to validate the uploaded PDF, extract metadata from the PDF itself (since no API metadata available), prompt user to confirm/correct extracted metadata, compute checksum, calculate FAIR score (will be lower due to manual upload), and complete acquisition.
Action:
Validate PDF format using `file` command.
Observation:
File type: PDF document, version 1.7. Size: 1.5 MB. Format valid.
Thought (Reasoning):
PDF is valid. Now attempt metadata extraction from PDF using pdftotext and parsing.
Action:
Extract text from first page using `pdftotext -f 1 -l 1 /tmp/acm-paper.pdf -`.
Observation:
Extracted text contains:
Formal Verification of Neural Networks: A Survey
John Smith¹, Jane Doe², Alice Johnson¹
¹University of California, Berkeley
²Stanford University
ACM Computing Surveys, Vol. 55, No. 3, 2023
DOI: 10.1145/3576915.3000000
Thought (Extraction):
Successfully extracted: Title "Formal Verification of Neural Networks: A Survey", Authors: Smith, Doe, Johnson, Year: 2023, Venue: "ACM Computing Surveys", DOI: 10.1145/3576915.3000000.
Action:
Present extracted metadata to user for confirmation.
User Interaction:
Extracted metadata:
- Title: "Formal Verification of Neural Networks: A Survey"
- Authors: ["Smith, John", "Doe, Jane", "Johnson, Alice"]
- Year: 2023
- Venue: "ACM Computing Surveys"
- DOI: 10.1145/3576915.3000000
Confirm metadata? (y/n/edit): y
Observation:
User confirmed metadata is correct.
Thought (Reasoning):
Metadata confirmed. Now compute checksum and FAIR score. FAIR score will be moderate because: paywalled source (lower accessibility), but DOI present and metadata complete.
Action:
Compute SHA-256 checksum using `sha256sum /tmp/acm-paper.pdf`.
Observation:
SHA-256: a7b9c8d1e2f3456789abcdef0123456789abcdef0123456789abcdef01234567
Thought (Reasoning):
FAIR scoring for manually uploaded paper:
- Findable: DOI present (+40), metadata complete (+50 for all fields)
- Accessible: No persistent URL (-50), License unknown (-50) = 0/100
- Interoperable: JSON format (+50), schema compliance (+50) = 100/100
- Reusable: License unknown (-50), provenance documented (+50) = 50/100
Overall: (90 + 0 + 100 + 50) / 4 = 60/100 (Moderate)
Action:
Copy PDF to `.aiwg/research/sources/pdfs/REF-027-formal-verification-neural-networks-survey.pdf`, save metadata JSON, update checksums file, generate acquisition confirmation.
Observation:
All files saved successfully. Acquisition complete.
Thought (Exception):
FAIR score is only 60/100 (Moderate) due to missing license information and lack of persistent URL. This is expected for manually uploaded paywalled papers. User should be informed.
Output:
Validating PDF format... OK
Extracting metadata from PDF...
- Title: "Formal Verification of Neural Networks: A Survey"
- Authors: ["Smith, John", "Doe, Jane", "Johnson, Alice"]
- Year: 2023
- Venue: "ACM Computing Surveys"
- DOI: 10.1145/3576915.3000000
Confirm metadata? (y/n/edit): y
Metadata confirmed by user.
Assigning identifier: REF-027
Computing SHA-256 checksum... OK
FAIR validation: 60/100 (Moderate - manual upload, missing license info)
⚠️ Note: FAIR score is moderate due to:
- No persistent open access URL (paywalled source)
- License information not available
Saved:
- PDF: .aiwg/research/sources/pdfs/REF-027-formal-verification-neural-networks-survey.pdf
- Metadata: .aiwg/research/sources/metadata/REF-027-metadata.json
- Checksum: Updated .aiwg/research/sources/checksums.txt
Acquisition complete for REF-027.
Why This Is Good:
- Handles manual upload workflow gracefully
- Extracts metadata from PDF when API unavailable
- Prompts user for confirmation to catch extraction errors
- Explains FAIR score reduction with specific reasons
- Completes acquisition despite missing information
- Warns user about quality limitations
- Uses thought protocol to show metadata extraction and FAIR scoring logic