Source Tracking

Purpose: record where each source was discovered (which account / surface), so high-yield curators a

Source / Discovery Tracking Schema

Purpose: record where each source was discovered (which account / surface), so high-yield curators are identified and can be revisited deliberately.

Optional by design: the `discovery:` block is OPTIONAL. Its absence is normal and never an error (see §0).

Tooling: `aiwg corpus discovery-log` (record), `aiwg corpus curator-init` (scaffold PROF-S), `aiwg corpus curator-status` (yield + orphans). Read views: `by-source`, `by-curator` (rendered by `aiwg index build`).


0. Optionality & exemptions (read first)

Discovery metadata is best-effort signal, not a required field. Three cases where it is legitimately absent — none are gaps, none are flagged by audits:

CaseStateTreatment
Legacy refs (inducted before source-tracking adoption)no `discovery:` blockNormal. Not backfilled. Audits ignore.
Operator-direct (you brought the paper/source directly)`surface: direct`, `curator-id: null` (or block omitted)First-class, curator-less. Never an orphan.
Curator unknown (found via search/feed with no clear account)`surface: x-search`/`x-foryou`/…, `curator-id: null`Surface recorded, curator left null. Fine.

Only set a `curator-id` when a source genuinely came through a named, repeatable curator worth returning to. When in doubt, record the `surface` and leave `curator-id` null.


1. Per-paper: `discovery:` block (citation sidecar)

Added to `documentation/citations/REF-XXX-citations.md` frontmatter:

discovery:
  date: 2026-05-25                 # when the source was first surfaced
  surface: x-account               # controlled vocab — see below
  via: "x.com/@askalphaxiv"        # human-readable origin (account/URL/feed)
  curator-id: PROF-S-askalphaxiv   # link to curator profile; null if no curator
  harvest-batch: 2026-05-25-morning # optional: groups a harvesting session
  harvested-by: claude-opus-4-7    # agent/human that performed the harvest

All fields except `date` and `surface` are optional.

`surface` controlled vocabulary

ValueMeaning
`x-account`A specific X account's timeline (curator)
`x-search`X search results (query-driven, often no curator)
`x-bookmarks`Operator's own X bookmarks
`x-foryou`X "For You" algorithmic feed
`x-following`X "Following" feed
`rss`RSS/Atom feed
`newsletter`Email newsletter / digest
`web`Direct web browsing / blog
`referral`Cited by / linked from another corpus paper
`direct`Operator supplied directly (no discovery surface)

Distinct from radar `sources-searched` (surfaces queried during a freshness refresh): `discovery` records the surface a paper was originally found through. They are orthogonal.


2. Curator: `PROF-S-` source profile

A `source` value in the entity-profile `type` enum, stored in `documentation/profiles/sources/PROF-S-{slug}.md` (see the `source-profile` template).

  • slug = handle lowercased, leading punctuation stripped, `_`→`-` (`@_akhaliq` → `PROF-S-akhaliq`).
  • `corpus-refs` = inducted REFs discovered via this curator (NOT candidates).
  • `signal-quality` = curator signal density (A = paper-per-post, high relevance; … D = low), graded A–D.
  • `revisit-cadence` = `daily | weekly | biweekly | monthly | on-demand`.

"Good accounts to return to" = PROF-S ranked by return-to score (inducted-ref count × avg surfaced-paper GRADE) — see `aiwg corpus curator-status`.


3. Bidirectionality + orphan rule

When a paper is inducted with `discovery.curator-id: PROF-S-x`:

1. Add the REF to `PROF-S-x` frontmatter `corpus-refs:` and its §2 "Sources Surfaced" table.

2. The sidecar's `discovery.curator-id` IS the backlink (no separate REF-doc edit).

3. Recompute the curator's yield stats.

A PROF-S referenced by a sidecar's `discovery.curator-id` but missing that REF in its `corpus-refs` is a curator orphan — flagged by `curator-status` (and `research-lint`). The check fires only when `curator-id` is set; a missing/`null`/`direct` discovery block is never an orphan (it is simply outside the discovery graph).


4. Candidate (pre-induction) curator records

Curator profiles may be seeded before their surfaced papers are inducted:

  • `corpus-refs: []` (empty until induction)
  • record observed candidate yield under §2 "Candidate Sources Surfaced (not yet inducted)".
  • As candidates induct, move them into `corpus-refs` + the "Sources Surfaced" table.