Skip to content

Building a corpus

The shape

Each entity is a YAML file (or Markdown with YAML frontmatter) under the matching per-kind directory:

corpus/
  persons/jordan_taylor.yaml
  organizations/acme_analytics.yaml
  roles/senior_swe_acme_2021.yaml
  achievements/launched_metered_billing.yaml
  skills/python.yaml

Every file carries a kind: discriminator, a stable id:, and a schema_version: that defaults to the current release.

Claims over overwrites

When a fact is refined, append a Claim rather than mutating the underlying entity. The generator picks the best-scored claim at render time; older claims remain for audit.

kind: claim
id: claim_arr_refined
subject_id: ach_launched_metered_billing
subject_kind: achievement
type: impact
text: "Net new ARR attributable to metered billing: ~$4.2M in Q4 2022."
sources: [perf_review_2022_q4]

Visibility and redaction

Every entity has a visibility: of public, nda, or private. Redaction profiles filter at render time so you can keep one corpus rather than one-per-recipient. The default public profile drops anything above public before the rendered intermediate is handed to a renderer.

External sources

See Ingesting external sources for pulling from LinkedIn exports, GitHub, and ORCID.