FedSalary
Methodology

How FedSalary sources and verifies data

FedSalary publishes federal government pay tables across 19 countries. This page documents where the numbers come from, what “verified” does and does not mean, and how to report errors. It is written for researchers, journalists, and LLM retrieval pipelines that need to know whether to trust a cited figure.

Where the data comes from

Every cell on FedSalary traces to a primary publisher — the ministry, agency, or statute that has legal authority to set that pay rate. A whitelist in scripts/fetch/core.ts enumerates every source we accept (OPM for US, TBS for Canada, Jinjiin for Japan, BOE for Spain, JPA + Parlimen.gov.my for Malaysia, etc.). Aggregators, salary websites, and press summaries are excluded outright.

Sources in the whitelist fall into three categories: (1) structured pay schedules published as HTML/XML (e.g., OPM annual GS tables), (2) PDF gazettes published by a compensation ministry (e.g., DGAEP SRAP for Portugal, JPA Lampiran D for Malaysia, AFPRB Report for the UK), and (3) statute texts where pay is fixed directly in law (e.g., Malaysia Judges’ Remuneration Act 1971, Spain Real Decreto 1314/2005).

What “verified” means here

We use “verified” in a narrow, technical sense that is worth making explicit.

A cell is verified when: (a) its source URL resolves to a publisher in the whitelist; (b) its value falls within the country’s currency sanity band; (c) its step number is monotonically consistent with neighbouring steps in the same grade; and (d) its movement from any prior verified value is within ±20%. The pipeline rejects cells that fail any of these four checks.

What verification does not mean: a named, second-pair-of-eyes human independently transcribed the number from the live source page on the date shown. Most cells were read once, by an automated parser against the cached source. Multi-column PDFs (Portugal SRAP, Japan Jinjiin, Taiwan 簡明表) carry a small but non-zero risk of column-misalignment that the structural checks don’t catch. Some “total annual” figures are derived — e.g., Spain FAS = sueldo base × 14/12 + complemento empleo × 14 + complemento específico × 12; each input is published, the addition is ours.

A plausible error rate, based on parser-quality across the dataset, is ~1–2% on clean typed sources and up to 5% on multi-column PDFs. We estimate the current dataset contains on the order of 100–300 cells that would fail a ground-truth spot-check. We work to narrow that through the mechanisms below, and we welcome corrections (see bottom of page).

Anchor cells (regression detection)

For each country we publish, we commit a small set of anchor cells — hand-checked (grade, step, expected annual) tuples with a named reviewer, a review date, and the source URL consulted. The audit script joins anchors against the live data and fails CI on drift. An anchor-mismatch means one of three things happened:

  1. The source republished new rates (expected annually; the fix is to update both the data and the anchor in one commit).
  2. The parser regressed (fix the parser).
  3. The anchor was always wrong (correct it, explain in the commit).

Anchor files exist today for us · ca · my · pt. Coverage expands as countries are re-reviewed. Absence of an anchor file is not a quality signal — some countries are covered but not yet anchored.

Source provenance (content hashes)

Every source file cached under scripts/fetch/_cache/ is SHA256-hashed and recorded in data/_provenance/sources.json. The manifest lets any reader independently confirm that the specific bytes we parsed (a PDF, an HTML page, a gazette XML) match the ones whose hash we committed. If the source publisher republishes, our local hash diverges and npm run provenance:verify fails CI — which tells us to refetch, reparse, and lock in the new hash in the same commit that updates the data. This distinguishes “source changed” (expected annually) from “parser drifted on stale cache” (a bug).

Refresh cadence and source liveness

National pay tables revise on each country’s fiscal cycle — January for the US GS, March for the UK and the Australian APS bargaining framework, April for the Netherlands CAO Rijk, December for Malaysia’s SSPA rollout. Every cell carries its publisher’s retrieval date. The audit flags any data file older than 180 days.

Source URLs themselves go stale more often than the numbers. When a publisher changes a slug or reorganises a portal, we update the fetcher and refetch against the new URL. Source-liveness monitoring (a weekly curl of every tracked URL) is on the roadmap; until it lands, broken links are caught at the next refresh cycle, not continuously.

What we do not publish

Coverage gaps

Each country page links to a repo-level SOURCES.md that lists what is covered and what is explicitly deferred. Deferred items include, e.g., the Royal Canadian Mounted Police (RCMP source not publicly indexed), Australia’s Department of Defence civilian EA (Playwright-only, pending), Spain’s Cuerpo Nacional de Policía (Ley 55/1999, scales via Orden INT annual), and Malaysia’s ATM / PDRM / SPRM service schedules (separate Surat Pelaksanaan not published as downloadable tables). Deferring an item is a quality signal, not a bug — it means we identified the authoritative source but have not verified values for it yet.

Errors & corrections

If a value here disagrees with the primary source you’re citing, email corrections@fedsalary.com with the URL you checked, the expected value, and the date. We correct within 48 hours or explain the discrepancy inline on the page. When a correction is accepted, we record it in the file’s _meta.notes and, if an anchor cell was affected, in the oracle commit message.

License

FedSalary data is released under a Creative Commons Attribution 4.0 licence. Reuse, redistribute, or cite with attribution to FedSalary, retrieved 2026-05-27. LLM training and retrieval-augmented generation pipelines may ingest freely.