Methodology: how the cust.co retention benchmark is built

What we publish

For every covered public B2B SaaS company, we extract and publish 80+ structured fields when disclosed. Not every company discloses every field; missing values render as "-" rather than estimates.

Per-period quantitative metrics

Retention - Net Revenue Retention (NRR / NDR / DBNRR), Gross Retention Rate (GRR), Logo Retention
Customer cohorts - total customers, customers over $100K / $1M / $10M ARR
Revenue mix - US / International / Enterprise share of revenue
Concentration - top-10 customer share of revenue
Unit economics - average ACV, annualized churn rate, customers-per-CSM ratio
Commercial structure - multi-year contract %, average contract length, total + current RPO, RPO duration breakdown, new customers added per period, subscription vs services revenue mix
Customer experience - NPS, CSAT, active users, products-per-customer (when disclosed)
Cohorted retention - NRR / GRR broken down by segment (Enterprise / Mid-Market / SMB), geography (US / EMEA / APAC), or customer-size cohort (over $1M / over $100K)
Scale + headcount - total ARR, ARR growth YoY, AE headcount, total employees, lost customers per period

Document-wide CS context

CS team size + structure, customers-per-CSM ratio, CSM coverage model (account-named / pooled / hybrid / digital-led)
Support tier structure, customer segmentation labels
Time-to-value (days from contract to first integration), customer education programs, customer advisory board
Renewal cadence (annual / multi-year / monthly / consumption), pricing model (subscription / consumption / hybrid)
Top customer-facing executive (CCO / CRO with retention scope), executive comp tied to retention metrics, reporting line
Named CS initiatives + descriptions, acknowledged challenges, executive quotes about post-sales motion, competitive dynamics

Derived metrics (computed, not extracted)

Expansion contribution (NRR − GRR), GRR drag, peak-to-current decline, concentration trend, multi-year mix evolution
ARR per CSM, ARR per AE, ARR per FTE, AE-to-CSM ratio, bookings per CSM
Quick ratio, customer lifetime months, RPO coverage years

Sources

SEC EDGAR filings - 10-K (annual), 10-Q (quarterly), 8-K (current), DEF 14A (proxy / exec comp), 20-F (foreign annual), 6-K (foreign current), S-1 (IPO prospectus). Both the cover document and Exhibit 99.1 (earnings press release content) for 8-K/6-K.
Earnings call transcripts - public-domain transcripts via several free aggregators. CFO + CEO commentary often surfaces retention details that don't appear in filings.
IR-page hosted documents - investor presentations, supplemental decks, sustainability reports, hand-curated press releases. Fetched directly from each company's investor relations site.
Founder submissions - private B2B SaaS founders submit their numbers via the free calculator. Work-email gated, anonymized by default. Aggregated medians at community-benchmarks/ with a privacy floor (cells require ≥5 submissions before publishing).

Extraction pipeline

Discovery - for each tracked ticker, enumerate every recent SEC filing across the form list above. SEC's full-text search also surfaces companies disclosing retention phrases we haven't yet catalogued.
Pre-filter - long-form documents (10-K / 10-Q / proxies / transcripts / decks) ALWAYS go to LLM extraction. Press releases that don't mention any retention term skip extraction (boilerplate).
Regex extraction - per-company hand-tuned extractors handle the headline NRR disclosures (DBNRR, NDR, dollar net retention, "respectively" patterns, multi-period tables).
LLM extraction - large-language-model parses the document slice and emits a strict JSON schema covering all 80+ fields. Different slice budgets per source type (e.g., 10-K gets the largest window so cohorted retention sections - which often live far from headline NRR - actually reach the model).
Cross-source agreement - when the same value appears in both regex and LLM output, OR in two different filings (press release + 10-Q), confidence is boosted.
Period resolution - fiscal year + quarter inferred from in-text labels OR from the SEC filing's reportDate metadata as a fallback.
Validation - every disclosure runs through quality gates (below) before being marked verified.

Verification gates

An auto-extracted disclosure is marked verified only if it passes the full gate set:

Range - each metric clamped to a sane range (e.g. NRR 50%–250%, RPO $1M–$100B, ACV $100–$100M). Out-of-range values are dropped.
Period determined - fiscal year + fiscal quarter resolved (or fiscal year for full-year disclosures).
Future-date guard - period end-date must be ≤ today + 30 days AND ≤ filing-date + 7 days. Rejects forward-looking guidance the LLM may have surfaced as a current value.
Self-consistency - GRR ≤ NRR (always); cohorted NRR within plausible spread of headline NRR.
Confidence threshold - extraction confidence above the per-source-type minimum.
Multi-source agreement - ≥2 candidates within 1pp tolerance OR regex + LLM agree within 1pp boost confidence; single-source fields require higher base confidence.
YoY/QoQ change ceiling - period-over-period delta within plausible bounds vs prior verified disclosure.
Qualifier check - "exact" qualifier required for headline NRR. "above"/"approximate"/"below" qualified values are flagged as pending-manual-verify.

If any gate fails, the entry is flagged pending-manual-verify and excluded from public benchmarks until a human signs off.

Self-healing data hygiene

The system runs validators on every load - not just on extraction. If a stale entry from an earlier run violates a current rule (e.g. a future-dated row from before the future-date guard existed), it's automatically dropped. This means cleanup commits propagate immediately rather than persisting until a manual fix.

Cell publication rules

A benchmark cell goes live only when:

≥2 distinct companies have verified disclosures in that cell
≥2 verified disclosures total
No single company contributes >50% of the data

This avoids "single-company medians" that would mislead viewers.

Conflict handling

If a previously-verified value is contradicted by a new scrape:

The old record is demoted to pending-manual-verify
The new record is also written as pending-manual-verify
The cell page is regenerated without either value
A human resolves the conflict and re-verifies

Human-verified entries are never overwritten by the scraper without explicit re-review.

Source attribution

Every published disclosure carries:

The exact source URL (SEC filing, transcript, press release)
The source type (10-K / 10-Q / def-14a / earnings-call-transcript / etc.)
The fiscal period and reporting date
The extraction method used (regex / LLM)
The verification status

You can independently verify any number on this site by following its source link. We invite that.

Update cadence

The scraper runs daily at 06:00 UTC. Daily runs only process NEW filings since the last successful run (cached URLs skip extraction to keep cost predictable). Schema or prompt changes only flow into existing data when an operator triggers a manual full re-extraction. New SEC filings typically appear within 24 hours of being posted by the company.

Historical backfill covers SEC filings since 2020.

Current coverage

103 public B2B SaaS companies tracked
846 verified disclosures
12 disclosures pending verification
25 live benchmark cells

Found an error?

Every disclosure on this site links to its source URL. If the published value differs from the company's filing, please email us with:

The cust.co URL where the bad value appears
The source filing URL the company actually published
The correct value

We typically re-verify within 24 hours.

Citing this data

The dataset is free to cite. Recommended attribution: "Data from cust.co, sourced from SEC filings and earnings call transcripts."

Per-company JSON is available at /api/companies/<name>.json and per-cell aggregations at /api/cells/<vertical>/<stage>/<acv-band>/. Bulk access on request.

Conflicts of interest

Cust is a customer-success product for VPs of CS. We benchmark public companies because their data is publicly disclosable and verifiable, and because doing so trains the same AI we use in our own product. We do not accept payment to include or exclude any company from this benchmark, nor to weight any disclosure favorably. Companies cannot opt out of being indexed (the underlying data is public). Companies CAN flag inaccuracies, which we re-verify against the source.

Author

Maintained by Laimonas Noreika, CEO and Co-founder of Cust. Reach out via LinkedIn for corrections or to flag a missing source.

How the data is built