What we publish
For every covered public B2B SaaS company, we extract and publish 80+ structured fields when disclosed. Not every company discloses every field; missing values render as "-" rather than estimates.
Per-period quantitative metrics
- Retention - Net Revenue Retention (NRR / NDR / DBNRR), Gross Retention Rate (GRR), Logo Retention
- Customer cohorts - total customers, customers over $100K / $1M / $10M ARR
- Revenue mix - US / International / Enterprise share of revenue
- Concentration - top-10 customer share of revenue
- Unit economics - average ACV, annualized churn rate, customers-per-CSM ratio
- Commercial structure - multi-year contract %, average contract length, total + current RPO, RPO duration breakdown, new customers added per period, subscription vs services revenue mix
- Customer experience - NPS, CSAT, active users, products-per-customer (when disclosed)
- Cohorted retention - NRR / GRR broken down by segment (Enterprise / Mid-Market / SMB), geography (US / EMEA / APAC), or customer-size cohort (over $1M / over $100K)
- Scale + headcount - total ARR, ARR growth YoY, AE headcount, total employees, lost customers per period
Document-wide CS context
- CS team size + structure, customers-per-CSM ratio, CSM coverage model (account-named / pooled / hybrid / digital-led)
- Support tier structure, customer segmentation labels
- Time-to-value (days from contract to first integration), customer education programs, customer advisory board
- Renewal cadence (annual / multi-year / monthly / consumption), pricing model (subscription / consumption / hybrid)
- Top customer-facing executive (CCO / CRO with retention scope), executive comp tied to retention metrics, reporting line
- Named CS initiatives + descriptions, acknowledged challenges, executive quotes about post-sales motion, competitive dynamics
Derived metrics (computed, not extracted)
- Expansion contribution (NRR − GRR), GRR drag, peak-to-current decline, concentration trend, multi-year mix evolution
- ARR per CSM, ARR per AE, ARR per FTE, AE-to-CSM ratio, bookings per CSM
- Quick ratio, customer lifetime months, RPO coverage years
Sources
- SEC EDGAR filings - 10-K (annual), 10-Q (quarterly), 8-K (current), DEF 14A (proxy / exec comp), 20-F (foreign annual), 6-K (foreign current), S-1 (IPO prospectus). Both the cover document and Exhibit 99.1 (earnings press release content) for 8-K/6-K.
- Earnings call transcripts - public-domain transcripts via several free aggregators. CFO + CEO commentary often surfaces retention details that don't appear in filings.
- IR-page hosted documents - investor presentations, supplemental decks, sustainability reports, hand-curated press releases. Fetched directly from each company's investor relations site.
- Founder submissions - private B2B SaaS founders submit their numbers via the free calculator. Work-email gated, anonymized by default. Aggregated medians at community-benchmarks/ with a privacy floor (cells require ≥5 submissions before publishing).
Extraction pipeline
- Discovery - for each tracked ticker, enumerate every recent SEC filing across the form list above. SEC's full-text search also surfaces companies disclosing retention phrases we haven't yet catalogued.
- Pre-filter - long-form documents (10-K / 10-Q / proxies / transcripts / decks) ALWAYS go to LLM extraction. Press releases that don't mention any retention term skip extraction (boilerplate).
- Regex extraction - per-company hand-tuned extractors handle the headline NRR disclosures (DBNRR, NDR, dollar net retention, "respectively" patterns, multi-period tables).
- LLM extraction - large-language-model parses the document slice and emits a strict JSON schema covering all 80+ fields. Different slice budgets per source type (e.g., 10-K gets the largest window so cohorted retention sections - which often live far from headline NRR - actually reach the model).
- Cross-source agreement - when the same value appears in both regex and LLM output, OR in two different filings (press release + 10-Q), confidence is boosted.
- Period resolution - fiscal year + quarter inferred from in-text labels OR from the SEC filing's
reportDatemetadata as a fallback. - Validation - every disclosure runs through quality gates (below) before being marked verified.
Verification gates
An auto-extracted disclosure is marked verified only if it passes the full gate set:
- Range - each metric clamped to a sane range (e.g. NRR 50%–250%, RPO $1M–$100B, ACV $100–$100M). Out-of-range values are dropped.
- Period determined - fiscal year + fiscal quarter resolved (or fiscal year for full-year disclosures).
- Future-date guard - period end-date must be ≤ today + 30 days AND ≤ filing-date + 7 days. Rejects forward-looking guidance the LLM may have surfaced as a current value.
- Self-consistency - GRR ≤ NRR (always); cohorted NRR within plausible spread of headline NRR.
- Confidence threshold - extraction confidence above the per-source-type minimum.
- Multi-source agreement - ≥2 candidates within 1pp tolerance OR regex + LLM agree within 1pp boost confidence; single-source fields require higher base confidence.
- YoY/QoQ change ceiling - period-over-period delta within plausible bounds vs prior verified disclosure.
- Qualifier check - "exact" qualifier required for headline NRR. "above"/"approximate"/"below" qualified values are flagged as
pending-manual-verify.
If any gate fails, the entry is flagged pending-manual-verify and excluded from public benchmarks until a human signs off.
Self-healing data hygiene
The system runs validators on every load - not just on extraction. If a stale entry from an earlier run violates a current rule (e.g. a future-dated row from before the future-date guard existed), it's automatically dropped. This means cleanup commits propagate immediately rather than persisting until a manual fix.
Cell publication rules
A benchmark cell goes live only when:
- ≥2 distinct companies have verified disclosures in that cell
- ≥2 verified disclosures total
- No single company contributes >50% of the data
This avoids "single-company medians" that would mislead viewers.
Conflict handling
If a previously-verified value is contradicted by a new scrape:
- The old record is demoted to
pending-manual-verify - The new record is also written as
pending-manual-verify - The cell page is regenerated without either value
- A human resolves the conflict and re-verifies
Human-verified entries are never overwritten by the scraper without explicit re-review.
Source attribution
Every published disclosure carries:
- The exact source URL (SEC filing, transcript, press release)
- The source type (10-K / 10-Q / def-14a / earnings-call-transcript / etc.)
- The fiscal period and reporting date
- The extraction method used (regex / LLM)
- The verification status
You can independently verify any number on this site by following its source link. We invite that.
Update cadence
The scraper runs daily at 06:00 UTC. Daily runs only process NEW filings since the last successful run (cached URLs skip extraction to keep cost predictable). Schema or prompt changes only flow into existing data when an operator triggers a manual full re-extraction. New SEC filings typically appear within 24 hours of being posted by the company.
Historical backfill covers SEC filings since 2020.
Current coverage
- 103 public B2B SaaS companies tracked
- 846 verified disclosures
- 12 disclosures pending verification
- 25 live benchmark cells
Found an error?
Every disclosure on this site links to its source URL. If the published value differs from the company's filing, please email us with:
- The cust.co URL where the bad value appears
- The source filing URL the company actually published
- The correct value
We typically re-verify within 24 hours.
Citing this data
The dataset is free to cite. Recommended attribution: "Data from cust.co, sourced from SEC filings and earnings call transcripts."
Per-company JSON is available at /api/companies/<name>.json and per-cell aggregations at /api/cells/<vertical>/<stage>/<acv-band>/. Bulk access on request.
Conflicts of interest
Cust is a customer-success product for VPs of CS. We benchmark public companies because their data is publicly disclosable and verifiable, and because doing so trains the same AI we use in our own product. We do not accept payment to include or exclude any company from this benchmark, nor to weight any disclosure favorably. Companies cannot opt out of being indexed (the underlying data is public). Companies CAN flag inaccuracies, which we re-verify against the source.
Author
Maintained by Laimonas Noreika, CEO and Co-founder of Cust. Reach out via LinkedIn for corrections or to flag a missing source.