Methodology
This page summarizes scope, cleanup, clustering, matching, and data-quality rules for the release bundle.
What the dataset is
The dataset separates observed CBS title rows, clustered public game records, enriched canonical metadata, and audit files.
What was excluded
Utilities, editors/SDKs, guide media, and disc-noise rows are removed from the canonical public game catalog but remain available through audit files.
Match states
- Matched: a canonical match survived the audit rules.
- Ambiguous: multiple plausible targets remain or the match was too weak.
- Unmatched: no trustworthy canonical target is published.
Data quality score
`data_quality_score` is a heuristic summary combining content class, merge confidence, canonical match state, link presence, and match demotions.
1719
Game clusters
2191
Issue rows
95
Excluded non-games
14
Match demotions