Why Population Databases Can’t Fix the VUS Crisis

Population databases such as gnomAD are foundational to modern variant interpretation, enabling labs to exclude common benign variants and prioritize rare changes. However, as testing expands to exomes and genomes, a fundamental limitation becomes clear: allele frequency alone cannot resolve the functional or clinical significance of most rare variants. Even with growing sample sizes and improved diversity, most novel missense variants remain rare or private and therefore classified as variants of uncertain significance (VUS). This is not a temporary data gap, but a structural limitation of human-only population evidence that requires additional, orthogonal context to overcome.

Population databases such as gnomAD and other aggregated human variant repositories are foundational to current variant interpretation workflows. These resources provide allele frequency data that help distinguish common benign variation from variants that are rare or potentially disease-associated. However, despite increasing sample sizes and improving diversity, these datasets still leave a high proportion of variants classified as uncertain significance (VUS). This persistent uncertainty reflects the intrinsic limits of human-only population data to provide functional insight and mechanistic evidence.

The VUS crisis persists because allele frequency alone does not capture the functional impact or clinical relevance of most rare variants. In large clinical gene panels and exome sequencing, most novel missense variants remain rare or unique to individuals or families. Without additional evidence that directly informs on functional consequence or evolutionary constraint, these variants cannot be confidently classified. As a result, the reliance on population databases inadvertently creates a bottleneck in interpretation workflows where manual review, functional assays, and complementary evidence sources remain necessary.

Evidence + External Citations

Population databases aggregate variant frequencies from tens to hundreds of thousands of individuals, enabling clinicians and researchers to identify variants that are too common to be highly penetrant for rare Mendelian disorders (Gudmundsson et al. 2022). These databases have grown significantly, with gnomAD incorporating data from nearly 800,000 exomes and genomes in its latest releases and contributing to variant reinterpretation efforts (Gudmundsson et al. 2025). However, even with large cohorts, many rare variants remain insufficiently sampled to provide definitive frequency evidence for classification.

Population bias within these aggregated databases also limits interpretive power. Underrepresentation of many global populations results in a higher proportion of VUS in individuals from ancestrally diverse groups, where allele frequency data remain sparse or absent (Lee et al. 2024; Assaf et al. 2025). These demographic gaps mean that population frequency evidence often fails to discriminate between benign and pathogenic variants across all human ancestries.

Thus, while population frequency data are valuable, they alone do not provide the mechanistic or functional context needed to resolve most VUS cases.

Why Existing Approaches Fail

Limited Discriminatory Power of Allele Frequency

Allele frequency thresholds are effective for excluding common non-pathogenic variants but cannot differentiate rare benign variants from rare pathogenic ones without additional evidence. Many rare variants observed only once or in a few individuals have frequencies that are too low to inform classification with confidence.

Incomplete Representation of Global Genetic Diversity

Large databases have improved representation but significant gaps remain. Variants that are rare or absent in European-centric cohorts may be common in underrepresented populations, leading to spurious VUS designations or misclassification (Lee et al. 2024). Population frequency alone therefore cannot be universally informative.

Lack of Functional and Mechanistic Insight

Population databases record the occurrence of variants but do not directly assess their biological impact. A variant’s frequency does not indicate whether it alters protein function, gene regulation, or phenotypic outcome, leaving many variants unresolved until functional data are available.

Persistent Manual Curation Burden

Because frequency evidence is insufficient to classify most rare variants, expert review remains essential. Variant scientists and clinical geneticists must integrate population data with computational predictions, clinical observations, family segregation, and functional assays to assign clinical significance. This manual integration remains labor-intensive and slow.

Introduce Evolutionary Evidence

Evolutionary genomics provides an independent evidence axis that complements population frequency data. Across millions of years of natural selection, genomic positions that are highly conserved indicate functional constraint. Comparative analyses of primates and other vertebrates reveal which changes are tolerated and which are not. Evolutionary constraint metrics thus offer functional insight that population frequency alone cannot provide.

In contrast to human populations, which sample recent genetic diversity, evolutionary comparisons capture deep biological histories that reflect mechanisms of gene function and constraint. Sites that are invariant across diverse species are likely to be functionally important, whereas positions that vary naturally can be deprioritized. This orthogonal evidence can help resolve uncertainty for variants that remain rare in human databases.

Cornerstone Genomic’s Solution (CodeXome)

Cornerstone Genomic’s platform, CodeXome, integrates deep evolutionary evidence with human population data, clinical annotations, and computational predictors to address the limitations of population-only approaches. By combining cross-species comparative genomics with large human reference databases, CodeXome prioritizes variants based on both evolutionary constraint and observed human variation. This evolutionary filtering reduces the candidate variant space, highlights biologically plausible functional impacts, and accelerates interpretation workflows that otherwise depend on manual curation and limited evidence.

What This Means for the Field

Clinical laboratories benefit from broader evidence integration that reduces VUS burden and improves interpretive confidence across diverse populations. Medical geneticists gain mechanistic insight into variant impact that is not discernible from allele frequency alone. Researchers obtain prioritized variant lists enriched for functionally relevant signals, guiding functional follow-up studies and disease gene discovery.

Population databases remain essential, but they are not sufficient on their own to resolve the VUS crisis. Integrating evolutionary evidence with population data strengthens interpretation, reduces uncertainty, and accelerates the translation of genomic data into clinical answers.

Conclusion

Population databases provide foundational frequency evidence for variant interpretation, but their limitations in sampling depth, demographic representation, and mechanistic insight ensure that many variants remain of uncertain significance. Expanding the evidence base beyond human populations is necessary to address the VUS crisis effectively. Evolutionary genomics offers an orthogonal and biologically grounded dimension of evidence that, when incorporated through platforms like CodeXome, can improve variant classification, reduce manual burden, and enable more confident clinical interpretation.

We invite clinical labs, geneticists, and researchers to explore evolutionary filtering frameworks, evaluate CodeXome in their workflows, and collaborate on complex variant interpretation challenges.

References

Continue reading