Introduction
The CFTR gene is one of the most extensively studied disease genes, with over 2,000 reported variants and hundreds of well-characterized pathogenic substitutions. Its biological function is strictly regulated, and its protein domains exhibit established patterns of intolerance to change. This makes CFTR an ideal gene for evaluating whether deep-time evolutionary recurrence and evolutionary absence align with established clinical pathogenicity classifications. Across primate evolution, CFTR serves as a clean, high-confidence benchmark for validating CodeXome’s Deep Time Ancestry approach
Objective of This Study
This study aims to determine whether known pathogenic CFTR amino acid substitutions recur naturally across primates or if they remain entirely absent across the evolutionary record. If pathogenic variants were tolerated in other species, they would appear as natural substitutions in CodeXome’s primate alignment. Their absence across 80 million years would indicate strong functional constraint. Furthermore, this analysis evaluates whether naturally recurring substitutions align with known benign clinical classifications.
Methods
Using CodeXome’s Gene Profile module, the following analysis was performed:
Dataset
- Variants: Over 260 curated pathogenic CFTR amino acid substitutions compiled from clinical literature and variant curation efforts.
- Evolutionary Data: Recurrence/absence patterns across 55 primate genera.
- Metrics: Alignment depth, substitution class, and site-level evolutionary constraint metrics, ranging from highly conserved to highly variable, derived from CodeXome’s Deep Time Ancestry dataset.
Analysis Steps
- Variant Mapping: Each known pathogenic CFTR substitution was mapped to its orthologous site in the primate alignment. Natural variation was verified using the Gene Profile view (AA + codon modes), the “Changes relative to human” filter, and checking for recurrence in any primate lineage and lineage-specific substitutions.
- Clinical Comparison: Naturally occurring substitutions were compared to ClinVar benign and likely benign labels.
Key Findings
Finding 1 — Pathogenic CFTR variants are virtually absent in primates.
Out of more than 260 curated pathogenic amino acid changes, CodeXome found that only 2 substitutions are shared across 80 million years of primate evolution. This corresponds to >99 percent concordance between evolutionary absence and curated pathogenicity, strongly supporting the principle that pathogenic variants are not tolerated in natural variation for CFTR.
Finding 2 — Naturally recurring substitutions consistently match benign classifications.
CFTR displays dozens of tolerated amino acid substitutions across primates. For these sites, recurring primate substitutions co-occur with ClinVar benign and likely benign variants. Furthermore, for those ClinVar benign/likely benign variants that do not co-occur in primates, our predictive tool—the new Deep Time Ancestry Score (AI/ML)—matches their classification at 99% accuracy.
Finding 3 — Evolutionary patterns are distinctive across functional domains.
CFTR falls into the average range of evolutionary rates of change when compared with a dataset of 19,244 coding genes. Patterns of natural variation are distinctive and correlated with motifs and regions within:
- Nucleotide-binding domains (NBD1, NBD2)
- Regulatory domains
- Key transmembrane helices
Finding 4 — Evolutionary evidence is highly concordant with clinical truth sets.
Comparing evolutionary data with curated clinical classifications:
- Pathogenic: Almost always absent in primates; Deep-Time Ancestry predictions match clinical calls at 99%.
- Likely pathogenic: Displays an identical pattern to pathogenic variants.
- Benign: Recurring substitutions across primates; Deep-Time Ancestry predictions match clinical calls at 99%.
- VUS (Variants of Uncertain Significance): Mixed, but 5% of CFTR VUS can be reclassified as likely benign in a single step, and the Deep-Time Ancestry Score can confidently predict Likely Benign or Likely Pathogenic.
This supports the idea that evolutionary recurrence is an independent, biological truth set that can be used when clinical data are unknown.Interpretation
The CFTR case demonstrates that:
- Pathogenic variants remain strictly human-unique.
- Benign variants frequently appear in other primate lineages.
- Protein domains have unique evolutionary patterns related to function.
- Deep-time evidence aligns exceptionally well with curated human data.
In short, evolution provides a functional reference for CFTR that is consistent, biologically meaningful, and highly predictive of pathogenicity. CFTR serves as a foundational validation of CodeXome’s approach.
Impact for Researchers
Using CFTR as an anchor point, researchers can trust that:
- Deep-time recurrence is reliable for distinguishing tolerated from intolerant substitutions.
- CodeXome identifies constrained and variable regions that match clinical expectations.
- Absence across primates can be strong negative evolutionary evidence of function, backed by the Deep-Time Ancestry predictive score.
- The Deep Time Ancestry Score reflects true biological constraint, not prediction heuristics.
This case helps establish CodeXome as a meaningful source of functional evidence across diverse disease genes.
