CFTR

A Deep-Time Evolutionary Analysis of CFTR Pathogenicity Using CodeXome

CFTR is one of the most extensively characterized disease genes in human genetics, with over 2,000 reported variants and hundreds of well-validated pathogenic substitutions. CodeXome cross-referenced more than 260 curated pathogenic CFTR amino acid substitutions against 80 million years of primate evolutionary data and found only 2 shared with primates — greater than 99% concordance between evolutionary absence and curated pathogenicity. Naturally recurring substitutions tracked ClinVar benign classifications, and the Deep Time Ancestry Score matched clinical calls at 99% accuracy across pathogenic, likely pathogenic, and benign categories.

A tightly regulated chloride channel and the causative gene for cystic fibrosis, with over 2,000 reported variants.
The finding

Of 260+ curated pathogenic CFTR substitutions, only 2 are shared across 80 million years of primate evolution — greater than 99% concordance between evolutionary absence and curated pathogenicity.

01
99%
CONCORDANCE WITH CURATED PATHOGENICITY
02
2/260
PATHOGENIC VARIANTS SHARED WITH PRIMATES
03
99%
DTAS ACCURACY ACROSS CLINICAL CATEGORIES
The study

The CFTR gene is one of the most extensively studied disease genes in human genetics, with over 2,000 reported variants and hundreds of well-characterized pathogenic substitutions. Its biological function — a tightly regulated chloride channel — is strictly controlled, and its protein domains exhibit established patterns of intolerance to change. This combination of deep clinical characterization and well-defined functional architecture makes CFTR an ideal gene for evaluating whether deep-time evolutionary recurrence aligns with established clinical pathogenicity classifications. Across primate evolution, CFTR serves as a clean, high-confidence benchmark for validating CodeXome's Deep Time Ancestry approach.

Objective

This study evaluates whether known pathogenic CFTR amino acid substitutions recur naturally across primates or remain entirely absent across the evolutionary record. If pathogenic variants were tolerated in other species, they would appear as natural substitutions in CodeXome's primate alignment. Their absence across 80 million years would indicate strong functional constraint. The analysis also evaluates whether naturally recurring primate substitutions align with known benign clinical classifications, and whether the Deep Time Ancestry Score (DTAS) can predict clinical category for variants outside the directly observed primate record.

Methods

Using CodeXome's Gene Profile module, the analysis drew on three data layers: more than 260 curated pathogenic CFTR amino acid substitutions compiled from clinical literature and variant curation efforts; recurrence and absence patterns across 55 primate genera spanning 80 million years of evolution; and site-level evolutionary constraint metrics ranging from highly conserved to highly variable, derived from the Deep Time Ancestry dataset.

Each known pathogenic CFTR substitution was mapped to its orthologous site in the primate alignment. Natural variation was verified using the Gene Profile view (amino acid and codon modes), the "Changes relative to human" filter, and lineage-specific recurrence checks. Naturally occurring primate substitutions were then compared to ClinVar benign and likely benign labels to test concordance in both directions — pathogenic absence and benign recurrence.

Key Findings

Finding 1 — Pathogenic CFTR variants are virtually absent in primates. Of more than 260 curated pathogenic amino acid changes, only 2 substitutions are shared anywhere across 80 million years of primate evolution. This corresponds to greater than 99% concordance between evolutionary absence and curated pathogenicity, supporting the principle that pathogenic CFTR variants are not tolerated in natural variation.

Finding 2 — Naturally recurring substitutions match benign classifications. CFTR displays dozens of tolerated amino acid substitutions across primates. At these sites, recurring primate substitutions co-occur with ClinVar benign and likely benign variants. For benign and likely benign variants that do not directly co-occur in primates, the Deep Time Ancestry Score matches their clinical classification at 99% accuracy.

Finding 3 — Evolutionary patterns are distinctive across functional domains. Compared against the full 19,244-gene CodeXome dataset, CFTR's overall evolutionary rate falls within the average range. Within the gene, however, patterns of natural variation are distinctive and correlated with specific functional architecture: the nucleotide-binding domains (NBD1 and NBD2), the regulatory domain, and key transmembrane helices each show characteristic constraint signatures.

Finding 4 — Evolutionary evidence is highly concordant with clinical truth sets across all categories. Pathogenic variants are almost always absent in primates, with DTAS predictions matching clinical calls at 99%. Likely pathogenic variants display an identical pattern. Benign variants recur across primates, with DTAS predictions matching clinical calls at 99%. VUS show a mixed pattern — 5% of CFTR VUS can be identified as likely benign in a single step through primate recurrence, and DTAS provides confident likely-benign or likely-pathogenic predictions for additional VUS in the unresolved pool.

Interpretation

The CFTR case demonstrates four properties of evolutionary evidence on a well-characterized disease gene: pathogenic variants remain strictly human-unique; benign variants frequently appear in other primate lineages; protein domains have distinctive evolutionary patterns that track functional architecture; and deep-time evidence aligns with curated clinical data at near-complete concordance.

The 2-of-260 result is particularly informative. Pathogenic CFTR variants have been identified in patients across decades of clinical genetics work, yet across 55 primate genera and 80 million years of evolution, only 2 of those substitutions have ever been observed as natural variation. This is the empirical signature of strong purifying selection: deleterious changes do not persist in wild populations, regardless of how often they arise. The inverse pattern — benign recurrence — is equally clean. Tolerated substitutions appear in primate lineages and align with the variants human clinical genetics has independently classified as benign.

The 99% DTAS accuracy across pathogenic, likely pathogenic, and benign categories indicates that the predictive score generalizes the directly observed primate signal to variants without direct evolutionary recurrence. This matters most for the VUS pool, where 5% of CFTR VUS are clearable through primate recurrence alone and additional VUS receive confident DTAS predictions in either direction.

In short, evolution provides a functional reference for CFTR that is consistent, biologically meaningful, and highly predictive of pathogenicity. CFTR serves as a foundational validation of the CodeXome approach.

Impact for Researchers

Using CFTR as an anchor point, researchers can rely on several properties of the CodeXome evidence layer:

  • Deep-time recurrence reliably distinguishes tolerated from intolerant substitutions in well-characterized disease genes.
  • CodeXome identifies constrained and variable regions that match clinical expectations and track known functional architecture.
  • Absence across primates is strong negative evolutionary evidence of function, backed by the Deep Time Ancestry predictive score.
  • DTAS reflects true biological constraint derived from primate evolution, not prediction heuristics trained on human-only data.

This case helps establish CodeXome as a meaningful source of functional evidence across diverse disease genes, with CFTR providing the anchor point against which performance on less-characterized genes can be calibrated.

The evidence at a glance.

{{name}} results figure

CFTR variant classification compared against primate evolutionary recurrence. Pathogenic and likely pathogenic variants cluster almost entirely in the "absent in primates" column; benign variants cluster in the "shared with primates" column. The Deep Time Ancestry Score resolves the remaining benign and VUS calls at 99% accuracy.

Run this kind of analysis on your own gene of interest.

The CodeXome platform lets you browse residue-level evolutionary evidence across 55 primate genera, live in your browser. No signup for the Gene Previewer.