At a glance

What CodeXome adds that other tools don't have.

Most variant interpretation stacks draw from two evidence types: human population frequency (gnomAD, LOEUF) and machine-learning predictions (REVEL, CADD, AlphaMissense). Both are useful — and increasingly correlated with each other. CodeXome adds a third, orthogonal evidence source: empirical evolutionary recurrence across 55 primate genera.

01 · Scope

19,244

Human genes, fully profiled

Every coding gene has residue-level constraint data across 55 primate genera, mapped to GRCh38 human coordinates.

02 · Resolution

Residue

Level, not gene-level

Not a LOEUF-style gene intolerance metric. CodeXome tells you which specific substitution at a specific position has been tolerated.

03 · Evolutionary window

80M yrs

Of primate selection

The intermediate window between human population data (tens to hundreds of thousands of years) and deep vertebrate conservation (hundreds of millions of years).

04 · Principle

Recurrence

Not prediction

Direct empirical observation. A variant shared with primate species has been tested by millions of generations of selection — not a statistical inference.

05 · Population bias

Zero

Population-agnostic

Equally informative regardless of patient ancestry. Particularly valuable where gnomAD coverage is sparse.

06 · Black box

None

Fully transparent evidence

Every classification traces back to visible alignment data. Gene Profile Module shows alignments, annotations, and overlays side-by-side.

RQHV

The core mechanic

Evolution clears the easy ones. Machine learning handles what's left.

A typical patient exome yields ~20,000 coding variants. Standard filtering trims that to a few hundred candidates — the vast majority still benign, hiding a small number that actually matter. CodeXome's two-stage model collapses that review burden by separating variants biology has already tested from the narrow pool of truly human-specific residuals.

Patient VCF

~20,000

coding variants

after standard filtering:

~300 candidates

75–150 hours of review

→

Stage 01

Primate Recurrence Filter

Shared with primates?

→ Clear as benign

(tested by 80M yrs of selection)

→

Stage 02

DTAS Scoring

(human-specific residuals)

ML-ranked by evolutionary significance

trained on 80M yrs of alignments

→

TSV

~75

ranked candidates

50–75% reduction

→ functional follow-up

Weeks of manual review → minutes of computational analysis.

Stage 01

Primate Recurrence Filter

The direct empirical evidence layer — no ML, no statistics, just recurrence.

Every incoming variant is cross-referenced against the proprietary primate exome database. If the same amino-acid substitution appears in one or more of 55 non-human primate genera, it has been biologically tested — proteins of closely related species have tolerated that exact change across millions of generations.

Input: amino-acid substitution at specific residue position (GRCh38-mapped)
Check: does the same substitution appear in primate alignments?
Output: recurrence count, lineage coverage, shared-with-primates flag
Effect: shared variants cleared; human-specific variants passed to Stage 02

Empirical, not predictive. Every variant is checked directly against the primate record — no statistical inference, no ML model, no training data required.

Stage 02

Deep Time Ancestry Score (DTAS)

A supervised ML layer for the human-specific residuals.

After the recurrence filter clears shared-benign variants, what remains is a mixed pool — human-specific variants with no primate record. DTAS ranks them using a supervised model trained on the proprietary primate dataset.

Model: supervised learning on residue-level constraint, domain context, primate substitution patterns
Output: per-variant functional-impact score, prioritization rank
Integration: stacks alongside REVEL, CADD, AlphaMissense as orthogonal evidence
Training: 80M years of primate alignments

KAVH

Platform Modules

Four tools, one evidence layer.

Every module reads from the same proprietary primate exome database. What differs is how researchers interrogate it — browsing genes, analyzing variant sets, or scoring novel positions.

01 · The Database

Deep Time Ancestry Database

The proprietary foundation. Multi-species alignment across 55 primate genera, mapped residue-by-residue to human GRCh38. Every other module reads from this.

55 generaResidue-levelGRCh38

02 · Browse

Gene Profile Module

Search any of 19,244 genes and see constraint, domains, motifs, IDRs, and human-unique mutations in one view — with gnomAD, ClinVar, UniProt overlays.

19,244 genes3 overlays

The shared evidence layer

~2B

Evolutionary datapoints, one database

80 million years of primate selection, consolidated into a single proprietary resource. Every module queries it.

03 · Analyze

Variant Analysis

Upload a VCF or paste variants. Two-stage analysis runs in minutes. Every result exports as TSV with recurrence, human_unique, DTAS, and QC fields.

VCF inTSV outCohort-scale

04 · Score

Deep Time Ancestry Score

AI scoring for human-specific residuals — trained on 80M years of primate alignments. Orthogonal to REVEL, CADD, AlphaMissense. 99.3% on CF benchmark.

99.3% CFSupervised ML

The Database

Deep Time Ancestry Database

The foundation layer — a unified multi-species alignment spanning 55 primate genera, mapped residue by residue to human GRCh38 coordinates.

Per-residue recurrenceAbsence signalsLineage coverage · QC-gatedDomain aggregations

Coordinate system

GRCh38

Resolution

Amino-acid residue

Taxonomic breadth

55 primate genera

Evolutionary window

~80M years

alignment · BRCA1 · residue 1755

gene profile · SCN5A

Gene Profile Module

Every gene, mapped across evolution.

The interactive research interface. Search any of 19,244 human genes and see the full evolutionary constraint profile — all in one unified view.

Domain boundariesMotifs & active sitesHuman-unique residuesgnomAD · ClinVar · UniProt

Gene access

All 19,244 coding

Lookup

Symbol · Ensembl · RefSeq

Overlays

gnomAD · ClinVar · UniProt

Export

TSV · selection or full gene

Variant Analysis

Run the evolutionary filter on your own data.

Upload a VCF — or paste individual variants — and run the full two-stage analysis. Every result is backed by the same alignment evidence you can inspect in the Gene Profile Module.

Inputs

.vcf · .vcf.gzcohort-scale

variant analysis · cohort upload

dtas · score distribution

Deep Time Ancestry Score

AI evolutionary scoring, trained on 80M years.

For variants Stage 01 couldn't clear, DTAS provides a per-variant ML score derived from the same proprietary primate dataset. The only ML scoring whose training data is an empirical evolutionary record.

Orthogonal to REVEL · CADD · AlphaMissenseResidue-aware, not just gene-awarePeriodically retrainedStacks alongside your predictors

Model type

Supervised learning

Training data

Proprietary primate alignments

CF benchmark

99.3% accuracy · n=260

Retraining

Periodic · validated diagnostics

Your workflow

Research Platform. No install. TSV out.

CodeXome is designed to fit into the workflow you already run — not replace it. Every result comes out as a TSV that joins your existing analysis in R, Python, Excel, or your lab's annotation workflow.

Step 01 · Upload

VCF or gene list

GRCh38-aligned VCFs, gene symbols, Ensembl IDs, or cohort-scale uploads. No preprocessing required — the platform handles normalization and canonical transcript mapping.

Accepts: .vcf .vcf.gz gene_symbol ENSG*

Step 02 · Annotate

Primate recurrence + DTAS

Every variant annotated with cross-primate recurrence, residue-level constraint, and Deep Time Ancestry Score. Browse results in the interface or push straight to export.

Adds: primate_recurrence human_unique dtas_score qc_coverage

Step 03 · Export

TSV, VCF, or Gene Profile

Export annotated variants as TSV or VCF, or pull full Gene Profile reports for structured review. Drops into R, Python, Excel — same pattern as dbNSFP or any other annotation source.

Outputs: .tsv .vcf

Validation

Measured against expert classifications and clinical ground truth.

CodeXome's evidence claims are concrete and reproducible. Each validation study was designed to test a specific assertion — not a vague demonstration of performance.

100%

ClinGen concordance

Across 52 genes and 2,689 expert SNVs, no variant classified pathogenic co-occurred in the primate database.

20%

Avg VUS reclassified

Across ~14,000 ClinVar genes, evolutionary evidence resolved 20% of VUS missense variants as likely benign (range 10–40%).

99.3%

DTAS on CF benchmark

Predictive accuracy against 260 cystic fibrosis therapeutic mutations with established clinical response data.

10,699

BRCA1/2 variants validated

ENIGMA/BRCA Exchange validation produced the same result — pathogenic variants are human-unique, benign are shared.

Read the full validation studies →

Residue-level evolutionary constraint for every coding gene in the human genome.

What CodeXome adds that other tools don't have.

Evolution clears the easy ones. Machine learning handles what's left.

Primate Recurrence Filter

Deep Time Ancestry Score (DTAS)

Four tools, one evidence layer.

Deep Time Ancestry Database

Gene Profile Module

Evolutionary datapoints, one database

Variant Analysis

Deep Time Ancestry Score

Deep Time Ancestry Database

Every gene, mapped across evolution.

Run the evolutionary filter on your own data.

AI evolutionary scoring, trained on 80M years.

Research Platform. No install. TSV out.

VCF or gene list

Primate recurrence + DTAS

TSV, VCF, or Gene Profile

Measured against expert classifications and clinical ground truth.

ClinGen concordance

Avg VUS reclassified

DTAS on CF benchmark

BRCA1/2 variants validated

See your gene through the lens of evolution.