GTLP

Residue-level evolutionary constraint for every coding gene in the human genome.

Input → Analysis → Output
IN
Upload VCF or gene list
S1
Stage 1 · Primate recurrence filter
Clears shared-benign variants in one step
S2
Stage 2 · DTAS ML prioritization
Scores human-specific residual pool
OUT
Annotated TSV export
CodeXome

A browser-based research workbench where scientists explore gene constraint profiles, run evolutionary variant annotation on their own data, and export TSV results to bring into their existing analysis workflows.

NS
At a glance

What CodeXome adds that other tools don't have.

Most variant interpretation stacks draw from two evidence types: human population frequency (gnomAD, LOEUF) and machine-learning predictions (REVEL, CADD, AlphaMissense). Both are useful — and increasingly correlated with each other. CodeXome adds a third, orthogonal evidence source: empirical evolutionary recurrence across 55 primate genera.

01 · Scope
19,244
Human genes, fully profiled

Every coding gene has residue-level constraint data across 55 primate genera, mapped to GRCh38 human coordinates.

02 · Resolution
Residue
Level, not gene-level

Not a LOEUF-style gene intolerance metric. CodeXome tells you which specific substitution at a specific position has been tolerated.

03 · Evolutionary window
80M yrs
Of primate selection

The intermediate window between human population data (millennia) and deep vertebrate conservation (hundreds of millions of years).

04 · Principle
Recurrence
Not prediction

Direct empirical observation. A variant shared with primate species has been tested by millions of generations of selection — not a statistical inference.

05 · Population bias
Zero
Population-agnostic

Equally informative regardless of patient ancestry. Particularly valuable where gnomAD coverage is sparse.

06 · Black box
None
Fully transparent evidence

Every classification traces back to visible alignment data. Gene Profile Module shows alignments, annotations, and overlays side-by-side.

RQHV
The core mechanic

Evolution clears the easy ones. Machine learning handles what's left.

A typical patient exome yields ~20,000 coding variants. Standard filtering trims that to a few hundred candidates — the vast majority still benign, hiding a small number that actually matter. CodeXome's two-stage model collapses that review burden by separating variants biology has already tested from the narrow pool of truly human-specific residuals.

Stage 01

Primate Recurrence Filter

The direct empirical evidence layer — no ML, no statistics, just recurrence.

Every incoming variant is cross-referenced against the proprietary primate exome database. If the same amino-acid substitution appears in one or more of 55 non-human primate genera, it has been biologically tested — proteins of closely related species have tolerated that exact change across millions of generations.

  • Input: amino-acid substitution at specific residue position (GRCh38-mapped)
  • Check: does the same substitution appear in primate alignments?
  • Output: recurrence count, lineage coverage, shared-with-primates flag
  • Effect: shared variants cleared; human-specific variants passed to Stage 02
Empirical, not predictive. Every variant is checked directly against the primate record — no statistical inference, no ML model, no training data required.
Stage 02

Deep Time Ancestry Score (DTAS)

A supervised ML layer for the human-specific residuals.

After the recurrence filter clears shared-benign variants, what remains is a mixed pool — human-specific variants with no primate record. DTAS ranks them using a supervised model trained on the proprietary primate dataset.

  • Model: supervised learning on residue-level constraint, domain context, primate substitution patterns
  • Training: 80M years of primate alignments
  • Output: per-variant functional-impact score, prioritization rank
  • Integration: stacks alongside REVEL, CADD, AlphaMissense as orthogonal evidence
KAVH
Platform Modules

Four tools, one evidence layer.

Every module reads from the same proprietary primate exome database. What differs is how researchers interrogate it — browsing genes, analyzing variant sets, or scoring novel positions.

01 · The Database

Deep Time Ancestry Database

The proprietary foundation. Multi-species alignment across 55 primate genera, mapped residue-by-residue to human GRCh38. Every other module reads from this.

55 generaResidue-levelGRCh38
02 · Browse

Gene Profile Module

Search any of 19,244 genes and see constraint, domains, motifs, IDRs, and human-unique mutations in one view — with gnomAD, ClinVar, UniProt overlays.

19,244 genes3 overlays
The shared evidence layer
~2B

Evolutionary datapoints, one database

80 million years of primate selection, consolidated into a single proprietary resource. Every module queries it.

03 · Analyze

Variant Analysis

Upload a VCF or paste variants. Two-stage analysis runs in minutes. Every result exports as TSV with recurrence, human_unique, DTAS, and QC fields.

VCF inTSV outCohort-scale
04 · Score

Deep Time Ancestry Score

AI scoring for human-specific residuals — trained on 80M years of primate alignments. Orthogonal to REVEL, CADD, AlphaMissense. 99.3% on CF benchmark.

99.3% CFSupervised ML
SN
The Database

Deep Time Ancestry Database

The foundation layer — a unified multi-species alignment spanning 55 primate genera, mapped residue by residue to human GRCh38 coordinates.

Per-residue recurrenceAbsence signalsLineage coverage · QC-gatedDomain aggregations
Coordinate system
GRCh38
Resolution
Amino-acid residue
Taxonomic breadth
55 primate genera
Evolutionary window
~80M years
alignment · BRCA1 · residue 1755
gene profile · SCN5A
Gene Profile Module

Every gene, mapped across evolution.

The interactive research interface. Search any of 19,244 human genes and see the full evolutionary constraint profile — all in one unified view.

Domain boundariesMotifs & active sitesHuman-unique residuesgnomAD · ClinVar · UniProt
Gene access
All 19,244 coding
Lookup
Symbol · Ensembl · RefSeq
Overlays
gnomAD · ClinVar · UniProt
Export
TSV · selection or full gene
Variant Analysis

Run the evolutionary filter on your own data.

Upload a VCF — or paste individual variants — and run the full two-stage analysis. Every result is backed by the same alignment evidence you can inspect in the Gene Profile Module.

Inputs
.vcf · .vcf.gzgene listsHGVS · Ensembl · chr:poscohort-scale
Output fields
@@PROTECT0@@@@PROTECT1@@@@PROTECT2@@@@PROTECT3@@@@PROTECT4@@@@PROTECT5@@
variant analysis · cohort upload
dtas · score distribution
Deep Time Ancestry Score

AI evolutionary scoring, trained on 80M years.

For variants Stage 01 couldn't clear, DTAS provides a per-variant ML score derived from the same proprietary primate dataset. The only ML scoring whose training data is an empirical evolutionary record.

Orthogonal to REVEL · CADD · AlphaMissenseResidue-aware, not just gene-awarePeriodically retrainedStacks alongside your predictors
Model type
Supervised learning
Training data
Proprietary primate alignments
CF benchmark
99.3% accuracy · n=260
Retraining
Periodic · validated diagnostics
AV
Your workflow

Browser-based. No install. TSV out.

CodeXome is designed to fit into the workflow you already run — not replace it. Every result comes out as a TSV that joins your existing analysis in R, Python, Excel, or your lab's annotation workflow.

Step 01 · Upload

VCF or gene list

GRCh38-aligned VCFs, gene symbols, Ensembl IDs, or cohort-scale uploads. No preprocessing required — the platform handles normalization and canonical transcript mapping.

Accepts: .vcf .vcf.gz gene_symbol ENSG*
Step 02 · Annotate

Primate recurrence + DTAS

Every variant annotated with cross-primate recurrence, residue-level constraint, and Deep Time Ancestry Score. Browse results in the interface or push straight to export.

Adds: primate_recurrence human_unique dtas_score qc_coverage
Step 03 · Export

TSV, VCF, or Gene Profile

Export annotated variants as TSV or VCF, or pull full Gene Profile reports for structured review. Drops into R, Python, Excel — same pattern as dbNSFP or any other annotation source.

Outputs: .tsv .vcf gene_profile.pdf
Validation

Measured against expert classifications and clinical ground truth.

CodeXome's evidence claims are concrete and reproducible. Each validation study was designed to test a specific assertion — not a vague demonstration of performance.

100%

ClinGen concordance

Across 52 genes and 2,689 expert SNVs, no variant classified pathogenic co-occurred in the primate database.

20%

Avg VUS reclassified

Across ~14,000 ClinVar genes, evolutionary evidence resolved 20% of VUS missense variants as likely benign (range 10–40%).

99.3%

DTAS on CF benchmark

Predictive accuracy against 260 cystic fibrosis therapeutic mutations with established clinical response data.

10,699

BRCA1/2 variants validated

ENIGMA/BRCA Exchange validation produced the same result — pathogenic variants are human-unique, benign are shared.

See your gene through the lens of evolution.

Deeper evidence. Faster interpretation. Stronger conclusions.