
A browser-based research workbench where scientists explore gene constraint profiles, run evolutionary variant annotation on their own data, and export TSV results to bring into their existing analysis workflows.
Most variant interpretation stacks draw from two evidence types: human population frequency (gnomAD, LOEUF) and machine-learning predictions (REVEL, CADD, AlphaMissense). Both are useful — and increasingly correlated with each other. CodeXome adds a third, orthogonal evidence source: empirical evolutionary recurrence across 55 primate genera.
Every coding gene has residue-level constraint data across 55 primate genera, mapped to GRCh38 human coordinates.
Not a LOEUF-style gene intolerance metric. CodeXome tells you which specific substitution at a specific position has been tolerated.
The intermediate window between human population data (millennia) and deep vertebrate conservation (hundreds of millions of years).
Direct empirical observation. A variant shared with primate species has been tested by millions of generations of selection — not a statistical inference.
Equally informative regardless of patient ancestry. Particularly valuable where gnomAD coverage is sparse.
Every classification traces back to visible alignment data. Gene Profile Module shows alignments, annotations, and overlays side-by-side.
A typical patient exome yields ~20,000 coding variants. Standard filtering trims that to a few hundred candidates — the vast majority still benign, hiding a small number that actually matter. CodeXome's two-stage model collapses that review burden by separating variants biology has already tested from the narrow pool of truly human-specific residuals.
Every incoming variant is cross-referenced against the proprietary primate exome database. If the same amino-acid substitution appears in one or more of 55 non-human primate genera, it has been biologically tested — proteins of closely related species have tolerated that exact change across millions of generations.
After the recurrence filter clears shared-benign variants, what remains is a mixed pool — human-specific variants with no primate record. DTAS ranks them using a supervised model trained on the proprietary primate dataset.
Every module reads from the same proprietary primate exome database. What differs is how researchers interrogate it — browsing genes, analyzing variant sets, or scoring novel positions.
The proprietary foundation. Multi-species alignment across 55 primate genera, mapped residue-by-residue to human GRCh38. Every other module reads from this.
Search any of 19,244 genes and see constraint, domains, motifs, IDRs, and human-unique mutations in one view — with gnomAD, ClinVar, UniProt overlays.
80 million years of primate selection, consolidated into a single proprietary resource. Every module queries it.
Upload a VCF or paste variants. Two-stage analysis runs in minutes. Every result exports as TSV with recurrence, human_unique, DTAS, and QC fields.
AI scoring for human-specific residuals — trained on 80M years of primate alignments. Orthogonal to REVEL, CADD, AlphaMissense. 99.3% on CF benchmark.
The foundation layer — a unified multi-species alignment spanning 55 primate genera, mapped residue by residue to human GRCh38 coordinates.


The interactive research interface. Search any of 19,244 human genes and see the full evolutionary constraint profile — all in one unified view.
Upload a VCF — or paste individual variants — and run the full two-stage analysis. Every result is backed by the same alignment evidence you can inspect in the Gene Profile Module.

.avif)
For variants Stage 01 couldn't clear, DTAS provides a per-variant ML score derived from the same proprietary primate dataset. The only ML scoring whose training data is an empirical evolutionary record.
CodeXome is designed to fit into the workflow you already run — not replace it. Every result comes out as a TSV that joins your existing analysis in R, Python, Excel, or your lab's annotation workflow.
GRCh38-aligned VCFs, gene symbols, Ensembl IDs, or cohort-scale uploads. No preprocessing required — the platform handles normalization and canonical transcript mapping.
.vcf .vcf.gz gene_symbol ENSG*Every variant annotated with cross-primate recurrence, residue-level constraint, and Deep Time Ancestry Score. Browse results in the interface or push straight to export.
primate_recurrence human_unique dtas_score qc_coverageExport annotated variants as TSV or VCF, or pull full Gene Profile reports for structured review. Drops into R, Python, Excel — same pattern as dbNSFP or any other annotation source.
.tsv .vcf gene_profile.pdfCodeXome's evidence claims are concrete and reproducible. Each validation study was designed to test a specific assertion — not a vague demonstration of performance.
Across 52 genes and 2,689 expert SNVs, no variant classified pathogenic co-occurred in the primate database.
Across ~14,000 ClinVar genes, evolutionary evidence resolved 20% of VUS missense variants as likely benign (range 10–40%).
Predictive accuracy against 260 cystic fibrosis therapeutic mutations with established clinical response data.
ENIGMA/BRCA Exchange validation produced the same result — pathogenic variants are human-unique, benign are shared.
Deeper evidence. Faster interpretation. Stronger conclusions.