AlephConforma

Current protein structure prediction approaches include AI methods and template-based modeling, each with distinct tradeoffs in accuracy, interpretability, provenance, and computational cost and time. AlephConforma integrates both: it takes a protein's 3D structure - or, if none is provided, generates an AI-predicted structure from the input sequence (ESMFold; Lin et al., 2023) as a prior (in the examples shown on this page, ESMFold v1 is used as the prior) - and upgrades each domain with the best-matching experimentally determined structure from the Protein Data Bank (PDB), selected through systematic structural analysis and statistically principled template ranking. Each domain may draw from a different PDB entry, combining experimental evidence across the database for each domain. Domain regions with experimental templates trace back to specific PDB structures with full provenance; non-domain regions remain as the user-provided scaffold or, when no scaffold is provided, as the AI-generated prior.

For proteins with intra-domain conformational diversity - such as Class A GPCRs, kinases, and other multi-state families - AlephConforma identifies distinct conformational states within each domain and delivers a separate atomic-resolution model for each, capturing their structural diversity that underlies distinct functional states.

Each output includes the 3D structure with per-residue confidence scores, a provenance report identifying the experimental or AI-predicted origin of each region, and named conformational state labels where applicable - typically delivered in under a minute.

How to Read the Case Studies

TM-Score: a measure of structural similarity between a predicted model and the experimentally determined structure, on a scale from 0 to 1 where 1 indicates an identical match.

RMSD: root mean square distance in angstroms between corresponding atoms.

In the case studies presented here, AlephConforma is built exclusively from experimental PDB structures released before July 1, 2022. All case studies below are from PDB entries released after that date - structures that AlephConforma has never seen during its construction. This ensures a fair evaluation with no data leakage.

Sequence identity (SID): a measure of similarity between query and template sequences. Domain SID is within the matched domain; chain SID is across the full chain.

Experimental structures are from the Protein Data Bank (PDB). ESMFold predictions were computed using ESMFold v1. Structures are visualized using Mol*.

Case Studies

Case 1: Hsp70 Chaperone - 4 Conformational States

PDB 8D1W | Plasmodium falciparum

GRP78, an Hsp70-family chaperone. AlephConforma detects four distinct conformational states across the Hsp70 ATP cycle and returns a model for each. Template-to-query sequence identity is moderate - no near-duplicate.

Templates: 5ey4 (domain 72%, chain 65%)

Method	TM-score	RMSD (Å)	vs native	State label
AlephConforma - State 0	0.972	1.25	matches this native	domain undocked
AlephConforma - State 1	0.864	2.96	alternative conformation	domain docked
AlephConforma - State 2	0.864	2.79	alternative conformation	allosteric intermediate
AlephConforma - State 3	0.873	2.76	alternative conformation	substrate stimulated
ESMFold	0.879	2.73	single model	no state information

This native was solved in one specific conformation (ADP-bound, domain undocked). State 0 matches it. States 1-3 are alternative Hsp70 conformations this particular structure does not capture.

Loading 3D viewer...

AlephConforma identifies four distinct conformational states of this Hsp70 chaperone, each built from experimental templates. State 0 closely matches this particular native (TM 0.972), which was solved in the ADP-bound / domain-undocked conformation. States 1-3 are alternative conformations from the Hsp70 functional cycle - each represents a different known state of this protein family, and each is available as a separate atomic-resolution model. ESMFold produces a single model with no way to indicate that alternatives exist.

Predictions for Unsolved Receptors: TAAR5

Human TAAR5 (UniProt O14804), a trace amine receptor with no experimental structure in the PDB as of April 2026. AlephConforma generated 6 multi-state predictions from sequence.

Loading 3D viewer...

State	Label	Download
0	inactive dark-state retinal-bound rhodopsin	taar5_human_state0.pdb
1	active opsin G-protein/arrestin-coupled	taar5_human_state1.pdb
2	active agonist-bound β2AR Gs-coupled	taar5_human_state2.pdb
3	inactive antagonist-bound β2AR crystal	taar5_human_state3.pdb
4	inactive antagonist or inverse-agonist bound 5-HT2AR	taar5_human_state4.pdb
5	active agonist-bound 5-HT2AR Gq-coupled	taar5_human_state5.pdb

Predictions and state labels are returned directly by the model. Biological relevance of individual states is for the user to evaluate.

The 6 predicted structures are licensed under CC BY 4.0. Attribution: Aleph Motif LLC. Sequence input from UniProt (accession O14804) under CC BY 4.0.

Need predictions for your target portfolio? Reach out at contact@alephmotif.com for custom predictions and bulk access to the multi-state library.

GPCR Validation

AlephConforma models intra-domain conformational states - different 3D conformations of the same protein domain, such as the active and inactive states of Class A GPCRs. It does not currently model inter-domain orientational states, which involve different spatial arrangements of multiple domains and occur in Classes B and C of the GPCR family. AlephConforma's multi-state output is currently limited to Class A; we report coverage and quality metrics for all classes below.

AlephConforma was built from Protein Data Bank (PDB) coordinate files released before July 1, 2022. GPCRdb's activation state annotations and class assignments were not used as training input; they were held out for independent post-hoc validation. Among the GPCR chains catalogued in GPCRdb, 871 are from PDB entries released after our July 1, 2022 cutoff. These structures were not available during AlephConforma's build, making them a fully held-out validation set.

Per-class coverage

AlephConforma produced multi-state output for Class A chains only. For Classes B1, B2, C, F, and other subtypes, it returned single-state output across all chains - their conformational state differences involve domain orientation rather than the intra-domain mechanism AlephConforma currently captures. This is a scope limitation of the current release, not a per-chain failure. Class assignments were retrieved from the GPCRdb REST API; GPCRdb class assignment follows the official receptor classification nomenclature established by the IUPHAR/BPS Guide to PHARMACOLOGY (NC-IUPHAR; see references).

Class	Total chains	Multi-state output	Single-state output
Class A	708	679	29
Class B1	52	0	52
Class B2	21	0	21
Class C	52	0	52
Class F	12	0	12
Other	26	0	26

Structural prediction quality - Class A multi-state chains

For the 679 Class A chains, where AlephConforma returned multiple conformational states, the table below shows the best-state TM-score: the highest agreement between any of model's predicted states and the native experimental structure. Because the experimental structures capture only one of possible conformations, this metric shows whether at least one of predicted states matches the state the native has captured, which is the score a user would see after selecting the relevant conformation state.

TM range	Count	Percentage
< 0.80	41	6.0%
0.80 - 0.95	461	67.9%
>= 0.95	177	26.1%
Total	679	100%

Mean TM: 0.902 | Median TM: 0.916 | N = 679

Inter-state conformational separation

For the same 679 Class A chains, the table below shows the TM-score between AlephConforma's two predicted states - a measure of how structurally distinct the predicted conformations are with respect to each other (not to the native). The native experimental structure is not involved in these metrics.

TM range	Count	Percentage
< 0.70	10	1.5%
0.70 - 0.80	82	12.1%
0.80 - 0.85	107	15.8%
0.85 - 0.90	256	37.7%
0.90 - 0.95	155	22.8%
>= 0.95	69	10.2%
Total	679	100%

Mean: 0.870 | Median: 0.881 | SD: 0.065 | N = 679

The bulk of the distribution falls in the 0.80-0.95 range, and that is consistent with the transmembrane-helix rearrangement observed between active and inactive Class A GPCR structures. Values in this range indicate that the model's two outputs represent significantly distinct conformations rather than minor variants of the same structure.

In the tails, 10 chains (1.5%) show pairs with TM < 0.70, indicating separations larger than the typical active-inactive states, and 69 chains (10.2%) show pairs with TM >= 0.95, where the two predicted states are nearly identical. These edge cases reflect variability in the model's behavior for sparsely populated subfamilies and are tracked as ongoing areas of model improvement.

AlephState - state classification accuracy

AlephState is a separate component within the AlephConforma model that classifies the conformational state of an input 3D structure. For Class A GPCRs, it returns a binary label: active or inactive. Given a PDB chain as input, AlephState identifies which conformational state the structure represents. AlephState is currently trained for Class A only.

To evaluate AlephState, its predictions on Class A native structures were compared against independent expert annotations from GPCRdb. GPCRdb classifies GPCR structures into three categories: active, intermediate, and inactive. The comparison is reported below in two forms: all Class A chains with comparable labels, and the subset where AlephState reports high or moderate confidence (excluding cases where AlephState is uncertain).

All Class A chains (N = 695)

	GPCRdb: Active	GPCRdb: Intermediate	GPCRdb: Inactive
AlephState: Active	619	3	16
AlephState: Inactive	1	5	51

Accuracy: 96.4%

High + moderate confidence (N = 631)

	GPCRdb: Active	GPCRdb: Intermediate	GPCRdb: Inactive
AlephState: Active	605	2	1
AlephState: Inactive	0	0	23

Accuracy: 99.5%

Accuracy metric is the fraction of chains where AlephState's prediction agrees with the GPCRdb label (active to active or inactive to inactive). GPCRdb's "Intermediate" category has no corresponding AlephState output - AlephState is a binary classifier. Predictions on intermediate-labeled chains are included in the matrix for transparency, but they are not counted as correct under this comparison. Confidence tiers of AlephState derive from the TM-score margin between AlephState's first and second choices: high confidence corresponds to a margin above 0.05, moderate to 0.02-0.05, ambiguous below 0.02.

GPCRdb annotations were not involved in AlephConforma's development. This is an external validation, not an internal benchmark.

Benchmark Summary

AlephConforma has been evaluated on multiple independent datasets comprising over 10,000 protein chains, including structures released after the system's template cutoff date. Across all evaluations, AlephConforma matches or outperforms both AI-based and template-based prediction methods under fair comparison conditions.

AlephConforma

How to Read the Case Studies

Case Studies

Case 1: Hsp70 Chaperone - 4 Conformational States

Predictions for Unsolved Receptors: TAAR5

GPCR Validation

Per-class coverage

Structural prediction quality - Class A multi-state chains

Inter-state conformational separation

AlephState - state classification accuracy

All Class A chains (N = 695)

High + moderate confidence (N = 631)

Benchmark Summary

References & Attribution

Data Sources

Validation Data

Methods/Models Used

Visualization

Structural Comparison