AlephConforma
Current protein structure prediction approaches include AI methods and template-based modeling, each with distinct tradeoffs in accuracy, interpretability, provenance, and computational cost and time. AlephConforma integrates both: it takes a protein's 3D structure - or, if none is provided, generates an AI-predicted structure from the input sequence (ESMFold; Lin et al., 2023) as a prior - and upgrades each domain with the best-matching experimentally determined structure from the Protein Data Bank (PDB), selected through systematic structural analysis and statistically principled template ranking. Each domain may draw from a different PDB entry, combining experimental evidence across the database for each domain. Domain regions with experimental templates trace back to specific PDB structures with full provenance; remaining regions retain the input structure or AI-generated scaffold.
For proteins with intra-domain conformational diversity - such as Class A GPCRs, kinases, and other multi-state families - AlephConforma identifies distinct conformational states within each domain and delivers a separate atomic-resolution model for each, capturing their structural diversity that underlies distinct functional states.
Each output includes the 3D structure with per-residue confidence scores, a provenance report identifying the experimental or AI-predicted origin of each region, and named conformational state labels where applicable - typically delivered in under a minute.
Launching May 1, 2026 on this website.
How to Read the Case Studies
TM-Score: a measure of structural similarity between a predicted model and the experimentally determined structure, on a scale from 0 to 1 where 1 indicates an identical match.
RMSD: root mean square distance in angstroms between corresponding atoms.
In the case studies presented here, AlephConforma is built exclusively from experimental PDB structures released before July 1, 2022. All case studies below are from PDB entries released after that date - structures that AlephConforma has never seen during its construction. This ensures a fair evaluation with no data leakage.
Sequence identity (SID): a measure of similarity between query and template sequences. Domain SID is within the matched domain; chain SID is across the full chain.
Experimental structures are from the Protein Data Bank (PDB). ESMFold predictions were computed using ESMFold v1. Structures are visualized using Mol*.
Case Studies
Case 1: Hsp70 Chaperone - 4 Conformational States
PDB 8D1W | Plasmodium falciparum
GRP78, an Hsp70-family chaperone. AlephConforma detects four distinct conformational states across the Hsp70 ATP cycle and returns a model for each. Template-to-query sequence identity is moderate - no near-duplicate.
Templates: 5ey4 (domain 72%, chain 65%)
| Method | TM-score | RMSD (Å) | vs native | State label |
|---|---|---|---|---|
| AlephConforma - State 0 | 0.972 | 1.25 | matches this native | domain undocked |
| AlephConforma - State 1 | 0.864 | 2.96 | alternative conformation | domain docked |
| AlephConforma - State 2 | 0.864 | 2.79 | alternative conformation | allosteric intermediate |
| AlephConforma - State 3 | 0.873 | 2.76 | alternative conformation | substrate stimulated |
| ESMFold | 0.879 | 2.73 | single model | no state information |
This native was solved in one specific conformation (ADP-bound, domain undocked). State 0 matches it. States 1-3 are alternative Hsp70 conformations this particular structure does not capture.
AlephConforma identifies four distinct conformational states of this Hsp70 chaperone, each built from experimental templates. State 0 closely matches this particular native (TM 0.972), which was solved in the ADP-bound / domain-undocked conformation. States 1-3 are alternative conformations from the Hsp70 functional cycle - each represents a different known state of this protein family, and each is available as a separate atomic-resolution model. ESMFold produces a single model with no way to indicate that alternatives exist.
GPCR Validation (Class A)
AlephConforma models intra-domain conformational states - different 3D conformations of the same protein domain, such as the active and inactive states of Class A GPCRs. It does not currently model inter-domain orientational states, which involve different spatial arrangements of multiple domains and occur in Classes B and C of the GPCR family. The validation below concerns Class A only.
AlephConforma was built from Protein Data Bank (PDB) coordinate files released before July 1, 2022. GPCRdb 's activation state annotations were not used as training input; they were held out for independent post-hoc validation of model's state assignments. Among the GPCR chains catalogued in GPCRdb, 871 are from PDB entries released after our July 1, 2022 cutoff. These structures were not available during AlephConforma's built, making them a fully held-out validation set.
Per-class coverage
AlephConforma produced multi-state output for Class A chains only. For Classes B1, B2, C, F, and other subtypes, it returned single-state output across all chains - their conformational state differences involve domain orientation rather than the intra-domain mechanism AlephConforma currently captures. This is a scope limitation of the current release, not a per-chain failure.
| Class | Total chains | Multi-state output | Single-state output |
|---|---|---|---|
| Class A | 708 | 679 | 29 |
| Class B1 | 52 | 0 | 52 |
| Class B2 | 21 | 0 | 21 |
| Class C | 52 | 0 | 52 |
| Class F | 12 | 0 | 12 |
| Other | 26 | 0 | 26 |
Structural prediction quality - Class A multi-state chains
For the 679 Class A chains, where AlephConforma returned multiple conformational states, the table below shows the best-state TM-score: the highest agreement between any of model's predicted states and the native experimental structure. Because the experimental structures capture only one of possible conformations, this metric shows whether at least one of predicted states matches the state the native has captured, which is the score a user would see after selecting the relevant conformation state.
| TM range | Count | Percentage |
|---|---|---|
| < 0.80 | 41 | 6.0% |
| 0.80 - 0.95 | 461 | 67.9% |
| >= 0.95 | 177 | 26.1% |
| Total | 679 | 100% |
Mean TM: 0.902 | Median TM: 0.916 | N = 679
Inter-state conformational separation
For the same 679 Class A chains, the table below shows the TM-score between AlephConforma's two predicted states - a measure of how structurally distinct the predicted conformations are with respect to each other (not to the native). The native experimental structure is not involved in these metrics.
| TM range | Count | Percentage |
|---|---|---|
| < 0.70 | 10 | 1.5% |
| 0.70 - 0.80 | 82 | 12.1% |
| 0.80 - 0.85 | 107 | 15.8% |
| 0.85 - 0.90 | 256 | 37.7% |
| 0.90 - 0.95 | 155 | 22.8% |
| >= 0.95 | 69 | 10.2% |
| Total | 679 | 100% |
Mean: 0.870 | Median: 0.881 | SD: 0.065 | N = 679
The bulk of the distribution falls in the 0.80-0.95 range, and that is consistent with the transmembrane-helix rearrangement observed between active and inactive Class A GPCR structures. Values in this range indicate that the model's two outputs represent significantly distinct conformations rather than minor variants of the same structure.
In the tails, 10 chains (1.5%) show pairs with TM < 0.70, indicating separations larger than the typical active-inactive states, and 69 chains (10.2%) show pairs with TM >= 0.95, where the two predicted states are nearly identical. These edge cases reflect variability in the model's behavior for sparsely populated subfamilies and are tracked as ongoing areas of model improvement.
AlephState - state classification accuracy
AlephState is a separate component within the AlephConforma model that classifies the conformational state of an input 3D structure. For Class A GPCRs, it returns a binary label: active or inactive. Given a PDB chain as input, AlephState identifies which conformational state the structure represents. AlephState is currently trained for Class A only.
To evaluate AlephState, its predictions on Class A native structures were compared against independent expert annotations from GPCRdb. GPCRdb classifies GPCR structures into three categories: active, intermediate, and inactive. The comparison is reported below in two forms: all Class A chains with comparable labels, and the subset where AlephState reports high or moderate confidence (excluding cases where AlephState is uncertain).
All Class A chains (N = 695)
| GPCRdb: Active | GPCRdb: Intermediate | GPCRdb: Inactive | |
|---|---|---|---|
| AlephState: Active | 619 | 3 | 16 |
| AlephState: Inactive | 1 | 5 | 51 |
Accuracy: 96.4%
High + moderate confidence (N = 631)
| GPCRdb: Active | GPCRdb: Intermediate | GPCRdb: Inactive | |
|---|---|---|---|
| AlephState: Active | 605 | 2 | 1 |
| AlephState: Inactive | 0 | 0 | 23 |
Accuracy: 99.5%
Accuracy metric is the fraction of chains where AlephState's prediction agrees with the GPCRdb label (active to active or inactive to inactive). GPCRdb's "Intermediate" category has no corresponding AlephState output - AlephState is a binary classifier. Predictions on intermediate-labeled chains are included in the matrix for transparency, but they are not counted as correct under this comparison. Confidence tiers of AlephState derive from the TM-score margin between AlephState's first and second choices: high confidence corresponds to a margin above 0.05, moderate to 0.02-0.05, ambiguous below 0.02.
GPCRdb annotations were not involved in AlephConforma's development. This is an external validation, not an internal benchmark.
Benchmark Summary
AlephConforma has been evaluated on multiple independent datasets comprising over 10,000 protein chains, including structures released after the system's template cutoff date. Across all evaluations, AlephConforma matches or outperforms both AI-based and template-based prediction methods under fair comparison conditions.
Detailed benchmark methodology and results will accompany the product launch.