The following models are available and described below:
There are two categories of models:
- Regressor: a model that predicts an interpretable value for each molecule (for example a model that predicts the LogD)
- Scorer: a model that predicts a number correlated with the ADMET property. Let's consider the PPB Scorer for example: the higher the predicted score, the higher the PPB rate. But the predicted score is not a direct estimation of the PPB rate. This type of model does not return interpretable scores but is designed to rank a list of molecules, and select the top ones, with respect to an ADMET property.
To help you select good molecules, we implemented good / medium / bad molecule tags, based on thresholds we estimated on real drug discovery projects (see more details below).
Absorption
1. Permeability AB Score (Scorer)
Definition:
Permeability of compounds from apical to basolateral (AB) membrane in the intestinal lumen is an important determinant of a drug’s intestinal absorption and thus a drug’s oral bioavailability.
Training:
The model is trained on :
- ~ 4,500 Caco-2 A->B measurements
- ~ 13,500 MDCK A->B measurements
- ~ 2,000 PAMPA measurements
How to interpret results:
The higher the prediction, the more the molecule permeates through the membrane. Therefore, the model should generally be maximised.
Test metrics:
Spearman = 0.37
Optimal MCC = 0.29
2. Permeability ER (Scorer)
Definition:
Assessing transport of molecules in both directions, from apical to basolateral (AB) and basolateral to apical (BA), across the intestinal cell monolayer, enables an efflux ratio (ER) to be determined. It is an indicator of whether a compound undergoes active efflux (pumping of drugs from inside a cell to outside).
Training:
The model is trained on :
- ~ 4,000 Caco-2 ER measurements
- ~ 13,000 MDCK ER measurements
How to interpret results:
The lower the prediction, the lower will be the efflux ratio.
Therefore, the model should generally be minimised.
Test metrics:
Spearman = 0.34
Optimal MCC = 0.25
3. Aqueous Kinetic Solubility logS (Regressor)
Definition:
The logS reflects the aqueous solubility of a compound at 25°C. Water soluble drugs are better absorbed than lipid soluble ones especially when passing through the intestine. Thus logS helps determine oral bioavailability and intestinal absorption.
Training:
This model is trained on around 24,000 molecules with experimentally determined logS values.
How to interpret results:
For most compounds, a compromise has to be made between reasonable aqueous solubility and the hydrophobicity needed for membrane permeability.
As a rule of thumb, a target range of -1 to -5 is used for acceptable solubility for designing orally bioavailable drugs.
Molecules with logS values < -5 are increasingly insoluble, while those having > -1 are increasing soluble.
Test metrics:
Testing was done on independent samples whose solubility values were experimentally determined.
Spearman = 0.62
MAE = 0.61
Distribution
1. Plasma Protein Binding PPB (Scorer)
Definition:
Upon administration, drugs may bind to plasma proteins and this could cause less bioavailability and undesirable drug-drug interactions. This adversely affects a drug’s efficacy, as the more that is bound, the less it can traverse across cell membranes and reach its intended pharmacological target.
The affinity of lead compounds for plasma proteins like human serum albumin (HSA) and ɑ-1-acid-glycoprotein (AGP) is usually measured during lead optimisation stage.
Training:
The model was trained on around 1,700 molecules whose PPB was experimentally determined.
How to interpret results:
The lower the prediction, the lower is the PPB rate.
Therefore, the model should generally be minimised.
Test metrics:
Spearman = 0.68
Optimal MCC = 0.55 (for a PPB lower than 95%)
2. Water-Octanol Distribution Coefficient logD (Regressor)
Definition:
Lipophilicity is a major determinant of ADMET properties and overall drug-likeness. Hence, it is indispensable to accurately predict water-octanol distribution coefficient (logD) for chemical compounds at early stages of drug discovery. With a smaller logD, compounds are more water soluble but likely to exhibit poor membrane permeability and higher susceptibility to renal clearance. With a high logD comes high plasma protein binding, vulnerability to CYP450 metabolism, target promiscuity and general toxicity.
Training:
The model concerned was trained on around 60,000 molecules previously scored with a Simulations Plus logD model.
How to interpret results:
For optimal ADMET properties, a compound is considered to have a reasonable logD if it falls between 1 and 3.
Test metrics:
Spearman = 0.78
MAE = 0.57
Excretion
1. Clearance (Scorer)
Definition:
Clearance is concerned with the rate at which a drug is removed from the body. It occurs primarily as a result of metabolism in the liver and excretion via kidneys. In context of liver, clearance occurs in two different ways: 1) microsomal clearance is measured solely using liver microsomes and how they enzymatically transform or break down drugs with enzymes like cytochrome P450. 2) Hepatocyte clearance estimates how quickly a drug is eliminated from liver cells and is measured using isolated liver cells, thus affecting a drug’s bioavailability and toxicity.
Both methods have pros and cons, and they can be used together or separately depending on the purpose and characteristics of the drug.
Training:
The model was trained on :
- ~30,000 mouse microsomal clearance
- ~7,700 mouse hepatocyte clearance
- ~25,000 rat microsomal clearance
- ~3,700 rat hepatocyte clearance
- ~22,000 human microsomal clearance
- ~5,000 human hepatocyte clearance
How to interpret results:
The lowest the prediction, the lowest the clearance.
Therefore, this model should generally be minimized.
Test metrics:
Spearman = 0.31
Optimal MCC = 0.29
Toxicity
1. hERG (Scorer)
Definition:
The human Ether-a-go-go-Related Gene (hERG) encodes a potassium ion channel best known for its function of electrical conductivity of the heart. Its inhibition as a result of drug’s off-target side effects can result in Acquired Long QT syndrome (LQTS) - leading to fatal ventricular arrhythmia. A number of clinically successful drugs have been withdrawn from the market in the past, resulting in hERG inhibition being an important “anti-target” that must be avoided during drug development.
Training:
The model was trained on :
- ~1,200 hERG IC50
- ~10,000 hERG PIN (inhibition percentage)
How to interpret results:
The lowest the prediction, the highest the hERG IC50.
Therefore, this model should generally be minimized.
Test metrics:
Spearman = 0.63
Optimal MCC (for hERG IC50 > 10uM) = 0.57
How to interpret models's predictions?
While regression models output values that are directly interpretable, scorer models's predictions are only correlated with the actual ADMET values and are not directly interpretable.
To help you select good molecules, we implemented good / medium / bad molecule tags, based on thresholds we estimated on real drug discovery projects:
| Target | Type of model | Good range | Medium range | Bad range |
| Permeability AB | Scorer | ≥ 0.4 | [ 0 ; 0.4 [ | < 0 |
| Permeability ER | Scorer | < 0 | [0 ; 0.5 [ | ≥ 0.5 |
| Solubility LogS | Regressor | ≥ -4 |
] -4 ; -5 ] |
< -5 |
| PPB | Scorer | < 0 | [0 ; 0.5 [ | ≥ 0.5 |
| LogD | Regressor | [ 1.5 ; 3.5 ] | [1 ; 1.5 [ or ] 3.5 ; 4 ] | < 1 or > 4 |
| Clearance | Scorer | < 0 | [ 0 ; 1 [ | ≥ 1 |
| hERG | Scorer | < 0.25 | [ 0.25 ; 0.75 [ | ≥ 0.75 |
ADMET Models predictions are coloured and labelled based on these thresholds: