In this example, we will review and highlight the importance of using a good QSAR model for the accurate prediction of properties (i.e. activity, solubility, clearance) and during molecule generation. While some model metrics will be reviewed, a detailed discussion regarding the interpretation of model scores can be found here: QSAR Model Scores Interpretation.
A critical element of building a good QSAR model is the target product profile (TPP). For a review of how to select a TPP, please refer to the following link: Creating a New QSAR by Defining a Target Product Profile (TPP).
When defining a TPP, it is necessary to provide a balance of compounds that meet and do not meet the desired threshold. A balance doesn't mean that the number of active and inactive compounds need to be perfectly equal, but of the same order of magnitude (we typically recommend at least 10% of molecules in each category). This is denoted by “Molecules In” and “Molecules Out” within the TPP.
Balanced QSAR
Once the QSAR model has been trained, note the performances. For both metrics, pKI IC50 Pi3K and PkI IC50 mTor, we see reasonable results for AUC, Precision, and Recall. These metrics indicate the models ability to correctly distinguish between active and inactive molecules within the dataset.
Now we will intentionally select an imbalanced TPP to produce QSAR models that display poor performances. We can modify the TPP by adjusting the sliders. Note that there is a significant difference between “Molecules In” and “Molecules Out” for the metrics of interest, and that the risk matrix on the right shows moderate anticorrelation between both pIC50 objectives.
Imbalanced QSAR
Once the imbalanced QSAR model has been trained, note the performances below. While we see what appears to be reasonable AUC scores for pKI IC50 Pi3K and PkI IC50 mTor, note the significant reduction in F1 score, Precision, and Recall. The AUC is a deceptive metrics when the train dataset is imbalanced and other metrics should be considered to get a more complete picture. The low scores calculated for precision and recall indicate the model's inability to appropriately predict when a molecule is active.
We can now rescore molecules to compare predictions based on these QSAR models. For a review on scoring individual molecules with the QSAR module, please refer to the following article: Applying QSAR Models on Custom Molecules.
The compound above was pulled from the initial training data set and is active on both pKi IC50 Pi3K and pKi IC50 mTor, for both TPPs.
Note: Makya QSARs are classification models and scores are between 0 and 1.
Balanced QSAR Rescoring
Despite both models having been trained on this compound, the imbalanced model fails at accurately predicting that the compound is active on mTOR and PI3K. The imbalance of the training set means the model resorts to shortcuts and is more likely to rank every molecule as inactive on all objectives.
Imbalanced QSAR Rescoring
Let's review the results from two Fine Tuning generators. One generator was run with balanced QSAR models that perform well, and one run with the imbalanced QSAR models that display poor recall and precision. For both generations, we analyzed the first 10,000 molecules.
For a review on the use of the fine tuning generator, please see the following use case: Lead Optimization with the Fine Tuning Generator.
Balanced Fine Tuning Generator
Imbalanced Fine Tuning Generator
Note the significant difference between the top Iktos Ranking scores (1 vs 0.73) and the structures of generated molecules. Introduction of the unbalanced QSAR models results in poorly scored molecules due to inefficient exploration of the chemical space and inaccurate predictions of molecular properties.