When setting up a new QSAR model, Makya displays the risk matrix for the selected TPP. The risk matrix is a tool to avoid anti-correlations between target objectives.
What do we mean by anti-correlation?
Two target objectives are anticorrelated when the increase of the first one leads to a decrease of the second one. For example, in a dataset of compounds whose bioactivity has been enhanced at the cost of their lipophilicity, bioactivity and lipophilicity will be anti-correlated: active compounds will be too fat, and compounds with good lipophilicity will be not active enough.
A generator that takes as an example a dataset with anti-correlations between objectives A and B will struggle to optimize both A and B together. This is because any QSAR model trained on the dataset will tend to learn as a rule that A and B cannot be simultaneously good, therefore depriving the generator from useful examples.
How to read the risk matrix to check for anti-correlations?
Each cell of the risk matrix shows a percentage P% which can be understood as follows:
When considering all the compounds that are active on the row objective, P% of these compounds are also active on the column objective.
In the example above, we can read the risk matrix as follows: when considering all the compounds with good mTOR activity (i.e. those whose pKi IC50 mTor is higher than 8.5), 197 compounds also have a registered value for pIC50 on PI3K, and this value is good for 40% of these 197 compounds (i.e. their pKi IC50 Pi3K is higher than 7).
In the example above, we increased the threshold on mTOR pIC50 so that only molecules whose pIC50 is above 9 are now considered active. The effet of this stricter TPP is to cause anti-correlations between the mTOR objective and the PI3K and Caco2 permeability objectives, as seen with the red cells and low percentages.
Makya typically flags correlation percentages lower than 25%; this does not mean that models can never work well with correlations lower than 25%, but simply that the problem will be more complex and that the user should be careful.
When should anti-correlations be checked for?
If the user's goal is to optimize multiple molecular properties in one generation (multi-parametric optimization, MPO), by simultaneously selecting multiple QSAR objectives, then the objectives should be checked for anti-correlations which would hinder MPO efforts.
There is a strong anti-correlation between two objectives I care about: what should I do?
To start with, it is better to focus on the highest priority objectives and remove low-priority objectives from the TPP, until addition of new data allows to re-consider these objectives for optimization.
If no objective can be removed from the TPP, then TPP thresholds should be relaxed so as to give more examples of good molecules to the QSAR models.