In Makya, users can easily train their own QSAR Models on project data, use the predictions from these models to guide generation, and by doing so, find molecules with interesting predicted properties.
However, in order to guide generators efficiently and in a good direction, models need to be performant, reliable, and applicable. Moreover, in the case where the user selects multiple QSAR Models, these need to be compatible.
A generator that learns from weak models, from models that are anti-correlated and create contradictions, or from models whose applicability domain does not cover the chemical space we wish to explore, cannot efficiently find solutions to its problem.
Am I using good models?
How to know if my models are weak? → QSAR model scores interpretation
What kind of impact do weak models have on generations? →How QSAR model selection can affect performances
If the models are badly performing, you can either relax the thresholds, or acquire more data.
Am I in the proper applicability domain?
This question requires a qualitative understanding of your project data. If you are working on optimizing a specific chemical series, ask yourself:
- How well represented is my chemical series in my training dataset? How many data points do I have with the same scaffold?
- What are the properties of these molecules? (If all the examples you have of molecules with your scaffold of interest have a terrible permeability, generators are unlikely to find compounds with the same scaffold and a good permeability)
To help assess whether your generation is in the right applicability domain, we developed an in-house Confidence Score.
From: Sushko, I. (2011). Applicability domain of QSAR models (Doctoral dissertation, Technische Universität München).