The Fine Tuning generator has been designed with the goal to find optimal molecules maximizing at lot of constraints simultaneously (models, descriptors, substructures, similarity...).
The Fine Tuning generate molecules close to the chemical space because the goal is not to find something different or novel, but to find a molecule which makes the consensus. Also the Fine Tuning generator is often used with QSAR models which impose by design to stay close to the training dataset due to applicability model constraints.
Use the Fine Tuning generator when the search is complex and fragment-based generation is not applicable.
NOTE:
- Play on the chemical space: by changing it, you guide the Fine Tuning generator in a different direction, as described in this article.
- Do not hesitate to add your own ideas as starting point.
Create a new Fine Tuning
To access the Fine Tuning generator, click on the New Generator box in the Generator tab. Specify a name for this generator, and under "Generator Engine" select Fine Tuning.
Example Use-Case
Set-up
-
In theory none of the set up options are mandatory, but the user is encouraged to use their best judgement from a chemistry perspective to obtain meaningful results in the applicability domain of the project:
-
Chemical Space (Recommended): Select ≥1 datasets uploaded to this project, or specify SMILES that can be used to guide the generation
-
Products: Define structural constraints on your final generated molecules. In the Substructures tab, add the substructures that you want to see (matching set) or to forbid (forbidden set); in the Pains/Tox tab, select Pains/Tox dictionaries.
- QSAR: Select among trained QSAR models to guide the generation around a defined target product profile (see QSAR)
- API: Plug in external scores or models that can be accessed to guide this generation (see Scoring APIs)
- Scorers: Select any scores that must be calculated during the generation (see Scorers)
-
Chemical Space (Recommended): Select ≥1 datasets uploaded to this project, or specify SMILES that can be used to guide the generation
Example of an output
Advanced
The Fine Tuning generator is a sequence-based generator i.e. it generates SMILES strings. It is designed to reproduce the distribution of molecules of a training dataset. It has been pre-trained on large public databases (like ChEMBL) to learn the SMILES syntax. Without any further training it generates diverse molecules from a very broad chemical space - which is of little interest when working on specific series / chemotypes. That’s why the chemical space is so crucial. The pre-trained generator will be fine-tuned on the chemical space to learn its distribution and generate compounds within it. This initialization is a preliminary step to optimization. First, the global generator is focused towards the chemical space, then it starts optimizing the given predicted properties.
During the initialization step, not only one generator but multiple ones (called agents) will focus on subsets of the input chemical space. So that if the chemical space is diverse (e.g with multiple chemical series) each agent will focus on a different part of the chemical space, to make sure that all the potential of the chemical space is exploited. This must be kept in mind when designing a chemical space: compounds that are actually of no interest shouldn’t be included, otherwise resources will be dedicated to train agents on these molecules. A typical case is when the initial dataset contains historical data from series that have been discarded. This data should be removed from the chemical space or else Makya will generate within these series. However, the predictors should be trained on the whole dataset.