Problem: How to use Makya to perform scaffold hopping for a given molecule and thereby generate diversity to get around existing patents?
In this scenario, we work with a reference compound and a chemical series that includes measured activity data. The goal is to replace the scaffold (highlighted by the dotted line in the compound visualization) with new fragments that are:
-
Commercially available as building blocks
-
High in synthetic feasibility
-
Structurally similar to the original series
You can download the dataset used in this example here.
Dataset Overview
The dataset focuses on compounds targeting PIM-1 kinase and includes:
-
A column labeled
"Molecule"with SMILES strings -
A corresponding
"pIC50"column containing activity values
Step 1: Upload the project data
Step 1: Upload the dataset by clicking on New Data Set and give it a name, such as “Data_pim1_pic50”.
Click on CREATE DATASET. This takes us back to the "Datasets" tab.

Step 2: Train the QSAR model
Step 2: Next, set up the QSAR as was done in Use Case 1, using the dataset uploaded at step 2. Use only the high activity molecules with pIC50>8 as a target for this model. Name this predictor “pim1_pic50_fragment_linking”.
Click Save and go to the QSAR tab. There, you will see your QSAR model set up and ready to be trained. Click Run:
After some time, we see the following window:
As in the Use Case 1, the predictor performance can be checked by clicking on See Results.
It is essential to note that this QSAR model will help guide the generator such that it only picks molecules which are biased towards higher pIC50 values. The Fragment Linking generator can also be run without specifying QSAR models. As such, this step can be skipped.
Step 3: Create the generator
Step 3: Go directly to the "Generators" tab of the project page and click on the New Generator box. Name the generator “scaffold_hopping”. Under "Generation Engine", select Fragment Linking and click on Next. On the following page, specify the various options to set up this generator.
- Chemical Space: Check the box next to Data_pim1_pic50 dataset. This is the dataset that was created and uploaded in Step 2 (or in use-case 1) and which will be used to guide the generation with similarity to this chemical space. Please note that this step is optional and the generator can also run without specifying a chemical space (series), thereby generating molecules with high novelty but potentially further away from the applicability domain of your QSAR models. The generator can also run with the sole specification of the reference molecule as the chemical space.
- Exit Vectors: On this tab, you will see two windows. Each allows to input one building block to which we wish to attach a scaffold.
Click on the pencil icon to input the SMILES of the reference molecule in the sketcher:
NC1CCCC(C1)Nc1ccc2cccc(-c3cc4c(CCNC4=O)[nH]3)c2n1
Once the above molecule appears, use the eraser tool to remove the scaffold such that only the following fragment remains. Attach a Br to the methyl cap of the fused heterobicyclic ring which will be used as an exit vector (reaction centre) in the following step.
Click DONE and repeat the process for the other window for the following building block:
Then, the Exit Vectors tab should look like this:
Click on Set for each of the intermediates to define their exit vectors:
It is only at these exit vectors that the scaffold will be attached while the rest of the building blocks will remain unchanged. Click Save.
- QSAR: Click on the blue box “pIC50” to add the QSAR model as the target.
The QSAR page then should look like this:
Click Save.
Go back to the "Generators" tab, and the newly set up generator will show up. Click on the Run button for this generator. The progress bar will indicate the status of generation.
Step 4: Explore the generated molecules
Visualise the generated molecules by clicking the eye icon. Sort the grid of generated molecules by Retrosynthesis Score in descending order. Since we are generating novelty using scaffold hopping, this is a useful way to see the more synthesizable molecules on top. The examples from this generation (see figure below) make it clear that the core of the reference compound has been replaced with novel scaffolds, while keeping the rest of the molecule intact. The generated molecules can be exported to a CSV file for further analysis or a subsequent iteration of generation in Makya.
Note:
This use-case aims to demonstrate how scaffold hopping can be achieved and what to expect in terms of results. Scaffold hopping from a reference compound carries an inherent risk of the results falling out of the applicability domain of the problem. It is highly recommended that the user guides this generation with an external score relevant to the project (for example 3D scores), and do so in an iterative manner.