Problem: How to use Makya to perform scaffold hopping for a given molecule and thereby generate diversity to get around existing patents?
In this scenario, we work with a reference compound and a chemical series that includes measured activity data. The goal is to replace the scaffold (highlighted by the dotted line in the compound visualization) with new fragments that are:
Commercially available as building blocks
High in synthetic feasibility
Structurally similar to the original series
You can download the dataset used in this example here.
Dataset Overview
The dataset focuses on compounds targeting PIM-1 kinase and includes:
A column labeled
"Molecule"with SMILES stringsA corresponding
"pIC50"column containing activity values
Step 1: Upload the project data
Step 1: Upload the dataset by clicking on New Data Set. Click Start Cleaning and name the dataset, such as Data_pim1_pic50.
Click Upload. The dataset can be confirmed at the dataset tab.
Step 2: Train the QSAR model
Step 2: Next, set up the QSAR as was done in Use Case 1, using the dataset uploaded in step 1. Use only the high activity molecules with pIC50>8 as a target for this model. Name this predictor pim1_pic50_fragment_linking.
If you go back to the QSAR tab, the newly created predictor should appear on this page. Click on Run to train the models. Once completed the progress bar will turn green, and two new buttons Test and See Results will appear.
As in the Use Case 1, the predictor performance can be checked by clicking on See Results.
It is essential to note that this QSAR model will help guide the generator such that it only picks molecules which are biased towards higher pIC50 values. The Fragment Linking generator can also be run without specifying QSAR models. As such, this step can be skipped.
Step 3: Create the generator
Step 3: Go directly to the Generators tab of the project page and click on the New Generator. Name the generator scaffold_hopping. Under "Generation type", select Fragment Linking and click on Next. On the following page, specify the various options to set up this generator.
- Chemical Space: Check the box next to Data_pim1_pic50 dataset. This is the dataset that was created and uploaded in Step 1 and which will be used to guide the generation with similarity to this chemical space. Please note that this step is optional and the generator can also run without specifying a chemical space (series), thereby generating molecules with high novelty but potentially further away from the applicability domain of your QSAR models. The generator can also run with the sole specification of the reference molecule as the chemical space.
- Exit Vectors: On this tab, you will see two windows. Each allows to input one building block to which we wish to attach a scaffold.
Click on the pencil icon to input the SMILES of the reference molecule in the sketcher:
NC1CCCC(C1)Nc1ccc2cccc(-c3cc4c(CCNC4=O)[nH]3)c2n1
Once the above molecule appears, use the eraser tool to remove the scaffold such that only the following fragment remains. Attach a Br to the methyl cap of the fused heterobicyclic ring which will be used as an exit vector (reaction centre) in the following step.
Click DONE and repeat the process for the other window for the following building block:
Then, the Exit Vectors tab should look like this:
Click on the arrow next to the pencil icon for each of the intermediates to define their exit vectors:
It is only at these exit vectors that the scaffold will be attached while the rest of the building blocks will remain unchanged. Click Save.
- QSAR: Click on the blue box “pIC50” to add the QSAR model as the target.
The QSAR page then should look like this:
Click Save.
Go back to the "Generators" tab, and the newly set up generator will show up. Click on the Run button for this generator. The progress bar will indicate the status of generation.
Step 4: Explore the generated molecules
Visualize the generated molecules by clicking the eye icon. Sort the grid of generated molecules by Retrosynthesis Score in descending order. Since we are generating novelty using scaffold hopping, this is a useful way to see the more synthesizable molecules on top. The examples from this generation (see figure below) make it clear that the core of the reference compound has been replaced with novel scaffolds, while keeping the rest of the molecule intact. The generated molecules can be exported to a CSV file for further analysis or a subsequent iteration of generation in Makya.
Note:
This use-case aims to demonstrate how scaffold hopping can be achieved and what to expect in terms of results. Scaffold hopping from a reference compound carries an inherent risk of the results falling out of the applicability domain of the problem. It is highly recommended that the user guides this generation with an external score relevant to the project (for example 3D scores), and do so in an iterative manner.