Problem: Starting from a hit intermediate, how to generate novel molecules with drug-like properties in Makya?
This guide walks through the step-by-step process of generating novel molecules from a fragment associated with strong PI3K activity.
You can download the dataset used in this example here.
Note:
If you already completed Use Case 1 and set up the PI3K QSAR model, you can skip ahead to Step 3. Otherwise, begin with Steps 1 and 2 to prepare the foundation.
Dataset Overview
The dataset used in this guide is based on a PI3K-mTOR chemical series. It includes:
-
SMILES strings representing chemical structures
-
Multiple objective columns, such as:
pKi IC50 PI3KpKi IC50 mTORWater SolubilityCaco-2 PermeabilityCYP1A2 inhibitorCYP3A4 inhibitorTotal Clearance
Step 1: Upload the project data
Step 1: The following image shows a screen grab of the example dataset in MS Excel, and this dataset will be available in the default project of your Makya installation.
On the project homepage in Makya, go to the "Datasets" taband click on the New Data Set box. Name the dataset Data_pi3k_mtor (or any other name of your choice). Then click on the Choose File button, navigate to the location of the CSV file, and click Open. Once uploaded, the dataset will appear on the right-side panel, with the SMILES being replaced with 2D chemical structures.
Click on Create Dataset to upload this data to the project. Click again on the "Datasets" tab, and the newly created dataset should now appear on the page.
Step 2: Train the QSAR model
Step 2: Next, in the project homepage, go to the QSAR tab and click on the New QSAR box. On the following page, name this predictor as “Lead_optimization_predictor” (or any other name) and click Next to go to the next page. Go to the "Targets" sub-tab and select the dataset Data_pi3k_mtor (created and uploaded in Step 2) in the dropdown menu for Dataset. In the adjacent Target dropdown menu, select the following objectives, and after each selection, click Add Target.
- pKi IC50 Pi3K
- pKi IC50 mTor
- Water Solubility
- Caco2 permeability
- Total Clearance
A number of sliders corresponding to each objective will appear on the screen. Use these sliders to define the thresholds for each objective while ensuring that there are always molecules that fall in and out of the thresholds (indicated by Molecules In and Molecules Out). An alternative to using the sliders is directly entering the desired value in the min or max threshold boxes for each objective. For this example use case, set the following thresholds.
| pKi IC50 Pi3K | 7 (min) |
| pKi IC50 mTor | 8.5 (min) |
| Water Solubility | -4.0 (max) |
| Caco2 permeability | 0.9 (max) |
| Total Clearance | 0.5 (min) |
After setting the thresholds, click Save. Go back to the QSAR tab and the newly created predictor should appear. Click on Run to train the models. Once completed, the progress bar will turn green, and two new buttons Test and See Results will appear.
Clicking on the See Results button will display the metrics curves for each model trained on each objective. These can be used to assess the performance or quality of the trained models. If any of the models are deemed unsatisfactory, this process can be redone, for instance with relaxed thresholds. Clicking on the Test button will allow the user to input an individual SMILES or a file of SMILES to be evaluated by model.
Step 3: Create the generator
Step 3: We will now set up and launch our generator. On the project homepage, go directly to the "Generators" tab and click on New Generator. Name the generator “Hit_growing_generator” (or any other name). Under "Generation Engine", select Fragment growing and click on Next.
On the following pages, specify the settings of the generator:
- Chemical Space: Check the box next to Data_pi3k_mtor dataset. This is the dataset that was created and uploaded in Step 2 (or in use-case 1) and which will be used to guide the generation with similarity to this chemical space.
- Exit Vectors: In the text field below "Enter SMILES", copy-paste the following SMILES:
Cc1ncnc(Cl)c1C#Cc1ccc(N)nc1
This is our hit intermediate.
We then need to select exit vector(s) for our intermediate. In the top panel showing the intermediate 2D structure, click on Set. The intermediate’s atoms are numbered. In the text box, type “6” and hit Enter. This selects the C-Cl bond as the exit vector in which the generator will plug fragments to generate new molecules, while keeping the intermediate’s structure intact (in particular, no fragment will be plugged to any of the other functional groups which are considered unreactive). Click Save.
If desired, an additional exit vector could also be selected in the same manner.
- QSAR: On the right-hand side of the Lead_optimization_predictor QSAR, select the following objectives: pKi IC50 PI3K, Water Solubility, Caco2 permeability, andTotal Clearance. Click on the Save button at the bottom of the page. The generation will be optimized on these objectives.
Go back to the "Generators" tab, and the newly set up generator will show up. Click on the Run button for this generator. The progress bar will indicate the status of generation.
Step 4: Explore the generated molecules
Step 4: while the generation is still running, you can visualize the generated molecules. In the Generators tab, clicking on the eye icon below “In Blueprint” will directly show you the molecules matching your specified blueprint. Use the Parallel Coordinates to show the generated molecules’s properties and select the molecules whose Iktos ranking is higher than 0.7 for example (this is done by dragging the mouse from 0.7 to 1 on the Iktos ranking scale).
You can then select the molecules, and export them as a .csv or add them to the cart to share them easily. You can also create a dataset from these molecules; such a dataset can then be used as a chemical space for the fine-tuning generator.