Problem: Starting from a hit intermediate, how to generate novel molecules with drug-like properties in Makya?
This guide walks through the step-by-step process of generating novel molecules from a fragment associated with strong PI3K activity.
You can download the dataset used in this example here.
Note:
If you already completed Use Case 1 and set up the PI3K QSAR model, you can skip ahead to Step 3. Otherwise, begin with Steps 1 and 2 to prepare the foundation.
Dataset Overview
The dataset used in this guide is based on a PI3K-mTOR chemical series. It includes:
SMILES strings representing chemical structures
-
Multiple objective columns, such as:
pKi IC50 PI3KpKi IC50 mTORWater SolubilityCaco-2 PermeabilityCYP1A2 inhibitorCYP3A4 inhibitorTotal Clearance
Step 1: Upload the project data
Step 1: The following image shows a screen grab of the example dataset in MS Excel, and this dataset will be available in the default project of your Makya installation.
Upload this dataset to Makya by clicking on the New Dataset, and drop your file or click on the Click to select one and navigate to the location of the csv file. Once uploaded, the dataset will appear with SMILES replaced with 2D chemical structures. Click on Start Cleaning and once cleaning is done, name the dataset Data_pi3k_mtor (or any other name of your choice) and save it.
If you go back to the Datasets tab, you see the newly uploaded dataset as below.
Step 2: Train the QSAR model
Step 2: The data is uploaded so now you can build predictive models which can be use to score a new idea you may have or to guide the generation process. To do so, go to the QSAR tab and click on the New QSAR. On the following page, select the dataset Data_pi3k_mtor that you have just uploaded in a previous step.
7 Objectives will appear on the screen. Use the sliders next to each objective to define the thresholds for the five objectives listed below. Make sure that there is always a balanced number of molecules that fall in and out of the thresholds (indicated by Molecules In and Molecules Out). An alternative to using the sliders is directly entering the desired value in the min or max threshold box for each objective. For this example use case, set the following thresholds.
| pKi IC50 Pi3K | 7 (min) |
| pKi IC50 mTor | 8.5 (min) |
| Water Solubility | -4.0 (max) |
| Caco2 permeability | 0.9 (max) |
| Total Clearance | 0.5 (min) |
After setting the thresholds, click Train QSAR and name the model. If you go back to the QSAR tab, the newly created predictor should appear on this page. Click on Run to train the models. Once completed the progress bar will turn green, and two new buttons Test and See Results will appear.
Clicking on the See Results button will display the ROC curves and associated metrics for each model trained on each objective. These can be used to assess the performance or quality of the trained models.
NOTE: If any of the models is deemed unsatisfactory, either (1) improve the quality of the provided data or (2) run again the QSAR process with relaxed thresholds.
Clicking on the Test button will allow the user to input an individual SMILES or a file of SMILES to be evaluated by model.
Step 3: Create the generator
Step 3: We will now set up and launch our generator. On the project homepage, go directly to the "Generators" tab and click on New Generator. Name the generator “Hit_growing_generator” (or any other name). Under "Generation type", select Fragment growing and click on Next.
On the following pages, specify the settings of the generator:
- Chemical Space: Check the box next to Data_pi3k_mtor dataset. This is the dataset that was created and uploaded in Step 2 (or in use-case 1) and which will be used to guide the generation with similarity to this chemical space.
- Exit Vectors: In the text field below "Enter SMILES", copy-paste the following SMILES:
Cc1ncnc(Cl)c1C#Cc1ccc(N)nc1
This is our hit intermediate.
We then need to select exit vector(s) for our intermediate. To do so, click on the arrow next to the pencil icon. The intermediate’s atoms are numbered. In the text box, type “6” and hit Enter. This selects the C-Cl bond as the exit vector in which the generator will plug fragments to generate new molecules, while keeping the intermediate’s structure intact (in particular, no fragment will be plugged to any of the other functional groups which are considered unreactive). Click Save.
If desired, an additional exit vector could also be selected in the same manner.
- QSAR: On the right-hand side of the Lead_optimization_predictor QSAR, select the following objectives: pKi IC50 PI3K, Water Solubility, Caco2 permeability, andTotal Clearance. Click on the Save button at the bottom of the page. The generation will be optimized on these objectives.
Go back to the "Generators" tab, and the newly set up generator will show up. Click on the Run button for this generator. The progress bar will indicate the status of generation.
Step 4: Explore the generated molecules
Step 4: while the generation is still running, you can visualize the generated molecules. In the Generators tab, clicking on the eye icon. Use the Parallel Coordinates to show the generated molecules’s properties and select the molecules whose Iktos ranking is higher than 0.7 for example (this is done by dragging the mouse from 0.7 to 1 on the Iktos ranking scale).
You can then select the molecules, and export them as a .csv or add them to the cart to share them easily. You can also create a dataset from these molecules; such a dataset can then be used as a chemical space for the fine-tuning generator.