Problem:How can you fine-tune and optimize an existing chemical series with promising activity, ADME, or biological endpoints using Makya?
The following guide walks through this process step-by-step. You can download the dataset referenced here.
Dataset Overview
The dataset used in this guide is based on a PI3K-mTOR chemical series. It includes:
SMILES strings representing chemical structures
-
Multiple objective columns, such as:
pKi IC50 PI3KpKi IC50 mTORWater SolubilityCaco-2 PermeabilityCYP1A2 inhibitorCYP3A4 inhibitorTotal Clearance
Step 1: Upload the project data
Step 1: Go to the "Datasets" tab and upload the previously downloaded dataset as a CSV file. The following image shows a screenshot of the example PI3K-mTor dataset in MS Excel. This dataset will be available in the default project of your Makya installation. The dataset contains one column of SMILES and other columns with values for objectives like pKi IC50 Pi3K, pKi IC50 mTor, Water Solubleness, Caco2 permeability, etc.
Upload this dataset to Makya by clicking on the New Dataset, and drop your file or click on the Click to select one and navigate to the location of the csv file. Once uploaded, the dataset will appear with SMILES replaced with 2D chemical structures. Click on Start Cleaning and once cleaning is done, name the dataset Data_pi3k_mtor (or any other name of your choice) and save it.
If you go back to the Datasets tab, you see the newly uploaded dataset as below.
Step 2: Train the QSAR model
Step 2: The data is uploaded so now you can build predictive models which can be use to score a new idea you may have or to guide the generation process. To do so, go to the QSAR tab and click on the New QSAR. On the following page, select the dataset Data_pi3k_mtor that you have just uploaded in a previous step.
7 Objectives will appear on the screen. Use the sliders next to each objective to define the thresholds for the five objectives listed below. Make sure that there is always a balanced number of molecules that fall in and out of the thresholds (indicated by Molecules In and Molecules Out). An alternative to using the sliders is directly entering the desired value in the min or max threshold box for each objective. For this example use case, set the following thresholds.
| pKi IC50 Pi3K | 7 (min) |
| pKi IC50 mTor | 8.5 (min) |
| Water Solubility | -4.0 (max) |
| Caco2 permeability | 0.9 (max) |
| Total Clearance | 0.5 (min) |
After setting the thresholds, click Train QSAR and name the model. If you go back to the QSAR tab, the newly created predictor should appear on this page. Click on Run to train the models. Once completed the progress bar will turn green, and two new buttons Test and See Results will appear.
Clicking on the See Results button will display the ROC curves and associated metrics for each model trained on each objective. These can be used to assess the performance or quality of the trained models.
NOTE: If any of the models is deemed unsatisfactory, either (1) improve the quality of the provided data or (2) run again the QSAR process with relaxed thresholds.
Clicking on the Test button will allow the user to input an individual SMILES or a file of SMILES to be evaluated by model.
Step 3: Create the generator
Step 3: Finally, to generate molecules guided by the QSAR model, go to the "Generators" tab of the project page and click on the New Generation. Name the generator as Lead_optimization_generator, and under "Generation type" select Fine Tuning.
On the following page specify the various options to set up this generator.
- Chemical Space: Check the box next to Data_pi3k_mtor dataset. This is the dataset that was created and uploaded in Step 1 and will be used to guide this generation as the chemical space.
-
Products: In the "Substructures" tab, click the Add button on "Molecules must match at least one of these substructures (matching set)" enter the following SMARTS string and click Save:
[#6]-[#6]-c1ncnc(c1C#Cc1ccc(-[#7])nc1)-*:1:*:*:*(:*:*:1)-[#6](-[#7])=O
- QSAR: Check the box next to Lead_optimization_predictor. This should check boxes for all the individual predictor models on the objectives we trained on in step 2.
Go back to the "Generators" tab, and the newly set up generator will show up. Click on the Run button for this generator. The progress bar will indicate the status of generation.
Step 4: Explore the generated molecules
Step 4: While the generation is still running, visualize the generated molecules by clicking on the eye icon. On the page that appears, go to the Parallel Coordinates. Click Display Scores highlighted in yellow in the picture below and add Confidence. Adjust the Confidence metric to >0.7 by holding down the mouse on this metric and dragging it down to 0.7 as shown below.
Sort the generated molecules by Iktos Ranking. On the first molecule of the generated molecules grid, click on the eye icon. In the window that pops up, note the predicted values for all the objectives for the generated molecule in this window. Such analysis can be done for all the molecules that are generated.