Problem:How can you fine-tune and optimize an existing chemical series with promising activity, ADME, or biological endpoints using Makya?
The following guide walks through this process step-by-step. You can download the dataset referenced here.
Dataset Overview
The dataset used in this guide is based on a PI3K-mTOR chemical series. It includes:
-
SMILES strings representing chemical structures
-
Multiple objective columns, such as:
pKi IC50 PI3KpKi IC50 mTORWater SolubilityCaco-2 PermeabilityCYP1A2 inhibitorCYP3A4 inhibitorTotal Clearance
Step 1: Upload the project data
Step 1: Go to the "Datasets" tab and upload the previously downloaded dataset as a CSV file. The following image shows a screenshot of the example PI3K-mTor dataset in MS Excel. This dataset will be available in the default project of your Makya installation. The dataset contains one column of SMILES and other columns with values for objectives like pKi IC50 Pi3K, pKi IC50 mTor, Water Solubleness, Caco2 permeability, etc.
Upload this dataset to Makya by clicking on the New Data Set box, and name the dataset Data_pi3k_mtor (or any other name of your choice). Then click on the Choose File button and navigate to the location of the csv file, click Open. Once uploaded, the dataset will appear on the right-side panel with SMILES replaced with 2D chemical structures. Click on Create Dataset to upload this data to the project. Click again on the "Datasets" tab, and the newly created dataset should appear on the page.
Step 2: Train the QSAR model
Step 2: The data is uploaded so now you can build predictive models which can be use to score a new idea you may have or to guide the generation process. To do so, go to the QSAR tab and click on the New QSAR box. On the following page, name this predictor as “Lead_optimization_predictor” (or any other name) and click Next to go to the next page. Go to the "Targets" sub-tab and select the dataset Data_pi3k_mtor (created and uploaded in Step 2) in the dropdown menu for "Dataset". In the adjacent Target dropdown menu, select the following targets, and after each selection, click Add Target.
- pKi IC50 Pi3K
- pKi IC50 mTor
- Water Solubility
- Caco2 permeability
- Total Clearance
A number of sliders corresponding to each of the objectives (in the initial dataset) will appear on the screen. Use these sliders to define the thresholds for each objective while ensuring that there is always a balanced number of molecules that fall in and out of the thresholds (indicated by Molecules In and Molecules Out). An alternative to using the sliders is directly entering the desired value in the min or max threshold box for each objective. For this example use case, set the following thresholds.
|
pKi IC50 Pi3K |
7 (min) |
|
pKi IC50 mTor |
8.5 (min) |
|
Water Solubility |
-4.0 (max) |
|
Caco2 permeability |
0.9 (max) |
|
Total Clearance |
0.5 (min) |
After setting the thresholds, click Save. Go back to the QSAR tab and the newly created predictor should appear on this page. Click on Run to train the models. Once completed the progress bar will turn green, and two new buttons Test and See Results will appear.
Clicking on the See Results button will display the ROC curves and associated metrics for each model trained on each objective. These can be used to assess the performance or quality of the trained models.
NOTE: If any of the models is deemed unsatisfactory, either (1) improve the quality of the provided data or (2) run again the QSAR process with relaxed thresholds.
Clicking on the Test button will allow the user to input an individual SMILES or a file of SMILES to be evaluated by model.
Step 3: Create the generator
Step 3: Finally, to generate molecules guided by the QSAR model, go to the "Generators" tab of the project page and click on the New Generator box. Name the generator as “Lead_optimization_generator”, and under "Generation Engine" select Fine Tuning and click on Next.
On the following page specify the various options to set up this generator.
- Chemical Space: Check the box next to Data_pi3k_mtor dataset. This is the dataset that was created and uploaded in Step 2 and will be used to guide this generation as the chemical space.
-
Products: In the "Substructures" tab, click the Add button on "Molecules must match at least one of these substructures (matching set)" enter the following SMARTS string and click Save:
[#6]-[#6]-c1ncnc(c1C#Cc1ccc(-[#7])nc1)-*:1:*:*:*(:*:*:1)-[#6](-[#7])=O
- QSAR: Check the box next to Lead_optimization_predictor. This should check boxes for all the individual predictor models on the objectives we trained on in step 3.
Go back to the "Generators" tab, and the newly set up generator will show up. Click on the Run button for this generator. The progress bar will indicate the status of generation.
Step 4: Explore the generated molecules
Step 4: While the generation is still running, visualize the generated molecules that match the specified blueprint by clicking on the eye icon under IN BLUEPRINT. On the page that appears, go to the Parallel Coordinates panel and adjust the Confidence metric to >0.7 by holding down the mouse on this metric and dragging it down to 0.7 as shown in the image below.
Sort the generated molecules by Iktos Ranking. On the first molecule of the generated molecules grid, click on See Details. In the window that pops up, compare the generated molecule with the molecule most similar to it from the chemical space, and notice the subtle differences between the two (see example images below). Also note the predicted values for all the objectives for the generated molecule in this window. Such analysis can be done for all the molecules that are generated.