US20230245729A1 - Accounting for induced fit effects - Google Patents
Accounting for induced fit effects Download PDFInfo
- Publication number
- US20230245729A1 US20230245729A1 US18/132,936 US202318132936A US2023245729A1 US 20230245729 A1 US20230245729 A1 US 20230245729A1 US 202318132936 A US202318132936 A US 202318132936A US 2023245729 A1 US2023245729 A1 US 2023245729A1
- Authority
- US
- United States
- Prior art keywords
- ligand
- biomolecule
- template
- candidate
- pharmacophore
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000000694 effects Effects 0.000 title claims description 19
- 239000003446 ligand Substances 0.000 claims abstract description 366
- 238000000034 method Methods 0.000 claims abstract description 59
- 238000003032 molecular docking Methods 0.000 claims abstract description 31
- 230000004048 modification Effects 0.000 claims abstract description 28
- 238000012986 modification Methods 0.000 claims abstract description 28
- 238000002360 preparation method Methods 0.000 claims abstract description 20
- 238000003860 storage Methods 0.000 claims description 15
- 238000009510 drug design Methods 0.000 claims description 6
- 238000003556 assay Methods 0.000 claims description 4
- 230000015572 biosynthetic process Effects 0.000 claims description 3
- 238000003786 synthesis reaction Methods 0.000 claims description 3
- 230000002194 synthesizing effect Effects 0.000 claims description 3
- 239000003814 drug Substances 0.000 claims 1
- 229940079593 drug Drugs 0.000 claims 1
- 230000003993 interaction Effects 0.000 description 26
- 229910052739 hydrogen Inorganic materials 0.000 description 17
- 239000001257 hydrogen Substances 0.000 description 17
- 230000006870 function Effects 0.000 description 15
- 125000004429 atom Chemical group 0.000 description 14
- 125000003118 aryl group Chemical group 0.000 description 13
- 238000004590 computer program Methods 0.000 description 12
- 238000012545 processing Methods 0.000 description 12
- 239000000370 acceptor Substances 0.000 description 11
- 238000010586 diagram Methods 0.000 description 11
- 125000001165 hydrophobic group Chemical group 0.000 description 10
- 238000000329 molecular dynamics simulation Methods 0.000 description 10
- 239000003471 mutagenic agent Substances 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- QNAYBMKLOCPYGJ-REOHCLBHSA-N L-alanine Chemical compound C[C@H](N)C(O)=O QNAYBMKLOCPYGJ-REOHCLBHSA-N 0.000 description 7
- 235000004279 alanine Nutrition 0.000 description 7
- 238000005070 sampling Methods 0.000 description 7
- 238000000324 molecular mechanic Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 150000001875 compounds Chemical class 0.000 description 5
- 125000004435 hydrogen atom Chemical group [H]* 0.000 description 5
- UFHFLCQGNIYNRP-UHFFFAOYSA-N Hydrogen Chemical compound [H][H] UFHFLCQGNIYNRP-UHFFFAOYSA-N 0.000 description 4
- 239000005557 antagonist Substances 0.000 description 4
- 230000008827 biological function Effects 0.000 description 4
- 238000010494 dissociation reaction Methods 0.000 description 4
- 230000005593 dissociations Effects 0.000 description 4
- 230000002209 hydrophobic effect Effects 0.000 description 4
- 238000002810 primary assay Methods 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 238000012216 screening Methods 0.000 description 4
- 238000002805 secondary assay Methods 0.000 description 4
- 238000002424 x-ray crystallography Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 230000000295 complement effect Effects 0.000 description 3
- 230000009073 conformational modification Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 235000018102 proteins Nutrition 0.000 description 3
- 102000004169 proteins and genes Human genes 0.000 description 3
- 108090000623 proteins and genes Proteins 0.000 description 3
- 239000000126 substance Substances 0.000 description 3
- 235000001014 amino acid Nutrition 0.000 description 2
- 229940024606 amino acid Drugs 0.000 description 2
- 150000001413 amino acids Chemical class 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 239000007795 chemical reaction product Substances 0.000 description 2
- 230000002860 competitive effect Effects 0.000 description 2
- 238000002884 conformational search Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 238000007876 drug discovery Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000002349 favourable effect Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000000704 physical effect Effects 0.000 description 2
- 150000003839 salts Chemical class 0.000 description 2
- 238000013515 script Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 230000035502 ADME Effects 0.000 description 1
- DCXYFEDJOCDNAF-UHFFFAOYSA-N Asparagine Natural products OC(=O)C(N)CC(N)=O DCXYFEDJOCDNAF-UHFFFAOYSA-N 0.000 description 1
- XDTMQSROBMDMFD-UHFFFAOYSA-N Cyclohexane Chemical group C1CCCCC1 XDTMQSROBMDMFD-UHFFFAOYSA-N 0.000 description 1
- 238000002965 ELISA Methods 0.000 description 1
- WHUUTDBJXJRKMK-UHFFFAOYSA-N Glutamic acid Natural products OC(=O)C(N)CCC(O)=O WHUUTDBJXJRKMK-UHFFFAOYSA-N 0.000 description 1
- 102000004310 Ion Channels Human genes 0.000 description 1
- DCXYFEDJOCDNAF-REOHCLBHSA-N L-asparagine Chemical compound OC(=O)[C@@H](N)CC(N)=O DCXYFEDJOCDNAF-REOHCLBHSA-N 0.000 description 1
- CKLJMWTZIZZHCS-REOHCLBHSA-N L-aspartic acid Chemical compound OC(=O)[C@@H](N)CC(O)=O CKLJMWTZIZZHCS-REOHCLBHSA-N 0.000 description 1
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 1
- COLNVLDHVKWLRT-QMMMGPOBSA-N L-phenylalanine Chemical compound OC(=O)[C@@H](N)CC1=CC=CC=C1 COLNVLDHVKWLRT-QMMMGPOBSA-N 0.000 description 1
- QIVBCDIJIAJPQS-VIFPVBQESA-N L-tryptophane Chemical compound C1=CC=C2C(C[C@H](N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-VIFPVBQESA-N 0.000 description 1
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 description 1
- KZSNJWFQEVHDMF-BYPYZUCNSA-N L-valine Chemical compound CC(C)[C@H](N)C(O)=O KZSNJWFQEVHDMF-BYPYZUCNSA-N 0.000 description 1
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 1
- QIVBCDIJIAJPQS-UHFFFAOYSA-N Tryptophan Natural products C1=CC=C2C(CC(N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-UHFFFAOYSA-N 0.000 description 1
- KZSNJWFQEVHDMF-UHFFFAOYSA-N Valine Natural products CC(C)C(N)C(O)=O KZSNJWFQEVHDMF-UHFFFAOYSA-N 0.000 description 1
- 239000000556 agonist Substances 0.000 description 1
- 230000003281 allosteric effect Effects 0.000 description 1
- 235000009582 asparagine Nutrition 0.000 description 1
- 229960001230 asparagine Drugs 0.000 description 1
- 235000003704 aspartic acid Nutrition 0.000 description 1
- OQFSQFPPLPISGP-UHFFFAOYSA-N beta-carboxyaspartic acid Natural products OC(=O)C(N)C(C(O)=O)C(O)=O OQFSQFPPLPISGP-UHFFFAOYSA-N 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000007707 calorimetry Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010668 complexation reaction Methods 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000001962 electrophoresis Methods 0.000 description 1
- 239000003248 enzyme activator Substances 0.000 description 1
- 239000002532 enzyme inhibitor Substances 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 235000013922 glutamic acid Nutrition 0.000 description 1
- 239000004220 glutamic acid Substances 0.000 description 1
- ZDXPYRJPNDTMRX-UHFFFAOYSA-N glutamine Natural products OC(=O)C(N)CCC(N)=O ZDXPYRJPNDTMRX-UHFFFAOYSA-N 0.000 description 1
- 238000012203 high throughput assay Methods 0.000 description 1
- 238000000099 in vitro assay Methods 0.000 description 1
- 238000005462 in vivo assay Methods 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 125000001449 isopropyl group Chemical group [H]C([H])([H])C([H])(*)C([H])([H])[H] 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 229920002521 macromolecule Polymers 0.000 description 1
- 230000037353 metabolic pathway Effects 0.000 description 1
- 244000000010 microbial pathogen Species 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000000869 mutational effect Effects 0.000 description 1
- 230000036963 noncompetitive effect Effects 0.000 description 1
- 102000039446 nucleic acids Human genes 0.000 description 1
- 108020004707 nucleic acids Proteins 0.000 description 1
- 150000007523 nucleic acids Chemical class 0.000 description 1
- 230000007170 pathology Effects 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- COLNVLDHVKWLRT-UHFFFAOYSA-N phenylalanine Natural products OC(=O)C(N)CC1=CC=CC=C1 COLNVLDHVKWLRT-UHFFFAOYSA-N 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 239000011541 reaction mixture Substances 0.000 description 1
- 239000000018 receptor agonist Substances 0.000 description 1
- 229940044601 receptor agonist Drugs 0.000 description 1
- 239000002464 receptor antagonist Substances 0.000 description 1
- 239000002469 receptor inverse agonist Substances 0.000 description 1
- 230000000452 restraining effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 230000019491 signal transduction Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 150000003384 small molecules Chemical class 0.000 description 1
- 238000012916 structural analysis Methods 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 125000000999 tert-butyl group Chemical group [H]C([H])([H])C(*)(C([H])([H])[H])C([H])([H])[H] 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 description 1
- 239000004474 valine Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B30/00—Methods of screening libraries
- C40B30/04—Methods of screening libraries by measuring the ability to specifically bind a target molecule, e.g. antibody-antigen binding, receptor-ligand binding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/60—In silico combinatorial chemistry
- G16C20/64—Screening of libraries
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/30—Drug targeting using structural data; Docking or binding prediction
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/40—Searching chemical structures or physicochemical data
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/50—Molecular design, e.g. of drugs
Definitions
- This application relates generally to using a computer to assist in predicting a docked position of a target ligand in a binding site of a biomolecule, and relates more specifically to using a computer to assist in predicting a docked position of a target ligand in a binding site of a biomolecule that is capable of undergoing an induced fit.
- Biomolecules often serve particular functions and the ability to modulate the functionality of a biomolecule can be useful for treating diseases and for engineering industrial biomolecular applications.
- the functionality of a biomolecule is sometimes modulated by whether and how one or more ligands are bound to the biomolecule.
- Biomolecules often have regions (e.g., an “active site”) where one or more ligands can bind to the biomolecule and thereby modulate the functionality of the biomolecule.
- active site regions where one or more ligands can bind to the biomolecule and thereby modulate the functionality of the biomolecule.
- competitive antagonists are compounds that can bind to an active site in a biomolecule, thereby inhibiting the natural ligand from binding.
- Competitive antagonists prevent a biomolecule from performing its biological function, since the biological function requires the natural ligand to be bound in the active site.
- non-competitive antagonists also prevent a biomolecule from performing its biological function, but do so by binding to the biomolecule and changing the biomolecule in some way (such as by changing its three-dimensional conformational ensemble) so that the biomolecule can no longer perform its biological function (e.g., changing the biomolecule's conformation such that it can no longer accommodate the binding of the natural ligand).
- an agonist can bind to a biomolecule and activate a particular function of the biomolecule (rather than inhibit the function).
- the three-dimensional structure of the ligand-biomolecule complex (the structure of both the ligand and the biomolecule when the ligand is bound to the biomolecule).
- the three-dimensional structure can provide information about which interactions between the ligand and the biomolecule are important for binding, thereby informing rational drug design.
- the three-dimensional structure can also be used to calculate the free energy of binding. Unfortunately, it is sometimes difficult to predict the three-dimensional structure of a ligand-biomolecule complex, especially when the biomolecule undergoes an induced fit effect.
- One aspect features a method for predicting a docked position of a target ligand in a binding site of a biomolecule.
- the method involves receiving a template ligand-biomolecule structure that has a template ligand docked in the binding site of the biomolecule and comparing a pharmacophore model of the template ligand to a pharmacophore model of the target ligand.
- the pharmacophore model of the target ligand is overlapped with the pharmacophore model of the template ligand while the template ligand is in the binding site of the biomolecule.
- a docked position is predicted for the target ligand in the binding site of the biomolecule based on a position of the pharmacophore model of the target ligand when overlapped with the pharmacophore model of the template ligand.
- Another aspect features a computer system that has at least one processor, a preparation module, a pharmacophore matcher module, and a docking module.
- the preparation module is stored in memory and coupled to at least one processor, and is programmed to receive information identifying a target ligand and a template ligand-biomolecule structure comprising a template ligand and a biomolecule.
- the pharmacophore matcher module is stored in memory and coupled to at least one processor, and is programmed to identify a pharmacophore match between the template ligand and the target ligand by comparing the pharmacophore model of the template ligand to the pharmacophore model of the target ligand.
- the docking module is stored in memory and coupled to at least one processor, and is programmed to predict a docked ligand position of the target ligand in the template ligand-biomolecule structure by overlapping the pharmacophore model of the target ligand with the pharmacophore model of the template ligand while the template ligand is in the binding site of the biomolecule.
- Another aspect features a non-transitory computer readable storage medium having a computer readable program that when executed on a computer causes the computer to predict a docked position of a target ligand in a binding site of a biomolecule.
- Making the prediction as to the docked position of the target ligand in the binding site of the biomolecule involves performing various steps.
- One step involves receiving information identifying the target ligand and a template ligand-biomolecule structure, using a preparation module stored in memory and coupled to at least one processor.
- the template ligand-biomolecule structure has a template ligand docked in the binding site of the biomolecule.
- Another step involves identifying a pharmacophore match between the template ligand and the target ligand, using a pharmacophore matcher module stored in memory and coupled to at least one processor.
- the process of identifying the pharmacophore match involves comparing a pharmacophore model of the template ligand to a pharmacophore model of the target ligand.
- Another step involves predicting a docked ligand position of the target ligand, using a docking module stored in memory and coupled to at least one processor.
- the docking module predicts the docked position of the target ligand in the binding site of the biomolecule based on a position of the pharmacophore model of the target ligand when overlapped with the pharmacophore model of the template ligand while the template ligand is in the binding site of the biomolecule.
- the target ligand is selected from a plurality of ligand candidates, each of the ligand candidates being different from the template ligand. Selecting the target ligand involves comparing the pharmacophore model of the template ligand to a pharmacophore model of each respective one of the plurality of ligand candidates.
- a plurality of template ligand-biomolecule structures is received, each template ligand-biomolecule structure having a different template ligand docked in the binding site of the biomolecule.
- the pharmacophore model of the template ligand is generated by combining information from each of the template ligands from the plurality of template ligand-biomolecule structures.
- the target ligand has more than one structural conformation in its unbound state
- the docked position of the target ligand in the binding site of the biomolecule is predicted by enumerating a set of potential target ligand conformations and overlapping a respective pharmacophore model of the target ligand for each of the potential target ligand conformations with the pharmacophore model of the template ligand while the template ligand is in the binding site of the biomolecule.
- predicting the docked position of the target ligand in the binding site of the biomolecule involves ignoring at least one clash between the target ligand conformation's atomic coordinates and the biomolecule's atomic coordinates.
- the atomic coordinates of the biomolecule are modified to reduce clashes between the docked target ligand conformation's atomic coordinates and the biomolecule's atomic coordinates, thereby creating an altered ligand-biomolecule structure comprising the docked target ligand and an altered biomolecule.
- a re-docked position of each target ligand conformation is predicted by predicting each target ligand conformation's position in the binding site of the altered biomolecule.
- the atomic coordinates of the altered biomolecule are modified to reduce clashes between the atomic coordinates of the target ligand conformation's re-docked position and the atomic coordinates of the altered biomolecule, thereby creating a re-altered ligand-biomolecule structure comprising a re- docked target ligand and a re-altered biomolecule.
- each altered and re-altered ligand-biomolecule structure is ranked using a scoring function.
- a subset of high-ranking target ligands corresponding to target ligands having a threshold value for an empirical activity is identified.
- FIG. 1 is a block/flow diagram showing a method of predicting a docked position of a target ligand in a binding site of a biomolecule.
- FIG. 2 is a block diagram showing a prediction system for predicting a docked position of a target ligand in a binding site of a biomolecule.
- FIG. 3 is a block/flow diagram showing one component of the prediction system shown in FIG. 2 (the pharmacophore matcher module).
- FIG. 4 is a block diagram showing one component of the prediction system shown in FIG. 2 (the preparation module).
- FIG. 5 is a block diagram showing one component of the prediction system shown in FIG. 2 (the biomolecule modification module).
- FIG. 6 is a block diagram showing one component of the prediction system shown in FIG. 2 (the docking module).
- FIG. 7 A is a cartoon diagram illustrating the process of a ligand binding to a biomolecule.
- FIG. 7 B is a cartoon diagram illustrating the process of induced fit binding for both a template ligand and a target ligand.
- FIG. 8 A illustrates a pharmacophore model for a template ligand and a target ligand.
- FIG. 8 B illustrates an overlap between the pharmacophore model of the template ligand and the target ligand.
- FIG. 9 illustrates an example of how multiple pharmacophore models can be created for a single ligand.
- FIG. 10 illustrates an overlap between the template ligand and the target ligand illustrated in FIG. 9 B while the template ligand is in the active site of a biomolecule.
- FIG. 11 is a flow chart illustrating steps in an exemplary drug design method that includes induced fit docking computations.
- FIG. 12 is a diagram of a computer system.
- a template ligand 704 that binds to a biomolecule 700 (i.e., the structure of a template ligand-biomolecule complex 224 ), but either know or suspect that a different target ligand 706 also binds to the same biomolecule 700 (see FIG. 7 B ).
- scientists and engineers may be interested in the target ligand 706 because it may (i) have higher binding affinity than the template ligand 704 , (ii) be more commercially viable than the template ligand 704 , (iii) be metabolized in a safer way than the template ligand 704 , (iv) not be covered by the same intellectual property rights as the template ligand 704 , etc.
- the three-dimensional structure of the target ligand 706 when bound to biomolecule 700 because the three-dimensional structure can provide information about which interactions between the target ligand 706 and the biomolecule 700 are important for binding (thereby informing rational drug design). Additionally, the three-dimensional structure can also be used to calculate the free energy of binding of target ligand 706 .
- Computers can help reduce the cost and time involved in obtaining a three-dimensional structure; sometimes, computers are the only viable option because empirical techniques (e.g., x-ray crystallography and NMR) are sometimes unsuccessful at determining a three-dimensional structure, especially when the biomolecule has flexible/floppy regions.
- the three-dimensional structure of a template ligand 704 bound to a biomolecule 700 can be used to predict the three-dimensional structure of a target ligand 706 bound to the same (or similar) biomolecule 700 .
- a ligand binds to a particular biomolecule, the biomolecule does not always keep its original three-dimensional conformation.
- FIG. 7 A there are generally two different modes of ligand binding: (i) the “lock and key” mode 712 , and (ii) the “induced fit” mode 716 .
- binding can occur through the “lock and key” mode 712 and the biomolecule may not need to undergo significant conformation changes.
- the prediction system and methods disclosed herein describe how to predict conformational changes that result from the induced fit effect.
- the system and methods describe how computational methods can be used to predict the three-dimensional structure of a target ligand-biomolecule complex 230 (comprising target ligand 706 bound to biomolecule 701 , where biomolecule 701 is biomolecule 700 after undergoing conformational changes), given a template ligand-biomolecule structure 224 (comprising template ligand 704 and biomolecule 700 ).
- more than one target ligand 706 is analyzed, and each one is ranked based on a scoring function.
- the top-ranking target ligands 706 can be chemically synthesized for empirical testing.
- the structure of the biomolecule in the predicted ligand-biomolecule complex 230 can be used as a modified biomolecule in rigid-receptor docking and other drug discovery techniques.
- FIG. 1 shows a block/flow diagram illustratively depicting one embodiment of a method for predicting a docked position of a target ligand 706 in a binding site of a biomolecule 700 , where blocks 100 through 110 (outlined in bold) represent steps of the method.
- the prediction system 200 shown in FIG. 2 can implement steps of the method shown in FIG. 1 .
- the prediction system 200 receives input 222 from a user or in an automated fashion (e.g., automatically downloading the input 222 from a server).
- the input 222 includes at least one three-dimensional atomic structure of the template ligand-biomolecule complex 224 and also includes information identifying at least one target ligand 706 .
- the template ligand-biomolecule complex 224 includes a biomolecule 700 and a template ligand 704 that is bound to the biomolecule 700 .
- the template ligand 704 can be bound to binding site 702 (e.g., an active site or allosteric site) of the biomolecule 700 .
- the at least one template ligand-biomolecule structure 224 can be obtained empirically (e.g., using NMR or x-ray crystallography) or computationally (e.g., using a biomolecule structure prediction system, such as CHARMM, AMBER, or GROMACS).
- the template ligand-biomolecule complex 224 can be an incomplete structure—e.g., some empirical techniques are incapable of resolving the myriad three-dimensional structures adopted by floppy/flexible regions of a biomolecule. In these situations, the unresolved regions of the incomplete template ligand- biomolecule complex 224 can be resolved using the molecule dynamics module 504 of the prediction system 200 , or using any other biomolecular structure prediction module or system.
- the ligand-biomolecule complex 224 can also be incomplete for other reasons, e.g., because a contiguous set of atomic coordinates may be undesirable or not needed, such as in the case where distant atoms not significantly involved in the complexation may be ignored to save computational resources, or in the case where regions of the template ligand make contacts with the biomolecule and such contacts are unlikely to be shared by the target ligand.
- the prediction system 200 can also receive other input, such as information about physical conditions 226 (e.g., pH, temperature, and salt concentration).
- the target ligand 706 is sometimes provided as input 222 by a user.
- a user may know that a particular ligand (different from the template ligand 704 ) binds more strongly to biomolecule 700 than the template ligand 704 or has better ADME properties than the template ligand 704 .
- the known ligand can be the target ligand 706 that is provided as input 222 by a user seeking to know the three-dimensional structure of the target ligand 706 when bound to a biomolecule 700 .
- the target ligand 706 can be selected from a plurality of ligand candidates stored in a target ligand database 214 .
- the first step 100 of the method shown in FIG. 1 involves comparing at least one pharmacophore model of the template ligand 704 with at least one pharmacophore model of the target ligand 706 .
- Pharmacophore generator 300 can be used to identify pharmacophores of different types (e.g., aromatic type, hydrophobic type, etc.).
- a pharmacophore model comprises one or more pharmacophores and can include information about the relative location of the pharmacophores and the directionality of the pharmacophores (when applicable).
- the pharmacophore models used in step 100 can either be generated by the prediction system 200 (e.g., using pharmacophore generator 300 ) or provided as input 222 to the prediction system 200 .
- the pharmacophore models used in step 100 need not be generated from the same source (e.g., the pharmacophore model of the target ligand 706 can be provided as input 222 , while the pharmacophore model of the template ligand 704 can be generated by the prediction system 200 ).
- FIG. 8 illustrates example pharmacophore models for a specific template ligand 704 and a specific target ligand 706 .
- the template ligand 704 has nine distinct pharmacophores, comprising three types: aromatic groups 804 represented by orange rings, hydrogen-bond acceptors 802 represented by red spheres, and hydrophobic groups 800 represented by green spheres. Together, all nine pharmacophores, or a subset thereof, can make up the pharmacophore model 806 for template ligand 704 .
- the target ligand 706 also has nine distinct pharmacophores, comprising the same three types.
- the template ligand 704 and target ligand 706 may, but need not, have the same number of pharmacophores.
- the pharmacophore generator 300 (see FIG. 3 ) can be used to generate pharmacophores like those in FIG. 8 .
- the pharmacophore generator 300 can have an aromatic detector 310 to detect aromatic groups 804 , a hydrophobe detector 312 to detect hydrophobic groups 800 , and a hydrogen-bond acceptor detector 318 to detect hydrogen bond acceptors 802 .
- a pharmacophore model can comprise more than one instance of a pharmacophore type, e.g., pharmacophore type 800 (hydrophobic groups represented by green spheres in FIGS. 8 A- 9 ) has three pharmacophore instances 810 in target ligand 706 , all of which could form part of a pharmacophore model of the target ligand 706 .
- pharmacophore type 800 hydrophobic groups represented by green spheres in FIGS. 8 A- 9
- pharmacophore models like those shown in FIG. 8 can be generated by pharmacophore generator 300 using a number of different techniques.
- Each pharmacophore type e.g., aromatic groups 804 , hydrogen-bond acceptors 802 , and hydrophobic groups 800
- Each pharmacophore type within a pharmacophore model can be identified using pre- determined criteria. For example, instances of a hydrogen bond acceptor type 802 can be identified by searching for any surface-accessible atom that has one or more donatable lone electron pairs. Similarly, instances of a hydrogen bond donor type (detected by hydrogen bond donor detector 320 ) can be identified by searching for donatable hydrogen atoms.
- instances for a hydrophobic group type 800 can be identified by searching for rings, isopropyl groups, t-butyl groups, various halogenated moieties, and chains as long as four carbons (using this scheme for identifying hydrophobic group instances, chains of more than four carbons can be divided up into smaller fragments having between two to four carbons).
- pharmacophore generator 300 can be used to create a more detailed pharmacophore model by characterizing each of the pharmacophore instances based on their location within the molecule and their directionality (if applicable). There are various methods for identifying the location of a particular instance of a pharmacophore type. As one example, the location of an instance of a hydrophobic group type 800 can be defined as the weighted average of the positions of the non-hydrogen atoms in the identified instance.
- the location of negative and positive ionizable sites can be defined as a single point located on a formally charged atom, or at the centroid of a group of atoms over which the ionic charge is shared.
- the location of an instance of an aromatic type 804 can be defined as the centroid of the aromatic ring.
- a pharmacophore type has directionality can be a pre-determined setting of pharmacophore generator 300 .
- the hydrophobic group type 800 can be deemed to have no directionality component because hydrophobic interactions are frequently directionless, while the hydrogen bond donor/acceptor types (e.g., hydrogen-bond acceptors 802 ) can be deemed to have directionality because an interaction between this type and a biomolecule 700 frequently requires directional polar interactions along the hydrogen bond axis.
- Directionality of a type can be represented as a vector, as symbolized by the arrows 812 associated with the hydrogen-bond acceptor type 802 in FIG. 8 B .
- the directionality of the aromatic group type 804 can be defined as a two-headed vector normal to the plane of the aromatic ring (to correctly describe ring-stacking interactions).
- more than one pharmacophore model can be generated for any particular molecule.
- the two snapshots shown in FIG. 9 illustrate the same fused-ring molecule, but with different pharmacophore models.
- the difference between the pharmacophore model shown in snapshot 900 and the pharmacophore model shown in snapshot 902 is that in snapshot 900 , the 5-membered ring is represented as an aromatic pharmacophore type 804 , while in snapshot 902 the 5-membered ring is represented as having a hydrogen bond acceptor pharmacophore type 802 .
- Both pharmacophore models (model 904 for snapshot 900 , and model 906 for snapshot 902 ) are acceptable models.
- Another situation when more than one pharmacophore model can be generated for any particular molecule is the case where a molecule exists in multiple different three-dimensional conformation, e.g., when the target ligand 706 has a cyclohexane ring-structure that can exist in either a chair conformation or a boat conformation.
- a pharmacophore model 808 can be created for each conformation of the target ligand 706 , and the method shown in FIG. 1 can be performed on each conformation of the target ligand 706 .
- a pharmacophore model can be based on pharmacophores perceived in more than just one molecule.
- more than one template ligand-biomolecule structure 224 can be received as input 222 .
- each of the structures 224 can have a different template ligand 704 docked in the binding site 702 of the biomolecule 700 .
- step 100 can involve generating a pharmacophore model 806 of the template ligands 704 by combining information from each of the respective template ligands 704 from the plurality of template ligand-biomolecule structures 224 .
- Pharmacophores common to each of the respective template ligands 704 can be used to create a combined pharmacophore model. Additionally, more than one pharmacophore model 806 can be generated from the plurality of template ligands 704 . In such cases, if the template ligand-biomolecule structures 224 have known binding affinities of the associated template ligands 704 , then the binding affinities can be provided as input 222 and pharmacophore models of template ligands 704 can be given greater weight in the pharmacophore model if they belong to a template ligand 704 with higher binding affinity.
- step 100 of FIG. 1 next involves comparing the at least one pharmacophore model 806 of the template ligand 704 with the at least one pharmacophore model 808 of the target ligand 706 .
- the objective of the comparison is to identify pharmacophore types common to both the pharmacophore model 806 of the template ligand 704 and the pharmacophore model 808 of the target ligand 706 .
- the pharmacophore match detector 306 can be used to identify common pharmacophores between the template ligand 704 and target ligand 706 (e.g., FIG. 8 B shows a pharmacophore match 816 where the aromatic group type 804 is found in both the template ligand 704 and the target ligand 706 ).
- pharmacophore models can be used for comparing pharmacophore models, with the underlying goal being the identification of pharmacophores common to both molecules being compared (e.g., common to both template ligand 704 and target ligand 706 ), and especially the identification of pharmacophores with similar topological arrangements and directionality.
- the pharmacophore types common to both the template ligand 704 and the target ligand 706 can be superimposed. More than one superimposed option may be possible (e.g., when more than one instance 810 of a particular pharmacophore type is present in the template ligand 704 or the target ligand 706 or both), in which case various techniques can be used to rank the superimposition options.
- the RMSD between the superimposed common pharmacophores can be calculated—superimposition options with lower RMSD can be more highly ranked, and the highest-ranking superimposition option (e.g., superimposition option 814 shown in FIG. 8 B ) can be chosen first for the implementation of steps 102 - 110 in FIG. 1 .
- the output of step 100 can be at least one superimposition of the pharmacophore model of target ligand 706 and the pharmacophore model of template ligand 704 (e.g., superimposition 814 ).
- each pharmacophore model of the template target ligand 704 is compared (step 100 ) to each pharmacophore model of the target ligand 706 .
- Such a comparison can be done serially or in parallel using the pharmacophore match detector 306 .
- step 102 involves docking the target ligand 706 into a binding site of biomolecule 700 (e.g., into the active site 702 of the biomolecule 700 ).
- Step 102 can be accomplished using docking module 208 .
- Docking the target ligand 706 into the active site 702 involves overlapping the pharmacophore model 808 of the target ligand 706 with the pharmacophore model 806 of the template ligand 704 while the template ligand 704 is in the binding site 702 of the biomolecule 700 .
- Such an overlap can be achieved by selecting the highest-ranking superimposition option (e.g., superimposition option 814 ) resulting from the comparison in step 100 .
- the highest-ranking superimposition option (e.g., superimposition option 814 ) can then be overlapped/superimposed in the active site 702 of the biomolecule 700 , as shown in FIG. 10 .
- Other lower-ranking superimposition options can also be docked, either serially or in parallel to the highest-ranking option.
- Step 102 may result in energetically unfavorable interactions (“clashes”) between the atoms in the target ligand 706 and the biomolecule 700 .
- Clashes e.g., clash 710 shown in FIG. 7 A
- some or all of such clashes can be ignored during step 102 . While it is acceptable to ignore all clashes in some implementations, in other implementations some clashes may be deemed too severe to ignore. Whether a clash is deemed too severe to ignore can be determined by analyzing pre-set criteria (e.g., default criteria of docking module 208 , or criteria provided as user input 222 ).
- a clash between an atom of target ligand 706 and a backbone atom of biomolecule 700 may be deemed too severe to ignore. If a clash is deemed too severe to ignore in the pre-set criteria, then the method shown in FIG. 1 can either be terminated at step 102 for the particular superimposition option being analyzed, or the prediction system 200 can output a message to the user indicating that the particular superimposition option being analyzed may result in highly unfavorable interactions requiring major modifications of the biomolecule 700 .
- step 104 involves modifying the biomolecule 700 in response to the presence of the target ligand 706 (e.g., in response to clashes between the target ligand 706 and the biomolecule 700 ).
- Step 104 models the “induced fit” effect.
- Biomolecule modification module 206 can be used to accomplish step 104 .
- the atoms of the template ligand 704 can be deleted or ignored (i.e., treated as “dummy” atoms).
- biomolecule 700 can undergo conformational modification (i.e., the movement of the atomic coordinates of the biomolecule 700 ) in response to the presence of target ligand 706 .
- clashes 710 can be resolved using minimizer 404 to perform molecular mechanics minimization of the clashing atoms in the biomolecule 700 while restraining the atoms of the target ligand 706 (e.g., using a harmonic restraint).
- molecular mechanics minimization can be followed by molecular dynamics simulation using molecular dynamics module 504 .
- clashes 710 can be resolved by Monte Carlo conformational searches to explore non-clashing positions of the side-chains of biomolecule 700 (e.g., rotamer optimization) using conformation explorer 502 .
- the clashing sidechains of biomolecule 700 can also be computationally mutated to residues larger than alanine but smaller than the clashing residues in biomolecule 700 , e.g., a leucine could be mutated to a valine, a tyrosine or tryptophan could be mutated to phenylalanine, a glutamine could be mutated to asparagine, a glutamic acid could be mutated to an aspartic acid, etc.
- step 104 is the predicted structure of the target ligand-biomolecule complex 230 , which comprises target ligand 706 and altered biomolecule 701 .
- step 106 which involves ranking the target ligand-biomolecule complexes 230 that are output from step 104 .
- Each complex 230 output from step 104 comprises a target ligand 706 and altered biomolecule 701 .
- the complexes 230 can be ranked according to any number of scoring functions, which can be used to calculate the affinity between the target ligand 706 and altered biomolecule 701 .
- Scoring functions can generally be force-field-based (using classical molecular mechanics energy functions), knowledge-based (using a potential created from statistical probability distributions of interatomic distances in known ligand-biomolecule complexes), and/or empirical-based (i.e., weighting structural moieties based on experimental binding affinities from a training set of known biomolecule-ligand complexes).
- all complexes 230 can be ranked together using a scoring function that is a function of interactions between the target ligand 706 and altered biomolecule 701 .
- mutated sidechains can be restored to the original sidechain (by using mutator 506 and then preparation module 210 for minimization and/or sampling) after the modification step 104 of the process shown in FIG. 1 .
- the mutated residues can be restored to the original sidechain either before or after the ranking step 106 .
- All complexes can be scored together in ranking step 106 under the assumption that mutating non-interacting residues (i.e., those residues that do not form significant contacts with the biomolecule 700 ) will not affect scoring, but mutating interacting residues (e.g., residues forming a salt bridge with biomolecule 700 , residues involved in pi-stacking with biomolecule 700 , etc.) would negatively impact scoring since those interacting residues are presumably key for binding.
- mutating non-interacting residues i.e., those residues that do not form significant contacts with the biomolecule 700
- mutating interacting residues e.g., residues forming a salt bridge with biomolecule 700 , residues involved in pi-stacking with biomolecule 700 , etc.
- a subset of the top-ranking complexes listed in step 108 of FIG. 1 can be synthesized for empirical structural analysis (e.g., using x-ray crystallography or NMR, etc.) or empirical activity analysis (e.g., using calorimetry, electrophoresis, ELISA, fluorescence changes, etc.).
- the subset of top-ranking complexes listed in step 108 can be chosen using a pre-determined cut-off, e.g., the top 10%, which can be ultimately provided as a list of ranked complexes 232 .
- the pre-determined cut-off could also represent a threshold value for an empirical activity, where the threshold value can be specified as user input 222 (e.g., activity in the nanomolar range or better).
- the threshold value can be specified as user input 222 (e.g., activity in the nanomolar range or better).
- the output 228 of the method shown in FIG. 1 includes the structure of each target ligand-biomolecule complex 230 (where the target ligand-biomolecule complex 230 comprises the target ligand 706 and the altered biomolecule 701 ), which can be used to create a list of ranked complexes 232 (step 108 ) and/or used for the visualization of ranked complexes (step 110 ). Whether a list of ranked complexes 232 (step 108 ) or a visualization of them (step 110 ) is produced (or both), the output can include information about atomic coordinates of each of the three-dimensional structures of the target ligand-biomolecule complex 230 .
- the output 228 may be visualized on one or more displays 218 that are coupled to one or more graphical user interfaces 220 .
- the three-dimensional structures of the ranked complexes can be shown on display 218 and the three-dimensional structures can be manipulated and modified by a user via graphical user interface 220 .
- steps 102 - 110 can be repeated.
- step 102 can be performed on the list of ranked complexes 108 in order to predict a re-docked position of each target ligand 706 (including all three-dimensional conformations of each target ligand 706 ) by predicting each target ligand's 706 position in the binding site 702 of the altered biomolecule 701 .
- step 102 can be performed on the predicted complexes 230 that were output from modification step 104 (without ranking those complexes 230 ).
- re-docking can be done by optimizing interactions between the target ligand 706 and the active site 702 of biomolecule 701 (e.g., optimizing hydrogen bonding interactions, salt-bridges, hydrophobic interactions, etc.), using the interaction optimizer 604 of docking module 208 .
- steps 104 - 110 can be performed on the re-docked target ligand 706 and altered biomolecule 701 (yielding the structure of a target ligand 706 bound to a re-altered version of altered biomolecule 701 ).
- step 106 (involving ranking of the predicted structure of each target ligand-biomolecule complex 230 ) can comprise ranking all target ligand-biomolecule complexes 230 , including those that have an altered biomolecule 701 and those that have a re-altered biomolecule structure (where the re-altered biomolecule structure is the result of repeating steps 102 - 104 in FIG. 1 ), using a scoring function.
- step 106 the step of ranking complexes 230 (step 106 ) may not be performed.
- a computer prediction system 200 can be used for predicting a target ligand-biomolecule structure 230 after receiving as input one or more template ligand-biomolecule complex structures 224 and one or more target ligands 706 .
- the prediction system 200 can include one or more or processors 216 that are able to receive computer program instructions from a general purpose computer, special purpose computer, or any other programmable data processing apparatus.
- the one or more processors 216 are responsible for executing the received computer program instructions, e.g., instructions provided by modules stored in memory 202 .
- the output 228 may be visualized on one or more displays 218 that are coupled to one or more graphical user interfaces 220 .
- the three-dimensional structure of a predicted target ligand-biomolecule complex 230 can be shown on display 218 and can also be manipulated and modified by a user via graphical user interface 220 .
- the prediction system 200 can have a memory 202 that stores information and/or instructions.
- the memory 202 can store a preparation module 210 that is coupled to at least one processor 216 .
- the preparation module 220 can be programmed to receive physical parameters, e.g., pH, temperature, and salt concentration; such parameters can be used by the preparation module 210 and can also ultimately be used by other modules, such as molecular dynamics module 502 .
- the physical parameters can be provided by a user as input 222 to the prediction system 200 .
- the physical parameters can inform when to make preliminary modification to the template ligand-biomolecule structure 224 and/or the target ligand 706 , e.g., using the hydrogen completer 400 described below.
- the preparation module 210 can be programmed to include a hydrogen completer 400 .
- the hydrogen completer 410 can covalently add hydrogen atoms to appropriate locations of a template ligand-biomolecule structure 224 or target ligand 706 , e.g., depending on the pH provided as user input 222 .
- Hydrogen atom addition is also sometimes performed because experimental techniques (e.g., NMR and x-ray crystallography) are sometimes incapable of resolving all hydrogen atoms in the template ligand-biomolecule structure 224 .
- the preparation module 210 can also include a missing coordinate completer 402 which can be used to predict the unknown coordinates of certain atoms when the template ligand-biomolecule structure 224 is an incomplete structure, or when restoring previously mutated residues (e.g., after modification step 104 but before performing the ranking step 106 ) to their original residue.
- the template ligand-biomolecule structure 224 can be incomplete because some empirical techniques are incapable of resolving the myriad structures adopted by floppy/flexible regions of a biomolecule, and so the input 222 of the template ligand-biomolecule complex 224 may be missing atomic coordinates for certain residues.
- the unresolved regions of the incomplete structure can be resolved using the missing coordinate completer 402 , which can communicate with other modules, e.g., the molecule dynamics module 504 of the prediction system 200 , to predict the unknown atomic coordinates.
- the missing coordinate completer 402 can communicate with other modules, e.g., the molecule dynamics module 504 of the prediction system 200 , to predict the unknown atomic coordinates.
- the preparation module 210 can also include a minimizer 404 that is capable of performing energetic minimization using classical molecular mechanics forcefields.
- the minimizer 404 can be used to energetically relax the template ligand-biomolecule structure 224 after using the hydrogen completer 410 and the missing coordinate completer 402 .
- the minimizer 404 can also be useful when performing step 104 of the method shown in FIG. 1 , where the minimizer 404 can be used to partially or completely alleviate clashes 710 .
- the preparation module 210 can also include a conformational sampling module 406 .
- the conformational sampling module 406 can be used to sample other viable three-dimensional conformations of the template ligand-biomolecule complex 224 , besides the conformation provided as input 222 .
- the conformational sampling module 406 can contain or be coupled to molecular dynamics module 504 , conformation explorer 502 , and/or any other module capable of identifying alternative three-dimensional conformations of the template-ligand biomolecule complex 224 .
- Such sampling can be especially useful when the template ligand-biomolecule structure 224 is known or suspected to be floppy/flexible but the experimental technique used to generate the template ligand-biomolecule structure 224 was only capable of resolving one or some of the myriad of potential structures.
- the memory 202 can also store a pharmacophore matcher module 204 that is coupled to at least one processor 216 .
- the pharmacophore matcher module 204 can be programmed to generate pharmacophores for a template ligand 704 and a target ligand 706 using pharmacophore generator 300 .
- Pharmacophore generator 300 can includes various detectors that are capable of identifying pharmacophores in a molecule; the detectors can be either default detectors pre-set in prediction system 200 or can be supplied as input 222 by a user.
- An aromatic detector 310 can detect pharmacophores of the aromatic group type 804 .
- Hydrophobe detector 312 can detect pharmacophores of the hydrophobic group type 800 .
- Positive ionizable detector 314 can detect pharmacophore groups that can become positively ionized; similarly, negative ionizable detector 316 can detect pharmacophore groups that can become negatively charged.
- Hydrogen bond acceptor detector 318 can detect hydrogen bond acceptor pharmacophores 802 ; similary, hydrogen bond donor detector 320 can detect hydrogen bond donor pharmacophores.
- the pharmacophore detectors shown in FIG. 3 are only some examples of pharmacophore detectors; other types of pharmacophore detectors besides those shown in FIG. 3 can also be used, e.g., a user can define a pharmacophore as input 222 .
- the pharmacophore matcher module 204 can also be programmed to identify one or more pharmacophore matches 816 between the pharmacophore model 806 of template ligand 704 and the pharmacophore model 808 of the target ligand 706 , using pharmacophore match detector 306 .
- Pharmacophore match detector 306 can use any number of algorithms to detect common pharmacophores. Matches (common pharmacophores and/or superimpositions) between the pharmacophore model 806 of template ligand 704 and the pharmacophore model 808 of the target ligand 706 can be communicated to the pharmacophore overlapper 602 of the docking module 208 .
- the target ligand 706 that is analyzed by the pharmacophore matcher module 204 can be selected from a plurality of ligand candidates stored in a target ligand database 214 , where the target ligand database can be stored in memory 202 and coupled to at least one processor 216 .
- Selection of the target ligand 706 from target ligand database 214 can comprise comparing a pharmacophore model 806 of the template ligand 704 to a pharmacophore model of each respective one of the plurality of ligand candidates in the target ligand database 214 and choosing a ligand candidate based on the RMSD of the superimposition of the pharmacophore model of the ligand candidate and the template ligand 704 (lower RMSD would indicate a better ligand candidate).
- the pharmacophore matcher module 204 can be used to create pharmacophore models for each ligand candidate in the target ligand database 214 , and pharmacophore match detector 306 can be used to perceive common pharmacophores and create superimposition options.
- the memory 202 can also store a docking module 208 that is coupled to at least one processor 216 .
- the docking module 208 can be programmed to predict a docked ligand position of the target ligand 706 in the template ligand-biomolecule structure 224 by overlapping the pharmacophore model 808 of the target ligand 706 with the pharmacophore model 806 of the template ligand 704 while the template ligand 704 is in the binding site 702 of the biomolecule 700 (step 102 in FIG. 1 ), using the pharmacophore overlapper 602 .
- the docking module 208 can also be programmed to predict a re-docked ligand position of the target ligand 706 in the altered biomolecule 701 (e.g., after step 104 of the method in FIG. 1 is performed to yield an altered biomolecule 701 reflecting induced fit conformational changes), using interaction optimizer 604 .
- interaction optimizer 604 can predict a re-docked position of target ligand 706 by optimizing interactions between the target ligand 706 and the active site 702 of altered biomolecule 701 (e.g., optimizing hydrogen bonding interactions, salt-bridges, hydrophobic interactions, etc.). It will be understood that interaction optimizer 604 is one example of how non-pharmacophore-based docking can be accomplished—other modules in addition to interaction optimizer 604 can also be incorporated into docking module 208 , each module having a different docking technique.
- the memory 202 can also store a biomolecule modification module 206 that is coupled to at least one processor 216 .
- the biomolecule modification module 206 can be programmed to achieve an induced fit effect by modifying the atomic coordinates of the biomolecule 700 to reduce clashes 710 between the docked target ligand 706 and the biomolecule 700 , thereby creating an altered ligand-biomolecule structure 230 having an altered biomolecule 701 and a docked target ligand 706 .
- Biomolecule modification module 206 can include a clash identifier 500 that can identify energetically unfavorable interactions between biomolecule 700 and target ligand 706 ; the regions of the biomolecule 700 that have energetically unfavorable interactions (e.g., clash 710 ) are the regions of the biomolecule 700 that are most likely to undergo conformational changes due to the induced fit effect.
- a clash identifier 500 that can identify energetically unfavorable interactions between biomolecule 700 and target ligand 706 ; the regions of the biomolecule 700 that have energetically unfavorable interactions (e.g., clash 710 ) are the regions of the biomolecule 700 that are most likely to undergo conformational changes due to the induced fit effect.
- the biomolecule modification module 206 can also include various modules that are capable of resolving energetically unfavorable interactions (e.g., clash 710 ).
- minimizer 404 can alleviate clashes 710 by performing energetic minimization using classical molecular mechanics forcefields to move the specific atoms in biomolecule 700 that clash with target ligand 706 (thereby creating an altered biomolecule 701 ).
- biomolecule modification module 206 can include conformation explorer 502 , which can use Monte Carlo conformational searches to explore non-clashing positions of the side-chains of biomolecule 700 (e.g., rotamer optimization).
- biomolecule modification module 206 can include molecular dynamics module 504 that can typically be used after minimizer 404 has been used; molecular dynamics module 504 can use a typical molecular mechanics forcefield to simulate the biomolecule 700 with the docked target ligand 706 in the binding site 702 , thereby exploring the conformational space of biomolecule 700 when target ligand 706 is docked in its active site 702 .
- Molecular dynamics module 706 can include various sampling techniques besides simple simulation, e.g., the replica exchange technique.
- biomolecule modification module 206 can include mutator 506 that can resolve clashes 710 between target ligand 706 and specific sidechains of biomolecule 700 by computationally mutating the clashing sidechains, e.g., by truncating the clashing sidechains of biomolecule 700 to alanine (alanine is a smaller amino acid that is less likely to sterically clash with a target ligand 706 ), thereby yielding an altered biomolecule 701 .
- mutator 506 can resolve clashes 710 between target ligand 706 and specific sidechains of biomolecule 700 by computationally mutating the clashing sidechains, e.g., by truncating the clashing sidechains of biomolecule 700 to alanine (alanine is a smaller amino acid that is less likely to sterically clash with a target ligand 706 ), thereby yielding an altered biomolecule 701 .
- the modules shown in FIG. 5 are only some of the options for achieving an induced fit effect using biomolecule modification module 206 ; other modules not shown in FIG. 5 may also be included in biomolecule modification module 206 .
- One or all of the above-mentioned modules can be used to resolve clashes 710 and ultimately achieve an induced fit effect.
- mutator 506 may be first used, then minimizer 404 , and finally molecular dynamics module 504 .
- conformation explorer 502 may be first used, then minimizer 404 , and finally molecular dynamics module 504 .
- Mutator 506 can be used at various steps in the process, e.g., mutator 506 can be used to mutate a clashing residue to a smaller residue (e.g., alanine) during modification step 104 , and mutator 506 can also be used to restore a mutated residue (e.g., alanine) to its original residue after performing modification step 104 but before performing the ranking step 106 or before repeating step 104 (after such restoration, preparation module 210 can be used to minimize and/or sample the complex 230 ).
- the output of the biomolecule modification module 206 can be one or more predicted structures for target ligand-biomolecule complex 230 , where the target ligand-biomolecule complex 230 comprises the target ligand 706 and the altered biomolecule 701 .
- the memory 202 can also store a ranking module 212 that is coupled to at least one processor 216 .
- the ranking module 212 can be programmed to receive the structure of each target ligand-biomolecule complex 230 from the biomolecule modification module 206 , and rank each target ligand-biomolecule structure 230 (comprising the altered biomolecule 701 and target ligand 706 ) using a scoring function.
- the ranking module 212 can be useful in instances where (i) the target ligand 706 has more than one structural conformation and the method shown in FIG. 1 is performed on each structural conformation, and/or (ii) more than one pharmacophore model is created for the target ligand 706 or the template ligand 704 , etc.
- the prediction system 200 represents only one embodiment of a computer prediction system within the scope of this disclosure; other embodiments may include more or less input 222 , more or less output 228 , and more or less modules and components within the software and hardware of the prediction system.
- FIG. 2 shows individual separate modules, any of the shown modules could in fact be a sub-module of any of the other shown modules.
- the molecular dynamics module 504 could be part of or coupled to the preparation module 210 .
- the minimizer 404 can be part of or coupled to the molecule dynamics module 504 .
- the preparation module 210 could be a sub-module of the biomolecule modification module 206 , and vice-versa.
- the induced fit docking calculations can be used to evaluate compounds in drug discovery.
- the computational approaches described above can be used as a virtual filter for screening compounds for their suitability as a candidate for new pharmaceutical applications.
- FIG. 11 an exemplary drug design protocol 1101 that incorporates these computational approaches is illustrated as a flow chart.
- the process begins by identifying one or more target ligands 706 for bonding to a biomolecular target 700 (step 910 ).
- the biomolecular target 700 is a protein, nucleic acid, or some other biological macromolecule involved in a particular metabolic or signaling pathway associated with a specific disease condition or pathology or to the infectivity or survival of a microbial pathogen.
- the target ligands 706 are selected small molecules that are complementary to a binding site of the target.
- target ligands 706 can be molecules that are expected to serve as: receptor agonists, antagonists, inverse agonists, or modulators; enzyme activators or inhibitors; or ion channel openers or blockers. In some studies, a large number of target ligands 706 (e.g., hundreds or thousands) are identified.
- prediction system 200 can be used to predict target ligand-biomolecule complex structures 230 using generally the techniques described above, e.g., inter alia, using pharmacophore matcher 204 and docking module 208 (step 920 ).
- the prediction calculated described above may be performed across a computer network.
- the calculations may be performed using one or more servers that a researcher accesses via a network, such as the internet.
- the predicted target ligand-biomolecule complex structures 230 are then screened (step 930 ), e.g. using ranking module 212 to provide a ranked list 232 , in order to identify candidates for chemical analysis, which involves first synthesizing the target ligands 706 (step 940 ) and then assaying the synthesized target ligands 706 (steps 950 and 960 ). Screening molecules can be performed as described above in step 108 , e.g. by using a scoring function.
- Synthesis typically includes several steps including choosing a reaction pathway to make the compound, carrying out the reaction or reactions using suitable apparatus, separating the reaction product from the reaction mixture, and purifying the reaction product.
- step 950 multiple different assays can be performed on each target ligand 706 .
- primary assays can be performed from on all synthesized target ligands 706 (step 960 ).
- the primary assays can be high throughput assays that provide a further screen for the target ligands 706 rather that performing every necessary assay on every target ligand 706 selected from the computational screening step.
- Secondary assays are performed on those molecules that demonstrate favorable results from the primary assays.
- Secondary assays can include both in vitro or in vivo assays to assess, e.g., selectivity and/or liability. Both the primary and secondary assays can provide information useful for identifying additional target ligands 706 for further computational screening.
- Target ligands 706 with favorable results from the secondary assays can be identified as suitable candidates for further preclinical evaluation (step 970 ).
- Embodiments of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.
- Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of, a data processing apparatus.
- the program instructions can be encoded on an artificially-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to a suitable receiver apparatus for execution by a data processing apparatus.
- the computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
- data processing apparatus refers to data processing hardware and encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers.
- the apparatus can also be or further include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
- the apparatus can optionally include, in addition to hardware, code that creates an execution environment for computer programs, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
- a computer program which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code, can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
- a computer program may, but need not, correspond to a file in a file system.
- a program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub-programs, or portions of code.
- a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
- the processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output.
- the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
- special purpose logic circuitry e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
- Computers suitable for the execution of a computer program include, by way of example, general or special purpose microprocessors or both, or any other kind of central processing unit.
- a central processing unit will receive instructions and data from a read-only memory or a random access memory or both.
- the essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data.
- a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
- mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
- a computer need not have such devices.
- a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few.
- PDA personal digital assistant
- GPS Global Positioning System
- USB universal serial bus
- Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
- semiconductor memory devices e.g., EPROM, EEPROM, and flash memory devices
- magnetic disks e.g., internal hard disks or removable disks
- magneto-optical disks e.g., CD-ROM and DVD-ROM disks.
- the processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
- a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) or LED (light emitting diode) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
- a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) or LED (light emitting diode) monitor
- a keyboard and a pointing device e.g., a mouse or a trackball
- Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
- a computer can interact with a user by sending documents to and receiving
- Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components.
- the components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (LAN) and a wide area network (WAN), e.g., the Internet.
- LAN local area network
- WAN wide area network
- the computing system can include clients and servers.
- a client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
- a server transmits data, e.g., an HTML page, to a user device, e.g., for purposes of displaying data to and receiving user input from a user interacting with the user device, which acts as a client.
- Data generated at the user device e.g., a result of the user interaction, can be received from the user device at the server.
- FIG. 12 shows a schematic diagram of a generic computer system 1200 .
- the system 1200 can be used for the operations described in association with any of the computer-implemented methods described previously, according to one implementation.
- the system 1200 includes a processor 1210 , a memory 1120 , a storage device 1230 , and an input/output device 1240 .
- Each of the components 1210 , 1120 , 1230 , and 1240 are interconnected using a system bus 1250 .
- the processor 1210 is capable of processing instructions for execution within the system 1200 .
- the processor 1210 is a single-threaded processor.
- the processor 1210 is a multi-threaded processor.
- the processor 1210 is capable of processing instructions stored in the memory 1120 or on the storage device 1230 to display graphical information for a user interface on the input/output device 1240 .
- the memory 1120 stores information within the system 1200 .
- the memory 1120 is a computer-readable medium.
- the memory 1120 is a volatile memory unit.
- the memory 1120 is a non-volatile memory unit.
- the storage device 1230 is capable of providing mass storage for the system 1200 .
- the storage device 1230 is a computer-readable medium.
- the storage device 1230 may be a floppy disk device, a hard disk device, an optical disk device, or a tape device.
- the input/output device 1240 provides input/output operations for the system 1200 .
- the input/output device 1240 includes a keyboard and/or pointing device.
- the input/output device 1240 includes a display unit for displaying graphical user interfaces.
Landscapes
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Crystallography & Structural Chemistry (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Medicinal Chemistry (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computing Systems (AREA)
- Pharmacology & Pharmacy (AREA)
- Organic Chemistry (AREA)
- Biophysics (AREA)
- Library & Information Science (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- General Chemical & Material Sciences (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Biochemistry (AREA)
- Molecular Biology (AREA)
- Immunology (AREA)
- Investigating Or Analysing Biological Materials (AREA)
- Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
Abstract
A system, device, and method for predicting a docked position of a target ligand in a binding site of a biomolecule is disclosed. The prediction makes use of a template ligand-biomolecule complex structure in order to predict a target ligand-biomolecule complex structure. The system and device contain modules allowing for the prediction of a target-ligand biomolecule complex structure. A preparation module can receive information identifying a target ligand and a template ligand-biomolecule structure. A pharmacophore matcher module can identify common pharmacophores between the template ligand and the target ligand. A docking module can predict a docked ligand position of the target ligand by overlapping the pharmacophore models of the target ligand and template ligand while the template ligand is in the binding site of the biomolecule. A biomolecule modification module can modify the biomolecule to reduce clashes between the docked target ligand and the biomolecule.
Description
- This application is a continuation application of and claims the benefit of priority to U.S. application Ser. No. 16/757,267, filed on Apr. 17, 2020, which is a National Phase Application under 35 U.S.C. § 371 of International Application No. PCT/US2018/056494, filed on Oct. 18, 2018, which claims priority to U.S. Provisional Application No. 62/574,364, filed on Oct. 19, 2017.
- This application relates generally to using a computer to assist in predicting a docked position of a target ligand in a binding site of a biomolecule, and relates more specifically to using a computer to assist in predicting a docked position of a target ligand in a binding site of a biomolecule that is capable of undergoing an induced fit.
- Biomolecules often serve particular functions and the ability to modulate the functionality of a biomolecule can be useful for treating diseases and for engineering industrial biomolecular applications. The functionality of a biomolecule is sometimes modulated by whether and how one or more ligands are bound to the biomolecule. Biomolecules often have regions (e.g., an “active site”) where one or more ligands can bind to the biomolecule and thereby modulate the functionality of the biomolecule. For example, competitive antagonists are compounds that can bind to an active site in a biomolecule, thereby inhibiting the natural ligand from binding. Competitive antagonists prevent a biomolecule from performing its biological function, since the biological function requires the natural ligand to be bound in the active site. Similarly, non-competitive antagonists also prevent a biomolecule from performing its biological function, but do so by binding to the biomolecule and changing the biomolecule in some way (such as by changing its three-dimensional conformational ensemble) so that the biomolecule can no longer perform its biological function (e.g., changing the biomolecule's conformation such that it can no longer accommodate the binding of the natural ligand). In contrast to antagonists, an agonist can bind to a biomolecule and activate a particular function of the biomolecule (rather than inhibit the function).
- When a ligand binds to a biomolecule, it is useful to know the three-dimensional structure of the ligand-biomolecule complex (the structure of both the ligand and the biomolecule when the ligand is bound to the biomolecule). The three-dimensional structure can provide information about which interactions between the ligand and the biomolecule are important for binding, thereby informing rational drug design. The three-dimensional structure can also be used to calculate the free energy of binding. Unfortunately, it is sometimes difficult to predict the three-dimensional structure of a ligand-biomolecule complex, especially when the biomolecule undergoes an induced fit effect.
- One aspect features a method for predicting a docked position of a target ligand in a binding site of a biomolecule. The method involves receiving a template ligand-biomolecule structure that has a template ligand docked in the binding site of the biomolecule and comparing a pharmacophore model of the template ligand to a pharmacophore model of the target ligand. The pharmacophore model of the target ligand is overlapped with the pharmacophore model of the template ligand while the template ligand is in the binding site of the biomolecule. A docked position is predicted for the target ligand in the binding site of the biomolecule based on a position of the pharmacophore model of the target ligand when overlapped with the pharmacophore model of the template ligand.
- Another aspect features a computer system that has at least one processor, a preparation module, a pharmacophore matcher module, and a docking module. The preparation module is stored in memory and coupled to at least one processor, and is programmed to receive information identifying a target ligand and a template ligand-biomolecule structure comprising a template ligand and a biomolecule. The pharmacophore matcher module is stored in memory and coupled to at least one processor, and is programmed to identify a pharmacophore match between the template ligand and the target ligand by comparing the pharmacophore model of the template ligand to the pharmacophore model of the target ligand. The docking module is stored in memory and coupled to at least one processor, and is programmed to predict a docked ligand position of the target ligand in the template ligand-biomolecule structure by overlapping the pharmacophore model of the target ligand with the pharmacophore model of the template ligand while the template ligand is in the binding site of the biomolecule.
- Another aspect features a non-transitory computer readable storage medium having a computer readable program that when executed on a computer causes the computer to predict a docked position of a target ligand in a binding site of a biomolecule. Making the prediction as to the docked position of the target ligand in the binding site of the biomolecule involves performing various steps. One step involves receiving information identifying the target ligand and a template ligand-biomolecule structure, using a preparation module stored in memory and coupled to at least one processor. The template ligand-biomolecule structure has a template ligand docked in the binding site of the biomolecule. Another step involves identifying a pharmacophore match between the template ligand and the target ligand, using a pharmacophore matcher module stored in memory and coupled to at least one processor. The process of identifying the pharmacophore match involves comparing a pharmacophore model of the template ligand to a pharmacophore model of the target ligand. Another step involves predicting a docked ligand position of the target ligand, using a docking module stored in memory and coupled to at least one processor. The docking module predicts the docked position of the target ligand in the binding site of the biomolecule based on a position of the pharmacophore model of the target ligand when overlapped with the pharmacophore model of the template ligand while the template ligand is in the binding site of the biomolecule.
- In some implementations, the target ligand is selected from a plurality of ligand candidates, each of the ligand candidates being different from the template ligand. Selecting the target ligand involves comparing the pharmacophore model of the template ligand to a pharmacophore model of each respective one of the plurality of ligand candidates.
- In some implementations, a plurality of template ligand-biomolecule structures is received, each template ligand-biomolecule structure having a different template ligand docked in the binding site of the biomolecule. The pharmacophore model of the template ligand is generated by combining information from each of the template ligands from the plurality of template ligand-biomolecule structures.
- In some implementations, the target ligand has more than one structural conformation in its unbound state, and the docked position of the target ligand in the binding site of the biomolecule is predicted by enumerating a set of potential target ligand conformations and overlapping a respective pharmacophore model of the target ligand for each of the potential target ligand conformations with the pharmacophore model of the template ligand while the template ligand is in the binding site of the biomolecule.
- In some implementations, predicting the docked position of the target ligand in the binding site of the biomolecule involves ignoring at least one clash between the target ligand conformation's atomic coordinates and the biomolecule's atomic coordinates. In some instances of these implementations, for each target ligand conformation, the atomic coordinates of the biomolecule are modified to reduce clashes between the docked target ligand conformation's atomic coordinates and the biomolecule's atomic coordinates, thereby creating an altered ligand-biomolecule structure comprising the docked target ligand and an altered biomolecule.
- In some implementations, a re-docked position of each target ligand conformation is predicted by predicting each target ligand conformation's position in the binding site of the altered biomolecule. For each target ligand conformation, the atomic coordinates of the altered biomolecule are modified to reduce clashes between the atomic coordinates of the target ligand conformation's re-docked position and the atomic coordinates of the altered biomolecule, thereby creating a re-altered ligand-biomolecule structure comprising a re- docked target ligand and a re-altered biomolecule.
- In some implementations, each altered and re-altered ligand-biomolecule structure is ranked using a scoring function. In some instances of these implementations, a subset of high-ranking target ligands corresponding to target ligands having a threshold value for an empirical activity is identified.
-
FIG. 1 is a block/flow diagram showing a method of predicting a docked position of a target ligand in a binding site of a biomolecule. -
FIG. 2 is a block diagram showing a prediction system for predicting a docked position of a target ligand in a binding site of a biomolecule. -
FIG. 3 is a block/flow diagram showing one component of the prediction system shown inFIG. 2 (the pharmacophore matcher module). -
FIG. 4 is a block diagram showing one component of the prediction system shown inFIG. 2 (the preparation module). -
FIG. 5 is a block diagram showing one component of the prediction system shown inFIG. 2 (the biomolecule modification module). -
FIG. 6 is a block diagram showing one component of the prediction system shown inFIG. 2 (the docking module). -
FIG. 7A is a cartoon diagram illustrating the process of a ligand binding to a biomolecule. -
FIG. 7B is a cartoon diagram illustrating the process of induced fit binding for both a template ligand and a target ligand. -
FIG. 8A illustrates a pharmacophore model for a template ligand and a target ligand. -
FIG. 8B illustrates an overlap between the pharmacophore model of the template ligand and the target ligand. -
FIG. 9 illustrates an example of how multiple pharmacophore models can be created for a single ligand. -
FIG. 10 illustrates an overlap between the template ligand and the target ligand illustrated inFIG. 9B while the template ligand is in the active site of a biomolecule. -
FIG. 11 is a flow chart illustrating steps in an exemplary drug design method that includes induced fit docking computations. -
FIG. 12 is a diagram of a computer system. - Frequently, scientists and engineers are aware of the structure of a
template ligand 704 that binds to a biomolecule 700 (i.e., the structure of a template ligand-biomolecule complex 224), but either know or suspect that adifferent target ligand 706 also binds to the same biomolecule 700 (seeFIG. 7B ). In general, scientists and engineers may be interested in thetarget ligand 706 because it may (i) have higher binding affinity than thetemplate ligand 704, (ii) be more commercially viable than thetemplate ligand 704, (iii) be metabolized in a safer way than thetemplate ligand 704, (iv) not be covered by the same intellectual property rights as thetemplate ligand 704, etc. In such situations, scientists and engineers would like to know the three-dimensional structure of thetarget ligand 706 when bound tobiomolecule 700 because the three-dimensional structure can provide information about which interactions between thetarget ligand 706 and thebiomolecule 700 are important for binding (thereby informing rational drug design). Additionally, the three-dimensional structure can also be used to calculate the free energy of binding oftarget ligand 706. Computers can help reduce the cost and time involved in obtaining a three-dimensional structure; sometimes, computers are the only viable option because empirical techniques (e.g., x-ray crystallography and NMR) are sometimes unsuccessful at determining a three-dimensional structure, especially when the biomolecule has flexible/floppy regions. - As described herein, the three-dimensional structure of a
template ligand 704 bound to abiomolecule 700 can be used to predict the three-dimensional structure of atarget ligand 706 bound to the same (or similar)biomolecule 700. Unfortunately, when a ligand binds to a particular biomolecule, the biomolecule does not always keep its original three-dimensional conformation. As shown inFIG. 7A , there are generally two different modes of ligand binding: (i) the “lock and key”mode 712, and (ii) the “induced fit”mode 716. When a ligand's shape and properties complement a biomolecule's shape and physical properties, binding can occur through the “lock and key”mode 712 and the biomolecule may not need to undergo significant conformation changes. However, when a ligand's shape and properties do not complement a biomolecule's shape or physical properties, then binding will occur through the “induced fit”mode 716 and thebiomolecule 700 will change its conformation into an alteredbiomolecule 701 in order to avoid clashes (e.g., clash 710). Consequently, the conformation ofbiomolecule 700 when bound totemplate ligand 704 may not accurately represent the conformation ofbiomolecule 700 when bound to targetligand 706, due to conformational changes associated with the induced fit effect. - Among other advantages, the prediction system and methods disclosed herein describe how to predict conformational changes that result from the induced fit effect. In particular, the system and methods describe how computational methods can be used to predict the three-dimensional structure of a target ligand-biomolecule complex 230 (comprising
target ligand 706 bound tobiomolecule 701, wherebiomolecule 701 isbiomolecule 700 after undergoing conformational changes), given a template ligand-biomolecule structure 224 (comprisingtemplate ligand 704 and biomolecule 700). In some implementations, more than onetarget ligand 706 is analyzed, and each one is ranked based on a scoring function. The top-rankingtarget ligands 706 can be chemically synthesized for empirical testing. Another advantage is that in some implementations, the structure of the biomolecule in the predicted ligand-biomolecule complex 230 can be used as a modified biomolecule in rigid-receptor docking and other drug discovery techniques. -
FIG. 1 shows a block/flow diagram illustratively depicting one embodiment of a method for predicting a docked position of atarget ligand 706 in a binding site of abiomolecule 700, whereblocks 100 through 110 (outlined in bold) represent steps of the method. Theprediction system 200 shown inFIG. 2 can implement steps of the method shown inFIG. 1 . - Before performing the
first step 100 of the method shown inFIG. 1 , the prediction system 200 (seeFIG. 2 ) receivesinput 222 from a user or in an automated fashion (e.g., automatically downloading theinput 222 from a server). Referring toFIG. 2 , theinput 222 includes at least one three-dimensional atomic structure of the template ligand-biomolecule complex 224 and also includes information identifying at least onetarget ligand 706. The template ligand-biomolecule complex 224 includes abiomolecule 700 and atemplate ligand 704 that is bound to thebiomolecule 700. Thetemplate ligand 704 can be bound to binding site 702 (e.g., an active site or allosteric site) of thebiomolecule 700. The at least one template ligand-biomolecule structure 224 can be obtained empirically (e.g., using NMR or x-ray crystallography) or computationally (e.g., using a biomolecule structure prediction system, such as CHARMM, AMBER, or GROMACS). The template ligand-biomolecule complex 224 can be an incomplete structure—e.g., some empirical techniques are incapable of resolving the myriad three-dimensional structures adopted by floppy/flexible regions of a biomolecule. In these situations, the unresolved regions of the incomplete template ligand-biomolecule complex 224 can be resolved using themolecule dynamics module 504 of theprediction system 200, or using any other biomolecular structure prediction module or system. The ligand-biomolecule complex 224 can also be incomplete for other reasons, e.g., because a contiguous set of atomic coordinates may be undesirable or not needed, such as in the case where distant atoms not significantly involved in the complexation may be ignored to save computational resources, or in the case where regions of the template ligand make contacts with the biomolecule and such contacts are unlikely to be shared by the target ligand. Theprediction system 200 can also receive other input, such as information about physical conditions 226 (e.g., pH, temperature, and salt concentration). - The
target ligand 706 is sometimes provided asinput 222 by a user. For example, a user may know that a particular ligand (different from the template ligand 704) binds more strongly tobiomolecule 700 than thetemplate ligand 704 or has better ADME properties than thetemplate ligand 704. In such a case, the known ligand can be thetarget ligand 706 that is provided asinput 222 by a user seeking to know the three-dimensional structure of thetarget ligand 706 when bound to abiomolecule 700. Alternatively, thetarget ligand 706 can be selected from a plurality of ligand candidates stored in atarget ligand database 214. - Referring to
FIGS. 1-2 , thefirst step 100 of the method shown inFIG. 1 involves comparing at least one pharmacophore model of thetemplate ligand 704 with at least one pharmacophore model of thetarget ligand 706.Pharmacophore generator 300 can be used to identify pharmacophores of different types (e.g., aromatic type, hydrophobic type, etc.). A pharmacophore model comprises one or more pharmacophores and can include information about the relative location of the pharmacophores and the directionality of the pharmacophores (when applicable). - The pharmacophore models used in
step 100 can either be generated by the prediction system 200 (e.g., using pharmacophore generator 300) or provided asinput 222 to theprediction system 200. The pharmacophore models used instep 100 need not be generated from the same source (e.g., the pharmacophore model of thetarget ligand 706 can be provided asinput 222, while the pharmacophore model of thetemplate ligand 704 can be generated by the prediction system 200). -
FIG. 8 illustrates example pharmacophore models for aspecific template ligand 704 and aspecific target ligand 706. As shown inFIG. 8A , thetemplate ligand 704 has nine distinct pharmacophores, comprising three types:aromatic groups 804 represented by orange rings, hydrogen-bond acceptors 802 represented by red spheres, andhydrophobic groups 800 represented by green spheres. Together, all nine pharmacophores, or a subset thereof, can make up thepharmacophore model 806 fortemplate ligand 704. Similarly, thetarget ligand 706 also has nine distinct pharmacophores, comprising the same three types. Together, all nine pharmacophores, or a subset thereof, can make up thepharmacophore model 808 oftarget ligand 706. Thetemplate ligand 704 andtarget ligand 706 may, but need not, have the same number of pharmacophores. The pharmacophore generator 300 (seeFIG. 3 ) can be used to generate pharmacophores like those inFIG. 8 . For example, thepharmacophore generator 300 can have anaromatic detector 310 to detectaromatic groups 804, ahydrophobe detector 312 to detecthydrophobic groups 800, and a hydrogen-bond acceptor detector 318 to detecthydrogen bond acceptors 802. A pharmacophore model can comprise more than one instance of a pharmacophore type, e.g., pharmacophore type 800 (hydrophobic groups represented by green spheres inFIGS. 8A-9 ) has threepharmacophore instances 810 intarget ligand 706, all of which could form part of a pharmacophore model of thetarget ligand 706. - If not provided as
input 222, pharmacophore models like those shown inFIG. 8 can be generated bypharmacophore generator 300 using a number of different techniques. Each pharmacophore type (e.g.,aromatic groups 804, hydrogen-bond acceptors 802, and hydrophobic groups 800) within a pharmacophore model can be identified using pre- determined criteria. For example, instances of a hydrogenbond acceptor type 802 can be identified by searching for any surface-accessible atom that has one or more donatable lone electron pairs. Similarly, instances of a hydrogen bond donor type (detected by hydrogen bond donor detector 320) can be identified by searching for donatable hydrogen atoms. As another example, instances for ahydrophobic group type 800 can be identified by searching for rings, isopropyl groups, t-butyl groups, various halogenated moieties, and chains as long as four carbons (using this scheme for identifying hydrophobic group instances, chains of more than four carbons can be divided up into smaller fragments having between two to four carbons). - Once every instance of a pharmacophore type is identified (e.g.,
instances 810 of the hydrophobic group type 800) in a molecule,pharmacophore generator 300 can be used to create a more detailed pharmacophore model by characterizing each of the pharmacophore instances based on their location within the molecule and their directionality (if applicable). There are various methods for identifying the location of a particular instance of a pharmacophore type. As one example, the location of an instance of ahydrophobic group type 800 can be defined as the weighted average of the positions of the non-hydrogen atoms in the identified instance. As another example, the location of negative and positive ionizable sites (identified using negativeionizable detector 316 and positiveionizable detector 314, respectively) can be defined as a single point located on a formally charged atom, or at the centroid of a group of atoms over which the ionic charge is shared. As yet another example, the location of an instance of anaromatic type 804 can be defined as the centroid of the aromatic ring. - Various methods also exist for identifying the directionality of particular instances of pharmacophore types. Whether a pharmacophore type has directionality can be a pre-determined setting of
pharmacophore generator 300. For example, thehydrophobic group type 800 can be deemed to have no directionality component because hydrophobic interactions are frequently directionless, while the hydrogen bond donor/acceptor types (e.g., hydrogen-bond acceptors 802) can be deemed to have directionality because an interaction between this type and abiomolecule 700 frequently requires directional polar interactions along the hydrogen bond axis. Directionality of a type can be represented as a vector, as symbolized by thearrows 812 associated with the hydrogen-bond acceptor type 802 inFIG. 8B . As another example of how directionality can be associated with a particular pharmacophore type, the directionality of thearomatic group type 804 can be defined as a two-headed vector normal to the plane of the aromatic ring (to correctly describe ring-stacking interactions). - Referring to
FIG. 9 , more than one pharmacophore model can be generated for any particular molecule. For example, the two snapshots shown inFIG. 9 (snapshot 900 and snapshot 902) illustrate the same fused-ring molecule, but with different pharmacophore models. The difference between the pharmacophore model shown insnapshot 900 and the pharmacophore model shown insnapshot 902 is that insnapshot 900, the 5-membered ring is represented as anaromatic pharmacophore type 804, while insnapshot 902 the 5-membered ring is represented as having a hydrogen bondacceptor pharmacophore type 802. Both pharmacophore models (model 904 forsnapshot 900, andmodel 906 for snapshot 902) are acceptable models. Another situation when more than one pharmacophore model can be generated for any particular molecule is the case where a molecule exists in multiple different three-dimensional conformation, e.g., when thetarget ligand 706 has a cyclohexane ring-structure that can exist in either a chair conformation or a boat conformation. When thetarget ligand 706 has more than one structural conformation in its unbound state, apharmacophore model 808 can be created for each conformation of thetarget ligand 706, and the method shown inFIG. 1 can be performed on each conformation of thetarget ligand 706. - A pharmacophore model can be based on pharmacophores perceived in more than just one molecule. For example, more than one template ligand-
biomolecule structure 224 can be received asinput 222. When more than one template ligand-biomolecule structure 224 is received, each of thestructures 224 can have adifferent template ligand 704 docked in thebinding site 702 of thebiomolecule 700. In such cases, step 100 can involve generating apharmacophore model 806 of thetemplate ligands 704 by combining information from each of therespective template ligands 704 from the plurality of template ligand-biomolecule structures 224. Pharmacophores common to each of therespective template ligands 704 can be used to create a combined pharmacophore model. Additionally, more than onepharmacophore model 806 can be generated from the plurality oftemplate ligands 704. In such cases, if the template ligand-biomolecule structures 224 have known binding affinities of the associatedtemplate ligands 704, then the binding affinities can be provided asinput 222 and pharmacophore models oftemplate ligands 704 can be given greater weight in the pharmacophore model if they belong to atemplate ligand 704 with higher binding affinity. - Once at least one
pharmacophore model 806 of thetemplate ligand 704 and at least onepharmacophore model 808 of thetarget ligand 706 has been generated by pharmacophore generator 300 (or received as input 222),step 100 ofFIG. 1 next involves comparing the at least onepharmacophore model 806 of thetemplate ligand 704 with the at least onepharmacophore model 808 of thetarget ligand 706. The objective of the comparison is to identify pharmacophore types common to both thepharmacophore model 806 of thetemplate ligand 704 and thepharmacophore model 808 of thetarget ligand 706. Thepharmacophore match detector 306 can be used to identify common pharmacophores between thetemplate ligand 704 and target ligand 706 (e.g.,FIG. 8B shows apharmacophore match 816 where thearomatic group type 804 is found in both thetemplate ligand 704 and the target ligand 706). - Various techniques can be used for comparing pharmacophore models, with the underlying goal being the identification of pharmacophores common to both molecules being compared (e.g., common to both
template ligand 704 and target ligand 706), and especially the identification of pharmacophores with similar topological arrangements and directionality. In general, the pharmacophore types common to both thetemplate ligand 704 and thetarget ligand 706 can be superimposed. More than one superimposed option may be possible (e.g., when more than oneinstance 810 of a particular pharmacophore type is present in thetemplate ligand 704 or thetarget ligand 706 or both), in which case various techniques can be used to rank the superimposition options. For example, the RMSD between the superimposed common pharmacophores can be calculated—superimposition options with lower RMSD can be more highly ranked, and the highest-ranking superimposition option (e.g.,superimposition option 814 shown inFIG. 8B ) can be chosen first for the implementation of steps 102-110 inFIG. 1 . The output ofstep 100 can be at least one superimposition of the pharmacophore model oftarget ligand 706 and the pharmacophore model of template ligand 704 (e.g., superimposition 814). - When a
target ligand 706 and/or atemplate ligand 704 has more than one potential pharmacophore model, each pharmacophore model of thetemplate target ligand 704 is compared (step 100) to each pharmacophore model of thetarget ligand 706. Such a comparison can be done serially or in parallel using thepharmacophore match detector 306. - The next step shown in
FIG. 1 isstep 102, which involves docking thetarget ligand 706 into a binding site of biomolecule 700 (e.g., into theactive site 702 of the biomolecule 700). Step 102 can be accomplished usingdocking module 208. Docking thetarget ligand 706 into theactive site 702 involves overlapping thepharmacophore model 808 of thetarget ligand 706 with thepharmacophore model 806 of thetemplate ligand 704 while thetemplate ligand 704 is in thebinding site 702 of thebiomolecule 700. Such an overlap can be achieved by selecting the highest-ranking superimposition option (e.g., superimposition option 814) resulting from the comparison instep 100. The highest-ranking superimposition option (e.g., superimposition option 814) can then be overlapped/superimposed in theactive site 702 of thebiomolecule 700, as shown inFIG. 10 . Other lower-ranking superimposition options can also be docked, either serially or in parallel to the highest-ranking option. - Step 102 may result in energetically unfavorable interactions (“clashes”) between the atoms in the
target ligand 706 and thebiomolecule 700. Clashes (e.g., clash 710 shown inFIG. 7A ) indicate which portions of thebiomolecule 700 are likely to undergo an induced fit effect. Importantly, in the methods disclosed here, some or all of such clashes can be ignored duringstep 102. While it is acceptable to ignore all clashes in some implementations, in other implementations some clashes may be deemed too severe to ignore. Whether a clash is deemed too severe to ignore can be determined by analyzing pre-set criteria (e.g., default criteria ofdocking module 208, or criteria provided as user input 222). For example, in some implementations, a clash between an atom oftarget ligand 706 and a backbone atom of biomolecule 700 (as opposed to a side-chain atom of biomolecule 700) may be deemed too severe to ignore. If a clash is deemed too severe to ignore in the pre-set criteria, then the method shown inFIG. 1 can either be terminated atstep 102 for the particular superimposition option being analyzed, or theprediction system 200 can output a message to the user indicating that the particular superimposition option being analyzed may result in highly unfavorable interactions requiring major modifications of thebiomolecule 700. - The next step shown in
FIG. 1 isstep 104, which involves modifying thebiomolecule 700 in response to the presence of the target ligand 706 (e.g., in response to clashes between thetarget ligand 706 and the biomolecule 700). Step 104 models the “induced fit” effect.Biomolecule modification module 206 can be used to accomplishstep 104. When performingstep 104, the atoms of thetemplate ligand 704 can be deleted or ignored (i.e., treated as “dummy” atoms). There are many techniques by which biomolecule 700 can undergo conformational modification (i.e., the movement of the atomic coordinates of the biomolecule 700) in response to the presence oftarget ligand 706. For example, clashes 710 can be resolved usingminimizer 404 to perform molecular mechanics minimization of the clashing atoms in thebiomolecule 700 while restraining the atoms of the target ligand 706 (e.g., using a harmonic restraint). For better sampling of conformational space, molecular mechanics minimization can be followed by molecular dynamics simulation usingmolecular dynamics module 504. As another example, clashes 710 can be resolved by Monte Carlo conformational searches to explore non-clashing positions of the side-chains of biomolecule 700 (e.g., rotamer optimization) usingconformation explorer 502. - Other modifications besides conformational modifications are also possible. For example, if
biomolecule 700 is a protein, then clashes 710 that are betweentarget ligand 706 and specific sidechains ofbiomolecule 700 may be resolved by computationally mutating the clashing sidechains, e.g., by truncating the clashing sidechains ofbiomolecule 700 to alanine (alanine is a relatively small amino acid that is less likely to sterically clash with a target ligand 706). The clashing sidechains ofbiomolecule 700 can also be computationally mutated to residues larger than alanine but smaller than the clashing residues inbiomolecule 700, e.g., a leucine could be mutated to a valine, a tyrosine or tryptophan could be mutated to phenylalanine, a glutamine could be mutated to asparagine, a glutamic acid could be mutated to an aspartic acid, etc. - One or all of the above-mentioned techniques can be used to resolve
clashes 710 and ultimately achieve an induced fit effect. By modifying thebiomolecule 700, an alteredbiomolecule 701 is created that has a different three-dimensional structure (and possibly a different chemical make-up) than thebiomolecule 700. The output ofstep 104 is the predicted structure of the target ligand-biomolecule complex 230, which comprisestarget ligand 706 and alteredbiomolecule 701. - The next step shown in
FIG. 1 isstep 106, which involves ranking the target ligand-biomolecule complexes 230 that are output fromstep 104. Each complex 230 output fromstep 104 comprises atarget ligand 706 and alteredbiomolecule 701. Thecomplexes 230 can be ranked according to any number of scoring functions, which can be used to calculate the affinity between thetarget ligand 706 and alteredbiomolecule 701. Scoring functions can generally be force-field-based (using classical molecular mechanics energy functions), knowledge-based (using a potential created from statistical probability distributions of interatomic distances in known ligand-biomolecule complexes), and/or empirical-based (i.e., weighting structural moieties based on experimental binding affinities from a training set of known biomolecule-ligand complexes). - When some predicted target ligand-
biomolecule complexes 230 are resolved by mutationalmodification using mutator 506, but others are resolved by only conformational modification (e.g., using only minimizer 404), allcomplexes 230 can be ranked together using a scoring function that is a function of interactions between thetarget ligand 706 and alteredbiomolecule 701. Such mutated sidechains can be restored to the original sidechain (by usingmutator 506 and thenpreparation module 210 for minimization and/or sampling) after themodification step 104 of the process shown inFIG. 1 . The mutated residues can be restored to the original sidechain either before or after theranking step 106. All complexes can be scored together in rankingstep 106 under the assumption that mutating non-interacting residues (i.e., those residues that do not form significant contacts with the biomolecule 700) will not affect scoring, but mutating interacting residues (e.g., residues forming a salt bridge withbiomolecule 700, residues involved in pi-stacking withbiomolecule 700, etc.) would negatively impact scoring since those interacting residues are presumably key for binding. - In some implementations, a subset of the top-ranking complexes listed in
step 108 ofFIG. 1 can be synthesized for empirical structural analysis (e.g., using x-ray crystallography or NMR, etc.) or empirical activity analysis (e.g., using calorimetry, electrophoresis, ELISA, fluorescence changes, etc.). The subset of top-ranking complexes listed instep 108 can be chosen using a pre-determined cut-off, e.g., the top 10%, which can be ultimately provided as a list of rankedcomplexes 232. The pre-determined cut-off could also represent a threshold value for an empirical activity, where the threshold value can be specified as user input 222 (e.g., activity in the nanomolar range or better). When using a threshold value for an empirical activity as the pre-determined cut-off, it is important thatstep 106 uses a scoring function that is capable of closely approximating the binding free energy ΔG of atarget ligand 706, in order to accurately derive a dissociation constant Kd (representing the activity) for eachtarget ligand 706. The dissociation constant associated with the binding of atarget ligand 706 can be calculated using the following equation: ΔG =−kTlnKd, where ΔG is the binding free energy, k is the Boltzmann constant, T is the temperature, and Kd is the dissociation constant. Based on the calculated dissociation constant Kd, a subset of top-ranking complexes listed instep 108 can be created (e.g., a subset having a predicted activity in the nanomolar range or better) and provided as a list of rankedcomplexes 232. - The
output 228 of the method shown inFIG. 1 includes the structure of each target ligand-biomolecule complex 230 (where the target ligand-biomolecule complex 230 comprises thetarget ligand 706 and the altered biomolecule 701), which can be used to create a list of ranked complexes 232 (step 108) and/or used for the visualization of ranked complexes (step 110). Whether a list of ranked complexes 232 (step 108) or a visualization of them (step 110) is produced (or both), the output can include information about atomic coordinates of each of the three-dimensional structures of the target ligand-biomolecule complex 230. Theoutput 228 may be visualized on one ormore displays 218 that are coupled to one or moregraphical user interfaces 220. For example, the three-dimensional structures of the ranked complexes can be shown ondisplay 218 and the three-dimensional structures can be manipulated and modified by a user viagraphical user interface 220. - In some implementations, steps 102-110 can be repeated. For example, step 102 can be performed on the list of ranked
complexes 108 in order to predict a re-docked position of each target ligand 706 (including all three-dimensional conformations of each target ligand 706) by predicting each target ligand's 706 position in thebinding site 702 of the alteredbiomolecule 701. Alternatively, step 102 can be performed on the predictedcomplexes 230 that were output from modification step 104 (without ranking those complexes 230). Instead of usingpharmacophore overlapper 602 to predict the target ligand's 706 re-docked position in alteredbiomolecule 701, re-docking can be done by optimizing interactions between thetarget ligand 706 and theactive site 702 of biomolecule 701 (e.g., optimizing hydrogen bonding interactions, salt-bridges, hydrophobic interactions, etc.), using theinteraction optimizer 604 ofdocking module 208. Given a re-docked position, steps 104-110 can be performed on there-docked target ligand 706 and altered biomolecule 701 (yielding the structure of atarget ligand 706 bound to a re-altered version of altered biomolecule 701). In cases where clashing residues were mutated duringstep 104, the original residues can be restored usingmutator 506, before repeatingstep 104. In some implementations, this re-docking procedure can lead to more accurate structural predictions of the target ligand-biomolecule complex 230. When steps 102-110 are repeated, step 106 (involving ranking of the predicted structure of each target ligand-biomolecule complex 230) can comprise ranking all target ligand-biomolecule complexes 230, including those that have an alteredbiomolecule 701 and those that have a re-altered biomolecule structure (where the re-altered biomolecule structure is the result of repeating steps 102-104 inFIG. 1 ), using a scoring function. - A number of embodiments of the claimed methods have been described. Nevertheless, it will be understood that various modifications can be made without departing from the spirit and scope of the claims. For example, greater or fewer steps can be performed than are shown in
FIG. 1 , and the steps ofFIG. 1 do not necessarily need to be performed in a particular order. For instance, the pharmacophore models generated instep 100 could first be visualized usingdisplay 218 andgraphical user interface 220 before actually being compared usingpharmacophore matcher 204. As another example, in cases of only one template ligand-biomolecule structure 224 and only one fairlyinflexible target ligand 706, the step of ranking complexes 230 (step 106) may not be performed. - Referring to
FIG. 2 , acomputer prediction system 200 can be used for predicting a target ligand-biomolecule structure 230 after receiving as input one or more template ligand-biomolecule complex structures 224 and one ormore target ligands 706. Theprediction system 200 can include one or more orprocessors 216 that are able to receive computer program instructions from a general purpose computer, special purpose computer, or any other programmable data processing apparatus. The one ormore processors 216 are responsible for executing the received computer program instructions, e.g., instructions provided by modules stored inmemory 202. Theoutput 228 may be visualized on one ormore displays 218 that are coupled to one or moregraphical user interfaces 220. For example, the three-dimensional structure of a predicted target ligand-biomolecule complex 230 can be shown ondisplay 218 and can also be manipulated and modified by a user viagraphical user interface 220. - The
prediction system 200 can have amemory 202 that stores information and/or instructions. Thememory 202 can store apreparation module 210 that is coupled to at least oneprocessor 216. Thepreparation module 220 can be programmed to receive physical parameters, e.g., pH, temperature, and salt concentration; such parameters can be used by thepreparation module 210 and can also ultimately be used by other modules, such asmolecular dynamics module 502. The physical parameters can be provided by a user asinput 222 to theprediction system 200. The physical parameters can inform when to make preliminary modification to the template ligand-biomolecule structure 224 and/or thetarget ligand 706, e.g., using the hydrogen completer 400 described below. - Referring to
FIG. 4 , thepreparation module 210 can be programmed to include ahydrogen completer 400. The hydrogen completer 410 can covalently add hydrogen atoms to appropriate locations of a template ligand-biomolecule structure 224 ortarget ligand 706, e.g., depending on the pH provided asuser input 222. Hydrogen atom addition is also sometimes performed because experimental techniques (e.g., NMR and x-ray crystallography) are sometimes incapable of resolving all hydrogen atoms in the template ligand-biomolecule structure 224. - The
preparation module 210 can also include a missing coordinate completer 402 which can be used to predict the unknown coordinates of certain atoms when the template ligand-biomolecule structure 224 is an incomplete structure, or when restoring previously mutated residues (e.g., aftermodification step 104 but before performing the ranking step 106) to their original residue. The template ligand-biomolecule structure 224 can be incomplete because some empirical techniques are incapable of resolving the myriad structures adopted by floppy/flexible regions of a biomolecule, and so theinput 222 of the template ligand-biomolecule complex 224 may be missing atomic coordinates for certain residues. In these situations, the unresolved regions of the incomplete structure can be resolved using the missing coordinate completer 402, which can communicate with other modules, e.g., themolecule dynamics module 504 of theprediction system 200, to predict the unknown atomic coordinates. - The
preparation module 210 can also include aminimizer 404 that is capable of performing energetic minimization using classical molecular mechanics forcefields. For example, theminimizer 404 can be used to energetically relax the template ligand-biomolecule structure 224 after using the hydrogen completer 410 and the missing coordinate completer 402. Theminimizer 404 can also be useful when performingstep 104 of the method shown inFIG. 1 , where theminimizer 404 can be used to partially or completely alleviate clashes 710. - The
preparation module 210 can also include aconformational sampling module 406. Theconformational sampling module 406 can be used to sample other viable three-dimensional conformations of the template ligand-biomolecule complex 224, besides the conformation provided asinput 222. Theconformational sampling module 406 can contain or be coupled tomolecular dynamics module 504,conformation explorer 502, and/or any other module capable of identifying alternative three-dimensional conformations of the template-ligand biomolecule complex 224. Such sampling can be especially useful when the template ligand-biomolecule structure 224 is known or suspected to be floppy/flexible but the experimental technique used to generate the template ligand-biomolecule structure 224 was only capable of resolving one or some of the myriad of potential structures. - The
memory 202 can also store apharmacophore matcher module 204 that is coupled to at least oneprocessor 216. Thepharmacophore matcher module 204 can be programmed to generate pharmacophores for atemplate ligand 704 and atarget ligand 706 usingpharmacophore generator 300.Pharmacophore generator 300 can includes various detectors that are capable of identifying pharmacophores in a molecule; the detectors can be either default detectors pre-set inprediction system 200 or can be supplied asinput 222 by a user. Anaromatic detector 310 can detect pharmacophores of thearomatic group type 804.Hydrophobe detector 312 can detect pharmacophores of thehydrophobic group type 800. Positiveionizable detector 314 can detect pharmacophore groups that can become positively ionized; similarly, negativeionizable detector 316 can detect pharmacophore groups that can become negatively charged. Hydrogenbond acceptor detector 318 can detect hydrogenbond acceptor pharmacophores 802; similary, hydrogenbond donor detector 320 can detect hydrogen bond donor pharmacophores. The pharmacophore detectors shown inFIG. 3 are only some examples of pharmacophore detectors; other types of pharmacophore detectors besides those shown inFIG. 3 can also be used, e.g., a user can define a pharmacophore asinput 222. - The
pharmacophore matcher module 204 can also be programmed to identify one or more pharmacophore matches 816 between thepharmacophore model 806 oftemplate ligand 704 and thepharmacophore model 808 of thetarget ligand 706, usingpharmacophore match detector 306.Pharmacophore match detector 306 can use any number of algorithms to detect common pharmacophores. Matches (common pharmacophores and/or superimpositions) between thepharmacophore model 806 oftemplate ligand 704 and thepharmacophore model 808 of thetarget ligand 706 can be communicated to thepharmacophore overlapper 602 of thedocking module 208. - The
target ligand 706 that is analyzed by thepharmacophore matcher module 204 can be selected from a plurality of ligand candidates stored in atarget ligand database 214, where the target ligand database can be stored inmemory 202 and coupled to at least oneprocessor 216. Selection of thetarget ligand 706 fromtarget ligand database 214 can comprise comparing apharmacophore model 806 of thetemplate ligand 704 to a pharmacophore model of each respective one of the plurality of ligand candidates in thetarget ligand database 214 and choosing a ligand candidate based on the RMSD of the superimposition of the pharmacophore model of the ligand candidate and the template ligand 704 (lower RMSD would indicate a better ligand candidate). Thepharmacophore matcher module 204 can be used to create pharmacophore models for each ligand candidate in thetarget ligand database 214, andpharmacophore match detector 306 can be used to perceive common pharmacophores and create superimposition options. - The
memory 202 can also store adocking module 208 that is coupled to at least oneprocessor 216. Thedocking module 208 can be programmed to predict a docked ligand position of thetarget ligand 706 in the template ligand-biomolecule structure 224 by overlapping thepharmacophore model 808 of thetarget ligand 706 with thepharmacophore model 806 of thetemplate ligand 704 while thetemplate ligand 704 is in thebinding site 702 of the biomolecule 700 (step 102 inFIG. 1 ), using thepharmacophore overlapper 602. - The
docking module 208 can also be programmed to predict a re-docked ligand position of thetarget ligand 706 in the altered biomolecule 701 (e.g., afterstep 104 of the method inFIG. 1 is performed to yield an alteredbiomolecule 701 reflecting induced fit conformational changes), usinginteraction optimizer 604. Instead of using pharmacophore overlap for docking,interaction optimizer 604 can predict a re-docked position oftarget ligand 706 by optimizing interactions between thetarget ligand 706 and theactive site 702 of altered biomolecule 701 (e.g., optimizing hydrogen bonding interactions, salt-bridges, hydrophobic interactions, etc.). It will be understood thatinteraction optimizer 604 is one example of how non-pharmacophore-based docking can be accomplished—other modules in addition tointeraction optimizer 604 can also be incorporated intodocking module 208, each module having a different docking technique. - The
memory 202 can also store abiomolecule modification module 206 that is coupled to at least oneprocessor 216. Thebiomolecule modification module 206 can be programmed to achieve an induced fit effect by modifying the atomic coordinates of thebiomolecule 700 to reduceclashes 710 between the dockedtarget ligand 706 and thebiomolecule 700, thereby creating an altered ligand-biomolecule structure 230 having an alteredbiomolecule 701 and a dockedtarget ligand 706.Biomolecule modification module 206 can include aclash identifier 500 that can identify energetically unfavorable interactions betweenbiomolecule 700 andtarget ligand 706; the regions of thebiomolecule 700 that have energetically unfavorable interactions (e.g., clash 710) are the regions of thebiomolecule 700 that are most likely to undergo conformational changes due to the induced fit effect. - The
biomolecule modification module 206 can also include various modules that are capable of resolving energetically unfavorable interactions (e.g., clash 710). For example,minimizer 404 can alleviateclashes 710 by performing energetic minimization using classical molecular mechanics forcefields to move the specific atoms inbiomolecule 700 that clash with target ligand 706 (thereby creating an altered biomolecule 701). As another example,biomolecule modification module 206 can includeconformation explorer 502, which can use Monte Carlo conformational searches to explore non-clashing positions of the side-chains of biomolecule 700 (e.g., rotamer optimization). As yet another example,biomolecule modification module 206 can includemolecular dynamics module 504 that can typically be used afterminimizer 404 has been used;molecular dynamics module 504 can use a typical molecular mechanics forcefield to simulate thebiomolecule 700 with the dockedtarget ligand 706 in thebinding site 702, thereby exploring the conformational space ofbiomolecule 700 whentarget ligand 706 is docked in itsactive site 702.Molecular dynamics module 706 can include various sampling techniques besides simple simulation, e.g., the replica exchange technique. As yet another example, ifbiomolecule 700 is a protein (or another biomolecule with sidechains),biomolecule modification module 206 can includemutator 506 that can resolveclashes 710 betweentarget ligand 706 and specific sidechains ofbiomolecule 700 by computationally mutating the clashing sidechains, e.g., by truncating the clashing sidechains ofbiomolecule 700 to alanine (alanine is a smaller amino acid that is less likely to sterically clash with a target ligand 706), thereby yielding an alteredbiomolecule 701. - The modules shown in
FIG. 5 are only some of the options for achieving an induced fit effect usingbiomolecule modification module 206; other modules not shown inFIG. 5 may also be included inbiomolecule modification module 206. One or all of the above-mentioned modules can be used to resolveclashes 710 and ultimately achieve an induced fit effect. For example,mutator 506 may be first used, then minimizer 404, and finallymolecular dynamics module 504. As another example,conformation explorer 502 may be first used, then minimizer 404, and finallymolecular dynamics module 504.Mutator 506 can be used at various steps in the process, e.g.,mutator 506 can be used to mutate a clashing residue to a smaller residue (e.g., alanine) duringmodification step 104, andmutator 506 can also be used to restore a mutated residue (e.g., alanine) to its original residue after performingmodification step 104 but before performing theranking step 106 or before repeating step 104 (after such restoration,preparation module 210 can be used to minimize and/or sample the complex 230). Ultimately, the output of thebiomolecule modification module 206 can be one or more predicted structures for target ligand-biomolecule complex 230, where the target ligand-biomolecule complex 230 comprises thetarget ligand 706 and the alteredbiomolecule 701. - The
memory 202 can also store aranking module 212 that is coupled to at least oneprocessor 216. Theranking module 212 can be programmed to receive the structure of each target ligand-biomolecule complex 230 from thebiomolecule modification module 206, and rank each target ligand-biomolecule structure 230 (comprising the alteredbiomolecule 701 and target ligand 706) using a scoring function. Theranking module 212 can be useful in instances where (i) thetarget ligand 706 has more than one structural conformation and the method shown inFIG. 1 is performed on each structural conformation, and/or (ii) more than one pharmacophore model is created for thetarget ligand 706 or thetemplate ligand 704, etc. - The
prediction system 200 represents only one embodiment of a computer prediction system within the scope of this disclosure; other embodiments may include more orless input 222, more orless output 228, and more or less modules and components within the software and hardware of the prediction system. In addition, it will be understood that whileFIG. 2 shows individual separate modules, any of the shown modules could in fact be a sub-module of any of the other shown modules. For example, as previously described, themolecular dynamics module 504 could be part of or coupled to thepreparation module 210. Similarly, theminimizer 404 can be part of or coupled to themolecule dynamics module 504. As another example, thepreparation module 210 could be a sub-module of thebiomolecule modification module 206, and vice-versa. - In some embodiments, the induced fit docking calculations can be used to evaluate compounds in drug discovery. For example, the computational approaches described above can be used as a virtual filter for screening compounds for their suitability as a candidate for new pharmaceutical applications. Referring to
FIG. 11 , an exemplarydrug design protocol 1101 that incorporates these computational approaches is illustrated as a flow chart. Here, the process begins by identifying one ormore target ligands 706 for bonding to a biomolecular target 700 (step 910). Typically, thebiomolecular target 700 is a protein, nucleic acid, or some other biological macromolecule involved in a particular metabolic or signaling pathway associated with a specific disease condition or pathology or to the infectivity or survival of a microbial pathogen. In some cases, thetarget ligands 706 are selected small molecules that are complementary to a binding site of the target. Examples oftarget ligands 706 can be molecules that are expected to serve as: receptor agonists, antagonists, inverse agonists, or modulators; enzyme activators or inhibitors; or ion channel openers or blockers. In some studies, a large number of target ligands 706 (e.g., hundreds or thousands) are identified. - Once
target ligands 706 are identified,prediction system 200 can be used to predict target ligand-biomolecule complex structures 230 using generally the techniques described above, e.g., inter alia, usingpharmacophore matcher 204 and docking module 208 (step 920). Generally, the prediction calculated described above may be performed across a computer network. For example, the calculations may be performed using one or more servers that a researcher accesses via a network, such as the internet. - The predicted target ligand-
biomolecule complex structures 230 are then screened (step 930), e.g. usingranking module 212 to provide a rankedlist 232, in order to identify candidates for chemical analysis, which involves first synthesizing the target ligands 706 (step 940) and then assaying the synthesized target ligands 706 (steps 950 and 960). Screening molecules can be performed as described above instep 108, e.g. by using a scoring function. - Synthesis typically includes several steps including choosing a reaction pathway to make the compound, carrying out the reaction or reactions using suitable apparatus, separating the reaction product from the reaction mixture, and purifying the reaction product.
- Chemical composition and purity can be checked to ensure the correct compounds are assayed.
- Generally, multiple different assays can be performed on each
target ligand 706. For example, instep 950, primary assays can be performed from on all synthesized target ligands 706 (step 960). The primary assays can be high throughput assays that provide a further screen for thetarget ligands 706 rather that performing every necessary assay on everytarget ligand 706 selected from the computational screening step. Secondary assays (step 960) are performed on those molecules that demonstrate favorable results from the primary assays. Secondary assays can include both in vitro or in vivo assays to assess, e.g., selectivity and/or liability. Both the primary and secondary assays can provide information useful for identifyingadditional target ligands 706 for further computational screening. -
Target ligands 706 with favorable results from the secondary assays can be identified as suitable candidates for further preclinical evaluation (step 970). - Embodiments of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of, a data processing apparatus. Alternatively, or in addition, the program instructions can be encoded on an artificially-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to a suitable receiver apparatus for execution by a data processing apparatus. The computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
- The term “data processing apparatus” refers to data processing hardware and encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can also be or further include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). The apparatus can optionally include, in addition to hardware, code that creates an execution environment for computer programs, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
- A computer program, which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code, can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub-programs, or portions of code. A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
- The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
- Computers suitable for the execution of a computer program include, by way of example, general or special purpose microprocessors or both, or any other kind of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few.
- Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
- To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) or LED (light emitting diode) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user's device in response to requests received from the web browser.
- Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (LAN) and a wide area network (WAN), e.g., the Internet.
- The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. In some embodiments, a server transmits data, e.g., an HTML page, to a user device, e.g., for purposes of displaying data to and receiving user input from a user interacting with the user device, which acts as a client. Data generated at the user device, e.g., a result of the user interaction, can be received from the user device at the server.
- An example of one such type of computer is shown in
FIG. 12 , which shows a schematic diagram of a generic computer system 1200. The system 1200 can be used for the operations described in association with any of the computer-implemented methods described previously, according to one implementation. The system 1200 includes a processor 1210, a memory 1120, a storage device 1230, and an input/output device 1240. Each of the components 1210, 1120, 1230, and 1240 are interconnected using a system bus 1250. The processor 1210 is capable of processing instructions for execution within the system 1200. In one implementation, the processor 1210 is a single-threaded processor. In another implementation, the processor 1210 is a multi-threaded processor. The processor 1210 is capable of processing instructions stored in the memory 1120 or on the storage device 1230 to display graphical information for a user interface on the input/output device 1240. - The memory 1120 stores information within the system 1200. In one implementation, the memory 1120 is a computer-readable medium. In one implementation, the memory 1120 is a volatile memory unit. In another implementation, the memory 1120 is a non-volatile memory unit.
- The storage device 1230 is capable of providing mass storage for the system 1200. In one implementation, the storage device 1230 is a computer-readable medium. In various different implementations, the storage device 1230 may be a floppy disk device, a hard disk device, an optical disk device, or a tape device.
- The input/output device 1240 provides input/output operations for the system 1200. In one implementation, the input/output device 1240 includes a keyboard and/or pointing device. In another implementation, the input/output device 1240 includes a display unit for displaying graphical user interfaces.
- While this specification contains many specific implementation details, these should not be construed as limitations on the scope of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
- Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Claims (21)
1. Canceled
2. A rational drug design method, comprising:
identifying a plurality of candidate ligands for bonding to a biomolecular target, the target ligands being candidates for a drug associated with modifying a function of the biomolecular target;
predicting, using a computer system, a plurality of target ligand-biomolecule structures each comprising a corresponding candidate ligand of the plurality of candidate ligands and the biomolecular target with the corresponding candidate ligand being in a docked position in a binding site of the biomolecular target, each prediction comprising:
receiving, by the computer system, a template ligand-biomolecule structure, the template ligand-biomolecule structure comprising a template ligand docked in the binding site of the biomolecular target;
comparing, using the computer system, a pharmacophore model of the template ligand to a pharmacophore model of the corresponding candidate ligand;
overlapping, using the computer system, the pharmacophore model of the corresponding candidate ligand with the pharmacophore model of the template ligand while the template ligand is in the binding site of the biomolecular target; and
predicting the docked position of the corresponding candidate ligand in the binding site of the biomolecular target based on a position of the pharmacophore model of the corresponding candidate ligand when overlapped with the pharmacophore model of the template ligand; and
providing, using the computer system, a ranked list of the plurality of candidate ligands.
3. The method of claim 2 , further comprising receiving a plurality of template ligand-biomolecule structures, each template ligand-biomolecule structure having a different template ligand docked in the binding site of the biomolecule, and generating the pharmacophore model of the template ligand by combining information from each of the template ligands from the plurality of template ligand-biomolecule structures.
4. The method of claim 2 , wherein at least one of the plurality of candidate ligands has more than one structural conformation in its unbound state, and the docked position of the correspond candidate ligand in the binding site of the biomolecule is predicted by enumerating a set of potential candidate ligand conformations and overlapping a respective pharmacophore model of the candidate ligand for each of the potential candidate ligand conformations with the pharmacophore model of the template ligand while the template ligand is in the binding site of the biomolecule.
5. The method of claim 4 , wherein predicting the docked position of the corresponding candidate ligand in the binding site of the biomolecule comprises ignoring at least one clash between the corresponding candidate ligand conformations' atomic coordinates and the biomolecule's atomic coordinates.
6. The method of claim 5 , further comprising, for each candidate ligand conformation, modifying atomic coordinates of the biomolecule to reduce clashes between the docked candidate ligand conformations' atomic coordinates and the biomolecule's atomic coordinates, thereby creating an altered ligand-biomolecule structure comprising the docked candidate ligand and an altered biomolecule.
7. The method of claim 6 , further comprising, predicting a re-docked position of each candidate ligand conformation by predicting each candidate ligand conformation's position in the binding site of the altered biomolecule; and
for each candidate ligand conformation, modifying atomic coordinates of the altered biomolecule to reduce clashes between the atomic coordinates of the candidate ligand conformation's re-docked position and the atomic coordinates of the altered biomolecule, thereby creating a re-altered ligand-biomolecule structure comprising a re-docked candidate ligand and a re-altered biomolecule.
8. The method of claim 7 , wherein providing the ranked list comprises ranking each altered and re-altered ligand-biomolecule structure using a scoring function.
9. The method of claim 8 , wherein the providing the ranked list comprises identifying, using the computer system, a subset of high-ranking candidate ligands corresponding to candidate ligands having a threshold value for an empirical activity.
10. The method of claim 9 , wherein the ranked list of target ligands that includes the target ligand based on the predicted dock position and synthesizing one or more target ligands from the ranked list.
11. The method of claim 3 , further comprising selecting, based on the ranked list, one or more of the plurality candidate ligands for synthesis and assaying.
12. The method of claim 11 , further comprising synthesizing the one or more selected candidate ligands to provide one or more synthesized candidate ligands.
13. The method of claim 12 , further comprising performing at least one assay of the one or more synthesized candidate ligands.
14. The method of claim 13 , further comprising identifying a clinical candidate from the ranked list of candidate ligands based on the at least one assay.
15. A computer system, comprising:
at least one computer processor and a computer memory coupled to the at least one computer processor;
a preparation module, stored in the computer memory, wherein the preparation module is programmed to receive information identifying a plurality of candidate ligands and a template ligand-biomolecule structure comprising a template ligand and a biomolecule;
a pharmacophore matcher module, stored in the computer memory, wherein the pharmacophore matcher module is programmed to identify a pharmacophore match between the template ligand and each of the plurality of candidate ligands by comparing the pharmacophore model of the template ligand to the pharmacophore model of a corresponding candidate ligand of the plurality of candidate ligands; and
a docking module, stored in computer memory, wherein the docking module is programmed to predict, for the corresponding candidate ligand, a docked ligand position of the corresponding candidate ligand in the template ligand-biomolecule structure by overlapping the pharmacophore model of the corresponding candidate ligand with the pharmacophore model of the template ligand while the template ligand is in the binding site of the biomolecule; and
a ranking module, stored in the computer memory, wherein the ranking module is programmed to rank each altered ligand-biomolecule structure using a scoring function and output the ranked list.
16. The computer system recited in claim 15 , wherein the docking module is programmed to ignore at least one clash between the corresponding candidate ligand's atomic coordinates and the biomolecule's atomic coordinates when predicting the docked ligand position.
17. The computer system recited in claim 15 , further comprising a biomolecule modification module, stored in the computer memory, wherein the biomolecule modification module is programmed to modify atomic coordinates of the biomolecule to reduce clashes between the docked ligand position's atomic coordinates and the biomolecule's atomic coordinates, thereby creating an altered ligand-biomolecule structure having an altered biomolecule and a docked candidate ligand.
18. The computer system recited in claim 17 , wherein at least one of the candidate ligands have more than one structural conformation, and wherein the preparation module is programmed to enumerate a plurality of potential candidate ligand structural conformations for the at least one candidate ligand, and each of the enumerated potential candidate ligand structural conformations is processed by the docking module and the biomolecule modification module.
19. A non-transitory computer readable storage medium comprising a computer readable program, wherein the computer readable program when executed on a computer causes the computer to rank a plurality of candidate ligands for selection for synthesis and assaying in a rational drug design method, the ranking being based on predicting a docked position for each of the plurality of candidate ligands in a binding site of a biomolecule, each prediction comprising causing the computer to perform the steps of:
receiving information identifying a corresponding ligand of the plurality candidate ligands and a template ligand-biomolecule structure, using a preparation module stored in computer memory and coupled to at least one computer processor, the template ligand-biomolecule structure comprising a template ligand docked in the binding site of the biomolecule;
identifying a pharmacophore match between the template ligand and the corresponding candidate ligand, using a pharmacophore matcher module stored in the computer memory and coupled to at the least one computer processor, wherein the identifying of the pharmacophore match further comprises comparing a pharmacophore model of the template ligand to a pharmacophore model of the corresponding candidate ligand; and
predicting a docked ligand position of the target ligand, using a docking module stored in the computer memory and coupled to the at least one computer processor, wherein the docking module predicts the docked position of the corresponding candidate ligand in the binding site of the biomolecule based on a position of the pharmacophore model of the corresponding candidate ligand when overlapped with the pharmacophore model of the template ligand while the template ligand is in the binding site of the biomolecule.
20. The computer readable storage medium as recited in claim 19 , wherein the plurality of candidate ligands are selected from a candidate ligand database, each of the plurality of candidate ligands being different from the template ligand, and wherein selecting the plurality of candidate ligands comprises comparing the pharmacophore model of the template ligand to a pharmacophore model of each respective one of the plurality of candidate ligands.
21. The computer readable storage medium as recited in claim 19 , wherein the step of predicting an initial docked position comprises ignoring at least one clash between the corresponding candidate ligand's atomic coordinates and the biomolecule's atomic coordinates.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/132,936 US20230245729A1 (en) | 2017-10-19 | 2023-04-10 | Accounting for induced fit effects |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201762574364P | 2017-10-19 | 2017-10-19 | |
PCT/US2018/056494 WO2019079585A1 (en) | 2017-10-19 | 2018-10-18 | Accounting for induced fit effects |
US202016757267A | 2020-04-17 | 2020-04-17 | |
US18/132,936 US20230245729A1 (en) | 2017-10-19 | 2023-04-10 | Accounting for induced fit effects |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/757,267 Continuation US11651840B2 (en) | 2017-10-19 | 2018-10-18 | Accounting for induced fit effects |
PCT/US2018/056494 Continuation WO2019079585A1 (en) | 2017-10-19 | 2018-10-18 | Accounting for induced fit effects |
Publications (1)
Publication Number | Publication Date |
---|---|
US20230245729A1 true US20230245729A1 (en) | 2023-08-03 |
Family
ID=66173483
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/757,267 Active 2039-11-02 US11651840B2 (en) | 2017-10-19 | 2018-10-18 | Accounting for induced fit effects |
US18/132,936 Pending US20230245729A1 (en) | 2017-10-19 | 2023-04-10 | Accounting for induced fit effects |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/757,267 Active 2039-11-02 US11651840B2 (en) | 2017-10-19 | 2018-10-18 | Accounting for induced fit effects |
Country Status (4)
Country | Link |
---|---|
US (2) | US11651840B2 (en) |
EP (1) | EP3697947A4 (en) |
JP (1) | JP7260535B2 (en) |
WO (1) | WO2019079585A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112786122B (en) * | 2021-01-21 | 2023-12-29 | 北京晶泰科技有限公司 | Molecular screening method and computing equipment |
CN114882940B (en) * | 2022-03-28 | 2022-11-08 | 北京玻色量子科技有限公司 | Molecular docking method and device based on coherent Icin machine |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020025535A1 (en) | 2000-06-15 | 2002-02-28 | Diller David J. | Prioritization of combinatorial library screening |
WO2004075021A2 (en) | 2003-02-14 | 2004-09-02 | Vertex Pharmaceuticals, Inc. | Molecular modeling methods |
JP2005018447A (en) | 2003-06-26 | 2005-01-20 | Ryoka Systems Inc | Method for searcing acceptor-ligand stable complex structure |
WO2006081658A1 (en) | 2005-02-01 | 2006-08-10 | The University Of British Columbia | In silico screening for shbg binding ligands via pharmacophore models |
WO2007090084A2 (en) | 2006-01-30 | 2007-08-09 | Schrodinger, Llc. | Determining pharmacophore features from known target ligands |
JP5093110B2 (en) | 2006-09-21 | 2012-12-05 | アステラス製薬株式会社 | Ligand search method |
US20100250217A1 (en) | 2007-08-31 | 2010-09-30 | University Of Florida Research Foundation ,Inc. | Docking Pose Selection Optimization via NMR Chemical Shift Perturbation Analysis |
EP2427769A4 (en) | 2009-05-04 | 2016-08-03 | Univ Maryland | Method for binding site identification by molecular dynamics simulation (silcs: site identification by ligand competitive saturation) |
US8175860B2 (en) | 2009-05-27 | 2012-05-08 | National Tsing Hua University | Method of inhibiting the growth of Helicobacter pylori |
WO2012028962A2 (en) * | 2010-09-01 | 2012-03-08 | Bioquanta Sa | Pharmacophore toxicity screening |
US20120252687A1 (en) | 2011-04-04 | 2012-10-04 | Friesner Richard A | Scoring Function Penalizing Compounds Which Desolvate Charged Protein Side Chains Structure |
US9914954B1 (en) * | 2014-03-18 | 2018-03-13 | Naviga Llc | Method for measurement of bioavailable testosterone |
-
2018
- 2018-10-18 WO PCT/US2018/056494 patent/WO2019079585A1/en unknown
- 2018-10-18 JP JP2020521864A patent/JP7260535B2/en active Active
- 2018-10-18 EP EP18868238.9A patent/EP3697947A4/en active Pending
- 2018-10-18 US US16/757,267 patent/US11651840B2/en active Active
-
2023
- 2023-04-10 US US18/132,936 patent/US20230245729A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
US20210193273A1 (en) | 2021-06-24 |
EP3697947A4 (en) | 2021-01-13 |
US11651840B2 (en) | 2023-05-16 |
WO2019079585A1 (en) | 2019-04-25 |
EP3697947A1 (en) | 2020-08-26 |
JP2021500661A (en) | 2021-01-07 |
JP7260535B2 (en) | 2023-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Axelrod et al. | GEOM, energy-annotated molecular conformations for property prediction and molecular generation | |
Tran-Nguyen et al. | LIT-PCBA: an unbiased data set for machine learning and virtual screening | |
Smith et al. | Prediction of protein–protein interactions by docking methods | |
US20230245729A1 (en) | Accounting for induced fit effects | |
Krasowski et al. | DrugPred: a structure-based approach to predict protein druggability developed using an extensive nonredundant data set | |
Berry et al. | Practical considerations in virtual screening and molecular docking | |
Pierce et al. | A combination of rescoring and refinement significantly improves protein docking performance | |
Baxter et al. | New approach to molecular docking and its application to virtual screening of chemical databases | |
Nantasenamat et al. | Maximizing computational tools for successful drug discovery | |
US20230317214A1 (en) | Methods for predicting an active set of compounds having alternative cores, and drug discovery methods involving the same | |
Mizutani et al. | Effective handling of induced‐fit motion in flexible docking | |
Voet et al. | Combining in silico and in cerebro approaches for virtual screening and pose prediction in SAMPL4 | |
Hogues et al. | ProPOSE: Direct exhaustive protein–protein docking with side chain flexibility | |
Kynast et al. | Evaluation of the coarse-grained OPEP force field for protein-protein docking | |
Krumrine et al. | Principles and methods of docking and ligand design | |
Rognan | Proteome-scale docking: myth and reality | |
Vilseck et al. | Overcoming challenging substituent perturbations with multisite λ-dynamics: a case study targeting β-secretase 1 | |
Li et al. | Annotating mutational effects on proteins and protein interactions: designing novel and revisiting existing protocols | |
Xia et al. | Integrated molecular modeling and machine learning for drug design | |
Maheshwari et al. | Across-proteome modeling of dimer structures for the bottom-up assembly of protein-protein interaction networks | |
Hall et al. | Computational solvent mapping in structure-based drug design | |
Rifai et al. | Binding free energy predictions of farnesoid X receptor (FXR) agonists using a linear interaction energy (LIE) approach with reliability estimation: application to the D3R Grand Challenge 2 | |
Ghemtio et al. | Recent trends and applications in 3D virtual screening | |
Maheshwari et al. | Predicted binding site information improves model ranking in protein docking using experimental and computer-generated target structures | |
Laeeq et al. | An overview of the computer aided drug designing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: SCHROEDINGER, INC., NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MILLER, EDWARD BLAKE;REEL/FRAME:067557/0111 Effective date: 20171108 |