US20070078605A1 - Molecular docking technique for screening of combinatorial libraries - Google Patents

Molecular docking technique for screening of combinatorial libraries Download PDF

Info

Publication number
US20070078605A1
US20070078605A1 US11/408,901 US40890106A US2007078605A1 US 20070078605 A1 US20070078605 A1 US 20070078605A1 US 40890106 A US40890106 A US 40890106A US 2007078605 A1 US2007078605 A1 US 2007078605A1
Authority
US
United States
Prior art keywords
ligand
conformations
protein
binding site
conformation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/408,901
Inventor
David Diller
Kenneth Merz
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US11/408,901 priority Critical patent/US20070078605A1/en
Publication of US20070078605A1 publication Critical patent/US20070078605A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K1/00General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length
    • C07K1/04General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length on carriers
    • C07K1/047Simultaneous synthesis of different peptide species; Peptide libraries
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/60In silico combinatorial chemistry
    • G16C20/62Design of libraries
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/60In silico combinatorial chemistry
    • G16C20/64Screening of libraries

Definitions

  • the present invention relates in general to screening combinatorial libraries by identification of binding ligands and ultimately pharmaceutical compounds, and more particularly, to a high throughput molecular docking technique for screening of combinatorial libraries.
  • molecular docking can be a useful tool for prioritizing screening efforts (reference: Charifson, P. S., ed. Practical Application of Computer - aided Drug Design 1997, Marcel Dekker: New York. 551; Knegtel, R. M. A. and M. Wagener, “Efficacy and Selectivity in Flexible Database Docking,” PROTEINS: Structure, Function and Genetics, 1999, Vol. 37, p. 334-335; and Debnath, A. K., L. Radigan, and S.
  • the ultimate goal of this invention is to use molecular docking as a way to prioritize combinatorial library screening efforts, i.e., rather than ranking individual compounds, combinatorial libraries of compounds are ranked.
  • Compounds synthesized through combinatorial methods are often quite flexible when compared to typical databases of compounds used for molecular docking studies.
  • it should be able to handle fairly flexible compounds (as many as 10-20 rotatable bonds), and it should be extremely fast (on the order of one million compounds a week).
  • a method of docking a ligand to a protein includes: performing a pre-docking conformational search to generate multiple solution conformations of the ligand; generating a binding site image of the protein, the binding site image comprising multiple hot spots; matching hot spots of the binding site image to atoms in at least one solution conformation of the multiple solution conformations of the ligand to obtain at least one ligand position relative to the protein; and optimizing the at least one ligand position while allowing translation, orientation and rotatable bonds of the ligand to vary, and while holding the protein itself fixed.
  • a system for docking a ligand to a protein includes means for performing a pre-docking conformational search to generate multiple solution conformations of the ligand.
  • the system includes means for generating a binding site image of the protein, with the binding site image comprising multiple hot spots; and means for matching hot spots of the binding site image to atoms in at least one solution conformation of the multiple solution conformations of the ligand to obtain at least one ligand position relative to the protein.
  • An optimization mechanism is also provided for optimizing the at least one ligand position while allowing translation, orientation and rotatable bonds of the ligand to vary, and while holding the protein fixed.
  • the invention comprises at least one program storage device readable by a machine, tangibly embodying at least one program of instructions executable by the machine to perform a method of docking a ligand to a protein.
  • the method includes: performing a pre-docking conformational search to generate multiple solution conformations of the ligand; generating a binding site image of the protein, the binding site image comprising multiple hot spots; matching hot spots of the binding site image to atoms in at least one solution conformation of the multiple solution conformations of the ligand to obtain at least one ligand position relative to the protein; and optimizing the at least one ligand position while allowing translation, orientation and rotatable bonds of the ligand to vary, and while holding the protein fixed.
  • the docking method presented herein has several advantages. First, it is built from several independent pieces. This allows one to better take advantage of scientific breakthroughs. For example, when a better conformational search procedure (in the present context this means more biologically relevant conformers) becomes available, it can be used to replace the current conformational search procedure by generating new 3-D databases. Second, this approach to ligand flexibility is better suited for the class of compounds synthesized through combinatorial methods. Compounds from combinatorial libraries frequently do not have a clear anchor fragment. Because finding and docking an anchor fragment from the ligand are key steps in the incremental construction algorithms, these algorithms may encounter difficulties with compounds commonly found in combinatorial libraries.
  • FIGS, 1 A- 1 C conceptually depict protein-ligand complex formation
  • FIG. 2 is a flowchart of one embodiment of a molecular docking approach in accordance with the principles of the present invention
  • FIG. 3 is a flowchart of one embodiment of a molecular conformational search procedure which can be employed by the docking approach of FIG. 2 , in accordance with the principles of the present invention
  • FIG. 4 is a flowchart of one embodiment of establishing a binding site image for use with the molecular docking approach of FIG. 2 , in accordance with the principles of the present invention
  • FIG. 5 is a flowchart of one embodiment of a matching procedure for use with the molecular docking approach of FIG. 2 , in accordance with the principles of the present invention
  • FIG. 6 is a flowchart of one embodiment of an optimization stage for optimizing ligand positions within identified matches for use with the molecular docking approach of FIG. 2 , in accordance with the principles of the present invention
  • FIG. 7 graphically depicts a hydrogen bonding potential and a steric potential for use in atom pairwise scoring in accordance with the principles of the present invention.
  • FIG. 8 depicts one embodiment of a computer environment providing and/or using the capabilities of the present invention.
  • the docking procedure discussed below is based on a conceptual picture of protein-ligand complex formation (see FIGS. 1A-1C ).
  • the ligand (L) adopts many conformations in solution.
  • the protein (P) recognizes one or several of these conformations.
  • the ligand, protein and solvent follow the local energy landscape to form the final complex.
  • the recognition stage is modeled by matching atoms of the ligand to interaction of “hot spots” in the binding site.
  • the final complex formation is modeled using a gradient based optimization technique with a simple energy function. During this final stage, the translation, orientation, and rotatable bonds of the ligand are allowed to vary, while the protein and solvent are held fixed.
  • the stochastic methods while often providing more accurate results, are typically too slow to search large databases.
  • the method presented herein falls into the combinatorial group. This approach is analogous to FlexX and HammerHead in that it attempts to match interactions between the ligand and receptor. It differs from these and most other docking techniques significantly in how it handles the flexibility of the ligand. More current combinatorial docking techniques handle flexibility using an incremental construction approach, whereas the technique described herein uses an initial conformational search followed by a gradient based minimization in the presence of the target protein.
  • FIG. 2 A generalized technique of one embodiment of the present invention is depicted in FIG. 2 .
  • a conformational search procedure 210 is performed for an entire library or collection, with the resulting conformations stored for future use.
  • a binding site image is then created using the protein structure 220 .
  • a matching procedure is performed to form an initial complex by initially positioning a given conformation of a ligand as a rigid body into the binding site 230 .
  • a flexible optimization is performed wherein the matches are pruned and then optimized to attain the final result 240 .
  • a conformational search is performed once for an entire library or a collection, with the resulting conformations stored for future use. If desired, the conformational searching can be periodically repeated.
  • uniformly distributed random conformations are generated allowing only rotatable bonds to vary 310 .
  • 1,000 uniformly distributed random conformations can be generated varying only the rotatable bonds.
  • the internal energy of each conformation is then minimized, again allowing only rotatable bonds to vary 320 .
  • Internal energy can be estimated, for example, using van der Waals potentials and dihedral angle term, reference: Diller, D. J. and C. L. M. J. Verlinde “A Critical Evaluation of Several Global Optimization Algorithms for the Purpose of Molecular Docking,” Journal of Computational Chemistry, 1999, Vol. 20(16), p. 1740-1751, which is hereby incorporated herein by reference in its entirety.
  • Each conformation can be minimized using, for example, a BFGS (Broyden-Fletcher-Goldfarb-Shanno) optimization algorithm, e.g., reference Press, W. H., et al., Numerical Recipes in C, 2 ed., 1997, Cambridge: Cambridge University Press. 994, which is hereby incorporated herein by reference in its entirety.
  • BFGS Broyden-Fletcher-Goldfarb-Shanno
  • Conformations with internal energy over a selected cut-off above a conformation with the lowest internal energy are eliminated 330. For example, any conformation with an internal energy of 15 kcal/mol above the conformation with the lowest internal energy is eliminated.
  • the remaining conformations are scored and ranked 340 .
  • any conformation within an rms deviation of 1.0 A of a higher ranked (i.e., better) conformation can be removed.
  • This clustering is a means to remove redundant conformations.
  • a maximum number of desired conformations, for example, 50 conformations, are retained at the end of the conformational analysis step 360 .
  • the lowest ranked conformations can be removed until the desired number of conformations remain.
  • the process of a small molecule binding to a protein target is a balance between “salvation” by water versus “solvation” by the protein.
  • the solvent accessible surface area term can be chosen in analogy with simple aqueous solvation models, e.g., reference Eisenberg, D. and A. D. McLachlan, “Solvation Energy in Protein Folding and Binding,” Nature, 1986, Vol. 319, p. 199-203; Ooi, T., et al., Accessible Surface Areas as a Measure of the Thermodynamic Parameters of Hydration of Peptides,” Proceedings of the National Academy of Sciences, 1987, Vol. 84, p.
  • the binding site image comprises a list of apolar hot spots, i.e., points in the binding site that are favorable for an apolar atom to bind, and a list of polar hot spots, i.e., points in the binding site that are favorable for a hydrogen bond donor or acceptor to bind.
  • a grid is placed around the binding site 410 .
  • the grid may be at least 20 A ⁇ 20 A ⁇ 20 A with at least 5 A of extra space in each direction.
  • a 0.2 A spacing can be used for the grid.
  • a “hot spot search volume” is determined 420 . This is accomplished by eliminating any grid point inside the protein. Any point contained in, for example, a 6.0 A or larger sphere not touching the protein can also be eliminated. The largest remaining connected piece becomes the “hot spot search volume.”
  • the hot spots can then be determined using a grid-like search of the hot spot search volume 430 .
  • a grid-like search is described in Goodford, P. J., “A computational Procedure for Determining Energetically Favorable Binding Sites on Biologically Important Macromolecules,” Journal of Medicinal Chemistry, 1985, Vol. 28(7), p. 849-857, which is hereby incorporated herein by reference in its entirety.
  • To find the apolar hot spots an apolar probe is placed at each grid point in the hot spot search volume, the probe score is calculated and stored. The process is repeated for polar hot spots.
  • the grid points are clustered and a desired number of best clustered grid points is maintained 440 . For example, the top 30 clustered grid points may be retained.
  • the atoms of the ligand are matched to the appropriate hot spots 510 . More precisely, in one example, a triplet of atoms, A 1 , A 2 , A 3 is considered a match to a triplet of hot spots, H 1 , H 2 , H 3 , if:
  • a match occurs, in one example, when three hot spots forming a triangle and three atoms of the ligand forming a triangle substantially match. That is, a match occurs when the triangles are sufficiently similar with the vertices of each triangle being the same type and the corresponding edges of similar length.
  • the matching algorithm finds all matches between atoms of a given conformation and the hot spots. Each match then determines a unique rigid body transformation. The rigid body transformation is then used to bring the conformation into the binding site to form the initial protein-ligand complex.
  • FIG. 6 depicts one each strategy.
  • a predetermined percentage e.g. 10%
  • the remaining matches are ranked using an atom pairwise score described below, with an atom score cutoff of for example 1.0 620 .
  • Use of a cutoff allows matches that fit reasonably well with a few steric clashes to survive to the final round, and the choice of 1.0 is merely exemplary.
  • the matches are clustered, and the top N matches are selected to move into the final stage 630 , where N may comprise, for instance, a number in the range of 25-100.
  • the score can be modeled after the Piecewise Linear Potential (see, Gehlhaar, D. K., et al., “Molecular Recognition of the Inhibitor AG-1343 by HIV-1 Protease: Conformationally Flexible Docking by Evolutionary Programming,” Chemistry & Biology, 1995, Vol. 2, p. 317-324, which is hereby incorporated herein by reference in its entirety) with a difference being that the score used herein is preferably differentiable. For this score, all hydrogens are ignored, and all non-hydrogen atoms are classified into one of four categories:
  • R rim is the position of the score minimum
  • is the depth of the minimum
  • is a softening factor
  • ⁇ (r:r 1 ,r 0 ) is a differentiable cutoff function of r (the distance between potential, steric and hydrogen bonding, is assigned its own set of parameters.
  • the parameters for these potentials can be chosen by one skilled in the art via intuition and subsequent testing, but they do not need to be fully optimized.
  • Table 2 contains example parameters for the pairwise potentials. TABLE 2 Hydrogen Bonding Steric Potential Potential ⁇ 2.0 0.4 ⁇ 0.5 1.5 R min 3.0 ⁇ 4.05 ⁇ r 1 3.0 ⁇ 5.0 ⁇ r 0 4.0 ⁇ 6.0 ⁇
  • the softening factor, a makes the potentials significantly softer than the typical 12-6 van der Waals potentials (see FIG. 7 ), i.e., mild steric clashes common in docking runs are tolerated by this potential.
  • the softening factor implicitly models small induced fit effects of the protein which can be important (see, Murray, C. W., C. A. Baxter, and D. Frenkel, “The Sensitivity of The Results of Molecular Docking to Induced Fit Effects: Application to Thrombin, Thermolysin and Neuraminidase,” Journal of Computer-Aided Molecular Design, 1999, Vol.
  • the Gold test set was used (see Jones, G., et al., “Development and Validation of a Generic Algorithm for Flexible Docking,” Journal of Molecular Biology, 1997, Vol. 267, p. 727-748, which is hereby incorporated herein by reference in its entirety). Any covalently bound ligand or any ligand bound to a metal ion was removed because it cannot, at present, be modeled by the scoring function described herein. In addition, any “surface sugars” were removed as they are not typical of the problems encountered. This left a total of 103 cases (see Table 1 below). No further individual processing of the test cases was performed.
  • the rms deviation between the bound conformation (X-ray) and the closest computationally generated conformation increases with the number of rotatable bonds.
  • at least one conformation was generated by the conformational search with 1.5 A rms deviation of the bound conformation.
  • the most interesting aspect of the conformational search results is that for some of the more rigid ligands, the minimum rms deviation was large. For example, there are several ligands with fewer than five rotatable bonds, but with a minimum rns deviation near 1.0A. This occurs for two reasons. First, a clustering radius of 1.0 A in all cases was used. This prevented the conformational space of small ligands from being sufficiently sampled.
  • the match tolerance ranges from 0.5 A for the high quality to 0.25 A for the rapid searches. Note that the larger tolerance the more matches will be found. Thus, a larger tolerance means a more thorough search, while a smaller tolerance denotes a less thorough but faster search.
  • a maximum of 100 matches per ligand were optimized for 100 steps compared to 25 matches per ligand for 20 steps for the rapid searches.
  • the first problem is to generate at least one docked position between a given rms deviation cutoff.
  • terminology is adopted that a ligand that is docked to within X A of the crystallographically observed position of the ligand is referred to as an X A hit.
  • the rms deviations are shown for the high quality runs in Table 1.
  • 89 of the 103 cases produce at least one 2.0 A hit.
  • the numbers drop to 80 at 1.5 A, 63 at 1.0 A and 26 at 0.5 A.
  • 75 of the 103 cases produce a 2.0 A hit
  • 65 produce a 1.5 A hit
  • 42 produce a 1.0 A hit and 16 produce a 0.5 A hit.
  • PROTEINS Structure, Function, and Genetics, 1999, Vol. 34, p. 17-28; Rarey M., B. Kramer, and T. Lengauer, “Docking of Hydrophobic Ligands With Interaction-based Matching Algorithms,” Bioinformatics, 1999, Vol. 15(3), p. 243-250; and Kramer, B., M. Rarey, and T. Lengauer, “Evaluation of the FlexX Incremental Construction Algorithm for Protein-Ligand Docking,” PROTEINS: Structure, Function, and Genetics, 1999, Vol. 37, p. 228-241).
  • the second problem is to correctly rank the docked compounds, i.e., is the top ranked conformation reasonably close to the crystallographically observed position for the ligand. This is a significantly more difficult problem than the first.
  • the rms deviation between the top scoring docked position and the observed position or the high quality runs are given in Table 1. In this case, there is little difference between the two sets of parameters.
  • 48 of the 103 cases produce a 2.0 A hit as the top scoring docked position. This number drops to 41 at 1.5 A, 34 at 1.0 A and 10 at 0.5 A.
  • 45 of the 103 cases produce a 2.0 A hit as the top scoring docked position with 41 at 1.5 A, 34 at 1.0 A and 10 at 0.5 A.
  • the utility of the scoring function used in this study lies less as a tool to absolutely rank the docked conformations than as an initial filter to select only a few docked conformations.
  • the average CPU time (e.g., using a Silicon Graphics Incorporated (SGI) computer R12000) per test case is approximately 4.5 seconds. At this rate, screening one million compounds with one CPU would take about 50 days. For the rapid searches, the average CPU time per test case drops to approximately 1.1 seconds per test case. At this rate, screening one million compounds with one CPU would take about 12 days. Because database docking is a highly parallel job, multiple CPUs could easily cut this to a reasonable amount of time (for example, a day or so).
  • SGI Silicon Graphics Incorporated
  • the first case is the dipeptide Ile-Val from the PDB entry 3tpi (see, Marquart, M., et al., “The Geometry of the Reactive Site and of the Peptide Groups in Trypsin, Trypsinogen and Its Complexes With Inhibitors,” Acta Crystallographica, 1983, Vol. B39, p. 480, which is hereby incorporated herein by reference in its entirety).
  • This case has no clear anchor fragment and as a result, the incremental construction approach to docking might have difficulties with this ligand.
  • Our conformation search procedure produced a conformation within 0.42 A of the observed conformation. The rms deviation between the best scoring docked position and the observed position is 0.53 A.
  • the second example, with a ligand having 15 rotatable bonds, is a much more difficult example. It is an HIV protease inhibitor from the PDB entry lida (see, Tong, L., et al., “Crystal Structures of HIV-2 Protease In Complex With Inhibitors Containing Hydroxyethylamine Dipeptide Isostere,” Structure, 1995, Vol. 3(1), p. 33-40, which is hereby incorporated herein by reference in its entirety).
  • the conformational search procedure was able to generate a conformation with an rms deviation of 0.96 A from the bound conformation.
  • the rms deviation for the top scoring docked position is 1.38 A.
  • the top 13 scoring docked positions are all within 2.0 A of the observed position with the closest near 1.32 A.
  • the final case is an HIV protease inhibitor from the PDB entry 4phv (see, Bone, R., et al., “X-ray Crystal Structure of The HIV Protease Complex With L-700, 417, An Inhibitor With Pseudo C2 Symmetry,” Journal of the American Chemical Society, 1991, Vol. 113(24), p. 9382-9384, which is hereby incorporated herein by reference in its entirety).
  • the ligand in this case has 12 rotatable bonds. This clearly demonstrates the value of including the final flexible gradient optimization step of the ligand.
  • the closest conformation produced from the conformational search procedure is 1.32 A from the crystallographically observed conformation.
  • the top scoring docked position is also the closest to the observed position.
  • the smallest rms deviation that could have been obtained without the flexible optimization is that of the closest conformation generated by the conformation search procedure, i.e.; 1.32 A.
  • the flexible optimization decreased the final rms deviation by at least 1.0 A.
  • the ligand was taken as bound to the protein and a BFGS optimization was performed. If the resulting score was significantly less than the best score found from the docking runs, the failure is classified as a search failure. Every other failure is classified as a scoring failure.
  • the sulfur atom in the X-ray position is accepting a hydrogen bond from the OH of a tyrosine and the carboxylic acid is involved in a salt bridge with a lysine. Neither of these interactions was recognized by the scoring function described herein.
  • the conformer generation while reasonably successful, should treat small relatively rigid molecules and large flexible molecules differently. Since the conformational space of very large flexible molecules is too large to explore thoroughly, a Monte Carlo search algorithm is used. In addition, the score used to rank the conformations is certainly simplistic and can be improved. For example, variations of solvation models (see, Eisenberg, D. and A. D. McLachlan, “Solvation Energy in Protein Folding and Binding,” Nature, 1986, Vol. 319, p. 199-203; Still, W.
  • the algorithm used to find the polar hot spots tends to find any hydrogen bond donor and acceptor rather than those buried in the binding site. Improving the hot spot search routine will not only increase the quality of the technique, but will also decrease the number of hot spots needed and, thus, make the technique more efficient.
  • Some available programs, such as GRID see, Goodford, P. J., “A Computational Procedure for Determining Energetically Favorable Binding Sites on Biologically Important Macromolecules,” Journal of Medicinal Chemistry, 1985, Vol. 28(7), p. 849-857; and Still, W.
  • the capability of the present invention can readily be automated by creating a suitable program, in software, hardware, microcode, firmware or any combination thereof. Further, any type of computer or computer environment can be employed to provide, incorporate and/or use the capability of the present invention. One such environment is depicted in FIG. 8 and described in detail below.
  • a computer environment 800 includes, for instance, at least one central processing unit 810 , a main storage 820 , and one or more input/output devices 830 , each of which is described below.
  • central processing unit 810 is the controlling center of computer environment 800 and provides the sequencing and processing facilities for instruction execution, interruption action, timing functions, initial program loading and other machine related functions.
  • the central processing unit executes at least one operating system, which as known, is used to control the operation of the computing unit by controlling the execution of other programs, controlling communication with peripheral devices and controlling use of the computer resources.
  • Central processing unit 810 is coupled to main storage 820 , which is directly addressable and provides for high-speed processing of data by the central processing unit.
  • Main storage may be either physically integrated with the CPU or constructed in stand-alone units.
  • Main storage 820 is also coupled to one or more input/output devices 830 . These devices include, for instance, keyboards, communications controllers, teleprocessing devices, printers, magnetic storage media (e.g., tape, disk), direct access storage devices, and sensor-based equipment. Data is transferred from main storage 820 to input/output devices 830 , and from the input/output devices back to main storage.
  • input/output devices 830 include, for instance, keyboards, communications controllers, teleprocessing devices, printers, magnetic storage media (e.g., tape, disk), direct access storage devices, and sensor-based equipment.
  • the present invention can be included in an article of manufacture (e.g., one or more computer program products) having for instance, computer usable media.
  • the media has embodied therein, for instance, computer readable program code means for providing and facilitating the capabilities of the present invention.
  • the articles of manufacture can be included as part of a computer system or sold separately.
  • At least one program storage device readable by a machine, tangibly embodying at least one program of instructions executable by the machine to perform the capabilities of the present invention can be provided.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Organic Chemistry (AREA)
  • Library & Information Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Physics & Mathematics (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

A high-throughput molecular docking facility is presented for screening combinatorial libraries to identify binding ligands and ultimately pharmaceutical compounds. The facility employs a pre-docking conformational search to generate multiple solution conformations of a ligand. The molecular docking facility includes: generating a binding site image of the protein, the binding site image having multiple hot spots; matching hot spots of the binding site image to atoms in at least one solution conformation of the multiple solution conformations of the ligand to obtain at least one ligand position relative to the protein in a ligand-protein complex formation; and optimizing the at least one ligand position while allowing translation, orientation and rotatable bonds of the ligand to vary, and while holding the protein fixed.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of U.S. patent application Ser. No. 09/595,096, now U.S. Pat. No. ______, filed on Jun. 15, 2000. The entire disclosure of this application is hereby incorporated by reference in its entirety.
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention relates in general to screening combinatorial libraries by identification of binding ligands and ultimately pharmaceutical compounds, and more particularly, to a high throughput molecular docking technique for screening of combinatorial libraries.
  • 2. Description of the Related Art
  • With the advent of combinatorial chemistry and the resulting ability to synthesize large collections of compounds for a broad range of targets, it has become apparent that the capability to effectively prioritize screening efforts is crucial to the rapid identification of the appropriate region of chemical space for a given target. Since it has been generally observed that hits obtained against a given target are clustered in a finite region of chemical space, there is reason to believe that given the right computational tools it is possible to prioritize screening efforts such that only libraries containing active compounds are interrogated. Effective prioritization tools would allow scientists to both obtain leads in a cost effective and efficient manner and to test virtual libraries against novel targets prior to active synthesis and bioanalysis, thereby, reducing synthesis costs. With the expected flood of new targets becoming available in the coming decade, it will be critical to focus screening efforts on target appropriate regions of chemical space.
  • There are many challenges to overcome prior to being able to develop appropriate library prioritization tools. At one extreme are the screens for which there is no structural data for the target. In these cases, QSAR or other data mining tools are typically the method of choice for screening prioritization. At the opposite extreme are the structure-based approaches that rely on the availability of X-ray structures of the target. Unfortunately, in most cases, a crystal structure is not available. With the advent of proteomics and high-throughput protein crystallography, however, it is likely that for a given target, a structure of a related protein will be available. In these cases, a homology model can be built starting from the structure of a related protein, and structure-based tools could be utilized in conjunction with QSAR or other data mining tools.
  • When structural information for a target protein is available, molecular docking can be a useful tool for prioritizing screening efforts (reference: Charifson, P. S., ed. Practical Application of Computer-aided Drug Design 1997, Marcel Dekker: New York. 551; Knegtel, R. M. A. and M. Wagener, “Efficacy and Selectivity in Flexible Database Docking,” PROTEINS: Structure, Function and Genetics, 1999, Vol. 37, p. 334-335; and Debnath, A. K., L. Radigan, and S. Jiang, Structure-based Identification of Small Molecule Antiviral Compounds Targeted to the gp41Core Structure of the Human Immunodeficiency Virus Type 1,” Journal of Medicinal Chemistry, 1999, Vol. 42(17), p. 3202-3209). Operationally, this means that rather than assaying an entire collection of compounds, the compounds are first docked and ranked via some scoring function, and then only a subset of the compounds, usually the highest ranked, are assayed. This approach to prioritizing screening efforts usually increases by a factor of 1-10 the number of active compounds, i.e., when compared to a randomly selected subset of compounds (see, Charifson, P. S., et al., “Consensus Scoring: A Method for Obtaining Improved Hit Rates From Docking Databases of Three-dimensional Structures Into Proteins,” Journal of Medicinal Chemistry, 1999, Vol. 42(25), p. 5100-5109).
  • The ultimate goal of this invention is to use molecular docking as a way to prioritize combinatorial library screening efforts, i.e., rather than ranking individual compounds, combinatorial libraries of compounds are ranked. Compounds synthesized through combinatorial methods are often quite flexible when compared to typical databases of compounds used for molecular docking studies. Thus, for a docking procedure to be useful, it should be able to handle fairly flexible compounds (as many as 10-20 rotatable bonds), and it should be extremely fast (on the order of one million compounds a week). With these constraints in mind, a new docking technique has been developed and validated, as presented hereinbelow.
  • SUMMARY OF THE INVENTION
  • To briefly summarize, presented herein in one aspect is a method of docking a ligand to a protein. The method includes: performing a pre-docking conformational search to generate multiple solution conformations of the ligand; generating a binding site image of the protein, the binding site image comprising multiple hot spots; matching hot spots of the binding site image to atoms in at least one solution conformation of the multiple solution conformations of the ligand to obtain at least one ligand position relative to the protein; and optimizing the at least one ligand position while allowing translation, orientation and rotatable bonds of the ligand to vary, and while holding the protein itself fixed.
  • In another aspect, a system for docking a ligand to a protein is provided. The system includes means for performing a pre-docking conformational search to generate multiple solution conformations of the ligand. In addition, the system includes means for generating a binding site image of the protein, with the binding site image comprising multiple hot spots; and means for matching hot spots of the binding site image to atoms in at least one solution conformation of the multiple solution conformations of the ligand to obtain at least one ligand position relative to the protein. An optimization mechanism is also provided for optimizing the at least one ligand position while allowing translation, orientation and rotatable bonds of the ligand to vary, and while holding the protein fixed.
  • In a further aspect, the invention comprises at least one program storage device readable by a machine, tangibly embodying at least one program of instructions executable by the machine to perform a method of docking a ligand to a protein. The method includes: performing a pre-docking conformational search to generate multiple solution conformations of the ligand; generating a binding site image of the protein, the binding site image comprising multiple hot spots; matching hot spots of the binding site image to atoms in at least one solution conformation of the multiple solution conformations of the ligand to obtain at least one ligand position relative to the protein; and optimizing the at least one ligand position while allowing translation, orientation and rotatable bonds of the ligand to vary, and while holding the protein fixed.
  • The docking method presented herein has several advantages. First, it is built from several independent pieces. This allows one to better take advantage of scientific breakthroughs. For example, when a better conformational search procedure (in the present context this means more biologically relevant conformers) becomes available, it can be used to replace the current conformational search procedure by generating new 3-D databases. Second, this approach to ligand flexibility is better suited for the class of compounds synthesized through combinatorial methods. Compounds from combinatorial libraries frequently do not have a clear anchor fragment. Because finding and docking an anchor fragment from the ligand are key steps in the incremental construction algorithms, these algorithms may encounter difficulties with compounds commonly found in combinatorial libraries. (Incremental construction algorithms work roughly as follows: the ligand is divided into rigid fragments; the largest of these fragments is docked into the binding site of the protein; and the ligand is then rebuilt in the binding site by attaching the appropriate fragments and systematically searching around the rotatable bonds. The procedure is described further in: M. Rarey, B. Kramer, T. Lengauer, & G. Klebe, “A fast flexible docking method using an incremental construction algorithm,” J. Molecular Biology, 261 (1996), pp. 470-489; and S. Makino & I. Kuntz, “Automated flexible ligand docking method and its application to database search,” J. Computational Chemistry, 18 (1997), pp. 1812-1825.) Docking entire conformations overcomes this difficulty. In addition, including an efficient flexible optimization step removes a significant burden from the conformational search procedure. Further improvements in energy minimization algorithms can also be taken advantage of, as they become available.
  • The approach herein to ligand flexibility could be viewed as a liability because of a reliance on an initial conformation search. As indicated previously, in order to achieve maximum efficiency the conformation search should be performed once for an entire library or collection and the resulting conformations stored for future use. For large collections, this would be a considerable investment in both computer time and disk space. Because a database will typically be used many times, the initial computer time for the conformational search can easily be justified. Moreover, with the availability of parallel computers and faster CPUs, the conformational search can be completed or occasionally redone in a reasonable amount of time. Since disk sizes are now approaching the tera-byte level, storing the conformations for millions of compounds presents no problem.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The above-described objects, advantages and features of the present invention, as well as others, will be more readily understood from the following detailed description of certain preferred embodiments of the invention, when considered in conjunction with the accompanying drawings in which:
  • FIGS, 1A-1C conceptually depict protein-ligand complex formation;
  • FIG. 2 is a flowchart of one embodiment of a molecular docking approach in accordance with the principles of the present invention;
  • FIG. 3 is a flowchart of one embodiment of a molecular conformational search procedure which can be employed by the docking approach of FIG. 2, in accordance with the principles of the present invention;
  • FIG. 4 is a flowchart of one embodiment of establishing a binding site image for use with the molecular docking approach of FIG. 2, in accordance with the principles of the present invention;
  • FIG. 5 is a flowchart of one embodiment of a matching procedure for use with the molecular docking approach of FIG. 2, in accordance with the principles of the present invention;
  • FIG. 6 is a flowchart of one embodiment of an optimization stage for optimizing ligand positions within identified matches for use with the molecular docking approach of FIG. 2, in accordance with the principles of the present invention;
  • FIG. 7 graphically depicts a hydrogen bonding potential and a steric potential for use in atom pairwise scoring in accordance with the principles of the present invention; and
  • FIG. 8 depicts one embodiment of a computer environment providing and/or using the capabilities of the present invention.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
  • The docking procedure discussed below is based on a conceptual picture of protein-ligand complex formation (see FIGS. 1A-1C). Initially, the ligand (L) adopts many conformations in solution. The protein (P) recognizes one or several of these conformations. Upon recognition, the ligand, protein and solvent follow the local energy landscape to form the final complex.
  • This simple picture of protein/ligand complex formation is converted into an efficient computational model in accordance with an aspect of the present invention, as follows. The initial solution conformations are generated using a straightforward conformational search procedure. One might view the conformational search part of this technique as part of the entire docking process, but since it involves only the ligand, it can be decoupled from the purely docking steps. This is justified since 3-D databases of conformations for a collection of molecules can readily be generated and stored for use in numerous docking studies (for example, using Catalyst, see A. Smellie, S. D. Kahn, S. L. Teig, “Analysis of Conformation Coverage. 1. Validation and Estimation of Coverage,” J. Chem. Inf. Comput. Sci. (1995) v235, pp285-294; and A. Smellie, S. D. Kahn, S. L. Teig, “Analysis of Conformational Coverage. 2. Application of Conformational Models,” J. Chem. Inf. Comput. Sci. (1995) v235, pp295-304). The recognition stage is modeled by matching atoms of the ligand to interaction of “hot spots” in the binding site. The final complex formation is modeled using a gradient based optimization technique with a simple energy function. During this final stage, the translation, orientation, and rotatable bonds of the ligand are allowed to vary, while the protein and solvent are held fixed.
  • Most docking methods can be classified into one of two loosely defined categories: (1) stochastic, such as AutoDock, (Goodford, P. J., “A Computational Procedure for Determining Energetically Favorable Binding Sites on Biologically Important Macromolecules,” Journal of Medicinal Chemistry, 1985, Vol. 28(7), p. 849-857; Goodsell, D. S. and A. J. Olson, “Automated Docking of Substrates to Proteins by Simulated Annealing,” PROTEINS: Structure, Function and Genetics, 1990, Vol. 8, p. 195-202), GOLD (Jones, G., et al., “Development and Validation of a Generic Algorithm for Flexible Docking,” Journal of Molecular Biology, 1997, Vol. 267, p. 727-748), TABU (Westhead, D. R., D. E. Clark, and C. W. Murray, “A Comparison of Heuristic Search Algorithms for Molecular Docking,” Journal of Computer-Aided Molecular Design, 1997, Vol. 11, p. 209-228; and Baxter, C. A. et al., “Flexible Docking Using Tabu Search and an Empirical Estimate of Binding Affinity,” PROTEINS: Structure, Function, and Genetics, 1998, Vol. 33, p. 367-382), and Stochastic Approximation with Smoothing (SAS) (Diller, D. J. and C. L. M. J. Verlinde, “A Critical Evaluation of Several Global Optimization Algorithms for the Purpose of Molecular Docking,” Journal of Computational Chemistry, 1999, Vol. 20(16), p. 1740-1751); or (2) combinatorial, for example, DOCK (Kuntz, I. D., et al., “A Geometric Approach to Macromolecular-ligand Interactions,” Journal of Molecular Biology, 1982, Vol. 161, p. 269-288); Kuntz, I. D., “Structure-based Strategies for Drug Design and Discovery,” Science, 1992, Vol. 257, p. 1078-1082; Makino, S. and I. D. Kuntz, “Automated Flexible Ligand Docking Method and Its Application for Database Search,” Journal of Occupational Chemistry, 1997, Vol. 18(4), p. 1812-1825), FlexX (Rarey, M., et al., “A Fast Flexible Docking Method Using an Incremental Construction Algorithm,” Journal of Molecular Biology, 1996, Vol. 261, p. 470-489; Rarey, M., B. Kramer, and T. Lengauer, “The Particle Concept: Placing Discrete Water Molecules During Protein-ligand Docking Predictions,” PROTEINS: Structure, Function, and Genetics, 1999, Vol. 34, p. 17-28; Rarey M., B. Kramer, and T. Lengauer, “Docking of Hydrophobic Ligands With Interaction-based Matching Algorithms,” Bioinformatics, 1999, Vol. 15(3), p. 243-250), and HammerHead (Welch, W., J. Ruppert, and A. N. Jain, “Hammerhead: Fast Fully Automated Docking of Flexible Ligands to Protein Binding Sites,” Chemistry & Biology, 1996, Vol. 3(6), p. 449-462).
  • The stochastic methods, while often providing more accurate results, are typically too slow to search large databases. The method presented herein falls into the combinatorial group. This approach is analogous to FlexX and HammerHead in that it attempts to match interactions between the ligand and receptor. It differs from these and most other docking techniques significantly in how it handles the flexibility of the ligand. More current combinatorial docking techniques handle flexibility using an incremental construction approach, whereas the technique described herein uses an initial conformational search followed by a gradient based minimization in the presence of the target protein.
  • A generalized technique of one embodiment of the present invention is depicted in FIG. 2. Initially, a conformational search procedure 210 is performed for an entire library or collection, with the resulting conformations stored for future use. A binding site image is then created using the protein structure 220. A matching procedure is performed to form an initial complex by initially positioning a given conformation of a ligand as a rigid body into the binding site 230. Finally, a flexible optimization is performed wherein the matches are pruned and then optimized to attain the final result 240. Each of these steps of a docking approach, in accordance with the present invention, is described in greater detail below with reference to FIGS. 3-6, respectively.
  • The Conformational Search Procedure
  • For one aspect of the present invention, a straightforward yet effective conformational search procedure is preferred. A conformational search is performed once for an entire library or a collection, with the resulting conformations stored for future use. If desired, the conformational searching can be periodically repeated.
  • Referring to FIG. 3, uniformly distributed random conformations are generated allowing only rotatable bonds to vary 310. For example, 1,000 uniformly distributed random conformations can be generated varying only the rotatable bonds. The internal energy of each conformation is then minimized, again allowing only rotatable bonds to vary 320. Internal energy can be estimated, for example, using van der Waals potentials and dihedral angle term, reference: Diller, D. J. and C. L. M. J. Verlinde “A Critical Evaluation of Several Global Optimization Algorithms for the Purpose of Molecular Docking,” Journal of Computational Chemistry, 1999, Vol. 20(16), p. 1740-1751, which is hereby incorporated herein by reference in its entirety. Each conformation can be minimized using, for example, a BFGS (Broyden-Fletcher-Goldfarb-Shanno) optimization algorithm, e.g., reference Press, W. H., et al., Numerical Recipes in C, 2 ed., 1997, Cambridge: Cambridge University Press. 994, which is hereby incorporated herein by reference in its entirety.
  • Conformations with internal energy over a selected cut-off above a conformation with the lowest internal energy are eliminated 330. For example, any conformation with an internal energy of 15 kcal/mol above the conformation with the lowest internal energy is eliminated. The remaining conformations are scored and ranked 340. Conformations can be ranked by a score defined as:
    Score=Strain−0.1×SASA
    where SASA is the “solvent accessible surface area” of a particular conformation; and “strain” of a given conformation of a given molecule is the internal energy of the given conformation minus the internal energy of the conformation of the given molecule with the lowest internal energy. Conformations within a pre-defined rms deviation of a better conformation are removed 350. For example, any conformation within an rms deviation of 1.0 A of a higher ranked (i.e., better) conformation can be removed. This clustering is a means to remove redundant conformations. A maximum number of desired conformations, for example, 50 conformations, are retained at the end of the conformational analysis step 360.
  • If more than the desired number of conformations remain after clustering, then the lowest ranked conformations can be removed until the desired number of conformations remain.
  • The process of a small molecule binding to a protein target is a balance between “salvation” by water versus “solvation” by the protein. With this in mind, the solvent accessible surface area term can be chosen in analogy with simple aqueous solvation models, e.g., reference Eisenberg, D. and A. D. McLachlan, “Solvation Energy in Protein Folding and Binding,” Nature, 1986, Vol. 319, p. 199-203; Ooi, T., et al., Accessible Surface Areas as a Measure of the Thermodynamic Parameters of Hydration of Peptides,” Proceedings of the National Academy of Sciences, 1987, Vol. 84, p. 3086-3090; and Vajda, S., et al., “Effect of Conformational Flexibility and Solvation on Receptor-ligand Binding Free Energies,” Biochemistry, 1994, Vol. 33, p. 13977-13988, each of which is hereby incorporated herein by reference in its entirety. The key difference in protein versus water “solvation” is that water competes for polar interactions only, while a protein effectively competes for both polar and hydrophobic interactions. Therefore, for purposes of this invention, polar and apolar surface areas are treated identically. The choice of 0.1 as a weight for the surface area term is somewhat arbitrary, but is comparable to the weights chosen for surface area based solvation models. Ultimately, conformations with more solvent accessible surface area are going to be able to interact more extensively with a target protein and can, therefore, be of somewhat higher strain and still bind tightly. A more refined ranking system could be used with the present invention, but this approach to ranking conformations supplies reasonable conformations.
  • The Binding Site Image—Locating the Hot Spots
  • The binding site image comprises a list of apolar hot spots, i.e., points in the binding site that are favorable for an apolar atom to bind, and a list of polar hot spots, i.e., points in the binding site that are favorable for a hydrogen bond donor or acceptor to bind. One procedure for creating these two lists is depicted in FIG. 4. First, in order to find the binding site, a grid is placed around the binding site 410. By way of example, the grid may be at least 20 A×20 A×20 A with at least 5 A of extra space in each direction. A 0.2 A spacing can be used for the grid. Next, a “hot spot search volume” is determined 420. This is accomplished by eliminating any grid point inside the protein. Any point contained in, for example, a 6.0 A or larger sphere not touching the protein can also be eliminated. The largest remaining connected piece becomes the “hot spot search volume.”
  • The hot spots can then be determined using a grid-like search of the hot spot search volume 430. By way of example, a grid-like search is described in Goodford, P. J., “A computational Procedure for Determining Energetically Favorable Binding Sites on Biologically Important Macromolecules,” Journal of Medicinal Chemistry, 1985, Vol. 28(7), p. 849-857, which is hereby incorporated herein by reference in its entirety. To find the apolar hot spots, an apolar probe is placed at each grid point in the hot spot search volume, the probe score is calculated and stored. The process is repeated for polar hot spots. For each type of hot spot, the grid points are clustered and a desired number of best clustered grid points is maintained 440. For example, the top 30 clustered grid points may be retained.
  • The Matching Procedure—Forming an Initial Complex
  • Referring to FIG. 5, in order to initially position a given conformation of a ligand as a rigid body into the binding site, the atoms of the ligand are matched to the appropriate hot spots 510. More precisely, in one example, a triplet of atoms, A1, A2, A3 is considered a match to a triplet of hot spots, H1, H2, H3, if:
    • i. The type of Aj matches the type of Hj for each j=1, 2, 3, that is, apolar hot spots match apolar atoms and polar hot spots match polar atoms.
    • ii. D(Aj, Ak)=D(Hj, Hk)±δ for all j, k=1, 2, 3 where D(Aj, Ak) and D(Hj, Hk) are the distance from Aj to Ak and Hj to Hk, respectively, and γ is some allowable amount of error, e.g., between 0.25 A and 0.5 A.
  • To restate, a match occurs, in one example, when three hot spots forming a triangle and three atoms of the ligand forming a triangle substantially match. That is, a match occurs when the triangles are sufficiently similar with the vertices of each triangle being the same type and the corresponding edges of similar length. The matching algorithm finds all matches between atoms of a given conformation and the hot spots. Each match then determines a unique rigid body transformation. The rigid body transformation is then used to bring the conformation into the binding site to form the initial protein-ligand complex.
  • In step 520, each match determines a unique rigid body transformation that minimizes I ( R , T ) = J = 1 3 H j - RA j - T 2 ( 1 )
    • where R is, for instance, a 3×3 rotation matrix and T is a translation vector. Again, a rigid body transformation comprises in one example, a 3×3 rotation matrix, R, and translation vector T, so that points X (the position of an atom of the conformation) are transformed by RX+T. Each rigid body transformation, which can be determined analytically, is then used to place the ligand conformation into the binding site 530. For this aspect of the calculation, several algorithms for finding all matches were tested. The geometric hashing algorithm developed for FlexX (see: Rarey, M., S. Welfing, and T. Lengauer, “Placement of Medium-sized Molecular Fragments Into Active Sites of Proteins,” Journal of Computer-Aided Molecular Design, 1996, Vol. 10, p. 41-54, which is hereby incorporated herein by reference in its entirety), proved to be the most efficient.
      Opitimization Stage
  • A single conformation can produce up to 10,000 matches. In the interest of efficiency, most of these matches cannot be optimized, so a pruning/scoring strategy is desired. FIG. 6 depicts one each strategy.
  • Referring to FIG. 6, initially all matches for which more than a predetermined percentage (e.g., 10%) of the ligand atoms have a steric clash can be eliminated 610. The remaining matches are ranked using an atom pairwise score described below, with an atom score cutoff of for example 1.0 620. Use of a cutoff allows matches that fit reasonably well with a few steric clashes to survive to the final round, and the choice of 1.0 is merely exemplary. After being ranked, the matches are clustered, and the top N matches are selected to move into the final stage 630, where N may comprise, for instance, a number in the range of 25-100.
  • Each remaining match is optimized using a BFGS optimization algorithm with a simple atom pairwise score 640. In one embodiment, the score can be modeled after the Piecewise Linear Potential (see, Gehlhaar, D. K., et al., “Molecular Recognition of the Inhibitor AG-1343 by HIV-1 Protease: Conformationally Flexible Docking by Evolutionary Programming,” Chemistry & Biology, 1995, Vol. 2, p. 317-324, which is hereby incorporated herein by reference in its entirety) with a difference being that the score used herein is preferably differentiable. For this score, all hydrogens are ignored, and all non-hydrogen atoms are classified into one of four categories:
    • i. Apolar—anything that cannot form a hydrogen bond.
    • ii. Acceptor—any atom that can act as a hydrogen bond acceptor, but not as a donor.
    • iii. Donor—any atom that can act as a hydrogen bond donor, but not as an acceptor.
    • iv. Donor/Acceptor—any atom that can act as both a hydrogen bond donor and an acceptor.
      The score between two atoms is calculated using either a hydrogen bonding potential or a steric potential. The two potentials, shown in FIG. 7, have the mathematical form F ( r ) = ɛ [ ( ( 1 - σ ) R min 2 r 2 + σ R min 2 ) 6 - 2 ( ( 1 - σ ) R min 2 r 2 + σ R min 2 ) 3 ] Φ ( r 2 : r 1 2 , r 0 2 ) ( 2 )
  • where Rrim is the position of the score minimum, ε is the depth of the minimum, σ is a softening factor, and Φ (r:r1,r0) is a differentiable cutoff function of r (the distance between potential, steric and hydrogen bonding, is assigned its own set of parameters. The parameters for these potentials can be chosen by one skilled in the art via intuition and subsequent testing, but they do not need to be fully optimized. Table 2 contains example parameters for the pairwise potentials.
    TABLE 2
    Hydrogen Bonding Steric
    Potential Potential
    ε 2.0 0.4
    σ 0.5 1.5
    Rmin 3.0 Å 4.05 Å 
    r1 3.0 Å 5.0 Å
    r0 4.0 Å 6.0 Å
  • These potentials are very similar to the 12-6 van der Waals potentials used in many force fields with two differences. First, the softening factor, a, makes the potentials significantly softer than the typical 12-6 van der Waals potentials (see FIG. 7), i.e., mild steric clashes common in docking runs are tolerated by this potential. In spirit, the softening factor implicitly models small induced fit effects of the protein which can be important (see, Murray, C. W., C. A. Baxter, and D. Frenkel, “The Sensitivity of The Results of Molecular Docking to Induced Fit Effects: Application to Thrombin, Thermolysin and Neuraminidase,” Journal of Computer-Aided Molecular Design, 1999, Vol. 12, p. 547-562, which is hereby incorporated herein by reference in its entirety), and in practice, makes the potential much more error tolerant. The second difference is the cutoff function. This function guarantees that the potential is zero beyond a finite distance usually between 5.0 A and 6.0 A. This along with some organization of the protein atoms significantly speeds up the direct calculation of the score.
  • An attempt was made to calculate the scores both directly and through precalculated grids. The advantage of using the grids is that the score can be calculated very rapidly. Grids were found to be 5-10 times faster than the direct calculation. The advantage of the direct calculation is that effects, such as protein flexibility and solvent mobility, can be accommodated more easily. Since using the grids did not seem to cause any deterioration in the quality of the docking results and since protein flexibility or solvent mobility is currently not included, for the results presented hereinbelow, the scores were calculated through precalculated grids. For the purpose of the BFGS optimization algorithm, all derivatives were calculated analytically including those with respect to the rotatable bonds (see, Haug, E. J. and M. K. McCullough, “A Variational-Vector Calculus Approach to Machine Dynamics,” Journal of Mechanisms, Transmissions, and Automation in Design, 1986, Vol. 108, p.25-30, which is hereby incorporated herein by reference in its entirety).
  • Test Results
  • To test the docking procedure of the present invention, the Gold test set was used (see Jones, G., et al., “Development and Validation of a Generic Algorithm for Flexible Docking,” Journal of Molecular Biology, 1997, Vol. 267, p. 727-748, which is hereby incorporated herein by reference in its entirety). Any covalently bound ligand or any ligand bound to a metal ion was removed because it cannot, at present, be modeled by the scoring function described herein. In addition, any “surface sugars” were removed as they are not typical of the problems encountered. This left a total of 103 cases (see Table 1 below). No further individual processing of the test cases was performed. (Note that the “Protein Data Bank” (PDB) is a database where protein structures are placed. The “PDB Code” is a four letter code that allows a given structure to be found and extracted from the PDB.)
    TABLE 1
    Number RMSD
    of Mini- of
    PDB Rot mum Top
    Code Bonds RMSD Score
    1aaq 17 1.35 1.4
    1abe 0 0.31 0.31
    1acj 0 0.59 0.71
    1ack 2 0.45 0.46
    1acm 6 0.31 0.31
    1aha 0 0.25 0.53
    1apt 18 1.10 1.63
    1atl 9 1.05 4.24
    1azm 1 1.40 2.33
    1baf 7 0.76 7.10
    1bbp 11 1.45 1.55
    1cbs 5 0.70 12.63
    1cbx 5 0.53 2.30
    1cil 3 1.07 5.94
    1com 3 0.76 0.76
    1coy 0 0.52 0.70
    1cps 5 0.85 0.97
    1dbb 1 0.72 0.85
    1dbj 0 0.64 5.90
    1did 2 2.76 3.65
    1die 1 2.24 2.30
    1drl 2 1.02 1.61
    1dwd 9 0.75 7.98
    1eap 10 0.79 3.95
    1eed 19 3.41 3.41
    1epb 5 0.75 2.86
    1eta 5 5.48 7.29
    1etr 9 2.70 7.06
    1fen 4 0.98 2.45
    1fkg 10 1.68 1.72
    1fki 0 0.30 0.54
    1frp 6 0.67 1.13
    1ghb 4 0.90 0.94
    1glp 10 1.45 8.92
    1glg 13 1.91 9.96
    1hdc 6 1.52 11.25
    1hef 19 3.63 5.29
    1hfc 10 1.37 7.77
    1hri 9 1.49 3.29
    1hsl 3 0.76 2.21
    1hyt 5 0.79 1.56
    1icn 15 1.78 9.43
    1ida 15 1.32 1.38
    1igj 3 0.90 7.46
    1imb 2 1.64 4.48
    1ive 2 2.55 6.63
    1lah 4 0.71 0.77
    1lcp 3 0.53 4.65
    1ldm 1 0.80 5.24
    1lic 15 1.32 4.39
    1lmo 6 5.00 8.40
    1lna 6 1.35 1.46
    1lst 5 0.58 1.43
    1mcr 5 3.92 5.41
    1mdr 2 0.41 0.78
    1mmq 7 0.55 0.60
    1mrg 0 0.45 3.42
    1mrk 2 0.94 2.91
    1mup 2 1.74 4.40
    1nco 8 2.88 8.50
    1pbd 1 0.29 0.38
    1poc 23 2.81 8.62
    1rne 21 8.83 10.14
    1rob 4 0.83 1.17
    1snc 5 1.17 5.60
    1srj 3 0.48 0.58
    1stp 5 0.33 0.48
    1tdb 4 1.33 7.09
    1tka 8 1.44 1.44
    1tng 1 0.35 0.42
    1tnl 1 0.45 4.25
    1tph 3 0.63 1.44
    1ukz 4 0.43 6.20
    1ulb 0 1.22 4.19
    1wap 3 0.29 0.34
    1xid 2 0.79 4.23
    1xie 1 0.34 3.89
    2ada 2 0.53 0.58
    2ak3 4 1.91 3.24
    2cgr 7 0.61 3.46
    2cht 2 0.18 0.40
    2cmd 5 0.50 2.36
    2ctc 3 0.36 4.15
    2dbl 6 0.40 0.96
    2gbp 1 0.17 0.17
    2lgs 4 0.71 5.48
    2phh 1 0.51 0.51
    2plv 5 1.98 7.40
    2r07 15 1.17 2.45
    2sim 8 0.92 1.37
    2yhx 3 1.07 6.99
    3aah 3 0.48 0.68
    3cpa 5 0.92 1.40
    3hvt 1 0.27 0.56
    3ptb 0 0.22 0.28
    3tpi 6 0.42 0.53
    4cts 3 0.73 0.77
    4dfr 9 2.05 8.72
    4fab 2 2.52 4.45
    4phv 12 0.38 0.38
    6abp 0 0.34 0.34
    7tim 3 0.40 0.98
    8gch 7 1.70 4.45
  • As expected, the rms deviation between the bound conformation (X-ray) and the closest computationally generated conformation increases with the number of rotatable bonds. In all but 5 cases, at least one conformation was generated by the conformational search with 1.5 A rms deviation of the bound conformation. The most interesting aspect of the conformational search results is that for some of the more rigid ligands, the minimum rms deviation was large. For example, there are several ligands with fewer than five rotatable bonds, but with a minimum rns deviation near 1.0A. This occurs for two reasons. First, a clustering radius of 1.0 A in all cases was used. This prevented the conformational space of small ligands from being sufficiently sampled. However, it is within the scope of the present invention that a clustering radius dependent on the molecule size could be used to alleviate this particular problem. The second problem is that a bond between two sp2 atoms was always treated as being conjugated. Thus, whenever this type of bond is encountered, it is strongly restrained to be planar. While bonds between two sp2 atoms are often conjugated, this is clearly an over-simplification. This may be addressed, in accordance with the invention by allowing the dihedral angles between two sp2 atoms to deviate from planarity. This deviation can then be penalized according to the degree of conjugation. The penalty could be chosen crudely based on the types of the sp2 atoms (see, S. L. Mayo, B. D. Olafson, & W. A. Goddard, “DRIEDING: A Generic Force Field for Molecular Simulations,” J. Phys. Chem. 1990, Vol. 94, p. 8897).
  • The Docking Results
  • For the docking runs, two different sets of parameters were tested to see their effects on the quality and speed of the docking runs: one for high quality docking and one for rapid searches. The key difference between the two sets of parameters are the match tolerance and the number and length of the BFGS optimization runs. The match tolerance ranges from 0.5 A for the high quality to 0.25 A for the rapid searches. Note that the larger tolerance the more matches will be found. Thus, a larger tolerance means a more thorough search, while a smaller tolerance denotes a less thorough but faster search. For the high quality runs, a maximum of 100 matches per ligand were optimized for 100 steps compared to 25 matches per ligand for 20 steps for the rapid searches.
  • The first problem is to generate at least one docked position between a given rms deviation cutoff. Here, terminology is adopted that a ligand that is docked to within X A of the crystallographically observed position of the ligand is referred to as an X A hit. The rms deviations are shown for the high quality runs in Table 1. For the high quality runs, 89 of the 103 cases produce at least one 2.0 A hit. The numbers drop to 80 at 1.5 A, 63 at 1.0 A and 26 at 0.5 A. For the rapid searches, 75 of the 103 cases produce a 2.0 A hit, 65 produce a 1.5 A hit, 42 produce a 1.0 A hit and 16 produce a 0.5 A hit. In both cases, these numbers compare favorably with similar statistics from other docking packages that have been tested on the Gold or similar test sets (see, Jones, G., et al., Development and Validation of a Generic Algorithm for Flexible Docking,” Journal of Molecular Biology, 1997, Vol. 267, p. 727-748; Baxter, C.A. et al., “Flexible Docking Using Tabu Search and an Empirical Estimate of Binding Affinity,” PROTEINS: Structure, Function, and Genetics, 1998, Vol. 1998, p. 367-382; Rarey, M., B. Kramer, and T. Lengauer, “The Particle Concept: Placing Discrete Water Molecules During Protein-ligand Docking Predictions,” PROTEINS: Structure, Function, and Genetics, 1999, Vol. 34, p. 17-28; Rarey M., B. Kramer, and T. Lengauer, “Docking of Hydrophobic Ligands With Interaction-based Matching Algorithms,” Bioinformatics, 1999, Vol. 15(3), p. 243-250; and Kramer, B., M. Rarey, and T. Lengauer, “Evaluation of the FlexX Incremental Construction Algorithm for Protein-Ligand Docking,” PROTEINS: Structure, Function, and Genetics, 1999, Vol. 37, p. 228-241).
  • The second problem is to correctly rank the docked compounds, i.e., is the top ranked conformation reasonably close to the crystallographically observed position for the ligand. This is a significantly more difficult problem than the first. The rms deviation between the top scoring docked position and the observed position or the high quality runs are given in Table 1. In this case, there is little difference between the two sets of parameters. For the high quality runs, 48 of the 103 cases produce a 2.0 A hit as the top scoring docked position. This number drops to 41 at 1.5 A, 34 at 1.0 A and 10 at 0.5 A. For the rapid searches, 45 of the 103 cases produce a 2.0 A hit as the top scoring docked position with 41 at 1.5 A, 34 at 1.0 A and 10 at 0.5 A.
  • The utility of the scoring function used in this study lies less as a tool to absolutely rank the docked conformations than as an initial filter to select only a few docked conformations. Most of the well docked positions, i.e., low rms deviations, survive this 10% cutoff. Most of the docked positions, however, do not. For the high quality runs, on average 74 positions are found, but after the 10% cutoff on average only 8 remain. For the rapid searches, on average nearly 21 positions are found, but after the cutoff on average only 5 remain. At this point, the docked positions that survive the 10% score cutoff could be further optimized, visually screened, or passed to a more accurate, but less efficient scoring function.
  • For the high quality runs, the average CPU time (e.g., using a Silicon Graphics Incorporated (SGI) computer R12000) per test case is approximately 4.5 seconds. At this rate, screening one million compounds with one CPU would take about 50 days. For the rapid searches, the average CPU time per test case drops to approximately 1.1 seconds per test case. At this rate, screening one million compounds with one CPU would take about 12 days. Because database docking is a highly parallel job, multiple CPUs could easily cut this to a reasonable amount of time (for example, a day or so).
  • Some Specific Successful Cases
  • In this section, a few of the successful cases are shown to demonstrate the strengths of the approach described herein to docking small molecules. In all of these cases, the results shown are from the medium quality docking runs. The first case is the dipeptide Ile-Val from the PDB entry 3tpi (see, Marquart, M., et al., “The Geometry of the Reactive Site and of the Peptide Groups in Trypsin, Trypsinogen and Its Complexes With Inhibitors,” Acta Crystallographica, 1983, Vol. B39, p. 480, which is hereby incorporated herein by reference in its entirety). This case has no clear anchor fragment and as a result, the incremental construction approach to docking might have difficulties with this ligand. Our conformation search procedure produced a conformation within 0.42 A of the observed conformation. The rms deviation between the best scoring docked position and the observed position is 0.53 A.
  • The second example, with a ligand having 15 rotatable bonds, is a much more difficult example. It is an HIV protease inhibitor from the PDB entry lida (see, Tong, L., et al., “Crystal Structures of HIV-2 Protease In Complex With Inhibitors Containing Hydroxyethylamine Dipeptide Isostere,” Structure, 1995, Vol. 3(1), p. 33-40, which is hereby incorporated herein by reference in its entirety). In this case the conformational search procedure was able to generate a conformation with an rms deviation of 0.96 A from the bound conformation. The rms deviation for the top scoring docked position is 1.38 A. In fact, the top 13 scoring docked positions are all within 2.0 A of the observed position with the closest near 1.32 A.
  • The final case is an HIV protease inhibitor from the PDB entry 4phv (see, Bone, R., et al., “X-ray Crystal Structure of The HIV Protease Complex With L-700, 417, An Inhibitor With Pseudo C2 Symmetry,” Journal of the American Chemical Society, 1991, Vol. 113(24), p. 9382-9384, which is hereby incorporated herein by reference in its entirety). The ligand in this case has 12 rotatable bonds. This clearly demonstrates the value of including the final flexible gradient optimization step of the ligand. The closest conformation produced from the conformational search procedure is 1.32 A from the crystallographically observed conformation. With an rms deviation of 0.38 A, the top scoring docked position is also the closest to the observed position. The smallest rms deviation that could have been obtained without the flexible optimization is that of the closest conformation generated by the conformation search procedure, i.e.; 1.32 A. Thus, in this case, the flexible optimization decreased the final rms deviation by at least 1.0 A.
  • An Analysis of the Errors and Avenues for Improvement
  • It is often assumed that when docking simulation fails, the score has failed, i.e., the global minimum of the scoring function did not correspond to the crystallographically determined position for the ligand. Since the docking problem involves many degrees of freedom, it is reasonable to believe that in many cases the failure can be attributed to insufficient search. It is the goal of this section to identify the cause of failure in the cases in which the procedure described herein performed poorly.
  • To classify docking failures as either scoring failures or search failures, the ligand was taken as bound to the protein and a BFGS optimization was performed. If the resulting score was significantly less than the best score found from the docking runs, the failure is classified as a search failure. Every other failure is classified as a scoring failure.
  • The vast majority of the cases qualify as moderate scoring errors, i.e., the global minimum appears not to correspond to the crystallographic position of the ligand, but the percent difference between the global minimum and the best score near the crystallographic position of the ligand is less than 10%. In these cases, it is difficult to decide which aspects of the score are failing, but it is reasonable to believe that many of these cases can be corrected simply by including some more detail in the scoring function, such as angular constraints on the hydrogen bonding term or a solvation model. There are, however, a few cases with dramatic scoring errors. These cases provide some insight into the weakness of the score and the complexities of protein/ligand interactions.
  • The case 1glq (see, Garcia-Saez, I., et al., “Molecular Structure at 1.8 A of Mouse Liver Class pi Glutathione S-Transferase Complexed With S-(p-Nitrobenzyl)Glutathione and Other Inhibitors,” Journal of Molecular Biology, 1994, Vol. 237, p. 298-314) pointed out the main weakness of the score used in this study—hydrogen bonding patterns. This is a polar ligand. The top ranked position for this ligand scores very well largely because there are many “perceived” hydrogen bonds. In reality, these hydrogen bonds would be extremely weak because the angular dependence of the interaction is poor. Moreover, the sulfur atom in the X-ray position is accepting a hydrogen bond from the OH of a tyrosine and the carboxylic acid is involved in a salt bridge with a lysine. Neither of these interactions was recognized by the scoring function described herein.
  • In the case live (see, Jedrzejas, M. J., et al., “Structures of Aromatic Inhibitors of Influenza Virus Neuraminidase,” Biochemistry, 1995, Vol. 34, p. 3144-3151), the correct position receives a relatively poor score largely due to the estimated strain of the observed conformation. The present invention recognizes certain bonds as being conjugated. Thus, a stiff penalty is applied when these bonds are not planar. In the observed conformation, the dihedral angles are all nearly 80° from planar. If these dihedral angles are forced to be near 0°, the conformation is no longer compatible with the observed interactions between the ligand and the protein. It would be difficult for any docking algorithm to predict these values for the dihedral angles.
  • The case 1hef (see, Murthy, K. H. M., et al., The Crystal Structures at 2.2-A Resolution of Hydroxyethylene-Based Inhibitors Bound to Human Immunodeficiency Virus Type 1 Protease Show That The Inhibitors are Present in Two Distinct Orientations,” Journal of Biological Chemistry, 1992, Vol. 267, p. 22770-22778), an HIV protease inhibitor, is perhaps the most interesting of all of the dramatic scoring errors. The binding pocket is at the interface of a dimer with the protein monomers being related through a crystallographic symmetry operation. At the C-terminus of the ligand, a methyl group is within 2.0 A. These interactions would be extremely difficult to predict. Our program did come up with an interesting alternate conformation for the C-terminus of the ligand. This conformation eliminates both the internal and external steric clashes and forms an additional hydrogen bond with the protein.
  • There are two cases that can be classified as conformation search failures: 1hef and 1poc. In these cases the best conformation produced is 2.1 A and 2.3 A, respectively. The ligand in the case 1poc has 23 rotatable bonds, and thus, it is very difficult to fully cover its conformation space with only 50 conformers. While the ligand in the case 1hef is also very flexible (18 rotatable bonds), the observed conformation, as described above, also has a serious steric clash. Thus, this is, as should be expected, a very difficult challenge for any conformational search procedure.
  • Conclusions
  • In this application, a new rapid technique for docking flexible ligands into the binding sites of proteins is presented. The method is based on a pre-generated set of conformations for the ligand and a final flexible gradient based optimization of the ligand in the binding site of the protein. Based on the results, this is a robust approach to handling ligand flexibility. With relatively few conformations (less than 50 per molecule), usually a conformation within 1.5 A of the bound conformation can be generated. Applying the flexible optimization as the final step reduces the number of conformations required while maintaining high quality final docked positions.
  • There are opportunities to improve the exemplified docking technique. Such improvements also fall within the scope of the present invention. For example, the conformer generation, while reasonably successful, should treat small relatively rigid molecules and large flexible molecules differently. Since the conformational space of very large flexible molecules is too large to explore thoroughly, a Monte Carlo search algorithm is used. In addition, the score used to rank the conformations is certainly simplistic and can be improved. For example, variations of solvation models (see, Eisenberg, D. and A. D. McLachlan, “Solvation Energy in Protein Folding and Binding,” Nature, 1986, Vol. 319, p. 199-203; Still, W. C., et al., “Semianalytical Treatment of Solvation For Molecular Mechanics and Dynamics,” Journal of the American Chemical Society, 1990, Vol. 112, p. 6127-6129, both of which are hereby incorporated herein by reference in their entirety) would likely give better conformations. Finally, a better treatment of strain, particularly that for rotation about bonds between two sp2 atoms, might lead to improved results.
  • In the embodiment exemplified, the algorithm used to find the polar hot spots tends to find any hydrogen bond donor and acceptor rather than those buried in the binding site. Improving the hot spot search routine will not only increase the quality of the technique, but will also decrease the number of hot spots needed and, thus, make the technique more efficient. Some available programs, such as GRID (see, Goodford, P. J., “A Computational Procedure for Determining Energetically Favorable Binding Sites on Biologically Important Macromolecules,” Journal of Medicinal Chemistry, 1985, Vol. 28(7), p. 849-857; and Still, W. C., et al., “Semianalytical Treatment of Solvation For Molecular Mechanics and Dynamics,” Journal of the American Chemical Society, 1990, Vol. 112, p. 6127-6129, both of which are hereby incorporated herein by reference in their entirety) or the LUDI binding site description (see, Bohm, H. J., “LUDI: Rule-based Automatic Design of New Substituents For Enzyme Inhibitor Leads,” Journal of Computer-Aided Molecular Design, 1992, Vol. 6, p. 693-606, which is hereby incorporated herein by reference in its entirety) or a documented method (see, Mills, J. E. J., T. D. J. Perkins, and P. M. Dean, “An Automated Method For Predicting The Positions of Hydrogen-bonding Atoms In Binding Sites,” Journal of Computer-Aided Molecular Designs, 1997, Vol. 11, p. 229-242, which is hereby incorporated herein by reference in its entirety) would likely show some improvement. In addition, separating the polar hot spots into donor, acceptor, ionic, etc., hot spots might improve the results. Finally, in a practical application, most users would be willing to spend some time to enhance the image, i.e., eliminate by hand bad hot spots, and add hot spots where needed. In practice, this will significantly improve docking runs.
  • In all docking programs, a good score should be efficient, error tolerant, and accurate. The score used here satisfies the first two qualities. These two qualities, however, are usually not compatible with the third. It appears that this score will still be useful as an initial screen after which a more accurate score can be applied. Geometric constraints for the hydrogen bonding term, recognition of ionic interactions and solvation effects, and terms for dealing with metals can be introduced to improve accuracy.
  • Nonetheless, when a crystal structure is available, the approach of the present invention to molecular docking is useful in library screening prioritization. Even with lower quality structural information, such as homology model, the technique described herein will still provide useful information.
  • The capability of the present invention can readily be automated by creating a suitable program, in software, hardware, microcode, firmware or any combination thereof. Further, any type of computer or computer environment can be employed to provide, incorporate and/or use the capability of the present invention. One such environment is depicted in FIG. 8 and described in detail below.
  • In one embodiment, a computer environment 800 includes, for instance, at least one central processing unit 810, a main storage 820, and one or more input/output devices 830, each of which is described below.
  • As is known, central processing unit 810 is the controlling center of computer environment 800 and provides the sequencing and processing facilities for instruction execution, interruption action, timing functions, initial program loading and other machine related functions. The central processing unit executes at least one operating system, which as known, is used to control the operation of the computing unit by controlling the execution of other programs, controlling communication with peripheral devices and controlling use of the computer resources.
  • Central processing unit 810 is coupled to main storage 820, which is directly addressable and provides for high-speed processing of data by the central processing unit. Main storage may be either physically integrated with the CPU or constructed in stand-alone units.
  • Main storage 820 is also coupled to one or more input/output devices 830. These devices include, for instance, keyboards, communications controllers, teleprocessing devices, printers, magnetic storage media (e.g., tape, disk), direct access storage devices, and sensor-based equipment. Data is transferred from main storage 820 to input/output devices 830, and from the input/output devices back to main storage.
  • The present invention can be included in an article of manufacture (e.g., one or more computer program products) having for instance, computer usable media. The media has embodied therein, for instance, computer readable program code means for providing and facilitating the capabilities of the present invention. The articles of manufacture can be included as part of a computer system or sold separately.
  • Additionally, at least one program storage device readable by a machine, tangibly embodying at least one program of instructions executable by the machine to perform the capabilities of the present invention can be provided.
  • The flow diagrams depicted herein are just exemplary. There may be many variations to these diagrams or the steps (or operations) described therein without departing from the spirit of the invention. For instance, the steps may be performed in a differing order, or steps may be added, deleted or modified. All of these variations are considered a part of the claimed invention.
  • Although preferred embodiments have been depicted and described in detail herein, it will be apparent to those skilled in the relevant art that various modifications, additions, substitutions, and the like can be made without departing from the spirit of the invention and these are therefore considered to be within the scope of the invention as defined by the following claims.

Claims (16)

1. A computer-aided method of docking a ligand to a protein so as to determine ligand conformations likely to bind to said protein, said method comprising:
performing a pre-docking conformational search and generating multiple solution conformations of a ligand therefrom;
generating a binding site image of a protein, said binding site image comprising multiple hot spots;
matching hot spots of the binding site image to atoms in at least one conformation of the multiple solution conformations of the ligand to initially position said at least one conformation of said ligand as a rigid body into said binding site so as to obtain at least one position of the ligand relative to the protein in a protein-ligand complex;
optimizing the at least one position of the ligand while allowing translation, orientation and rotatable bonds of the ligand to vary, and while holding the protein fixed;
calculating a score for the optimized position of the ligand using one or more potential functions; and
selecting one or more optimized ligand positions based on said score.
2. The method of claim 1, additionally comprising, after performing the pre-docking conformational search and generating multiple solution conformations, creating a database of the multiple solution conformations of the ligand and storing said three-dimensional database for subsequent use by said matching.
3. The method of claim 2, wherein said database of the multiple solution conformations of the ligand comprises a conformational database of a combinatorial library.
4. The method of claim 1, wherein said performing the pre-docking conformational search and generating multiple solution conformations of the ligand comprises:
randomly generating a plurality of conformations of the ligand;
minimizing the strain of each conformation of the plurality of conformations;
using the strain and the solvent accessible surface area of each conformation to rank the conformations; and
clustering the conformations and retaining a desired top number of clusters of conformations, said retained top number of clusters of conformations comprising said multiple conformations of the ligand in solution.
5. The method of claim 1, wherein said generating the binding site image includes at least one of creating a list of apolar hot spots identifying points in the binding site that are favorable for an apolar atom to bind, and generating a list of polar hot spots identifying points in the binding site that are favorable for a hydrogen bond donor or acceptor to bind.
6. The method of claim 5, wherein said generating the binding site image further comprises:
placing a grid around the binding site of the protein;
determining a hot spot search volume using said grid;
determining hot spots using a grid-like search of the hot spot search volume; and
for each type of hot spot, clustering the hot spots and retaining a desired number of top clusters of hot spots, said desired number of top clusters comprising said multiple hot spots to be employed by said matching.
7. The method of claim 1, wherein said matching comprises:
matching atoms of the at least one solution conformation of the ligand to appropriate hot spots of the protein by positioning the at least one solution conformation of the ligand as a rigid body into the binding site image;
defining a match, said match determining a unique rigid body transformation; and
using the unique rigid body transformation to place the at least one solution conformation of the ligand into the binding site of the protein.
8. At least one program storage device readable by a machine, tangibly embodying at least one program of instructions executable by the machine to perform a method of docking a ligand to a protein so as to determine ligand conformations likely to bind to said protein, said method comprising:
performing a pre-docking conformational search and generating multiple solution conformations of a ligand therefrom;
generating a binding site image of a protein, said binding site image comprising multiple hot spots;
matching hot spots of the binding site image to atoms in at least one conformation of the multiple solution conformations of the ligand to initially position said at least one conformation of said ligand as a rigid body into said binding site so as to obtain at least one position of the ligand relative to the protein in a protein-ligand complex;
optimizing the at least one position of the ligand while allowing translation, orientation and rotatable bonds of the ligand to vary, and while holding the protein fixed;
calculating a score for the optimized position of the ligand using one or more potential functions; and
selecting one or more optimized ligand positions based on said score.
9. The at least one program storage device of claim 8, additionally comprising, after performing the pre-docking conformational search and generating multiple solution conformations of the ligand, creating a database of the multiple solution conformations of the ligand and storing said three-dimensional database for subsequent use by said matching.
10. The at least one program storage device of claim 9, wherein said database of the multiple solution conformations of the ligand comprises a conformational database of a combinatorial library.
11. The at least one program storage device of claim 8, wherein said performing the pre-docking conformational search and generating multiple solution conformations of the ligand comprises:
randomly generating a plurality of conformations of the ligand;
minimizing the strain of each conformation of the plurality of conformations;
using the strain and the solvent accessible surface area of each conformation to rank the conformations; and
clustering the conformations and retaining a desired top number of clusters of conformations, said retained top number of clusters of conformations comprising said multiple solution conformations of the ligand.
12. The at least one program storage device of claim 8, wherein said generating the binding site image includes at least one of creating a list of apolar hot spots identifying points in the binding site that are favorable for an apolar atom to bind, and generating a list of polar hot spots identifying points in the binding site that are favorable for a hydrogen bond donor or acceptor to bind.
13. The at least one program storage device of claim 12, wherein said generating the binding site image further comprises:
placing a grid around the binding site of the protein;
determining a hot spot search volume using said grid;
determining hot spots using a grid-like search of the hot spot search volume; and
for each type of hot spot, clustering the hot spots and retaining a desired number of top clusters of hot spots, said desired number of top clusters comprising said multiple hot spots to be employed by said matching.
14. The at least one program storage device of claim 8, wherein said matching comprises:
matching atoms of the at least one solution conformation of the ligand to appropriate hot spots of the protein by positioning the at least one solution conformation of the ligand as a rigid body into the binding site image;
defining a match, said match determining a unique rigid body transformation; and
using the unique rigid body transformation to place the at least one solution conformation of the ligand into the binding site of the protein.
15. The at least one program storage device of claim 8, wherein multiple positions of the ligand are obtained, and said optimizing step comprises:
eliminating each position of the ligand having a predetermined percentage of atoms with a steric clash;
ranking remaining positions of the ligand using an atom pairwise score with a desired atom score cutoff, said atom pairwise score comprising a hydrogen bonding potential score or a steric potential score;
after ranking, clustering the positions of the ligand and selecting a top number n of positions; and
optimizing each of the n positions, allowing the translation, orientation and rotatable bonds of the ligand to vary.
16. The at least one program storage device of claim 15, wherein said optimizing comprises optimizing each position of the n positions using a Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimization algorithm with said atom pairwise score, allowing the translation, orientation and rotatable bonds of the ligand to vary.
US11/408,901 2000-06-15 2006-04-21 Molecular docking technique for screening of combinatorial libraries Abandoned US20070078605A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/408,901 US20070078605A1 (en) 2000-06-15 2006-04-21 Molecular docking technique for screening of combinatorial libraries

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/595,096 US7065453B1 (en) 2000-06-15 2000-06-15 Molecular docking technique for screening of combinatorial libraries
US11/408,901 US20070078605A1 (en) 2000-06-15 2006-04-21 Molecular docking technique for screening of combinatorial libraries

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US09/595,096 Continuation US7065453B1 (en) 2000-06-15 2000-06-15 Molecular docking technique for screening of combinatorial libraries

Publications (1)

Publication Number Publication Date
US20070078605A1 true US20070078605A1 (en) 2007-04-05

Family

ID=24381710

Family Applications (3)

Application Number Title Priority Date Filing Date
US09/595,096 Expired - Lifetime US7065453B1 (en) 2000-06-15 2000-06-15 Molecular docking technique for screening of combinatorial libraries
US10/320,752 Abandoned US20030228624A1 (en) 2000-06-15 2002-12-16 Molecular docking methods for assessing complementarity of combinatorial libraries to biotargets
US11/408,901 Abandoned US20070078605A1 (en) 2000-06-15 2006-04-21 Molecular docking technique for screening of combinatorial libraries

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US09/595,096 Expired - Lifetime US7065453B1 (en) 2000-06-15 2000-06-15 Molecular docking technique for screening of combinatorial libraries
US10/320,752 Abandoned US20030228624A1 (en) 2000-06-15 2002-12-16 Molecular docking methods for assessing complementarity of combinatorial libraries to biotargets

Country Status (1)

Country Link
US (3) US7065453B1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101020933B1 (en) 2010-02-19 2011-03-09 포항공과대학교 산학협력단 Method for searching ideal protein structures for inhibition of target molecules with simulation from electronic library
WO2012162320A1 (en) * 2011-05-23 2012-11-29 Schrodinger, Llc Binding affinity scoring with penalty for breaking conjugation between aromatic ligand groups

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7065453B1 (en) 2000-06-15 2006-06-20 Accelrys Software, Inc. Molecular docking technique for screening of combinatorial libraries
JP4730684B2 (en) * 2004-03-16 2011-07-20 イマジニアリング株式会社 Database system with advanced user interface and web browser using the database system
RS52854B (en) * 2009-06-11 2013-12-31 Abbvie Bahamas Limited Hepatitis c virus inhibitors
US8937150B2 (en) 2009-06-11 2015-01-20 Abbvie Inc. Anti-viral compounds
US20110137633A1 (en) * 2009-12-03 2011-06-09 Abbott Laboratories Anti-viral compounds and methods of identifying the same
NZ605440A (en) 2010-06-10 2014-05-30 Abbvie Bahamas Ltd Solid compositions comprising an hcv inhibitor
US10201584B1 (en) 2011-05-17 2019-02-12 Abbvie Inc. Compositions and methods for treating HCV
US9034832B2 (en) 2011-12-29 2015-05-19 Abbvie Inc. Solid compositions
US11484534B2 (en) 2013-03-14 2022-11-01 Abbvie Inc. Methods for treating HCV
WO2015103490A1 (en) 2014-01-03 2015-07-09 Abbvie, Inc. Solid antiviral dosage forms
JP6845157B2 (en) * 2015-05-01 2021-03-17 シュレーディンガー エルエルシーSchrodinger,Llc Physics-based calculation method for predicting the solubility of a compound
US10880754B1 (en) 2020-05-13 2020-12-29 T-Mobile Usa, Inc. Network planning tool for retention analysis in telecommunications networks
US11223960B2 (en) 2020-05-13 2022-01-11 T-Mobile Usa, Inc. Network planning tool for forecasting in telecommunications networks

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5463564A (en) * 1994-09-16 1995-10-31 3-Dimensional Pharmaceuticals, Inc. System and method of automatically generating chemical compounds with desired properties
US5495423A (en) * 1993-10-25 1996-02-27 Trustees Of Boston University General strategy for vaccine and drug design
US5642292A (en) * 1992-03-27 1997-06-24 Akiko Itai Methods for searching stable docking models of biopolymer-ligand molecule complex
US5854992A (en) * 1996-09-26 1998-12-29 President And Fellows Of Harvard College System and method for structure-based drug design that includes accurate prediction of binding free energy
US5889528A (en) * 1996-07-31 1999-03-30 Silicon Graphics, Inc. Manipulation of branching graphic structures using inverse kinematics
US20020025535A1 (en) * 2000-06-15 2002-02-28 Diller David J. Prioritization of combinatorial library screening
US20020099506A1 (en) * 2000-03-23 2002-07-25 Floriano Wely B. Methods and apparatus for predicting ligand binding interactions
US20030228624A1 (en) * 2000-06-15 2003-12-11 Pharmacopeia, Inc. Molecular docking methods for assessing complementarity of combinatorial libraries to biotargets

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ES2271971T3 (en) 1996-07-25 2007-04-16 Biogen Idec Ma Inc. INHIBITORS OF CELL ADHESION.

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5642292A (en) * 1992-03-27 1997-06-24 Akiko Itai Methods for searching stable docking models of biopolymer-ligand molecule complex
US5495423A (en) * 1993-10-25 1996-02-27 Trustees Of Boston University General strategy for vaccine and drug design
US5463564A (en) * 1994-09-16 1995-10-31 3-Dimensional Pharmaceuticals, Inc. System and method of automatically generating chemical compounds with desired properties
US5889528A (en) * 1996-07-31 1999-03-30 Silicon Graphics, Inc. Manipulation of branching graphic structures using inverse kinematics
US5854992A (en) * 1996-09-26 1998-12-29 President And Fellows Of Harvard College System and method for structure-based drug design that includes accurate prediction of binding free energy
US20020099506A1 (en) * 2000-03-23 2002-07-25 Floriano Wely B. Methods and apparatus for predicting ligand binding interactions
US20020025535A1 (en) * 2000-06-15 2002-02-28 Diller David J. Prioritization of combinatorial library screening
US20030228624A1 (en) * 2000-06-15 2003-12-11 Pharmacopeia, Inc. Molecular docking methods for assessing complementarity of combinatorial libraries to biotargets

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101020933B1 (en) 2010-02-19 2011-03-09 포항공과대학교 산학협력단 Method for searching ideal protein structures for inhibition of target molecules with simulation from electronic library
WO2012162320A1 (en) * 2011-05-23 2012-11-29 Schrodinger, Llc Binding affinity scoring with penalty for breaking conjugation between aromatic ligand groups
US9858395B2 (en) 2011-05-23 2018-01-02 Schrodinger, Llc Binding affinity scoring with penalty for breaking conjugation between aromatic ligand groups

Also Published As

Publication number Publication date
US20030228624A1 (en) 2003-12-11
US7065453B1 (en) 2006-06-20

Similar Documents

Publication Publication Date Title
US20070078605A1 (en) Molecular docking technique for screening of combinatorial libraries
Diller et al. High throughput docking for library design and library prioritization
Clark et al. Flexible ligand docking without parameter adjustment across four ligand–receptor complexes
Jones et al. Development and validation of a genetic algorithm for flexible docking
Muegge et al. Small molecule docking and scoring
Lyne Structure-based virtual screening: an overview
Marrone et al. Structure-based drug design: computational advances
Gillet et al. SPROUT: a program for structure generation
Lorber et al. Hierarchical docking of databases of multiple ligand conformations
Taylor et al. A review of protein-small molecule docking methods
Jones et al. Docking small-molecule ligands into active sites
Lewis et al. Current methods for site-directed structure generation
Wallqvist et al. Docking enzyme‐inhibitor complexes using a preference‐based free‐energy surface
US20020025535A1 (en) Prioritization of combinatorial library screening
Kroemer Molecular modelling probes: docking and scoring
US20070166760A1 (en) Ligand searching device, ligand searching method, program, and recording medium
Kapoor et al. Discovery of novel nonactive site inhibitors of the prothrombinase enzyme complex
Zhao et al. Protein–ligand docking with multiple flexible side chains
Beierlein et al. Quantum mechanical/molecular mechanical (QM/MM) docking: an evaluation for known test systems
Knegtel et al. Comparison of two implementations of the incremental construction algorithm in flexible docking of thrombin inhibitors
AU780941B2 (en) System and method for searching a combinatorial space
Beavers et al. Structure-based combinatorial library design: methodologies and applications
Trosset et al. Flexible docking simulations: Scaled collective variable Monte Carlo minimization approach using Bezier splines, and comparison with a standard Monte Carlo algorithm
Hanessian et al. A comparative docking study and the design of potentially selective MMP inhibitors
JP4314206B2 (en) Ligand search device, ligand search method, program, and recording medium

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION