EP1140737A2 - System und verfahren zum auf der struktur basierendem entwurf von arzneimitteln mit genauer vorhersage von freien bindungsenergien - Google Patents

System und verfahren zum auf der struktur basierendem entwurf von arzneimitteln mit genauer vorhersage von freien bindungsenergien

Info

Publication number
EP1140737A2
EP1140737A2 EP99967640A EP99967640A EP1140737A2 EP 1140737 A2 EP1140737 A2 EP 1140737A2 EP 99967640 A EP99967640 A EP 99967640A EP 99967640 A EP99967640 A EP 99967640A EP 1140737 A2 EP1140737 A2 EP 1140737A2
Authority
EP
European Patent Office
Prior art keywords
molecule
free energy
protein
receptor site
grown
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP99967640A
Other languages
English (en)
French (fr)
Inventor
Robert S. Dewitte
Eugene I. Shakhnovich
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harvard University
Original Assignee
Harvard University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harvard University filed Critical Harvard University
Publication of EP1140737A2 publication Critical patent/EP1140737A2/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment

Definitions

  • the present invention generally relates to systems and methods for de novo structure-based drug design. More specifically, the present invention relates to systems and methods for de novo structure-based drug design that includes a method for accurately predicting the binding free energy in building novel therapeutic molecules or ligands.
  • Finding leads and optimizing the leads that are found are the goals in the development of molecules or ligands that are useful in combating disease and other abnormal body conditions.
  • this has involved using, among other things, a molecular similarity method of drug design.
  • This is just one of a number of methods for determining useful leads in the search for new or improved bioactive, therapeutic molecules or ligands.
  • the molecular similarity method of drug design is based on evaluating a large range of chemical structures to find those that show either similarity with each other or complementarity with a biochemical target structure. Chemical structures identified in this manner, usually have a reasonably high probability of binding at the target structure.
  • This search for appropriate chemical structures is enhanced and improved by computer technology which uses search algorithms to search large databases of chemical structures.
  • Other methods to develop desired leads or ligands has been to use high throughput screening or combinatorial chemistry. These conventional methods have been proven, in certain circumstances, to derive desired leads and ligands.
  • Structure-based molecular design is yet another method to identify lead molecules for drug design. This method is based on the premise that good inhibitors possess significant structural and chemical complementarity with their target receptors. This design method can create molecules with specific properties that make them conducive for binding to the target site.
  • the molecular structures that are designed by the structure-based design process are meant to interact with biochemical targets, for example, whose three-dimensional structures are known.
  • the structure-based drug design method has distinct advantages over prior art methods.
  • One important advantage is that the structure-based drug design method provides over traditional methods of lead discovery and optimization is an awareness of the intermolecular interactions that are possible.
  • One goal of the structure-based drug design method is to identify lead molecules. This may be accomplished in a variety of ways. However, the principal generic steps in most structure-based drug design methods are: (1) to identify and determine the structure of the receptor site; (2) to use theoretical principles and experimental data to propose a series of putative ligands that will bind to the receptor sites (which ligands are synthesized and tested for their complementarity); (3) to make a determination of the structure of the receptor /ligand complexes that are successful in binding at low free energy levels; and (4) to iterate steps 2 and 3 in an effort to further enhance binding.
  • a structure-based drug design method may be enhanced through the use of advanced methods of computation which will expedite the identification of key molecular fragments (which may then joined to form molecules) or whole molecules (either from a database of existing compounds or through a molecular growth algorithm). These computational advances enhance the ability to develop molecules or ligands which will successfully bind to macromolecular receptor sites.
  • Docking of ligands has three components: (1) site/ligand description, (2) juxtaposition of the ligand and the site frames of reference, and (3) the evaluation of the complementarity.
  • site/ligand description the atomic coordinates of the receptor macromolecules are obtained through methods, such as X-ray crystallography, nuclear magnetic resonance (NMR), or homology modeling.
  • Site descriptions simply may be the atomic coordinates of the receptor site; however, some notion of the chemical properties of the atoms is needed if there are hopes to measure chemical complementarity, and not just spatial complementarity.
  • Site volume also may be defined if a site boundary is not identifiable. Ligand descriptions closely parallel site descriptions.
  • the general desire in molecular docking is to obtain the lowest free energy structure for the receptor-ligand complex. Moreover, this search for the lowest free energy selects the best fit at each position of the structure with regard to the lowest binding free energy and selection criteria.
  • (1) databases of putative ligands are searched and identified ligands are ranked according to their respective interactive energies with a particular receptor site and (2) computational studies are made of the geometry of particular complexes.
  • of interest for ligand design is the free energy of binding (_g w réelle ⁇ think ? ).
  • a first method is the Geometric method that matches ligand and receptor site descriptors.
  • a second method is to align the ligand and receptor by minimizing the ligand and receptor interactive energy.
  • energy-driven searching may be based on molecular dynamics ("MD") and traditional Monte Carlo (“MC”) simulations. These methods, however, require a tremendous amount of computational time. Finding the lowest energy state of a given ligand-receptor complex using either of these methods is a fundamental problem.
  • descriptor matching methods an analysis is made of the proposed receptor region at which binding is to take place. Ligand atoms are then positioned at the best locations at the site. This gives an approximated ligand-receptor configuration that may then be refined by optimization. Descriptor matching methods are reasonably fast and provide a good sampling of the region of interest at the receptor site. Many of the descriptor matching methods use algorithms that employ combinational search strategies. As such, small changes in parameter values can cause the computational time required to become unreasonably long.
  • DOCK is one of the earliest descriptor matching programs. DOCK software was developed at the University of California at San Francisco and provides a method that attempts to solve the problem by developing a drug by creating a negative image of the target site, searching a database for a similar ligand, placing putative ligands into the site, and evaluating the quality of the fit.
  • DOCK uses spheres locally complementary to the receptor surface to create a space-filling negative image of the receptor site. Several ligand atoms are matched with receptor spheres to generate chiral orientations of the ligand in the site. Databases of small molecules are searched for candidates that complement the structure of the receptor site.
  • DOCK DOCK is unable to suggest any novel structures, it can only search for what is in a database.
  • CAVEAT software also a descriptor matching method, is based on directional characterization of ligands.
  • CAVEAT was developed at the University of California at Berkeley. This program searches for ligands with atoms located along specified vectors. The vectors are derived from structural information from known complexes. CAVEAT focuses on searching ligand databases to find templates as starting points for chemical structures.
  • FOUNDATION software provides a descriptor matching method that attempts to combine models of crucial ligand atoms and structure-based models.
  • FOUNDATION In using FOUND ATION, the searcher identifies atom and binding types that the candidate molecule must possess.
  • FOUNDATION relies heavily on the detailed atom-type, bond-type, chain length, and topology constraints provided by the searcher to restrict the search.
  • FOUNDATION only considers the steric component of the active site and relies on the matching information to find chemically complementary ligands. The tight constraints that are required by the FOUNDATION program restrict the candidates to one orientation at the receptor site.
  • CLIX software developed at CSIRO, an Australian company, resembles DOCK by using the receptor site to define possible binding configurations.
  • CLIX relies on an elaborate chemical description of the receptor site. This program uses fewer receptor-ligand matches than does DOCK. CLIX also evaluates interaction energies at the receptor sites.
  • Grid search methods are used to sample the six degrees of freedom of the orientation space. These methods identity an approximate solution, which cannot be guaranteed with discrete sampling methods. Accuracy is limited by the step size used in the search of the various positions. The size of step also determines the time of the search, i.e., the greater the number of incremental steps, the greater the search time. Methods that use additional sampling in regions of high complementarity can overcome this problem.
  • a first type of grid search method is a side chain spheres method. This method explores protein-protein complexes using simplified sphere representations of side chain atoms and a grid search of four rigid degrees of freedom. This program uses surface evaluation algorithms, full molecular force-field evaluations of complexes, and simulated annealing to refine initial docking structures.
  • a second type of grid search method is a soft docking method. According to this method, receptor and ligand surfaces are divided into cubes to generate the translational part of the search. A pure rotational grid search is conducted on the sample ligand at orientations in discreet angular increments. The accuracy is limited by size. Run-time scaling is the cube of the rotational step size and is a product of the number of the receptor-ligand surface points. Fragment-joining methods identify regions of high complementarity by docking functional groups independently into receptors. These methods are not particularly bothered by rigid ligand issues because of the added combinational search. Fragment-joining methods suggest unsynthesized compounds, but connecting the fragments in sensible, synthetically accessible patterns is difficult. Fragment-joining methods have the problem that there is a need to connect functional groups to form complete molecules while maintaining the fragments at the geometric positions of lowest energy.
  • GROW is a fragment-joining method which has been used to design peptides complementary to proteins of a known structure.
  • the software was developed at Upjohn Laboratories, Kalamazoo, Michigan. In operation, a seed amino acid is placed in the receptor site followed by iterative additions of amino acids. Conformations are chosen from a library of precalculated low-energy forms. At each addition of a peptide, the peptide-receptor complex is minimized and evaluated. Only the best 10-100 low energy structures are kept at any stage.
  • GROWMOL software generates molecules by evaluating each new atom added to molecules according to the chemical complementarity of the atom to nearby atoms on the molecule.
  • a Boltzmann weighing factor is used to bias the probability of selection towards atoms with a high complementarity score.
  • the chemical complementarity is determined by calculating the number of hydrophobic contacts (i.e. the number of ligand carbons other than carbonyl carbons which occupy a predefined "hydrophobic zone") and the number of hydrogen bonds (i.e. the number of ligand hydrogens in a pre-defined "hydrogen acceptor zone" plus the number of ligand oxygens found in a pre-defined "hydrogen bond donor zone").
  • GROWBUILD software grows molecules by the addition of fragments from a library consisting of a single functional group such as a hydroxy, a carbonyl, or a benzene ring. At each setup, possible fragment additions are evaluated according to the molecular mechanics energy and one of the best is randomly chosen. No information about critical binding regions is used in the beginning to identify disconnected regions of the active site which must be filled.
  • HOOK developed at Harvard University, Cambridge, Massachusetts, is a fragment-joining method that finds hot spots in receptor sites by looking for low energy locations for functional groups. HOOK uses random placement of many copies of several functional fragments followed by molecular dynamics.
  • MCSS-HOOK-DLD methods involve the location of favorable interaction sites for molecular fragments by performing a multiple copy simultaneous search (MCSS).
  • MCSS multiple copy simultaneous search
  • the protein is subject to the average potential field of the ligands using the CHARMM empirical force field.
  • the resulting interaction sites unlike with GRID, contain orientation information and can be linked together with bonding force fields and linker sp 3 and sp 2 carbon atoms via DLD (dynamic ligand design) or molecular fragments in a database (HOOK).
  • DLD dynamic ligand design
  • HOOK molecular fragments in a database
  • BUILDER software uses a family of docked structures to provide an irregular lattice of controllable density. This lattice can be searched for paths that link molecular fragments.
  • LUDI is a fragment-joining method that proposes inhibitors by connecting fragments that dock into microsites on the receptor.
  • LUDI was developed at BASF, Stuttgard, Germany.
  • the fragments are from a predetermined list of molecular fragments.
  • the microsites are defined by hydrogen bonding and hydrophobic groups.
  • Ligand pseudoatom positions are generated within microsites on the basis of an appropriate angle and distance minima for various interactions.
  • the fragments identified are connected using linear chains composed of one or more of 12 functional groups.
  • GRID software is a hybrid grid/fragment-joining method that places small fragment probes at many regularly spaced grid points within an active receptor site. This program, developed at Oxford University, England, has been found to reproduce the positions of important hydrogen bonding groups. GRID uses empirical hydrogen bonding interaction potential and spherical representations of functional groups to generate affinity contours for various molecular fragments. This identifies regions of high and low affinity. The contours may be used to guide chemical intuition or as an input for other analysis programs. GRID is limited by its representation of the fragments since it does not allow prediction of fragment orientation. A related program is HSITE which generates a map of the hydrogen-bonding regions of an enzyme active site, including the probability of hydrogen bond formation at each point.
  • T ⁇ s c.omplex formation The change in entropy due to the reduction in flexibility in both the hgand and the protein upon complex formation.
  • ⁇ e Mlmlm ⁇ xhm m The energies of solvation are the energetic factors from the transfer of hydrophilic and lipophilic groups from an aqueous solvent to the more lipophilic region of the protein binding site.
  • the present invention provides a system and method that enhances the ability to conduct searches.
  • a quick and accurate free energy estimation method is used for testing only the most meaningful combinations.
  • the scoring is based solely on the interaction energy between the ligand and protein in the complex as the single most important contributor to free energy.
  • ranking may be based more on spacial complementarity than chemical complementarity.
  • the solvation contributions are taken as being an approximation of the surface area.
  • the scoring is incomplete, which adversely affects the accuracy of the rankings of the candidate molecules or ligands.
  • An object of the present invention is to provide a novel system and method for structure-based drug design.
  • the invention provides a method of de novo designing molecules that bind to a receptor site on a protein comprising the steps of:
  • (a) building a molecule in the receptor site comprising: adding successive random molecular fragments to an initial molecular fragment that is loaded into the receptor site, estimating the free energy of the molecule being grown after each addition of a molecular fragment, and orienting each successive molecular fragment as it is added to the receptor site such that the free energy estimate for the molecule may be higher than a lowest free energy estimate possible for the molecule;
  • step (b) repeating step (a) to generate a collection of molecules grown in the receptor site, and ranking the collection of molecules according to increasing free energy estimates to identify high-ranking molecules;
  • step (c) selecting one or more functional groups of a high-ranking molecule identified in step (b) as a single restart fragment and using the restart fragment to build a second-generation of molecules according to steps (a) and (b),
  • step (f) modifying high-ranking molecules from step (f) based on qualitative analysis of the molecules including determination of chemical viability, synthetic feasibility, solubility, and effect of the molecule on the structure of the protein, whereas such modification comprises: atomic and/or functional substitutions, initiating growth from a specific receptor site, inclusion of salt bridges or hydrogen bonds, and solubility-enhancing measures.
  • the receptor site is selected from the group consisting of the following: Src-homology-3 domain, Src-homology-2 domain, MDM2 protein, CD4 protein, and carbonic anhydrase protein (particularly, human carbonic anhydrase II protein).
  • the empirical interaction energy comprises CHARMM interaction energy and the empirical force field comprises CHARMM.
  • Another object of the present invention is to provide a novel system and method for structure-based drug design that has a more accurate method for predicting the binding free energy at the protein-ligand complex.
  • a yet further object of the present invention is to provide a novel system and method for building candidate molecules or ligands for binding at a receptor site that uses a more accurate method for predicting binding free energy at the protein-ligand complex.
  • Another object of the invention is to provide libraries of ligand candidates that bind to a receptor site of interest that have been generated using a de novo structure- based design method.
  • Figure 1 shows a block diagram of the method employed by the system of the present invention.
  • Figure 2 is a block diagram for the method of estimating binding free energy that may be used in the molecule growth method employed by the system of the present invention.
  • Figure 3 is a flow diagram of the molecular growth method of the system of the present invention.
  • Figure 4 shows an example of a protein receptor site with at least an H 2 molecule loaded in it.
  • Figure 5 shows an example of a protein receptor site with a molecule being built in it.
  • Figure 6 shows first generation molecules as ligand candidates for the specificity pocket of Src SH3 domain.
  • Figure 7 shows second- and third- generation molecules as ligand candidates for the specificity pocket of Src SH3 domain.
  • Figure 8 shows a candidate ligand, 7e, from Figure 7 that is able to form three hydrogen bonds as well as a significant ⁇ -stack with both Tyr55 and Trp42: (a) molecular structure of the candidate; (b) licorice diagram of the ligand in the binding site showing the residues with which a strong ligand should make interactions; (c) space-filling model showing the Il-stacking with Tyr55 and Trp42; and (d) another view of the space-filling model.
  • Figure 9 shows first- (b-e), second- (f-k) and third- (1-n) generation molecules as ligand candidates for the LP pocket of Src SH3 domain.
  • the peptide PLPP that occupies the LP pocket is represented by "a.”
  • the novel peptide molecule is represented by "o"; various side chains, R, are shown in Figure 10.
  • Figure 10 shows molecule, 9o, of Figure 9, with various side chains as candidate ligands for the LP pocket of Src SH3 domain.
  • Figure 11 shows a candidate ligand for the LP pocket of Src SH3 domain: (a) molecular structure of the ligand; (b) licorice diagram of the ligand in the building site showing the residues with which a strong ligand should make interactions; (c) space-filling model; and (d) another view of the space-filling model.
  • Figure 12 shows the correlation between coarse-grained knowledge-based potential data and experimental binding constants in a series of ligands for the specificity pocket of Src SH3 domain.
  • the experimental binding constants are plotted on the log scale since the logarithm of the binding constant is proportional to the experimental binding free energy.
  • Figure 13 shows first-generation molecules as ligand candidates for CD4. Based on the free energy estimate and empirical interaction energies, these seven molecules are the best of 1000 molecules generated in the binding site.
  • Figure 14 shows second-generation molecules as ligand candidates for CD4.
  • the molecules generated were manually manipulated to improve qualitative and quantitative characteristics, including increasing II-stacking interaction, adding a bridge from the flexible chain connecting to the pyridine group to the sugar-like ring of carbon, and substitution of carbon for the oxygen atom on the seven-membered ring.
  • Figure 15 shows a candidate ligand for the Phe43 binding pocket of CD4: (a) molecular structure of the candidate; (b) licorice diagram of the ligand in the binding site showing the residues with which a strong ligand should make interactions; (c) space-filling model of the ligand; (d) diagram showing the protein as a space-filling model and the ligand as a licorice diagram.
  • the present invention is a system and method for computational de novo structure-based drug design that employs a novel method for discovery and building of ligands, and a more accurate method for predicting binding free energy. Accordingly, the system and method of the present invention provide a better predictive de novo structure-based drug design tool by using a coarse-graining model with corresponding knowledge-based potential data. Moreover, in light of the use of the coarse-graining model, the novel molecular growth method of the present invention uses a metropolis Monte Carlo selection process for molecule growth that builds the molecules or ligands that result in a low free energy structure, but not necessarily the lowest free energy structure. Yet such a structure that is grown has a more accurate prediction of binding free energy and can be an acceptable drug design candidate.
  • the molecular growth method of the present invention uses the Metropolis Monte Carlo method to quickly search and sample the configuration space of the binding site. This is done with knowledge of the interactive potential for the fragments that are part of a database.
  • the Metropolis Monte Carlo method also gives the system and method of the present invention the ability to identify very quickly fragments that will be useful in building the molecules or ligands.
  • Coarse-graining is a procedure commonly used in statistical mechanics to allow one to focus attention on events at an intermediate length scale so that one can deduce general trends without being overwhelmed by the variations in the most minute details.
  • coarse-graining results in an averaging of the interaction potential within a space of a particular size. The size of these spaces should correspond to some physical distances in the system so that one can be assured of the essence of the details subsumed in the averaged potential.
  • coarse-graining entails choosing a length scale that corresponds roughly with the distance over which a molecule can induce order within an aqueous solvent.
  • a radius of contact between atoms of a protein and a ligand is defined and, when examining a database of crystal structures of protein- ligand complexes or evaluating the binding interactions in the course of design, atoms within a radius of contact (otherwise called an interaction radius) are considered to be in contact with one another.
  • One advantage of the coarse-graining approach is that it integrates the solvent entropy terms of ligand binding into its potential surface.
  • Another advantage of coarse-graining is that the potential surface is smoothed by the local averaging. This allows the space of possible molecules to be searched very efficiently by a Monte Carlo growth algorithm.
  • a knowledge-based potential is a set of interaction parameters that measure the contribution of various types of contacts to the free energy estimate. These parameters are derived from a database of structures by collecting statistics on the frequencies with which contacts are formed between all the various atom types. In combination with coarse graining, the knowledge-based potential provides a system for estimating binding free energies based on physical statistical inference.
  • the coarse-graining model with knowledge-based potential data follows from the application of the principles of canonical statistical mechanics to subsets of proteins.
  • the model includes the determination that small subsets of a folded protein are in thermal equilibrium with each other.
  • the present invention employs these principles.
  • the information present in the crystal structures of proteins and crystal structures of protein-ligand complexes may be disassembled into constituent parts and the contribution of each part to the binding free energy may be assigned on the basis of probability. This permits the present invention to achieve the more accurate results for binding free energy predictions and applying them in the building of molecules or ligands.
  • the identification of candidate molecules is not just a search for the lowest free energy complex, but is a search to identify candidate molecules or ligands that form low free energy complexes at the receptor site and gives the best lead for drug design. This is accomplished using the growth method that employs a metropolis Monte Carlo selection process. The molecular growth method results in low free energy candidates generated of a desired length.
  • the present invention is a system and method for molecular growth through structure-based drug design that uses a novel method for building candidate molecules or ligands, and libraries of ligand candidates, and which employs a more accurate method to predict the binding free energy of the molecules or ligands as they are grown.
  • the system and method of the present invention also allows the growth of candidate molecules or ligands in which there is a more accurate prediction of the binding free energy in a reasonable amount of computational time.
  • the method of the present invention allows one to quickly assess interactions to build a strong binding ligand and continuously provides suggestions for alterations and extensions of molecules that result in excellent chemical and spatial complementarity with the protein binding site.
  • the method of the present invention provides a quantitative score based on coarse-graining model with knowledge-based potential data (hereinafter such a quantitative score will be referred to as "free energy estimate score") which is related directly to and is an approximation of the experimentally found binding free energy so that changes to a candidate molecule can be assessed quantitatively.
  • free energy estimate score a quantitative score based on coarse-graining model with knowledge-based potential data
  • the structure-based design allows for the use of partially grown molecules as restart fragments and for the insertion of a specific interaction known to exist in naturally-occurring ligands (such as a salt bridge or II-II interactions), both of which affect the direction in which a candidate molecule is grown.
  • ligands such as a salt bridge or II-II interactions
  • Other advantages over methods of the prior art include efficiency of time, ability to generate and evaluate whole molecules rather than separate fragments which later require linkage, and ability to correlate the scoring method to known free energies of binding.
  • Binding is a physical event in which a ligand is associated with a receptor site in a stable configuration.
  • Docking is a computational procedure whose goal is to determine the configuration that will permit binding.
  • De novo structure-based drug design is meant to refer to a process of dynamically forming a molecule or ligand which is conducive to binding with a particular receptor site using knowledge of the protein structure.
  • Heavy Atom refers to any non-hydrogen atom within a molecule.
  • Ligand is a molecule that will bind with a target receptor.
  • Ligand candidate is a ligand proposed as a potential ligand. When a ligand candidate has been demonstrated experimentally to bind with a target receptor, it is redesignated a "ligand.”
  • Lead compound is a ligand that has been demonstrated experimentally to possess potential drug, therapeutic, and /or pharmaceutical uses.
  • Molecule refers to a combination of atoms bound together to form the smallest unit of matter of a molecular compound.
  • Retainer in the context of structure-based design is a molecule that has been oriented in a binding site by torsional rotation of the bond between the molecule and the binding site.
  • Fragment is an atom or functional group used in the construction of a ligand or molecule.
  • Restart fragment is an atom or function group from a partially or wholly grown molecule, instead of a fragment selected at random from a library of fragments, that is used as the starting input for construction of a ligand or molecule using the method of the present invention
  • Interaction radius is the maximum distance within which a protein and ligand atoms interact to bind to one another.
  • Quasichemical approximation is a mathematical method for converting probabilities to energies based on the principles of canonical statistical mechanics.
  • Empirical interaction energy refers to a formulation, in terms of mathematical equations and parameters, of the physical energy of interaction between a ligand candidate and a protein, wherein the specific form of the equations are adapted from studies of simpler interacting systems.
  • alkyl refers to straight and branched chain aliphatic groups having from 1 to 12 carbon atoms, preferably 1-8 carbon atoms, which may be optionally substituted with one, two or three substituents. Unless otherwise specified, the alkyl group may be saturated, unsaturated, or partially unsaturated. As used herein, therefore, the term “alkyl” is specifically intended to include alkenyl and alkynyl groups, as well as saturated alkyl groups.
  • Preferred alkyl groups include, without limitation, methyl, ethyl, propyl, isopropyl, butyl, tert- butyl, isobutyl, pentyl, hexyl, vinyl, allyl, isobutenyl, ethynyl, and propynyl.
  • a "substituted" alkyl, cycloalkyl, aryl, or heterocyclic group is one having between one and about four, preferably between one and about three, more preferably one or two, non-hydrogen substituents.
  • Suitable substituents include, without limitation, halo, hydroxy, nitro, haloalkyl, alkyl, alkaryl, aryl, aralkyl, alkoxy, amino, alkylcarboxamido, arylcarboxamido, aminoalkyl, alkoxycarbonyl, carboxy, hydroxyalkyl, alkanesulfonyl, arenesulfonyl, alkanesulfonamido, arenesulfonamido, aralkylsulfonamido, phosphorylalkylcarbonyl, cyano, and alkylaminocarbonyl groups.
  • cycloalkyl as employed herein includes saturated and partially unsaturated cyclic hydrocarbon groups having 3 to 12, preferably 3 to 8 carbons, wherein one or two ring positions may be substituted with an oxo group, and wherein the cycloalkyl group additionally may be optionally substituted.
  • Preferred cycloalkyl groups include, without limitation, cyclopropyl, cyclobutyl, cyclopentyl, cyclopentenyl, cyclohexyl, cyclohexenyl, cyclohexanone, cycloheptyl, and cyclooctyl.
  • aryl group is a C 6 -C, 4 aromatic moiety comprising one to three aromatic rings, which may be optionally substituted.
  • the aryl group is a -C 10 aryl group.
  • Preferred aryl groups include, without limitation, phenyl, naphthyl, anthracenyl, and fluorenyl.
  • An "arylalkyl” group comprises an aryl group covalently linked to an alkyl group, either of which may independently be optionally substituted or unsubstituted.
  • the arylalkyl group is C 1 - ⁇ alk(C 6 . 10 )aryl, including, without limitation, benzyl, phenethyl, and naphthylmethyl.
  • alkaryl or “alkylaryl” group is an aryl group having one or more alkyl substituents.
  • alkaryl groups include, without limitation, tolyl, xylyl, mesityl, ethylphenyl, and methylnaphthyl.
  • a “heterocyclic” group is a ring structure having from about 3 to about 8 atoms, wherein one or more atoms are selected from the group consisting of N, O, and S.
  • the heterocyclic group may be optionally substituted on carbon with oxo or with one of the substituents listed above.
  • the heterocyclic group may also independently be substituted on nitrogen with alkyl, aryl, aralkyl, alkylcarbonyl, alkylsulfonyl, arylcarbonyl, arylsulfonyl, alkoxycarbonyl, aralkoxycarbonyl, or on sulfur with oxo or lower alkyl.
  • Preferred heterocyclic groups include, without limitation, epoxy, aziridinyl, tetrahydrofuranyl, pyrrolidinyl, piperidinyl, piperazinyl, thiazolidinyl, oxazolidinyl, oxazolidinonyl, and morpholino.
  • the heterocyclic group is a heteroaryl group.
  • heteroaryl refers to groups having 5 to 14 ring atoms, preferably 5, 6, 9, or 10 ring atoms; having 6, 10, or 14 ⁇ electrons shared in a cyclic array; and having, in addition to carbon atoms, between one and about three heteroatoms selected from the group consisting of N, O, and S.
  • Preferred heteroaryl groups include, without limitation, thienyl, benzothienyl, furyl, benzofuryl, pyrrolyl, imidazolyl, pyrazolyl, pyridyl, pyrazinyl, pyrimidinyl, indolyl, quinolyl, isoquinolyl, quinoxalinyl, tetrazolyl, oxazolyl, thiazolyl, and isoxazolyl.
  • the heterocyclic group is fused to an aryl or heteroaryl group.
  • fused heterocyles include, without limitation, tetrahydroquinoline and dihydrobenzofuran.
  • acyl refers to an alkylcarbonyl or arylcarbonyl substituent.
  • acyloxy refers to an alkyloxycarbonyl or aryloxycarbonyl group.
  • amido refers to formylamino, alkylcarbonylamino, or arylcarbonylamino.
  • amino is meant to include NH 2 , alkylamino, arylamino, and cyclic amino groups.
  • Figure 1 generally at 100, provides a general schematic for the method of the present invention for building and ranking lead candidates for de novo structure- based drug design.
  • molecular growth method 108 receives inputs from 102 and 104.
  • information regarding the 1) protein structure and 2) its binding site is provided. This information includes the coordinate of the binding site, protein atom coordinates, and protein atom types in a standard Brookhaven Protein Data Bank ("PDB") format or notation.
  • PDB Brookhaven Protein Data Bank
  • the free energy estimate method at 104 provides the estimate of free energy of the molecule or ligand that is being built.
  • the free energy method receives input from energy table 106 which is developed according to Figure 3, which will be described subsequently. Once a number of molecules or ligands have been built, they are ranked at 110, usually based on their binding free energy, as lead candidates for drug design.
  • the second input to molecular growth method 108 is the free energy estimate, 104, for the molecule being built.
  • this free energy estimate method uses the information from the calculated free energy database developed according to Figure 3. This prediction is based on proper selection of an interaction model and reference state, selection of an appropriate interaction radius, knowledge of the atom types at issue, information with regard to known structures of protein-ligand complexes (knowledge-based potential data), and quasichemical approximations.
  • the molecular growth method at 108 is set forth in detail in Figure 2, generally at 200, and will be described subsequently.
  • This method employs a Metropolis Monte Carlo ("MMC") selection process to control the acceptance of intermediate and finished molecules or ligands as conditions based on their estimated binding free energy.
  • MMC Metropolis Monte Carlo
  • the energy table (based on Figure 3) is calculated. This table is calculated once for a given set of parameters. When the parameters are changed, the table is recalculated. For example, if the interaction radius is changed from 5A to 3A, the table may be recalculated. It is necessary to calculate the table at or near the beginning of the method of the present invention so that it may be accessed in building molecules or ligands.
  • step 102 the protein with a target receptor site is loaded.
  • step 102 the information about the protein that has the target receptor is input from 108 of Figure 1.
  • This information describes the protein structure and the binding site at issue. In describing the binding site, its relative position is given in the form of a coordinate. From this point, the molecular growth method will grow a molecule or ligand that is a simple organic molecule which consists of fragments joined with single bonds.
  • a hydrogen molecule (“H 2 ") is positioned randomly in the binding site of the protein at the coordinate. This H 2 molecule is considered to be the existing molecule to satisfy step 206. According to step 206, one of the H atoms is randomly selected to be the site of the new bond.
  • Step 208 is performed by selecting at random a fragment from a library of fragments.
  • a library could include the fragments set forth in Table 1:
  • step 210 there is the random selection of at least one H atom on the randomly selected fragment.
  • the selected H atom from the H 2 molecule and the H atom from the fragment will form the first fragment bond for the molecule being built at the protein binding site.
  • the first (and new) bond is formed between the first fragment and the remaining H atom from the H 2 molecule loaded at step 204. As this bond is formed, the two H atoms that were selected are eliminated. By following the method of the present invention, it is assured that the new bond angles and bond lengths are reasonable approximations.
  • the next step is at 214 where the new fragment is oriented with respect to the bond just created and binding site by torsional rotation about the new bond so that the new fragment is properly situated at the binding site at a low energy level.
  • the orientation is performed in fixed increments. These fixed increments are the smallest that will still permit reasonable computational times. Most preferably, the torsional orientation takes place in 60° increments. Orientations that are not sterically hindered are evaluated for their free energy value at step 216. Most preferably, the orientations are such that atom pairs are within 70% of the sum of their van der Waals radii. The position of the fragment or rotamer that yields low energy or the lowest energy is considered as a candidate for the molecule that is being grown.
  • the molecular growth method advances to step 218.
  • the system and method of the present invention evaluates the combinatorial search space, (the binding site of the protein), which is a rough energy landscape, to develop and identify candidate lead molecules that have a low, not necessarily the lowest, free energy complex.
  • the system and method of the present invention overcome the multiple minimum problem, by using a MMC selection process at step 218.
  • the MMC selection process at 218 makes a comparison with respect to the energy per atom before the current growth step and after the growth step at the optimal orientation. If there is a decrease in the energy per atom with the new built step, then that orientation is accepted as a condition. If an increase in energy per atom is experienced, however, it also is accepted as a condition but with a probability defined by Example (2): exp
  • present invention allows the energy per atom to increase occasionally. This is needed if a small molecule is grown into a tight steric region of the binding site and the molecule had to be grown into the solvent or other unoccupied region and only marginally interact with the protein was present. Thus, allowing such an increase may provide an opportunity for a subsequent larger decrease in the free energy of binding.
  • the step at 220 is to determine if the molecule that has been built is large enough at this point in time. If it is, then the method is directed to step 222. At step 222, it is determined if another molecule is to be built at this same binding site of the protein. If no additional molecules are built, the method will cease generating molecules for this protein binding site and the method will go to step 230 to wait for a next protein for which a candidate molecule or ligand is to be built. If the answer at step 222 is "yes" and another molecule is to be generated, then the method is directed back to step 204 where H 2 is added to the binding site of the protein where the new molecule or ligand is to be built.
  • step 206 the method of the present invention returns to step 206 where another fragment is randomly selected for the addition to the existing molecule in the described manner.
  • the temperature that is selected is one that generates the largest number of low energy structures per unit of time.
  • the nearest approach of two atoms is a percentage of the sum of their van der Waals radii since this will provide a good correlation with the nearest approaches in the database.
  • the selected incremental amount for torsional orientation of the fragments can be further refined to smaller increments. However, such smaller increments may result in significantly more computational time to obtain results.
  • the items at 302 and 304 are combined with the items at 306 to generate the statistics of atomic interactions of known protein-ligand complexes, which is shown at 310.
  • the information at 310 forms the knowledge-based potential interaction data that is used for estimating binding free energy of particular molecule being built.
  • a large interaction radius for the atoms that are to be bound in the protein-ligand complex is selected.
  • a large interaction radius is selected because it will permit the solvation entropy effects to be accounted for.
  • the most feasible length is selected for the interaction radius.
  • Table 2 provides examples of information that is included at 304:
  • Item 306 therefore, includes at least information about the atom-types that are suspected will be present at the protein-ligand complex.
  • this database includes a listing of structures of protein-ligand complexes that are known, their coordinates, and corresponding chemical elements. This database is one that can be continually updated as more information is obtained.
  • the method of the present invention was applied to protein-ligand complexes for which structural and binding information has been previously determined experimentally.
  • a sample of the protein-ligand complexes for which information is available in the Brookhaven Protein Data Bank (“PDB”) is shown in Table 3.
  • PNP purine nucleoside phosphorylase
  • LST amino acid binding protein
  • the information at 310 is the statistics of atomic interactions in known protein-ligand complexes which will permit a more accurate prediction of binding free energy for the molecule or ligand being built.
  • the next step is to compile the statistics at 310 into a set of interaction parameters that constitute the free energy contribution of interactions between specific atom types. Given the probability of atomic interactions in known protein-ligand complexes from 310, this information, along with reference state for the molecule or ligand being built, are combined. This reference state is selected such that it accounts for the solvent energy and configuration entropy effects.
  • the third item that is combined to generate the estimate from binding energy is an approximation of the protein-ligand complex of the molecule or ligand being built that is at 312.
  • the approximation utilized is a quasichemical approximation.
  • the results is the energy table at 106 that is used to provide the estimated free energy contributions for each type of interaction possible for the molecule or ligand being built.
  • An example of the data provided in the free energy table at 106 is shown in Table 4:
  • Figures 4 and 5 Before discussing the statistical support for the present invention, Figures 4 and 5 will be described. Those Figures graphically demonstrate aspects of Figures 1, 2, and 3. Referring to Figures 2 and 4, when protein 402, shown generally at 400, is loaded, it is defined and the coordinate of the binding site, such as coordinate "X" at 404, is determined. Next H 2 molecules, such as H 2 molecule 406, is added to the binding site. As would be understood, more than one H 2 molecule may be added to a single binding site.
  • H atom 408 or 410 is selected for forming the new bond.
  • H atom 410 is selected. Once the selection is made, a fragment is randomly chosen from the fragment library and an H atom of that fragment is selected for establishing the new bond.
  • one H atom of fragment 502 is selected for forming the first bond.
  • the selected H atoms of H 2 molecule 406 ( Figure 4) and fragment 502 are eliminated.
  • new bond 504 is formed, the first fragment has been added in building the molecule.
  • the added fragment is now incrementally rotated about bond 504 to obtain the best fit and the free energy estimation is made based on the different orientations.
  • An evaluation of the new molecule is made based on the MMC selection method described previously, then the remainder of the method of Figure 2 is carried out.
  • the present invention implements a coarse-graining model with corresponding knowledge-based potential data.
  • This model is used because of the ability to apply the principles of canonical mechanics to subsets of folded proteins that are in thermal equilibrium with one another.
  • the crystal structures of proteins and crystal structures of protein-ligand complexes may be disassembled into constituent parts. The contribution of each part may then be assigned on the basis of probability.
  • p'. The energetic probability of an interaction between a protein atom of type i and ligand atom of type j.
  • e ⁇ The energy of the interaction of a protein atom of type i and ligand atom of type /.
  • V The sampling probability of an interaction between a protein atom of type i and ligand atom of type j.
  • p 5 ⁇ ; The sampling probability of an interaction between a protein atom of type i and ligand atom of type ; " .
  • the interaction model that is chosen is comprised of an interaction radius and a set of eligible atom types.
  • Expression (5) can be changed so that it is solved for g* , the free energy as set forth in Expression (6). This comprises the quasichemical approximation.
  • Expression (6) is determined from the frequency of observed interactions:
  • the Normalization Constant can be eliminated.
  • the reference state contributes a free energy of g to every gross free
  • p The average probability of an interaction between protein and ligand atoms.
  • Pi j The total probability of an interaction between a protein of atom type i and ligand atom of type j.
  • T e Experimental temperature at which the data base information is obtained.
  • k Boltzmann Constant.
  • Expression (9) relates the statistical information about interatomic interactions in the crystal structures of the protein-ligand complex to term by term contributions to binding free energy.
  • the g ⁇ 's, when summed, will approximate the
  • the selection of the large interaction radius, between the protein and ligand, should be the correlation length of solvent ordering.
  • the probability of the specific contacts observed occurring will include the average effect of the contribution of solvation entropy to the prediction of free energy.
  • the system and method of the present invention will select a large interaction radius for the interaction model.
  • the interaction model will define a ligand atom to be in contact with a protein atom if they are within the interaction radius of one another.
  • each contact formed involves an energy loss based on desolvation. This must be accounted for in the reference state.
  • the specificity of the loss due to desolvation of a specific contact becomes a general element that is factored in and the remaining energetic contributions in the interaction model with a large interaction radius take into account simply that a loss due to desolvation has taken place.
  • the reference state there effectively is unrestricted spatial sampling of the ligand with respect to the protein. As such, the reference state has no perceived notion of the chemical structure.
  • the difference between the free estimates of a specific structure and that of the reference state accounts for the loss of configurational entropy in the collection of atoms upon formation of a specific, largely rigid chemical structure.
  • the system and method of the present invention score, and then rank, the candidate structures based, in large part, on the determination of the total binding free energy that is defined by Expression (10):
  • ⁇ G is an approximation to the complete change in free energy in the complex formation.
  • a closer look is taken of the atom types and their affect on the energy contributions that really take place in complex formation, not a generalization of the atom types.
  • the system of the present invention may be operated in one of three modes: automatic, directed, or assisted.
  • automatic mode all that is necessary is to provide the starting protein structure and a coordinate on the protein to specify the vicinity of the binding site. Based on this input, the system generates ligands with at least one atom within an interaction radius length of the specified coordinate.
  • the directed mode is an interactive method in which the user specifies the molecular fragments which are selected and where they are to bind.
  • the assisted mode begins with the user specifying a fragment. Then, the assisted mode proceeds automatically. This mode allows the user to incorporate a specific molecular fragment into the molecule being grown.
  • the method of the present invention was applied to several protein-ligand complex systems for which structural and binding information has been previously determined experimentally, including the following: purine nucleoside phosphorylase (“PNP”), and human immunodeficiency virus-1 protease (“HIV-1 protease”).
  • PNP purine nucleoside phosphorylase
  • HAV-1 protease human immunodeficiency virus-1 protease
  • PNP purine nucleoside phosphorylase
  • R and X comprise the following groups listed in Table 5, respectively.
  • Each molecule contains a guanine or 9-deazaguanine fragment, which was held fixed at the coordinates in the 1 ulb crystal structure of guanine.
  • the binding mode of the balance of the structure was determined by conformational search on the potential surface provided by the coarse-grained knowledge-based potential of the present invention.
  • the molecules that were marked as having low phosphate sensitivity are those whose binding constant changes by a factor of 15 or less upon increase of the concentration of phosphate to 50 mM.
  • the highly sensitive molecules are affected in some instances by a factor of 140.
  • the knowledge-based potential data correlates well with the experimental binding free energy for over five orders of magnitude in the binding constants.
  • the strong correlations that were found for the binding free energy predictions according to the system and method of the present invention indicate an ability to effect de novo drug design of lead molecules.
  • HIV-1 protease has been the target of a wealth of structure-based drug design efforts. See Abdel-Meguid, S. S. et al., Biochemistry 1994, 33:11671-11677. Thompson, S. K.; Murthy, K. H. M.; Zhaong, B.; Winborne, E.; Green, D. W.; and Fisher, S. M. et al., /. Med. Chem. 1994 37:3100-3107. However, in choosing a system of ligands for proofing the correlation between course-grained knowledge-based potential and experimentally determined binding free energies, several considerations needed to be applied. First, the experimental determinations had to have been performed under identical conditions among the members in the system.
  • HIV-1 protease ligands were tested by the method of the present invention:
  • X and R comprise the following groups listed in Table 10.
  • the probably of random occurrence was defined as the probability that a random selection of the same number of points would have the given correlation constant.
  • the confidence that the observed correlations were systematic and not the result of sparse sampling are 99.8%, 88.9%, and 95.0% for the PNP, SH3 and the HIV-1 systems, respectively.
  • Quantitative Analysis Use an empirical force field such as CHARMM to minimize the energy of the complex formed with each of the best molecules from stage ⁇ . Those molecules that score well with both the free energy estimate provided by the method of the present invention (otherwise referred to herein as coarsegrained potential) and the empirical interaction energy are scrutinized further.
  • step (b) repeating step (a) to generate a collection of molecules grown in the receptor site, and ranking the collection of molecules according to increasing free energy estimates to identify high-ranking molecules;
  • step (c) selecting one or more functional groups of a high-ranking molecule identified in step (b) as a single restart fragment and using the restart fragment to build a second-generation of molecules according to steps (a) and (b);
  • step (f) modifying high-ranking molecules from step (e) based on qualitative analysis of the molecules including determination of chemical viability, synthetic feasibility, solubility, and effect of the molecule on the structure of the protein, whereas such modification comprises: atomic and /or functional substitutions, initiating growth from a specific receptor site, inclusion of salt bridges or hydrogen bonds, and solubility-enhancing measures.
  • the invention provides a method of de novo designing molecules for binding to a receptor site present on a substrate, wherein the substrate is preferably selected from the group consisting of: Src-homology-3 domain, Src- homology-2 domain, MDM2 protein, CD4 protein, and carbonic anhydrase protein.
  • step (c) building a molecule for binding to the receptor site using the outputs from steps (a) and (b), with the building step including building the molecule by selecting molecular fragments at orientations that will result in free energy estimates for the molecule that may be higher than a lowest free energy estimate possible for the molecule.
  • libraries of ligand candidates are built which bind to a receptor site on the following substrates: Src-homology-3 domain, Src-homology-2 domain, MDM2 protein, CD4 protein, and carbonic anhydrase protein.
  • the CD4 protein is an immunoglobulin-family transmembrane coreceptor expressed in the helper T-cells. It participates in contact between the T-cells and antigen-presenting cells by binding to the nonpolymorphic part of the class II major histocompatibility complex (MHC-II) protein, which is followed by the activation of the bound Lck kinase which leads to downstream activation events in T-cells.
  • MHC-II major histocompatibility complex
  • the human immunodeficiency virus (HIV) gains entiy into a T-cell binding protein gpl20 to the CD4 receptor. This gpl20 binding site in the vicinity of Phe43 of CD4 was the target for ligand design.
  • FIG. 14 shows second- generation molecules as ligand candidates for CD4.
  • Molecule 41b was created by manually altering the point of attachment of the sugar-like ring structure of molecule 41, thus improving II-stacking interaction with Phe42.
  • Molecules 41c, 41d, and 41g were derived from 41b through ring substituents generated by the claimed design model.
  • Molecule 41e was generated from 41b by shortening and saturating the flexible chain connecting it to the pyridine group, which also improved the II- stacking.
  • Molecule 41f follows from 41e via manual alteration suggested by the geometry of the binding site.
  • Molecule 41h was derived from 41e by adding a bridge from the flexible chain to the sugar-like ring which preserved the binding conformation of the molecule, thereby enhancing its rigidity.
  • Molecule 41i was derived from 41h by manual substitution of carbon for the oxygen atom on the seven-membered ring; this substitution weakens the Il-stacking due to its effects ib various angles in molecule 41i. Table 13 describes the quantitative and qualitative analysis of these second-generation molecules.
  • the strain energy is calculated as the difference in internal energy between the bound conformation and the conformation resulting from gas phase minimization to convergence using the adapted-basis Newton — Raphson method.
  • the net CHARMM energy is the interaction energy plus the strain energy.
  • Figure 15 shows the three-dimensional structure of molecule 41h in the gpl20 building site of CD4.
  • Figure 15(a) shows the ligand candidate that binds to a receptor site on the CD4 protein generated de novo using a structure-based drug design method which comprises the following Formula V:
  • the interactions present within the ligand candidate include partial Il-stacking with Phe43, as well as four intermolecular hydrogen bonds with Lys46 and Asp56 and one intramolecular hydrogen bond which stabilizes the orientation of the puridine group.
  • the seven-membered fused-ring bridge gives the molecule a great deal of rigidity in its bound conformation.
  • the Src-homology-3 (SH3) domain is a conserved domain found in a variety of intracellular signal transduction mediators such as PI3K, Grb2, Crk, etc., and participates in the diversity of protein-protein interactions mediating the signal pathway eventually leading to the cell responses such as cell growth, differentiation, and migration. Irregularities in these processes may contribute to the cause of several common disease, thus making it important to consider the SH3 domain as a candidate for therapeutic inventions.
  • the acylated monomer provided the opportunity for growth in the pocket, it was used as a restart fragment, and the method of the present invention was used to grow ligands into the specificity pocket by insisting that the growth proceed only from the acyl H atom on this monomer, thus preserving the peptide-like nature of the molecule.
  • stage I it was apparent that two characteristics of high-scoring molecules were of special importance.
  • First the formation of a large amount of hydrophobic contacts with Tyr 55 and Trp 42.
  • Second the formation of hydrogen bonds with the donors and acceptors on Asp 23 and Thr 20.
  • the first-generation ligands are shown in Figure 6.
  • the molecules shown in Figure 6 are the best 6 of 100 molecules generated in the binding site using the acylated monomer as the restart fragment. These molecules possessed the following qualitative traits: a glucose-like ring that forms hydrogen bonds with residues in the RT loop of the pocket, and an unsaturated ring system with hydrophobic contacts with the tryptophan and tyrosine residues in the binding pocket.
  • Molecule 3 scored well quantitatively (see Table 6) and also provided suggestions for improved hydrophobic interactions with Tyr55 and Trp42.
  • One basic template was selected for further optimization (molecule 3), in which a sugar group made the hydrogen-bonding interactions and the remainder of the molecule left a rich potential for enhancing the hydrophobic interactions. This selection was based predominantly on opportunities for enhancing the scoring of molecule 3 using the coarse-grained knowledge-based potential molecular growth method of the present invention, rather than CHARMM interaction energy, which, though strong, was far weaker than for other first-generation candidates, as shown in Table 6.
  • molecule 3a is derived from molecule 3 of Figure 7 by removing one substituent from the pyrrole ring. The considerable strain energy of molecule 3a was relieved by saturating the five-membered ring such that the conformation of the glucose was altered as little as possible. Saturation of the pyrrole group led to molecule 3b, the restart fragment for subsequent design, whose internal strain energy was greatly reduced in relation to molecule 3a, as shown in Table 7. By using a few hydrogen atoms on this molecule as sites for potential growth, the method of the present invention was used through two generations of optimization.
  • molecule 3b was used as the restart fragment, with only the H atoms on the central five-membered ring as eligible attachment points.
  • the best scoring candidate, 3c was used as a restart fragment to create molecule 3d, whose phenyl ring forms a Il-stacking configuration with Tyr55.
  • Molecule 3e was derived from molecule 3d by manual alteration after noting that the arrangement of the terminal amide group could form part of a phenyl group that made a partial II-stack with Trp42. Also, the joining chain was made more flexible by the elimination of one carbonyl group, converting the carbon from sp 2 to sp 3 , thus reducing internal strain energy.
  • the resulting molecule, 3e, shown in Figure 8 is able to form two ⁇ -stacking interactions and three hydrogen bonds with the protein.
  • the strain energy is calculated as the difference in internal energy between the bound conformation and the conformation resulting from gas phase minimization to convergence using the adapted-basis Newton — Raphson method.
  • the net CHARMM energy is the interaction energy plus the strain energy.
  • the design effort for the LP pocket faced additional challenges from the desire to replace LP in position 2,3 of the biasing element with a mimetic. It was desired for the new ligand to possess amide bonds with the proline residues (1 and 4 of biasing element) at each boundary of the pocket, a goal which severely constrained the geometry of the molecules that would be reasonable structures.
  • the method of the present invention successfully designed candidate ligands for the LP pocket by using proline 1,4 as restart fragments such that molecular growth proceeded inward toward the pocket from each bounding proline.
  • stage I design In place of Pro 3, the method of the present invention demonstrated a strong preference for a seven membered hydrophobic ring (Figure 9b) grown from Pro 4 which makes hydrophobic contacts with Tyr 52, Arg 11, Tyr 8, and Pro 19 side chains. In place of Leu 2, the present method suggested several candidates grown from Pro 1, the best of which are shown in Figure 9c — e. These first generation molecules revealed that in the region where Pro 3 was bound, the preference is mainly for hydrophobic fragments whereas the Leu 2 site prefers fragments which make both hydrophobic contacts (with Trp 34) and hydrogen-bonding interactions (with residues Asn 51 and Ser 50). This last feature is absent in the purely hydrophobic leucine side chain.
  • the strain energy is calculated as the difference in internal energy between the bound conformation with Pro 1 and Pro 4 fixed, and the conformation resulting from gas phase minimization to convergence using the adapted-basis Newton — Raphson method, also holding Pro 1 and Pro 4 fixed.
  • the strain energy is the energy difference upon binding the portion of the helical-substituted biasing element in consideration to the protein.
  • the net CHARMM energy is the interaction energy plus the strain energy.
  • Molecule lOd of Figure 10 is further described in Figure 11 as an example of the best LP pocket ligand candidates designed using the (molecules lOd — g). As shown in Figure 11, this molecule is able to form three hydrogen bonds and possesses significant hydrophobic and electrostatic complementarity while bridging the bounding proline residues of the biasing element.
  • the Src homology 2 (SH2) domain is a modular component present in many signal transduction proteins. It allows rapid formation of stable protein complexes and may also regulate protein function through intramolecular binding events. SH2 domains recognize phosphotyrosyl residues in a specific sequence context, while SH3 domains recognize a PxxP motif and additional residues that mediate binding specificity. Src homology 2 (SH2) domains are found in a variety of signalling proteins and bind phosphotyrosine-containing peptide sequences. SH2 domains mediate protein/protein interactions by binding phosphotyrosyl proteins with high specificity.
  • the design method of the present invention was used in automatic mode by inputting only the starting protein structure and a coordinate on the protein to specify the vicinity of the binding site to generate ligand candidates for the SH2 domain.
  • the invention provides a library of ligand candidates generated using the method and system of the present invention for the SH2 domain having Formula I, ⁇ , or III:
  • Rl is alkyl, aryl, heteroaryl, alkylaryl, arylalkyl, cycloalkenyl, cycloalkyl, cycloalkylamido, and arylalkylamido; and R2 is independently at each occurrence selected from the group consisting of hydrogen, alkyl, aryl, heteroaryl, (heteroaryl)alkyl, alkylaryl, arylalkyl, cycloalkenyl, cycloalkyl, acyl, acyloxy, amino, amido, and alkoxy, wherein one or more groups may be optionally substituted.
  • Rl is preferably selected from the group consisting of (C fr - 10 )ar(C 1 - 6 )alkyl, preferably (C 6 - 10 )ar(C 1 _ 3 )alkyl, more preferably benzyl; (C 3 _ g )cycloalkyl(C 1 _ 6 )alkyl, preferably (C 3 _ 6 )cycloalkyl(C,. 3 )alkyl; (C 6 .
  • the substituent is a phosphoryl group
  • Rl is phosphoarylalkyl or phosphoarylalkylamido. Most preferably, Rl is selected from the group consisting of phosphobenzyl or phosphobenzylamido.
  • R2 is preferably selected from the group consisting of hydrogen; C 1-8 alkyl, preferably C 1 . 6 alkyl, more preferably C 4 - 6 alkyl; - 14 aryl, preferably C M0 aryl; heteroaryl; (C 6 . 10 )ar(C ] .
  • alkyl preferably (C 6-10 )ar(C 1-3 )alkyl; (heteroaryl)alkyl; (Cj-s ycloalkyl, preferably (C ⁇ Jcycloalkyl, more preferably cyclopropyl, cyclopentyl, cyclopentenyl, cyclopentadienyl, cyclohexyl, or cyclohexenyl; C 2 - 8 alkoxycarbonyl, preferably C 2 - 6 alkoxycarbonyl, more preferably methoxycarbonyl, ethoxycarbonyl, or benzyloxycarbonyl; C 2 , 8 acyloxy, preferably C 2 .
  • 6 acyloxy more preferably C 2 - 4 acyloxy, most preferably butylryloxy, butenoyloxy, propionyloxy, or propenoyloxy; . 6 alkoxy, preferably C,_ 4 alkoxy; C ⁇ alkylamido, preferably C ⁇ alkylamido; and C,_ 6 alkylamino, preferably C ⁇ alkylamino; any of which groups may be optionally substituted.
  • R2 is selected from the group consisting of pentyl, pentenyl, butyl, butenyl, phenyl, cyclopentyl, cyclopentenyl, cyclopentadienyl, cyclohexyl, butyryloxy, butenoyloxy, propionyloxy, propenoyloxy, propylamido, propenylamido, ethylamido, and ethenylamido.
  • the tumor-suppressor p53 is a short-lived protein that is maintained at low, often undetectable levels in normal cells.
  • the protein is expressed at very low levels in normal cells but accumulates in response to DNA damaging agents such as ultraviolet radiation. This increase is accompanied by transcriptional upregulation of the expression of a number of proteins including MDM2 which can in turn inhibit p53 dependent transcriptional activation, resulting in down-regulation of p53 activity.
  • MDM2 protein binds the transcriptional activation domain of p53 and blocks its ability to regulate target genes and to exert antiproliferative effects and p53 activates the expression of the MDM2 gene resulting in an autoregulatory feedback loop.
  • the invention provides a library of ligand candidates generated using the method and system of the present invention which bind to the p53-binding pocket on the MDM2 protein having Formula IV:
  • Rl is selected from the group consisting of hydrogen, alkyl, cycloalkyl, arylalkyl, aryl, heteroaryl, and (heteroaryl)alkyl, any of which groups may be optionally substituted;
  • R2 is independently at each occurrence selected from the group consisting of hydrogen, alkyl, cycloalkyl, arylalkyl, aryl, heteroaryl, and (heteroaryl)alkyl, OH, O(R3), amino, wherein the aryl or heteroaryl group may be optionally substituted,
  • R3 is selected from the group consisting of alkyl, cycloalkyl, aryl, aralkyl, (heteroaryl)alkyl, and heteroaryl, wherein the aryl or heteroaryl group may be optionally substituted.
  • Rl is preferably selected from the group consisting of H; C 1-8 alkyl, preferably C M alkyl, most preferably methyl, ethyl, propyl, isopropyl, isobutyl, or butyl; C. 3 . 8 cycloalkyl, preferably cyclopropyl, cyclopentyl, or cyclohexyl; ( .10 )ar(C,. 6 )alkyl, preferably(C 6 , 10 )ar(C 1 .
  • heterocyclic having one or more, preferably between one and about three, more preferably one or two, ring atoms independently selected from the group consisting of N, O, and S; heterocyclic(C 1 . 6 )alkyl, preferably heterocyclic(C 1 . 3 )alkyl; and _, 0 aryl, preferably phenyl; any of which groups may be optionally substituted.
  • R2 independently at each occurrence is preferably selected from the group consisting of H; hydroxy; amino; .galkyl, preferably C M alkyl, more preferably C M alkyl; -gCycloalkyl, preferably cyclopropyl, cyclopentyl, or cyclohexyl; (C 6 _ 10 )ar(C,. 6 )alkyl, preferably(C 6 , 10 )ar(C 1 .
  • R2 is selected from the group consisting of hydroxy and amine.
  • Rl is selected from the group consisting of methyl, ethyl, and isopropyl.
  • Carbonic anhydrase is an enzyme that catalyzes the reaction between water and carbon dioxide to produce carbonic acid and hydrogen ions. Seven isozymes are known, and of these human carbonic anhydrase II, in particular, is of interest.
  • Human carbonic anhydrase ⁇ protein causes increased intraocular pressure in the aqueous humour, which has been correlated to the development of the ocular disease, glaucoma.
  • the enzyme is comprised of 260 amino acids, with a 15 angstrom deep pocket with Zn +2 ion at the base, and is coordinated by three histidine residues. To date, several strong drug inhibitors of human carbonic anhydrase II have been developed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Medicinal Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Computing Systems (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Organic Low-Molecular-Weight Compounds And Preparation Thereof (AREA)
  • Saccharide Compounds (AREA)
  • Pharmaceuticals Containing Other Organic And Inorganic Compounds (AREA)
EP99967640A 1998-12-24 1999-12-22 System und verfahren zum auf der struktur basierendem entwurf von arzneimitteln mit genauer vorhersage von freien bindungsenergien Withdrawn EP1140737A2 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US22036398A 1998-12-24 1998-12-24
US220363 1998-12-24
PCT/US1999/030948 WO2000039751A2 (en) 1998-12-24 1999-12-22 System and method for structure-based drug design that includes accurate prediction of binding free energy

Publications (1)

Publication Number Publication Date
EP1140737A2 true EP1140737A2 (de) 2001-10-10

Family

ID=22823258

Family Applications (1)

Application Number Title Priority Date Filing Date
EP99967640A Withdrawn EP1140737A2 (de) 1998-12-24 1999-12-22 System und verfahren zum auf der struktur basierendem entwurf von arzneimitteln mit genauer vorhersage von freien bindungsenergien

Country Status (3)

Country Link
EP (1) EP1140737A2 (de)
JP (1) JP2002533477A (de)
WO (1) WO2000039751A2 (de)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
MXPA03005436A (es) * 2000-12-20 2004-05-05 Bristol Myers Squibb Co Polinucleotidos novedosos que codifican fosfastasas humanas.
JP2003053197A (ja) * 2001-08-09 2003-02-25 Inst Fr Petrole 化学結合のデスクリプタを用いて前記化学結合を生じさせる使用法を有する新規物質の設計
EP2427769A4 (de) 2009-05-04 2016-08-03 Univ Maryland Verfahren zur identifikation von bindungsstandorten durch moleküldynamikensimulation (standortidentifikation durch kompetitive ligandensättigung)
WO2011066655A1 (en) * 2009-12-02 2011-06-09 Zymeworks Inc. Combined on-lattice/off-lattice optimization method for rigid body docking
KR101739323B1 (ko) 2015-04-29 2017-05-24 숙명여자대학교산학협력단 단백질 폴딩 열역학을 이용한 단백질 안정도 분석법
JP6610182B2 (ja) * 2015-11-09 2019-11-27 富士通株式会社 結合自由エネルギー計算の前処理方法、結合自由エネルギーの算出方法、及び装置、並びにプログラム
US10726946B2 (en) * 2017-08-22 2020-07-28 Schrödinger, Inc. Methods and systems for calculating free energy differences using an alchemical restraint potential
US11710543B2 (en) 2017-10-19 2023-07-25 Schrödinger, Inc. Methods for predicting an active set of compounds having alternative cores, and drug discovery methods involving the same
JP7029098B2 (ja) * 2018-07-27 2022-03-03 富士通株式会社 集団座標の決定方法、及び決定装置、並びにプログラム
EP3852112A4 (de) * 2018-09-14 2021-10-20 FUJIFILM Corporation Verfahren zur erzeugung einer zusammengesetzten struktur, programm zur erzeugung einer zusammengesetzten struktur und vorrichtung zur erzeugung einer zusammengesetzten struktur
JP7234690B2 (ja) 2019-02-27 2023-03-08 富士通株式会社 化合物探索方法、化合物探索装置、及び化合物探索プログラム

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4672401A (en) * 1983-10-14 1987-06-09 Nippon Steel Chemical Co., Ltd. Heat-sensitive recording materials
US5434796A (en) * 1993-06-30 1995-07-18 Daylight Chemical Information Systems, Inc. Method and apparatus for designing molecules with desired properties by evolving successive populations
US5741666A (en) * 1994-08-23 1998-04-21 Millennium Pharmaceuticals, Inc. Compositions and methods, for the treatment of body weight disorders, including obesity
GB9616105D0 (en) * 1996-07-31 1996-09-11 Univ Kingston TrkA binding site of NGF
US5854992A (en) * 1996-09-26 1998-12-29 President And Fellows Of Harvard College System and method for structure-based drug design that includes accurate prediction of binding free energy

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0039751A2 *

Also Published As

Publication number Publication date
WO2000039751A2 (en) 2000-07-06
JP2002533477A (ja) 2002-10-08
WO2000039751A3 (en) 2001-01-04

Similar Documents

Publication Publication Date Title
US5854992A (en) System and method for structure-based drug design that includes accurate prediction of binding free energy
Lee et al. Alchemical binding free energy calculations in AMBER20: Advances and best practices for drug discovery
Reddy et al. Free energy calculations in rational drug design
US20020055536A1 (en) System and method for structure-based drug design that includes accurate prediction of binding free energy
Banks et al. Integrated modeling program, applied chemical theory (IMPACT)
Reymond et al. Chemical space as a source for new drugs
US6708120B1 (en) Apparatus and method for automated protein design
Gehlhaar et al. Molecular recognition of the inhibitor AG-1343 by HIV-1 protease: conformationally flexible docking by evolutionary programming
Wang et al. SCORE: A new empirical method for estimating the binding affinity of a protein-ligand complex
Schaffer et al. Predicting structural effects in HIV‐1 protease mutant complexes with flexible ligand docking and protein side‐chain optimization
De Sancho et al. Integrated prediction of protein folding and unfolding rates from only size and structural class
EP1140737A2 (de) System und verfahren zum auf der struktur basierendem entwurf von arzneimitteln mit genauer vorhersage von freien bindungsenergien
Krumrine et al. Principles and methods of docking and ligand design
SK4682003A3 (en) Method of operating a computer system to perform a discrete substructural analysis
US7231328B2 (en) Apparatus and method for designing proteins and protein libraries
Habgood et al. Conformational searching with quantum mechanics
WO2008144776A1 (en) Systems and methods for designing molecules with affinity for therapeutic target proteins
Stoddard et al. Molecular recognition analyzed by docking simulations: the aspartate receptor and isocitrate dehydrogenase from Escherichia coli.
Oshiro et al. Characterization of receptors with a new negative image: use in molecular docking and lead optimization
Verkhivker Computational analysis of ligand binding dynamics at the intermolecular hot spots with the aid of simulated tempering and binding free energy calculations
Grove et al. Computational Methods to Support Fragment‐based Drug Discovery
Tripathi et al. In silico-screening approaches for lead generation: identification of novel allosteric modulators of human-erythrocyte pyruvate kinase
CN115602252A (zh) 一种量子计算算法和/或量子计算装置在改良型新药研发中的应用
Srinivas Reddy et al. Structure‐Based De Novo Drug Design
WO2002057954A1 (fr) Procede pour realiser une structure tridimensionnelle de proteine avec ajustement induit et son utilisation

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20010719

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

RIN1 Information on inventor provided before grant (corrected)

Inventor name: SHAKHNOVICH, EUGENE, I.

Inventor name: DEWITTE, ROBERT, S.

17Q First examination report despatched

Effective date: 20021216

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20051121