WO2001039098A2 - System and method for searching a combinatorial space - Google Patents
System and method for searching a combinatorial space Download PDFInfo
- Publication number
- WO2001039098A2 WO2001039098A2 PCT/IL2000/000779 IL0000779W WO0139098A2 WO 2001039098 A2 WO2001039098 A2 WO 2001039098A2 IL 0000779 W IL0000779 W IL 0000779W WO 0139098 A2 WO0139098 A2 WO 0139098A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- value
- combinations
- protein
- combination
- energy
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/60—In silico combinatorial chemistry
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/20—Protein or domain folding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B35/00—ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
Definitions
- the present invention discloses a system and method for searching through a combinatorial space, and in particular, to such a system and method in which one or more combinations of basic elements having a desired property can be located in the combinatorial space.
- the desired property should have a numerical basis, or at the very least should be translatable into some type of numerical measurement and/or equivalent.
- the present invention enables the combinatorial space to be searched rapidly and efficiently according to the property, without a combinatorial explosion.
- the present invention accomplishes these tasks by examining each value of a basic element at least once, and preferably a plurality of times, during the search process. Therefore, each value can be said to be searched in an exhaustive search, yet not every combination of the values for the basic elements needs to be exhaustively searched.
- the present invention combines the efficiency of non-exhaustive, stochastic search processes with the efficacy of exhaustive searches.
- This section has been divided into a number of different sub-sections for ease of explanation. Briefly, the first section describes the general problem of combinatorial spaces, and searching within these spaces. The subsequent sections describe previous attempted solutions at solving a number of different biological problems, which are examples of inadequacy of background art solutions to handle combinatorial search spaces with regard to biological problems. These sections include placement of polar protons for biological molecules such as proteins; placement of side chains for amino acids in proteins; and prediction of loop structures in proteins.
- a combinatorial space is defined as having multiple combinations of basic elements. These combinations may differ according to the values of different types of these elements, the structure of the resultant combination of elements, or may be produced as a result of both factors.
- each combination may be considered to consist of variables, each of which may assume more than a single value.
- each variable preferably assumes one value of a set of discrete values, although alternatively, each variable may assume one value out of a range of continuous values or out of a function, for example.
- Combinatorial spaces often occur in biology, as many elementary biological materials are themselves produced through combinations of relatively basic building blocks, yet are highly complex in their resultant structure and/or function.
- Examples of combinatorial spaces include, but are not limited to, proteins, which are produced from combinations of amino acids as building blocks and eventually fold into a spatial structure known as a "tertiary structure", which is one set of values in the combinatorial space.
- a search for this single structure through such a combinatorial space may also be termed a "combinatorial search”.
- the folded protein in its biological environment is not fixed in a single "tertiary structure" but may exist in many conformational substates that are in equilibrium. Thus, searching through combinatorial space should preferably find more than a single solution.
- the third method suggested by Bass et al. is based on dividing the system into networks of interacting hydrogen bond donors and acceptors.
- the algorithm tries to maximize the number of hydrogen bonds that can be formed in each network and to minimize the total distance between donors and acceptors. Because each network is rigorously examined for the best possible set of hydrogen bonds, the number of comparisons (between options) scales with the factorial of the number of elements in the network-a fact that limits the calculation to small networks (Bass et al., Proteins 1992; 12:266-277). No energy evaluations are employed to choose the best structure. As a result, the output might contain high energy interactions between the located hydrogens and their environment.
- Another example of a problem for searching through combinatorial space is the placement of the side chains of amino acids. Even though this problem is itself solved through combinatorial space, it is only one part of the general problem of protein structure prediction. However, this problem has so far proved to be intractable to currently available methods for attempting to predict the locations of these side chains.
- X-ray crystallography usually supplies a single structure characterized by an R-factor.
- a crystal structure reflects the biomolecule in the highly ordered crystal lattice, as opposed to the more physiologically relevant solution environment of a NMR structure.
- the former might be biased toward specific conformational substates in the crystal, which may not be among the ensemble of conformations in solution (Brunger, Nat. Struct. Biol. 1997; 4 suppl: 862-865). Observation of alternate rotamers is beyond the detection limits of conventional X-ray crystallographic techniques, except at the very highest resolution. At least 10% of all side chains in proteins adopt multiple, discrete conformations in carefully refined crystal structures (Smith et al., Biochemistry, 1986; 25: 5018-5027).
- the time required to reach an equilibrium between different conformers of a protein by MD is prohibitive for such simulations, and we may acquire only a glimpse of the protein's behavior in its surrounding.
- Current strategies for side chain addition differ in three categories. The first is the conformational space of each side chain. In continuous space methods (Eisenmenger. J. Mol. Biol. 1993; 231 : 849-860; Roitberg. & Elber, J. Chem. Phys. 1991 ; 95: 9277-9287), any side-chain torsion angle may be sampled.
- Discrete space methods are based on the assumption that side-chains exist in energetically preferred conformations called rotamers, which are local minima conformers that have been sampled by statistical analysis of known structures (Chandrasekaran & Ramachandran, Int. J. Protein Res. 1970; 2: 223-233; Sasisekharan & Ponnuswamy, Biopolymers 1970; 9; 1249-1256; Sasisekharan & Ponnuswamy, Biopolymers 1971; 10: 583-592; Ponder & Richards J. Mol. Biol. 1987; 193: 775-791; Gelin & Karplus, Biochemistry 1979; 18: 1256-1268; Dunbrack & Karplus, Nat. Struct. Biol.
- the second category is the cost function for evaluating solutions.
- Energy based methods rely on non-bonded terms (Laughton, J. Mol. Biol. 1994; 235: 1088-1097; Vasquez, Biopolymers 1995; 36: 53-70; Wilson et al., J. Mol. Biol. 1993; 229: 996-1006; Vasquez, Curr. Opin. Struct. Biol. 1996; 6: 217-221).
- the assumption is that the lower the energy, the more accurate the prediction.
- Knowledge based methods were also proposed: Sutcliffe et al. (Protein Eng. 1987; 1 :
- the third category is the search strategy.
- search strategies being employed are various. Metropolis Monte Carlo methods (Holm & Sander, Proteins 1992; 14: 213-223), Gibbs sampling Monte Carlo (Vasquez, Biopolymers 1995; 36: 53-70), Neural networks (Hwang & Liao, Protein Eng. 1995; 8: 363-370), Genetic Algorithms (Tuffery et al., J. Biomol. Struct. Dynam. 1991; 8: 1267-1289; Tuffery et al., J. Comput. Chem 1993; 14: 790-798), Simulated Annealing (Lee & Subbiah, J. Mol. Biol.
- the A* algorithm finds the optimal path from the root node to a goal node in a search tree using a cost function labeled f* (Leach & Lemon, Proteins 1998; 33: 227-239).
- f* Cost function labeled f*
- Each node has a unique f* value composed from the cost of searching the node from the start node, and the estimated cost of reaching the goal node, f* is optimized in an iterative manner: the node with the smallest value of f* is expanded and new values of f* are calculated for its successor node.
- the optimal method known so far for identification of proteins' low energy side chain conformations is a combination of DEE with the A* algorithm, which has been employed for constructing partition functions.
- the A* algorithm approach may find the best N solutions, but it is restricted to relatively small proteins.
- the largest protein solved by this algorithm so far contained 68 amino acids, which comprise about 10 43 combinations - depending on the complexity of the rotamer library - while proteins with a much larger number of combinations are common.
- the A* algorithm reaches a maximum of 10 combinations.
- it must have a good estimate of the cost to reach a goal node. This is problematic due to interactions between residues that have not yet been assigned. Those limitations raise the need for a novel robust algorithm that finds the global minimum and the lowest energy conformations in larger systems. Unfortunately, such an algorithm is not currently available.
- prediction of the structure of proteins requires a search in combinatorial space, which currently has no suitable solution.
- the prediction of protein structure can itself be divided into a number of smaller problems, which in themselves also require searches in combinatorial space.
- One example of such a problem is the very complicated prediction of loop structure.
- Structural genomics projects are employed to provide an experimental structure or a good model for newly discovered sequences that emerge from the various genome projects.
- Brenner & Levitt (Protein Sci 2000; 9: 197-200) suggest, based on an analysis of sequence similarity databases, that the number of new folds is diminishing gradually, so that most common folds may soon be known.
- Homology modeling may thus become a prevailing tool for predicting a 3 -dimensional structure of a protein sequence, if it shows a reasonably high sequence similarity to another protein (a "template") with a known tertiary structure. In that case, secondary structure elements are transferred from the template to the target protein.
- stretches of "loops" or “coils” remain undetermined and must be predicted.
- the bond scaling-relaxation procedure meets the geometric and energy requirements simultaneously (Zheng et al., J. Comp. Chem. 1993; 14: 556-565). Random initial conformations are generated with standard bond lengths and angles. Bond lengths for each initial conformation are scaled to meet the loop-constrained distance, and systems are relaxed to a local energy minimum. This method was later enhanced by combining a multiple copy sampling method (Zheng et al., Protein Sci. 1993; 2: 1242-1248 ; Zheng et al., Protein Sci. 1994; 3: 493-506). The improved method was employed to handle loops with up to 12
- a fundamental assumption for rational drug design is that drug activity is obtained through the molecular binding of one molecule (the ligand) to the pocket of another, usually larger, molecule (the receptor, commonly a protein).
- the molecules In their active, or binding, conformations, the molecules exhibit geometric and chemical complementarity, both of which are essential for successful drug activity.
- drugs may modulate signal pathways, for example by altering sensitivity to hormonal action, or by altering metabolism, for example by interfering with the catalytic activity of the enzyme. Most commonly, this is achieved by binding in the specific cavity of the enzyme (the active site) which catalyses the reaction, thus preventing access of the natural substrate(s).
- an "antagonist” may be designed in order to prevent the binding of an "agonist” (the natural molecule that activates the signal transduction) or, in case of reduced biological response, a stronger binding agonist may be required as a drug.
- the modeling of molecular structure is a complex task, in particular because most molecules are flexible, being able to adopt a number of different conformations that are of similar or close energy.
- the modeling of the binding process is also a difficult task, as the characteristics of the receptor, the ligand, and the solvent in which these are found have to be taken into account.
- chemists strive to obtain models that are as accurate as possible, several approximations have to be made in practice. It is clear that the more accurate the model used, the better the chances chemists stand in predicting molecular interactions. Nevertheless, a large number of predictions made with approximate models have been confirmed with experimental observations. Recently, a few drugs have been designed by computer theoretical methods. This has encouraged researchers to build tools that use approximate models and investigate the extent to which these tools can be useful. These approximate models pose difficult algorithmic questions. More accurate molecular modeling, gained through better theoretical understanding or increased computational power, can only improve the techniques developed with simpler models.
- the problems that arise can be classified into two broad categories. If the receptor is known, chemists are interested in finding if a ligand can be placed inside the binding pocket of the receptor in a conformation that results in a low energy for the complex. This problem is referred to as the docking problem. It has several variations: an accurate description of the binding interaction may be desired, or an approximate estimate may be sought of which ligands, from those contained in a huge database, are likely to fit inside the receptor. Very often the binding pocket is unknown. In fact, the 3D structure of relatively few large molecules (or macromolecules) has been determined by X-ray crystallography or NMR techniques, although this number is increasing rapidly.
- chemists attempt to infer information about the receptor.
- chemists are interested in identifying the pharmacophore present in these ligands.
- the pharmacophore is a set of features in a specific 3D arrangement contained in all the active conformations of the considered molecules.
- a prevailing hypothesis is that the pharmacophore is the part, or parts, of the molecule that is responsible for drug activity, while the rest of the molecule is a scaffold for the pharmacophore's features. If the pharmacophore is determined, by examining the different activities, relative shapes, and chemical structures of the initial molecules, chemists can use it to design a more potent pharmaceutical drug.
- the techniques that have been used so far in computer-aided drug design include robotics (kinematics and planning), graphics algorithms (visualization of molecules), geometric calculations (surface computation), numerical methods (energy minimization), graph theoretic methods (invariant identification), randomized algorithms (conformational search), computer vision methods (docking), and a variety of other techniques like genetic algorithms and simulated annealing.
- robotics kinematics and planning
- graphics algorithms visualization of molecules
- geometric calculations surface computation
- numerical methods energy minimization
- graph theoretic methods invariant identification
- randomized algorithms conformational search
- computer vision methods docking
- a number of tools for performing complex geometric and energy calculations are now available and the success of these computer-aided methods is under evaluation.
- Protein structure prediction can be shown to be an NP-hard problem; the number of conformations grows exponentially with the number of residues. The native conformations of proteins occupy a very small subset of these, hence an exploratory, robust search algorithm is required.
- S A Simulated Annealing
- SA is suitable for optimization problems of large scale (Holm & Sander, Proteins 1992;14: 213; Lee & Subbiah, J. Mol. Biol. 1991 ; 217: 373; Hwang & Liao, Protein Eng. 1995; 8: 363; Press et al., Numerical Recipes, Cambridge University Press, New York, NY, 1986; 326), especially ones where a desired global minimum is hidden among many, much poorer, local minima.
- GAs Genetic Algorithms
- Each iteration of GAs involves a competitive selection that weeds out poor solutions.
- the solutions with high “fitness” are “recombined” with other solutions by swapping parts of a solution with another. Solutions are also "mutated” by making a small change to a single element of the solution.
- GAs are simple, tend not to get “stuck” in local minima and can often find a globally optimal solution. No derivatives or any other problem-specific calculations need to be done. However, there is no guarantee that it will converge to a valid solution, and many iterations are needed in order to achieve convergence criteria.
- Taboo Search (TBS) (Glover, Computers and Operations Research 1986; 5: 533) is superior to SA both in the time required to obtain a solution and the quality of the latter (Cvijovic & Klinowski, Science 1995; 267: 664).
- TBS is problem independent and can be applied to a wide range of tasks. It is very easy to implement and the entire procedure occupies only a few lines of code. It is conceptually much simpler than SA and GA. However, it cannot guarantee to solve the multiple minima problem in a finite number of steps, and may require long computing times.
- Conformational space annealing (Lee et al., J. Comput. Chem. 1997; 18: 1222), which narrows the search on a full conformational space to regions of low energies and starts a search with a "pool” of minimized conformations, that are later modified by picking random variations from the "pool", is also limited to a small number of variables.
- Dead End Elimination is based on identifying solutions that are absolutely incompatible with the global minimum. (Desmet et al., Nature 1992; 356: 539; Lasters et al., J. Prot. Chem. 1997; 16: 449). Solutions that cannot contribute to local energy minima of a certain or higher order are eliminated. One should write an energy (cost) function as a sum of terms which are themselves functions of maximally two variables. A value for the i-th variable xi cannot be consistent with the globally optimal solution if another value for the same variable, x'i, can be found so that:
- Statistical Methods employ a model of the objective function to bias the selection of new sample points. These methods are justified with Bayesian arguments that suppose that the particular objective function to be optimized comes from a class of functions that are modeled by a particular stochastic function (Mockus, J. Global Optim. 1994; 4: 347). Information from previous samples of the objective function can be used to estimate parameters, and this refined model can subsequently be used to bias the selection of points in the search domain. The problem in using statistical SMs is whether the statistical model is appropriate for a problem. Additionally, it is difficult to write computer codes for high dimensional optimization problems due to the mathematical complexity. Many times, SMs rely on dividing the search region into partitions, which limits these methods to problems with a moderate number of dimensions.
- each combination may be considered to consist of variables, each of which may assume at least one value.
- each variable preferably assumes one value of a set of discrete values, although alternatively, each variable may assume one value out of a range of continuous values or out of a function, for example.
- These variables interact with each other in a manner which is known for each individual interaction.
- individual interactions can be described for pairs of variables, such that the interactions are pairwise interactions.
- the search is performed by sampling one value of each variable to obtain a combination. This process is then repeated, typically many times. Each combination is evaluated by a quantitative measurement.
- the quantitative measurement is preferably a cost function, for which the desired outcome is generally maximized or at least increased during the process of determining which combinations best fulfill the cost function.
- the cost function is an energy minimization function, then the combinations are preferably selected which have lower energy costs or values.
- the present invention attempts to determine which elements do not contribute to combinations which provide at least some minimum desired value for the quantitative measurement, and/or which contribute to combinations which provide a value for the quantitative measurement which is below some cut-off or threshold for desirable values. In other words, these elements do not contribute toward the "best" or most satisfactory combinations for the system. These elements are then preferably eliminated or at least segregated from the remaining possible elements for forming the combinations. The process of evicting values of variables is preferably repeated until a predetermined number of combinations remain, which consist of the elements which have not been eliminated and/or segregated. At this point, an exhaustive search is most preferably performed, according to the quantitative measurement and/or according to some other measurement parameter or parameters.
- the present invention accomplishes these tasks by examining each value of a basic element at least once, and preferably a plurality of times, during the search process. Therefore, each value can be said to be searched in an exhaustive search, yet not every combination of the values for the basic elements needs to be exhaustively searched.
- the present invention combines the efficiency of non-exhaustive, stochastic search processes with the efficacy of exhaustive searches.
- a method for searching through combinatorial space the space featuring multiple combinations, each combination being composed of at least one element
- the steps of the method being performed by a data processor, the method comprising the steps of: (a) providing a quantitative parameter for determining success of a result of a search through the combinatorial space, said quantitative parameter being measurable for each combination; (b) dividing the combinations in the combinatorial space into ensembles, each ensemble featuring at least one combination; (c) calculating a value for said quantitative parameter for at least one combination of each ensemble; (d) determining an effect of each element on said value of said quantitative parameter; and (e) retaining at least one combination according to said effect, to provide a result of searching through the combinatorial space.
- amino acid refers to both natural and synthetic molecules which are capable of forming a peptide bond with another such molecule.
- FIG. 1 is a flowchart of an exemplary method according to the present invention
- FIG. 2 is a schematic block diagram of an exemplary system according to the present invention
- FIG. 3 shows a flow chart for the hydrogen positioning algorithm
- FIG. 4 shows a molecule that contains two carbonyls, one sp amide and one hydroxyl, that form together a single ensemble.
- the two carbonyls (1,2) act as acceptors, the hydroxyl donates one non trivial hydrogen (3) and two non trivial lone pairs (4,5), and the amide donates one trivial hydrogen (6).
- Atom 3 and lone pairs 4, 5 are one segment because they are bonded to the same oxygen;
- FIG. 5A shows an exemplary initial 2D matrix for the system in Figure 4.
- the hydroxyl hydrogen (3) can form a hydrogen bond to any of the carbonyls (1,2) and the hydroxyl lone pairs (4, 5) can form a hydrogen bond to the trivial hydrogen (6).
- FIG. 5B shows the refined 2D matrix.
- the hydroxyl two lone pairs are degenerate, therefore one of them can be omitted (5->6).
- the omitted lone pair is automatically added after the hydrogen and first lone pair are located.
- FIG. 5C using the 2D matrix, a 3D matrix is formed to keep all the possible combinations. Each combination is evaluated, and the best combination is the result;
- FIG. 6 shows an example of a "big" system.
- the initial 2D matrix in case of a large biological system (for example, a protein).
- An attempt to create the 3D matrix will exceed the computer capabilities. Therefore, the 2D matrix is refined by evicting high energy components;
- FIG. 7 shows a "test" protein with 1186 amino acids: 13 are serines (those are marked as CPK model) (13 segments) and 1173 glycines (0 segments).
- the stochastic search began with a total number of 5.02* 10 10 combinations and reached 2.7* 10 3 combinations after 204 iterations, which were then evaluated exhaustively. The global minima for hydrogens' positions was found;
- FIG. 8 shows a graph of the natural log of (total number of possible combinations) vs. the iteration number in the pure "stochastic approach". Five proteins are presented;
- FIG. 9 shows a graph of energy distribution in the 1 st and 4 th iterations for 5PTI (A), 5RSA (B), 2MB5(C), 1NTP(D);
- FIG. 10 shows a Ribbon display of trypsin (1NTP) and its polar residues. Many polar hydrogens create hydrogen bonds with water molecules. However, no water molecules' coordinates are included in the PDB file;
- FIG. 11 shows a model of crambin (46 amino acid residues) as a test case for comparison of a full exhaustive search to a stochastic search in finding the 10,000 lowest energy conformations.
- the backbone of crambin is presented as a ribbon.
- the non hydrogen atoms are presented by ball and stick models;
- FIG. 12 shows a comparison of stochastic and exhaustive searches in finding lowest energy conformations for 1-10,000 conformers. The % deviation between the two searches is on the lowest curve;
- FIG. 13A shows a percentage of angles in E. coli ribonuclease HI that may be detected: Out of 115 dihedral angles, 7 angles are missing from the rotamer library;
- Figure 13B shows a percentage of angles in E. coli ribonuclease HI that were detected by the stochastic algorithm;
- FIG. 14 shows values of ⁇ for 2 to 29 possible rotamers of a single residue that lead to elimination with high probability.
- Each number of rotamers has an associated value of ⁇ (triangles). The larger the number of rotamers, the smaller is ⁇ . For each given number of rotamers and ⁇ , the % certainty is calculated (squares);
- FIG. 15 shows an example of a 6 residues (0-5) loop. Residues 0 and 5 are part of the transmembrane helix. A search is performed for the conformation of residues 1-4. The method of the present invention is employed to explore the conformational space of the loop to find all possible loop closure conformations defined by equation 2;
- FIG. 16 shows the dihedral angles definition: ⁇ of a residue n, in the construction strategy, is the ⁇ of the previous residue toward the N-terminal;
- FIG. 17 shows the 10,000 "lowest cost function" conformations in a 4 residues' test case. A stochastic and an exhaustive search achieved the same global minimum. The 66 first conformations are identical.
- the present invention discloses a system and method for searching through combinatorial space, without a combinatorial explosion.
- the search is performed for various combinations of basic elements, according to at least one desired property of the combination, which is translatable into a quantitative measurement of the success of the search.
- the present invention attempts to determine which elements do not contribute to combinations which provide at least some minimum desired value for the quantitative measurement, and/or which contribute to combinations which provide a value for the quantitative measurement which is below some cut-off or threshold for desirable values.
- those elements are selected which only contribute to combinations which fail to meet the minimum threshold for desirable values. In other words, these elements do not contribute toward the "best" or most satisfactory combinations for the system.
- These elements are then preferably eliminated or at least segregated from the remaining possible elements for forming the combinations.
- the process of sorting through the elements is preferably repeated until a predetermined number of combinations remain, which consist of the elements which have not been eliminated and/or segregated.
- a predetermined number is optionally and more preferably an actual numerical value for the total number of combinations, but alternatively may be a threshold for the minimum desired value for the quantitative measurement which the combination must satisfy to be included in the remaining combinations.
- an exhaustive search is most preferably performed, according to the quantitative measurement and/or according to some other measurement parameter or parameters.
- each combination may be considered to consist of variables, each of which may assume at least one value.
- each variable preferably assumes one value of a set of discrete values, although alternatively, each variable may assume one of a range of continuous values or out of a function, for example.
- These variables interact with each other in a manner which is known for each individual interaction.
- individual interactions can be described for pairs of variables, such that the interactions are pairwise interactions.
- the quantitative measurement of the combinations of variables is preferably a cost function, for which the desired outcome is generally maximized or at least increased during the process of determining which combinations best fulfill the cost function. For example, if the cost function is an energy minimization function, then the combinations are preferably selected which have lower energy costs or values.
- the cost function could optionally be the energy minimization of the combination, such that the selected structure would represent an energy minimum or near-minimum.
- Such an energy cost function is also useful for more specific or "sub" problems within the larger problem of protein structure prediction. For example, minimization of the predicted location of polar protons and side chains for amino acids also provides a useful quantitative parameter for these types of combinatorial searches. It should be noted that in this case, maximization of the desired quantitative parameter is actually achieved through minimization of the value of the energy calculation for the combination.
- cost function could optionally be used with the method of the present invention.
- the cost function would not even necessarily need to be related to a biological problem, but could instead be related to other types of problems, such as optimization of a cost function for monetary value (for a literal, financial "cost” ), for example.
- the present invention accomplishes these tasks by examining each value of a basic element at least once, and preferably a plurality of times, during the search process. Therefore, each value can be said to be searched in an exhaustive search, yet not every combination of the values for the basic elements needs to be exhaustively searched.
- the present invention combines the efficiency of non-exhaustive, stochastic search processes- with the efficacy of exhaustive searches.
- an additional exhaustive search may optionally be performed after the execution of the present invention, for example in order to identify the absolute minimum as well as a plurality of local minima.
- Such an additional exhaustive search is particularly preferred when the initial search process according to the present invention includes a stochastic search and/or comparison component, which is the preferred embodiment of the present invention.
- the present invention is clearly distinguished from background art search methods in a number of respects.
- the present invention is not based upon, nor is it a modification of, any of the known methods in the art.
- each value of every variable in the combinatorial search space must be probed to determine whether it should be evicted from the search space, unlike other stochastic search methods, which can not guarantee the probing of each and every value in the combinatorial search space.
- the present invention is also optionally and preferably able to obtain a population of local minima in addition to the global minimum.
- the present invention is able to accomplish these goals with a stochastic search, while providing the efficacy of the exhaustive search, as proven below by a comparison of the results of the present invention with the results of full exhaustive searches used alone.
- the principles and operation of the present invention may be better understood with reference to the drawings and the accompanying description, which are provided through several sections.
- the first part of the description (in this section) centers around an exemplary general method according to the present invention, and a basic exemplary system for implementation thereof.
- the subsequent sections refer to specific biological problems, and are labeled with the name of each type of problem. These sections are intended to describe examples for suitable implementations and applications of the present invention, and are not otherwise intended to be limiting in any way.
- FIG. 1 is a flowchart of an exemplary but preferred general method according to the present invention for searching through combinatorial space.
- the combinatorial space is provided.
- Such a combinatorial space features multiple combinations of basic elements.
- the combinatorial space is optionally created, for example by creating multiple structures having the basic elements according to some pattern, plan and/or scheme.
- the combinatorial space may optionally have been previously defined.
- the combinatorial space may already be defined according to the type of biological structure which is to be analyzed.
- each combination is optionally and preferably constructed from variables, each of which may assume at least one value.
- each variable more preferably assumes one value of a set of discrete values, although alternatively, each variable may optionally assume one out of a range of continuous values or out of a function, for example.
- These variables interact with each other in a manner which is known for each individual interaction.
- individual interactions can be described for pairs of variables, such that the interactions are pairwise interactions.
- the quantitative parameter is determined, according to which the success of the search is measured.
- the quantitative parameter must be measurable for each combination of the combinatorial space.
- the quantitative parameter is calculated according to the basic elements of each combination, optionally with the additional consideration of the effect of structural features and/or interactions on this measurement.
- the type of quantitative parameter for examining the particular problem may already be known.
- the best quantitative parameter is preferably the energy minimization for the combination, determined according to equations which are known in the art and which are described in greater detail below with regard to Section 1.
- the quantitative measurement of the combinations of variables is preferably a cost function, for which the desired outcome is generally maximized or at least increased during the process of determining which combinations best fulfill the cost function.
- the cost function is an energy minimization function, then the combinations are preferably selected which have lower energy costs or values.
- step 4 the contribution of each element or variable is evaluated, to determine the effect of particular elements or values of particular variables of each combination on the quantitative parameter or cost function.
- Such an effect is preferably determined through both the values of the variables, and the interaction between these variables, as assessed through the cost function.
- the preferred effect is for consistent maximization of the cost function. Consistent maximization is optionally measured according to the distribution of values of the cost function for a large group of combinations or "configurations" of the whole set of variables. According to preferred embodiments of the present invention, particularly if large numbers of variables are involved, preferably the effect of these different values is determined through a stochastic analysis, since an exhaustive analysis could prove to be prohibitively inefficient and time-consuming.
- the stochastic analysis is preferably performed by randomly selecting values for each variable in order to form a combination, more preferably in order to form a plurality of different combinations. Most preferably, a predetermined number of such combinations are formed as part of a sampling process. The outcome or value of the cost function for each combination is then calculated, according to both the values of the variables and the interaction between these variables.
- step 5 optionally and preferably, those elements or values of variables are removed which do not contribute to consistent maximization of the desired outcome of the cost function, as previously described. More preferably, those values of variables are removed which contribute only to less desirable outcomes of the cost function, or outcomes which fall below a certain minimum threshold, and not to any outcomes which are above a certain threshold for desirable outcomes. For example, for a cost function involving energy minimization, those values for variables are preferably removed which are found only in combinations for which the energy cost is higher than a certain threshold (less desirable outcome), but not in combinations for which the energy cost is below another, low energy threshold (more desirable outcome). Alternatively, rather than being removed, these values can optionally be "marked " and/or segregated, for example for further analysis.
- step 6 if the total number of combinations has reached some minimum value, then these combinations are optionally and more preferably further analyzed according to the cost function, and/or some other parameter to determine the results of the combinatorial search.
- an exhaustive search could even be performed within the minimum number of combinations for the combination(s) of interest, again as evaluated according to the cost function, and/or some other parameter of interest.
- Such a group of combinations can also optionally be viewed as a population of combinations having a particular minimum value for a desirable outcome of the quantitative measurement. Otherwise, steps 4 and 5 are preferably repeated, until this minimum number of combinations is reached.
- FIG. 1 shows an exemplary system according to the present invention, for implementation of the method of Figure 1.
- a system 10 features a computational device 12.
- computational device 12 operates a number of functional modules, which collectively enable the method of Figure 1 to be executed.
- These functional modules are optionally and preferably implemented as software modules, but alternatively may implemented as hardware, firmware or a combination thereof.
- one such module is a combination storage module 14, which holds the combinations currently under consideration in their respective ensembles.
- a quantitative parameter calculation module 16 then calculates the value of the quantitative parameter for at least one combination in each ensemble from combination storage module 14.
- An evaluation module 18 creates a plurality of samples of combinations from the elements of the combinations, and evaluates the effect of each element on the value of the quantitative parameter for the combination, such that certain elements are preferably retained as consistently contributing toward maximized values for the quantitative parameter for the combination. These modules preferably interact until a certain minimum number of combinations are held in combination storage module 14, which represent the results of the search in the combinatorial space.
- Sections describe specific model systems which are handled by the present invention as specific problems, for which the present invention is able to provide a solution.
- These Sections include descriptions of searching through combinatorial space to locate polar protons (Section 1); locating amino acid side chains in proteins (Section 2); prediction of loop structure in proteins (Section 3); and other miscellaneous biological problems which are solved by the present invention (Section 4).
- Section 1 Location of Polar Protons
- the present invention is useful for solving the problem of correctly locating the polar protons within a biological molecule, such as a protein molecule or DNA, for example.
- the location of such polar protons in turn determines the location of hydrogen bonding, either within the biological molecule itself, or alternatively between the biological molecule and another molecule.
- This specific implementation of the present invention thus solves an important scientific problem.
- the specific implementation of the present invention which is described in this section under “Methods” was also tested against other methods known in the art, as described under “Results” . It should be noted that these methods and results are presented for the purposes of illustration only, and are not intended to be limiting in any way. The inte ⁇ retation for these results is then discussed under "Discussion” .
- the method of the present invention has been implemented as a computer software program, written in C++. It operates as illustrated in the flow chart of Figure 3.
- the program optionally reads the Protein Data Bank coordinate file format (a PDB file), or alternatively receives the input information from another source. It uses auxiliary ASCII files which serve as databases to parametrize the system atoms. Those files contain the connectivity of all atoms, their charges, A and B parameters for the Lennard- Jones function, and bond lengths between hydrogens and heavy atoms. The user may add, delete and modify residue types easily by editing these files. These values are read from the file, or alternatively are input from another source, in order to parametrize the atoms in step 2.
- step 3 the hydrogens and lone pairs, which are about to be added, are divided into three categories: (1) Trivial hydrogens-those hydrogens that may be located using coordinates and hybridization of heavy atoms, such as aliphatic and aromatic hydrogens. (2) Non trivial hydrogens-polar hydrogens, which have rotational degrees of freedom, such as serine, threonine and tyrosine hydroxyls. (3) Non trivial lone pairs, which are those with the same geometrical properties of non trivial hydrogens.
- Trivial hydrogens are added first, in step 4. Their coordinates are calculated using the coordinates of the heavy atoms, the bond length and angles from the database as well as the standard dihedral angles.
- non trivial hydrogens and lone pairs are divided into ensembles, and their coordinates are not yet calculated.
- An ensemble is defined as a group of non trivial hydrogens or lone pairs which interact among themselves.
- the ensemble cutoff is user defined. The user can assign a large ensemble cutoff value, and force the system to run as one big ensemble.
- the ensemble cutoff is measured from the coordinates of the heavy atom bonded to the non trivial atom, because the non trivial atom has not been located yet.
- Ensembles are composed of "segments" . Each segment includes a rotation around a bond connecting two heavy atoms, one of which is bonded to a polar proton. Each segment may employ various positions in space to fulfill H-bonding conditions.
- an energy cutoff in the usual sense of its use in non bonding energy calculations: the default is no cutoff.
- Another cutoff is used for locating hydrogen bonding partners around a rotatable segment (vide infra)-t is may be smaller or larger than the "ensemble cutoff, however it should be always >3 A to allow the inclusion of all close partners for H-bonding, and to avoid the risk of missing solutions for a segment. Increasing this cutoff over 4.5A creates many non realistic optional partners and extends the time for searching solutions.
- the ensemble cutoff is employed for creating a group of relevant heavy atoms (hydroxyl oxygen, water oxygen, NH 3 + , amine, etc...) that must solve its relations with respect to all its members.
- the cutoff is 4A , it may well be that the distance between each pair of atoms A and B, or A and C, is smaller than 4A , but R B,C may be > 4A, while all three atoms are part of the same ensemble.
- Each ensemble is preferably treated separately.
- a two dimensional matrix is formed in step 6. It is a list of all hydrogen bonds that may be formed between donors and acceptors.
- the ensemble displayed in Figure 4 contains only two carbonyls (1 ,2), one amide and one hydroxyl, that form together a single ensemble.
- the hydroxyl donates one non trivial hydrogen (3) and two non trivial lone pairs (4,5), and the amide donates one trivial hydrogen (6).
- a segment is defined as a group of non trivial hydrogens and lone pairs bonded to a single heavy atom. For example, atom 3 and lone pairs 4 and 5 are one segment because they are connected to the same oxygen.
- the full 2D matrix will have the form illustrated in Figure 5A.
- the two lone pairs are degenerate, therefore one of them can be omitted for forming the initial alternative combinations of the 2D matrix (4->6 or 5->6).
- the omitted lone pair is automatically added after the hydrogen and first lone pair are located. Therefore, the initial 2D matrix will have the form illustrated in Figure 5B.
- the module refines the 2D matrix: a location that yields a high energy value ("bump") is deleted.
- the energy threshold is user defined, and non bonding energy expressions are employed.
- a 3D matrix is formed in step 7, where all combinations in an ensemble are uniquely defined, i.e. in any combination there is only a single option for any non trivial (rotatable) hydrogen and non trivial lone pair.
- the 3D matrix has the form illustrated in Figure 5C. Each pair of lines constitutes one contribution. Each combination is evaluated, and the best combination is the result for the ensemble. In case of more than one ensemble the process is repeated for each ensemble.
- the energy criterion used to evaluate the quality of each combination is a pairwise
- the code is flexible and the force field can be easily modified to any desired.
- E £ " (n is of the order of 10 3 ).
- _F £ is an assembly of energies that corresponds to n sampled configurations for the full protein.
- H contains all configurations satisfying E. > E £ (1 - a) , where F £ (a) is the ⁇ -th percentile of _F £ , while L contains all configurations satisfying E, ⁇ E £ (a) .
- H stands for the 10 highest energy systems ( Figure 6B)
- L stands for the 10 systems with the lowest energy.
- n and ⁇ were chosen according to statistical formulae that deal specifically with the probability of justified and unjustified eviction of configurations from a large set of combinations.
- a minimization of incorrectly ruled out cases may be achieved by increasing ⁇ and n.
- the expected number of correctly ruled out cases also decreases, though, with a smaller slope.
- Protein Data Bank (Bernstein et al., J. Mol. Biol. 1997; 112: 535-542) files: Bovine Pancreatic Trypsin Inhibitor (5PTI), RNAse-A (5RSA), Trypsin (1NTP) and carbonmonoxymyoglobin (2MB5) for which the neutron diffraction coordinates are available for proton positions, and phosphate-binding protein (1IXH) for which very high resolution results have been reported by X-rays. All hydrogen atoms were removed from the PDB files and the algorithm was activated to reconstruct their locations, assuming them to be in optimal positions in the crystal.
- PPTI Bovine Pancreatic Trypsin Inhibitor
- NTP Trypsin
- 2MB5 carbonmonoxymyoglobin
- IXH phosphate-binding protein
- an imaginary protein was constructed. It has 1186 amino acids, as illustrated in Figure 7, out of which 13 are serines (presented as CPK models) (13 segments) and 1173 glycines (0 segments). It has a globular shape with sizes 64 A *64A *6lA .
- the serine hydroxyl oxygens were positioned to be at least lOA apart. In this case, the interactions between the hydroxyls can be neglected, and each segment can be treated as a separate ensemble. All possible combinations in this ensemble may be evaluated to obtain the global minimum for the system.
- Bovine pancreatic trypsin inhibitor (5PTL 1.8A resolution)
- trypsin inhibitor was determined by joint X-ray (l.OA resolution) and neutron diffraction (1.8A resolution) (Wlodawer et al. J Mol. Biol. 1987;193:145-156).
- This PDB file contains 58 amino acid residues and coordinates for 63 water molecules.
- a 2.5A water layer containing 54 water molecules was included in this calculation.
- a potassium and PO 4 3" ions from the PDB were also included in the calculation.
- the atoms in the side chains of residues GLU 7 and MET 52 were found to occupy two major sites.
- the *A* form was chosen for the calculation. Groups of rotatable atoms at a distance lower than 4.5A were defined as one ensemble. The total was 21 ensembles and 256 possible locations.
- Figure 8 depicts ln(total number of possible combinations) vs. the iteration number. The initial number of combinations is 1.19* 10 , of those, only 2690 remain for the exhaustive calculation after 443 iterations.
- Figure 9a depicts the energy distribution in the 1 st and 4 th iterations. The x-axis does not hold the same energy values for all iterations: The average energy of the samples taken decreases in progressive iterations. Therefore, the samples are divided among 30 columns: lowest energy samples are in column 1, highest in column 30. The number of samples taken in all iterations is constant. It can be seen that the algorithm eliminates energy bumps along the iterative process. Therefore, the energy distribution becomes more bell shaped along.
- This PDB file contains 124 amino acid residues, a PO 4 3" ion and coordinates of 128 water molecules. A 2.5A water layer containing 90 water molecules was included in this calculation. The four histidine residues of 5RSA were retained in the calculation in their protonated form, as found in the PDB file.
- Groups of rotatable atoms at a distance lower than 4.5 A were defined as one ensemble. A total number of 37 ensembles and 485 possible locations (Table II) was received. The "combined ensemble-stochastic approach" was employed. Ensembles 2, 7, 10,
- Figure 9b depicts the energy distribution in the 1 st and 4 th iterations. Due to the absence of energy bumps, the energy distribution remains bell shaped during the minimization.
- the structure of myoglobin was determined by neutron diffraction (1.8 A resolution)
- This PDB file contains 153 amino acid residues and coordinates for 89 water molecules (including their protons). It contains Protopo ⁇ hyrin with Fe, an ammonium ion and a sulfate ion. All waters, ions and the Protopo ⁇ hyrin moiety were included in the calculation. The HEM CO atoms are disordered. The *A* form was chosen for the calculation. The "combined ensemble-stochastic approach" was employed, as illustrated in Table III. Groups of rotatable atoms at a distance lower than 4.5 A were defined as one ensemble. A total number of 43 ensembles was obtained.
- Trypsin (1NTP, 1.8A resolution) The structure of trypsin was determined by neutron diffraction (1.8A resolution)(Kossiakoff, Basic Life Sci 1984; 27:281-304). The enzyme is inhibited by a monoisopropylphosphoryl derivative, which was taken into account in the calculation. A calcium ion with a 2+ charge was added according to the indications in the PDB file and was positioned close to GLU 70, ASN 72, VAL 75 and GLU 80. This structure does not contain any water of crystallization. Groups of rotatable atoms at a distance lower than 4.5 A are defined as one ensemble.
- Table IV lists the total number of 33 ensembles with a minimal energy of 483.9Kcal/mole.
- Phosphate-binding protein has been determined by X-ray diffraction (Wang et al., Nat. Struct. Biol. 1997;4:519-522).
- the PDB file contains 321 amino acid residues. No water molecules' coordinates are reported.
- the protein is complexed with a PO 4 phosphate ion with a charge of -3. The ion was included in the calculations.
- This entry contains six disordered residues: Glu 1, Ser 3, Thr 162, Pro 216, Ser 234, Lys 245. The *A* form was chosen for all of them.
- the "combined ensemble-stochastic approach" was employed, as illustrated in Table V. Groups of rotatable atoms at a distance lower than 4.5 A were defined as one ensemble. A total number of 45 ensembles was obtained.
- the five systems should be divided into two categories: The first are systems that lack experimental data for the coordinates of water molecules. Those systems are trypsin (INTP) and the Phosphate-binding protein (1IXH). Figure 10 shows a Ribbon display of INTP and its polar residues. Many polar hydrogens should create hydrogen bonds to water molecules. However, no water coordinates are included in this PDB entry.
- the method of the present invention lacks, in this case, essential data for correct positioning of polar protons for residues on the protein's surface.
- 5RSA, 5PTI and 2MB5 are systems with much experimental data regarding water positions. Those are the three most important for this study, and a good algorithm is expected to yield accurate proton predictions for them.
- the results of the methods for locating protons in biomolecular structures should be evaluated by a few criteria.
- the quality of the results should be examined in comparison to previously described methods as well as with respect to the ultimate goal, which is to achieve a negligible RMS for theoretical proton coordinates compared to experimental ones.
- the "combined ensemble-stochastic approach” and pure “stochastic approach” results were compared to experimental, to a CVFF minimization using the MSI Discover/Insightll software package, to the method of Brunger and Ka ⁇ lus, and to that of Bass et al., as shown in Table VI.
- the CVFF minimization employed the "steepest descents" algorithm for the first 100 iterations, followed by conjugate gradients until convergence with a maximum derivative lower than 0.001 Kcal/A was achieved.
- the present invention has two additional improvements over Bass at al.
- a bell shaped distribution in the first iterations indicates that there are no bumps between rotatable hydrogens.
- the "regular" bell shape of energy distributions for rotatable protons' positions obtained after a few iterations, may be an expression of the proteins' density in the vicinity of those protons: a "dense" protein should increase the barriers for rotations. Thus, its energies should be skewed towards the high end of the energy spectrum.
- the bell shape may be a demonstration of relative "free rotation" of those protons in a less dense surrounding.
- the present invention is also particularly useful for solving the problem of correctly determining the locations of amino acid side chains within a protein.
- This specific implementation of the present invention solves a difficult problem, by enabling such locations to be determined with some accuracy, without undue assumptions but also without a combinatorial explosion.
- the code uses a backbone dependent rotamer library.
- the August 1997 update of the rotamer library of Dunbrack & Ka ⁇ lus was used in the tests described below.
- a united atom model is employed (Weiner et al., J Amer. Chem. Soc. 1984; 106: 765-784).
- the torsion energy term is calculated for all dihedral angles of each residue's rotamers. If the non bonded energy term exceeds the value of 10 Kcal/mole for a given pair of atoms, it is truncated to 10 Kcal/mole.
- Every rotamer is given a local energy based on its probability in the backbone-dependent rotamer library.
- the search strategy includes several steps:
- the input for the calculation are the backbone (N, C ⁇ , C, O) coordinates of a protein with known structure. Those, together with ⁇ and ⁇ angles of the backbone are used in order to create the initial placement of possible rotamers for each residue. Possible disulfide bonds between cysteine residues are calculated by the distance between sulfur atoms. All rotamers that clash with the backbone are excluded. If all rotamers of a residue clash with the backbone, the rotamer with the lowest "clash energy" remains.
- the algorithm treats single rotamers as part of the backbone, i.e. other rotamers that clash with those residues will also be excluded.
- the algorithm also searches for all side chain clashes between rotamer i of amino acid j and rotamer k of amino acid 1.
- the algorithm excludes such pairs from being part of the solution, and therefore they are not sampled in the stochastic stage (vide infra).
- Stochastic stage It is obvious that in the case of a large biological system such as a protein, a very large combinatorial problem results.
- Hydrolase (larb) Tesunasawa et al., J. Biol. Chem. 1989; 264: 3832-3839
- the novel stochastic algorithm is employed.
- H contains all variable values satisfying E_ > E £ " (1 - a) , where F £ (a) is the ⁇ th percentile of F £ , while L contains all variable values satisfying E_ ⁇ F £ (a) .
- P error X Y(m -X ⁇ where m is the number of variable m, V m ) values (rotamers).
- m the number of variable m, V m ) values (rotamers).
- P er r or 0.
- P e rr o r 0, but the odds of evicting any variable value are very low.
- the stochastic algorithm is applied to 10 proteins of various sizes (46 to 263 residues), and complexity (1.04* 10 14 to 2.29* 10 105 possible combinations after elimination of rotamers that clash with the backbone), that were chosen to cover a range of protein fold families.
- 6 46-68 residues
- These proteins are: Crambin (PDB entry lcrn) (Teeter et al, J Mol Biol.
- the remaining proteins selected were larger (129-263 residues), with high resolution X-ray structures (resolution ⁇ 1.5 A, R factor ⁇ 0.17): Lysozyme (2ihl), Ribosomal protein (lwhi) (Davies et al., Structure 1996; 4:55-66) Endonuclease (2end) (Morikawa et al., Science 1992;256:523-526) and Hydrolase (larb) (Tsunasawa et al., J. Biol. Chem. 1989; 264:3832-3839). Table VII summarizes the results of applying the stochastic algorithm to the 10 proteins.
- Average RMS values for the 1000 low energy conformers are somewhat larger than for the global minimum, but for each protein, conformations that are higher in energy than the global minimum are found, that have a lower RMS than that minimum.
- the range of energy values for the 1000 lowest energy conformers is up to 5.52 Kcal/mole above the global minimum.
- the average energy gap of the 1000 lowest energy conformers from the global minimum is always small (2.20 Kcal/mole for all the proteins).
- the first question is whether the stochastic search achieves the results that could be obtained by an exhaustive search, given a specific rotamer library.
- the second questions is whether such a search can identify the crystallographic structure of a protein if the rotamer library includes the original X-ray rotamers.
- the first question requires a test of a relatively small protein, in which such an exhaustive search may be carried out. Given the constraints of the energy function and the rotamer library, our stochastic algorithm was imposed to find the lowest energy combinations in a test protein and compare them to the results of an exhaustive search.
- the entry contains 46 amino acid residues (see Figure 11) and coordinates for an ethanol molecule. There are 8 disordered residues (Thr 1, Thr 2, He 7, Val 8, Arg 10, Asn 12, He 34, Thr 39). In order to evaluate this protein in a reasonable time period, Arg 10 (the A form in this disordered residue), Arg 17, Glu 23, He 33 and He 35 were kept fixed in their original positions. The initial number of combinations (following the eliminative step of steric clashes) was 6.79* 10 . In Figure 12, the results of the stochastic and exhaustive searches for a range of N low energy conformations are compared.
- the stochastic algorithm was employed with an extended rotamer library to which the crystal rotamers of lcnr were introduced. No residues were fixed during this search. Energys were computed by equation 1 without the probability term, which is not available for the crystal coordinates. The following residues were not included: four Gly (no side chain), five Ala (only one possible rotamer) and six Cys (no rotamers because all of them form S-S bonds). Therefore, out of 46 amino acids in the sequence, 31 remained for this comparison. The energy of the protein in its crystal structure coordinates was 3.41 Kcal/mole higher than the global minimum found by the stochastic algorithm.
- each rotamer was located as close as possible to the relevant side chain in the crystal structure.
- the RMS value obtained was 1.15.
- the RMS value between the global energy minimum in the stochastic search and the crystal structure was found to be 1.97.
- ⁇ ⁇ ⁇ N ⁇ ⁇ ⁇
- , where N is the total number of structures in the ensemble, , J (j l, ..., N) is a 2D unit vector with phase equal to the dihedral angle ⁇ ,, i represents the residue number, and j stands for the number of ensemble number. If the angle is the same in all structures than S has a value of 1 , whereas a value of S much smaller than 1 indicates a disordered region of the structure. Philippopoulos & Lim limited their classification to an S value greater than 0.8.
- Table VIII contains a comparison between the stochastic algorithm, and the results of X-ray crystallography, NMR and MD. This table focuses on residues adopting highly probable conformations according to the following assumptions: In some cases torsion angles assumed a single conformation in the MD ensemble and multiple conformations in the NMR ensemble, while in others the reverse was obtained. We assume a high probability for an experimental rotamer if it obeys one or more of the following rules: (1) It appears in the high resolution crystal structure (2rn2). (2) It is found in at least two out of the three: low resolution crystal structure (lrnh), a NMR model and the MD simulation.
- a "hit” was considered to be any result of the stochastic algorithm, which has a fluctuation of up to ⁇ 30° from the "correct” conformer. Each such hit is marked by a "+” in the table. In some cases angles such as ⁇ l of M 47 are presented by a single rotamer in the table, and marked by "(+)". Such angles have additional values that do not obey the above two rules. Those other angles are considered to have low probability, and do not appear in table VIII. Out of 115 dihedral angles in table VIII, 7 angles are missing from the rotamer library (see Figure 13 A), and two other angles deviate by -40°, and therefore were not included in our evaluation as "hits". Thus, we may expect a maximum of 106 "hits", in comparison to X-rays, NMR and MD. The stochastic algorithm predicts correctly 87 angles (see Figure 13B), which is 82%.
- Leach & Lemon (Proteins 1998; 33: 227-239) explored the conformational space with the DEE/A* algorithm on a set of 8 proteins chosen to cover a range of protein fold families. The method of the present invention was then employed on 6 of those proteins (lcrn, lctf, lhcc, 2ovo, 3ebx, 5rxn).
- Snake venom neurotoxin (lnxb) (Tsernoglou et al., Mol Pharmacol. 1978; 14:710-716) was excluded due to an unknown residue type (residue 59).
- Bovine pancreatic trypsin inhibitor (5pti) (Wlodawer et al. J Mol. Biol.
- the previous description concerns the application of a novel stochastic search technique to explore the conformational space of proteins' side chains. It is an extension and refinement of the above example in the previous section for searching the positions of polar protons in proteins.
- the algorithm successfully explores the conformational space of various sizes of proteins and can deal with a large number of combinations after eliminating rotamers that clash with the backbone.
- Table VIII contains 106 angles of E. coli ribonuclease HI which was expected to be detected by comparing to X-rays, NMR, or MD.
- the algorithm detected correctly 87 angles, which are 82% of the total. Part of the deviation from 100% accuracy may be due to the quality of the rotamer library, but a greater part is due to the energy function.
- Mendes et al. (Proteins 1999; 37: 530-543) presented a rotamer as a continuous ensemble of conformations that cluster around the classic rigid rotamer. Such a technique may increase the rotamer library's efficiency.
- X-ray crystallography usually suggests a single structure, which might be biased toward specific conformational substates in the crystal (Brunger, Nat. Struct. Biol. 1997; 4 suppl: 862 ⁇ 865). Observing different conformations may be possible only at the highest resolution.
- the advantage of our algorithm is straightforward: it extends the single conformation into a family of viable conformations.
- NMR Unlike X-ray crystallography, NMR suggests alternative conformations by deciphering the 2D and 3D coupling maps. NMR does not teach us about the shape of the energy minima in the potential energy surface. NMR of proteins is a long and tedious experiment limited by the time scale of conformational variations, especially in large proteins. In this case, the method of the present invention may be an additional tool for suggesting alternative conformations. When NMR structures are available, the method of the present invention may be employed to extend this information by allowing the determination of the conformations' energy weights, thus enabling an assessment of their contribution to the overall population at equilibrium.
- MD simulations require extensive CPU time scales for biomolecules, which prohibits the full exploration of the conformational space.
- MD suggests conformations that may not be detected by NMR or by X-ray crystallography.
- MD time scales and barrier crossing ability are not yet reliable enough for detecting the global minimum or the population of lowest energy conformations in large biomolecules.
- the reliability of our stochastic algorithm in finding both has been demonstrated in this paper.
- MD trajectories imply a mechanism of conformational interconversions, the stochastic approach concentrates on products and not on pathways. Dill and Chan (Nature Struct. Biol. 1997; 4: 10-19; Chan & Dill, Proteins 1998, 30,
- the present stochastic search offers, in addition to finding the global minimum, the next N best solutions for rotamers in large proteins without any mean field approximation and is unique in that sense. It may thus be employed for studying thermodynamic properties of complex molecular systems.
- the stochastic algorithm can treat more than 250 residues (the maximum at this stage is 2.29 105 combinations).
- the DEE/A* algorithm treated a maximum of 68 residues and the maximal number of combinations (before backbone clash exclusion) was 10 44 . Following the application of the DEE algorithm, the size of the remaining space to be explored by the A* algorithm may be reduced to a maximu imm ooff l1O0 21 .
- the quality of the method of the present invention is compared to the results of the combined DEE/A* algorithm (Leach & Lemon, Proteins 1998; 33: 227-239), with a different energy expression and with two different libraries.
- a comparison of each technique to experiment by RMS is limited, because it is affected by the rotamer library :
- a RMS value of 2.0 with a rotamer library whose lowest RMS value for a protein is 1.9 reflects a better search technique than one with a RMS value of 1.5 obtained from a library whose optimal RMS is 0.1.
- the RMS values should be compared to the optimal RMS value that could be achieved within the constraints for the rotamer library.
- the present invention is also particularly useful for solving the problem of correctly predicting the structure of loops within a protein.
- This specific implementation of the present invention solves a difficult problem, by enabling such predictions to be determined with some accuracy, without undue assumptions but also without a combinatorial explosion.
- the specific implementation of the present invention which is described in this section under "Methods” was also tested against other methods known in the art, as described under "Results and Discussion". It should be noted that these methods and results are presented for the pu ⁇ oses of illustration only, and are not intended to be limiting in any way. The inte ⁇ retation for these results is then discussed.
- loops may be achieved by several strategies. Most of them employ standard bonds and bond angles, while varying dihedral angles only. This particular implementation of the method of the present invention follows this general path, while deviating from it in several steps.
- Geometric premises Figure 15 depicts an example of 6 residues (0-5). Residues 0 and 5 are in the invariable part of the protein. A search is performed for the conformations of residues 1-4. The loop is constructed simultaneously from both the N and C-termini (Moult & James, Proteins 1986; 1 : 146-163) and the loop closure is tested between residues 2 and 3. Such a construction strategy reduces the accumulation of errors: when one constructs the loop by dihedrals from one terminal toward the other, an inaccuracy in the first residues leads to an increasing amount of deviations in further residues.
- Figure 16 depicts the dihedral angles definition for a given residue: ⁇ of a residue n, in the construction strategy, is the ⁇ of the previous residue toward the N-terminal. The thought behind such a definition is that both ⁇ n and ⁇ n define the location of N and C atoms in residue n.
- the nitrogen of residue 1, the first to be predicted, should be located according to the ⁇ angle of the former residue.
- the exemplary method of the present invention assumes a trans (180°) structure for C ⁇ -C-N-C ⁇ .
- C ⁇ is located according to this premise.
- the carbonyl carbon of residue 1 is located according to ⁇ i, which is extracted from the search (vide infra).
- the nitrogen of residue 2 is located according to ⁇ 2 (which is regularly defined as ⁇ i) and so on.
- the carbonyl carbon of residue 4 is located by ⁇ 5 .
- C ⁇ of residue 4 is located at a 180° to the C ⁇ of residue 5.
- the N of residue 4 is located according to ⁇ 5 .
- residue 3 is located on the basis of ⁇ and ⁇ 4 as defined in Figure 16. Thus, the values of ⁇ 3 and ⁇ are not required.
- the method of the present invention employs a search for segments of 3 overlapping residues of each loop in SWISS-PROT (Bairoch & Apweiler, Nucleic Acids Res. 2000; 28: 45-48). Given a protein with a sequence ... ACGDEIL... , where 'A' is residue 0 from Figure 15, and CGDE is the loop, the method of the present invention searches for ACG, CGD, GDE, DEI and EIL segments. The Brookhaven Protein Data Bank (Bernstein et al., J.
- the pu ⁇ ose of the stochastic stage is to generate a population of loops that could potentially close.
- loops which remain open are evicted.
- the method of the present invention explores the conformational space using the cost function in equation 2.
- the "ko" parameter controls the stiffness of the angle spring, while ⁇ 0 defines its equilibrium angle.
- Unique parameters for angle bending are assigned to each bonded triplet of atoms based on their types. Two triplets were employed. The first was the C ⁇ -N-C (d 2 in figure 15), where C ⁇ is part of the previous residue. The second triplet included C ⁇ -C -N, where C ⁇ and C are part of the previous residue (d 3 in Figure 15).
- the torsion energy is modeled by a periodic function (equation 6):
- the above tests were intended to verify whether the novel stochastic search method may be applicable also to loop construction and whether it may be employed for the reconstruction of structurally known loops of varying size.
- the example used was a transmembrane protein.
- the only extensive experimental example is bacteriorhodopsin, which contains 7 transmembrane helices and was recently studied by high resolution crystallography (Luecke et al., J. Mol. Biol. 1999; 291 : 899-911).
- the search was applied to this structure (X-rays results at 1.55 A resolution, PDB file lc3w).
- the six loops of bacteriorhodopsin are listed in Table X.
- Loops 3 CD, intracellular and 4 (DE, extracellular) contain 2 and 1 residues respectively, and are not interesting test cases.
- loop 5 EF, intracellular
- the lc3w.pdb entry was not included for creating the residues' ( ⁇ ; ⁇ ) angle database that is employed for the stochastic search.
- the RMS values ranged between 0.28-2.46 (table XI), with an average value of 1.35.
- This fragment set is scored and sorted using a RMS fit to the anchor regions and a knowledge-based energy function.
- Van Vlijmen and Ka ⁇ lus employed a search on a database composed of 130 loops from 21 proteins. The best loops among the large number of candidates was determined by a CHARMM
- the method of the present invention was employed on the first loop of bacteriorhodopsin (vide infra).
- the RMS value between predicted and experimental backbones was 0.280.
- the real experimental dihedral angles were added to the angles' database, and the rest of the dihedral angles were deleted.
- the only option for the method of the present invention was to construct the system according to the experimental dihedral angles. If the rest of the angles and bond lengths were similar to the experimental one, one might expect to obtain a RMS value of 0. However a RMS value of 0.204 resulted. It indicates that such an approximation has a minor but not negligible effect. One must take that into consideration, especially when building large loops where the accumulation of errors might skew the results.
- the second question concerned the accuracy of approximation of evicting ⁇ ; ⁇ angle pairs that differ by less than 2° from another pair of the same residues (for both angles).
- the previous test was repeated with a slight change: all the experimental dihedral angles were increased in 2°. Su ⁇ risingly, a RMS value of 0.198 resulted. Repeating the same test with a 2° decrease for all dihedral angles resulted in a RMS value of 0.220. With such minor differences, the approximation can be shown to be appropriate.
- Section 4 Examples of Other Biological Problems
- Homology modeling construction of unknown protein structures on the basis of proteins known from X-rays or from NMR studies requires "insertions” and "deletions” of peptide fragments as well as mutations compared to the known structure.
- the homologous parts of the target (to be constructed) are superimposed, residue by residue, over those of the known protein.
- Other parts may differ in length and are regularly encountered in loops, beta-hai ⁇ ins and random coil parts of the known protein.
- Each such operation requires a re-evaluation of the backbone coordinates in those non-homologous parts, due to length differences ("insertions" and "deletions") as well as side chain positions, at least in the vicinity of the moderated part of the structure.
- Any planning of mutations in known protein structures may be aided by constructing models with an initial intact rigid backbone. Substantial progress in solving this acute problem has been already achieved by the method of the present invention.
- Cyclization of active peptides and other linear molecules is one of the methods of choice for increasing their binding to biological receptors, due to the expected reduction in entropy loss, increasing their stability to digestion as well as strengthening their specificity and selectivity, etc..
- the design of such cyclic structures may be aided considerably by preliminary modeling of the alternatives for ring closure. This is a function of many variables such as ring size, bond lengths, bond angles, and other factors. This problem is quite similar to that of loop structure prediction with regard to the present invention.
- cyclic peptides are smaller than loops, and so less "freedom" may be introduced into the conformational flexibility of the backbone and of side chains. Also, relatively small increments for phi and psi (backbone) angles are required for a thorough search for ring closure options.
- this is an extension of the problem of side chain positioning and also of determining a structure of a loop of a protein, differing from it in the need to move the drug by six degrees of freedom (translational + rotational) with respect to the biomolecular active site.
- the present invention must handle both the location of the side chains and the loops (backbone variations) predictions described earlier, but with the optimization applied to both a biomolecular target and a ligand at once, with the additional need to optimize their relative positions.
- Those additional degrees of freedom may optionally be introduced as variables, but with special requirements.
- the problem is analyzed according to the method of the present invention with the addition of an additional variable, which is the relative distance of the entities (the drug and the biomolecular active site, for example).
- the variables thus include variables for distance and angles, for a total of six additional variables for translations and rotations.
- the present invention must handle both the location of the side chains and the loops (backbone variations) predictions described earlier, but with the optimization applied to both a biomolecular target and a ligand at once, with the additional need to optimize their relative positions.
- the variables thus preferably include variables for distance and angles, for a total of six such variables: three translations along XN,Z coordinate axes and three rotations about the same angles.
- Such comparisons enable the assessment of the possibility that different molecules may be attached to the same biomolecular site/target.
- Two different molecules may display similar binding affinities to enzyme active sites or to a receptor.
- the method of the present invention enables the structural differences between such molecules to be optimized, in order to find candidates for a "bioactive conformation" of both.
- This problem presents another conformational search for the present invention, but the function or quantitative parameter to be minimized in this case would be the RMS difference between spatial positions of selected atoms in the two molecules.
- Protein folding has been a central problem of biophysics in the last two decades.
- the method of the present invention may be applied to a set of proteins which have a relatively small number of residues, in the range of 50-80, depending on their primary structure.
- this approach can produce many other low energy conformations that are in the energy vicinity of the global minimum and contribute to the total character of the protein.
- the variables will be the phi and psi angles along the backbone (each with 6 or 12 rotations of 60° or 30° difference, respectively, as well as rotamers for the side chains.
- the size of the problem is 6 ⁇ 0 or ⁇ l()66. with the additional rotamers that should be positioned simultaneously, it increases to about lO ⁇ O.
- the resultant calculations may be complex, they can be performed with the method of the present invention.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Crystallography & Structural Chemistry (AREA)
- Biochemistry (AREA)
- Library & Information Science (AREA)
- Molecular Biology (AREA)
- Medicinal Chemistry (AREA)
- Computing Systems (AREA)
- Investigating Or Analysing Biological Materials (AREA)
- Heterocyclic Carbon Compounds Containing A Hetero Ring Having Oxygen Or Sulfur (AREA)
- Complex Calculations (AREA)
Abstract
Description
Claims
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA002391987A CA2391987A1 (en) | 1999-11-22 | 2000-11-22 | System and method for searching a combinatorial space |
IL14961200A IL149612A0 (en) | 1999-06-06 | 2000-11-22 | System and method for searching a combinatorial space |
EP00977840A EP1266337A2 (en) | 1999-11-22 | 2000-11-22 | System and method for searching a combinatorial space |
AU15469/01A AU780941B2 (en) | 1999-11-22 | 2000-11-22 | System and method for searching a combinatorial space |
JP2001540691A JP2003524831A (en) | 1999-11-22 | 2000-11-22 | System and method for exploring combinatorial space |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16674499P | 1999-11-22 | 1999-11-22 | |
US60/166,744 | 1999-11-22 | ||
US20980600P | 2000-06-07 | 2000-06-07 | |
US60/209,806 | 2000-06-07 |
Publications (3)
Publication Number | Publication Date |
---|---|
WO2001039098A2 true WO2001039098A2 (en) | 2001-05-31 |
WO2001039098A3 WO2001039098A3 (en) | 2002-09-12 |
WO2001039098A8 WO2001039098A8 (en) | 2004-04-29 |
Family
ID=26862535
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IL2000/000779 WO2001039098A2 (en) | 1999-06-06 | 2000-11-22 | System and method for searching a combinatorial space |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP1266337A2 (en) |
JP (1) | JP2003524831A (en) |
AU (1) | AU780941B2 (en) |
CA (1) | CA2391987A1 (en) |
WO (1) | WO2001039098A2 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004013324A1 (en) * | 2002-08-01 | 2004-02-12 | Mochida Pharmaceutical Co., Ltd. | Novel crystalline tryptase and utilization thereof |
EP1510959A2 (en) * | 2001-08-10 | 2005-03-02 | Xencor | Protein design automation for protein libraries |
US7315786B2 (en) | 1998-10-16 | 2008-01-01 | Xencor | Protein design automation for protein libraries |
US7379822B2 (en) | 2000-02-10 | 2008-05-27 | Xencor | Protein design automation for protein libraries |
US8726100B2 (en) | 2011-02-03 | 2014-05-13 | Fujitsu Limited | Non-transitory computer-readable recording medium in which a failure analyzing program is recorded, failure analyzing apparatus, and method for analyzing failure |
CN112649802A (en) * | 2020-12-01 | 2021-04-13 | 中国人民解放军海军航空大学 | Tracking method before weak and small multi-target detection of high-resolution sensor |
CN114694759A (en) * | 2020-12-28 | 2022-07-01 | 富士通株式会社 | Stable structure search method, storage medium, and stable structure search apparatus |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI413020B (en) | 2008-12-31 | 2013-10-21 | Ind Tech Res Inst | Method and system for searching global minimum |
AU2013302283B2 (en) * | 2012-08-17 | 2015-04-09 | Zymeworks Inc. | Systems and methods for sampling and analysis of polymer conformational dynamics |
LT2951579T (en) * | 2013-01-31 | 2024-05-27 | Codexis, Inc. | Methods, systems, and software for identifying bio-molecules using models of multiplicative form |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1998047089A1 (en) * | 1997-04-11 | 1998-10-22 | California Institute Of Technology | Apparatus and method for automated protein design |
-
2000
- 2000-11-22 WO PCT/IL2000/000779 patent/WO2001039098A2/en not_active Application Discontinuation
- 2000-11-22 CA CA002391987A patent/CA2391987A1/en not_active Abandoned
- 2000-11-22 AU AU15469/01A patent/AU780941B2/en not_active Ceased
- 2000-11-22 JP JP2001540691A patent/JP2003524831A/en active Pending
- 2000-11-22 EP EP00977840A patent/EP1266337A2/en not_active Withdrawn
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1998047089A1 (en) * | 1997-04-11 | 1998-10-22 | California Institute Of Technology | Apparatus and method for automated protein design |
Non-Patent Citations (4)
Title |
---|
CORNO F ET AL: "A New Evolutionary Algorithm Inspired by the Selfish Gene Theory" IEEE INTERNATIONAL CONFERENCE ON EVOLUTIONARY COMPUTATION - ICEC'98, [Online] 1998, pages 575-580, XP002199150 Retrieved from the Internet: <URL:http://citeseer.nj.nec.com/corno98new .html> [retrieved on 2002-05-08] * |
GORDON D B ET AL: "BRANCH-AND-TERMINATE: A COMBINATORIAL OPTIMIZATION ALGORITHM FOR PROTEIN DESIGN" STRUCTURE, CURRENT BIOLOGY LTD., PHILADELPHIA, PA, US, vol. 7, no. 9, 1999, pages 1089-1097, XP001028197 ISSN: 0969-2126 * |
LEACH AR AND LEMON AP: "Exploring the Conformational Space of Protein Side Chains Using Dead-End Elimination and the A* Algorithm" PROTEINS: STRUCTURE, FUNCTION, AND GENETICS, vol. 33, no. 2, 1 November 1998 (1998-11-01), pages 227-239, XP002199152 cited in the application * |
LIWO A ET AL: "Protein structure prediction by global optimization of a potential energy function" PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, [Online] vol. 96, May 1999 (1999-05), pages 5482-5485, XP002199151 Retrieved from the Internet: <URL:http://www.pnas.org/cgi/reprint/96/10 /5482.pdf> [retrieved on 2002-05-08] * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7315786B2 (en) | 1998-10-16 | 2008-01-01 | Xencor | Protein design automation for protein libraries |
US7379822B2 (en) | 2000-02-10 | 2008-05-27 | Xencor | Protein design automation for protein libraries |
EP1510959A2 (en) * | 2001-08-10 | 2005-03-02 | Xencor | Protein design automation for protein libraries |
EP1510959A3 (en) * | 2001-08-10 | 2006-07-26 | Xencor, Inc. | Protein design automation for protein libraries |
WO2004013324A1 (en) * | 2002-08-01 | 2004-02-12 | Mochida Pharmaceutical Co., Ltd. | Novel crystalline tryptase and utilization thereof |
US8726100B2 (en) | 2011-02-03 | 2014-05-13 | Fujitsu Limited | Non-transitory computer-readable recording medium in which a failure analyzing program is recorded, failure analyzing apparatus, and method for analyzing failure |
CN112649802A (en) * | 2020-12-01 | 2021-04-13 | 中国人民解放军海军航空大学 | Tracking method before weak and small multi-target detection of high-resolution sensor |
CN114694759A (en) * | 2020-12-28 | 2022-07-01 | 富士通株式会社 | Stable structure search method, storage medium, and stable structure search apparatus |
Also Published As
Publication number | Publication date |
---|---|
AU780941B2 (en) | 2005-04-28 |
WO2001039098A3 (en) | 2002-09-12 |
AU1546901A (en) | 2001-06-04 |
CA2391987A1 (en) | 2001-05-31 |
EP1266337A2 (en) | 2002-12-18 |
WO2001039098A8 (en) | 2004-04-29 |
JP2003524831A (en) | 2003-08-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zsoldos et al. | eHiTS: a new fast, exhaustive flexible ligand docking system | |
De Bakker et al. | Ab initio construction of polypeptide fragments: Accuracy of loop decoy discrimination by an all‐atom statistical potential and the AMBER force field with the Generalized Born solvation model | |
Liu et al. | MCDOCK: a Monte Carlo simulation approach to the molecular docking problem | |
Oshiro et al. | Flexible ligand docking using a genetic algorithm | |
Ustach et al. | Optimization and evaluation of site-identification by ligand competitive saturation (SILCS) as a tool for target-based ligand optimization | |
Fiser | Comparative protein structure modelling | |
Blaney et al. | Computational approaches for combinatorial library design and molecular diversity analysis | |
US20070020642A1 (en) | Structural interaction fingerprint | |
Yadava | Search algorithms and scoring methods in protein-ligand docking | |
Xu et al. | Retrospect and prospect of virtual screening in drug discovery | |
Verdonk et al. | Protein–ligand informatics force field (PLIff): toward a fully knowledge driven “force field” for biomolecular interactions | |
US20030228624A1 (en) | Molecular docking methods for assessing complementarity of combinatorial libraries to biotargets | |
WO2005008240A2 (en) | STRUCTURAL INTERACTION FINGERPRINT (SIFt) | |
US20020025535A1 (en) | Prioritization of combinatorial library screening | |
US20070166760A1 (en) | Ligand searching device, ligand searching method, program, and recording medium | |
AU780941B2 (en) | System and method for searching a combinatorial space | |
Knegtel et al. | Comparison of two implementations of the incremental construction algorithm in flexible docking of thrombin inhibitors | |
Miller et al. | Prediction of long loops with embedded secondary structure using the protein local optimization program | |
Stahl | Structure‐Based Library Design | |
Das et al. | Optimization of solvation models for predicting the structure of surface loops in proteins | |
EP1468392B1 (en) | Method for binding site identification using a multi-scale approach | |
Vengadesan et al. | Energy landscape of Met-enkephalin and Leu-enkephalin drawn using mutually orthogonal Latin squares sampling | |
Lin et al. | An anchor-dependent molecular docking process for docking small flexible molecules into rigid protein receptors | |
Pichierri | Computation of the permanent dipole moment of α-chymotrypsin from linear-scaling semiempirical quantum mechanical methods | |
Thomsen | Protein–ligand docking with evolutionary algorithms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 15469/01 Country of ref document: AU |
|
WWE | Wipo information: entry into national phase |
Ref document number: 149612 Country of ref document: IL |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2391987 Country of ref document: CA |
|
ENP | Entry into the national phase |
Ref country code: JP Ref document number: 2001 540691 Kind code of ref document: A Format of ref document f/p: F |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2000977840 Country of ref document: EP |
|
AK | Designated states |
Kind code of ref document: A3 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWP | Wipo information: published in national office |
Ref document number: 2000977840 Country of ref document: EP |
|
CFP | Corrected version of a pamphlet front page | ||
CR1 | Correction of entry in section i |
Free format text: IN PCT GAZETTE 22/2001 DUE TO A TECHNICAL PROBLEMAT THE TIME OF INTERNATIONAL PUBLICATION, SOME INFORMATION WAS MISSING UNDER (81). THE MISSING INFORMATION NOW APPEARS IN THE CORRECTED VERSION |
|
WWG | Wipo information: grant in national office |
Ref document number: 15469/01 Country of ref document: AU |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2000977840 Country of ref document: EP |