WO1997036252A1 - Procede informatique de conception de structures chimiques ayant en commun des caracteristiques fonctionnelles - Google Patents

Procede informatique de conception de structures chimiques ayant en commun des caracteristiques fonctionnelles Download PDF

Info

Publication number
WO1997036252A1
WO1997036252A1 PCT/CA1996/000166 CA9600166W WO9736252A1 WO 1997036252 A1 WO1997036252 A1 WO 1997036252A1 CA 9600166 W CA9600166 W CA 9600166W WO 9736252 A1 WO9736252 A1 WO 9736252A1
Authority
WO
WIPO (PCT)
Prior art keywords
affinity
receptor
character
maximal
sum
Prior art date
Application number
PCT/CA1996/000166
Other languages
English (en)
Inventor
Jonathan M. Schmidt
Original Assignee
University Of Guelph
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University Of Guelph filed Critical University Of Guelph
Priority to AU49350/96A priority Critical patent/AU712188C/en
Priority to PCT/CA1996/000166 priority patent/WO1997036252A1/fr
Priority to NZ332332A priority patent/NZ332332A/en
Priority to EP96905638A priority patent/EP0888591A1/fr
Priority to EA199800843A priority patent/EA001095B1/ru
Priority to JP09533880A priority patent/JP2000507940A/ja
Publication of WO1997036252A1 publication Critical patent/WO1997036252A1/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K1/00General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Definitions

  • the present invention relates to a computer-based methods for designing chemical structures sharing common useful, functional properties based on specific combinations of steric configuration and binding affinity. More particularly the present invention provides a method for producing computer- simulated receptors which functionally mimic biological receptors. The simulated receptors are designed to exhibit optimized selective affinity for known target molecules. Chemical structures are then generated and evolved to exhibit selective affinity for the simulated receptors.
  • Biological receptors are linear polymers of either amino acids or nucleotides that are folded to create three-dimensional envelopes for substrate binding.
  • the specific three-dimensional arrangements of these linear arrays, and the placement of charged sites on the envelope surface are the products of evolutionary selection on the basis of functional efficacy.
  • the selectivity of biological receptors depends upon differences in the strength of attractive and repulsive forces generated between the receptor and the substrate. The magnitude of these forces varies in part with the magnitude and proximity of charged sites on the receptor and substrate surfaces.
  • binding affinity can vary with substrate structure.
  • Substrates with similar binding affinities for the same receptor have a high likelihood of sharing a common spatial arrangement of at least some of their induced and fixed charged sites. If the function of the receptor is correlated with binding affinity, then substrates with similar binding affinities will also be functionally similar in their effects. It is in this sense the receptor can be said to recognize or quantify similarities between the substrates.
  • the present invention provides a method for identifying non-trivial similarities between different chemical structures which are both necessary and sufficient to account for their shared functional properties.
  • the process also provides a method of generating novel chemical structures that display similar functional properties.
  • the basic concept underlying the present invention is the use of a two-step computational process to design or discover chemical structures with useful functional properties based on specific combinations of steric configuration and binding affinity.
  • an algorithmic emulation of antibody formation is used to create a population of computer-generated simulated receptors that mimic biological receptors with optimized binding affinity for selected target substrates.
  • the simulated or virtual receptors are used to evaluate the binding affinity of existing compounds or to design novel substrates with optimal binding.
  • the method described herein provides simulated receptors which mimic selected features of biological receptors, including the evolutionary processes that optimize their binding selectivity.
  • the mimics or simulated receptors generated by the method can be used to recognize specific similarities between molecules.
  • the simulated receptors generated by this invention are feature extraction mechanisms: they can be used to identify or recognize common or similar structural features of target substrates. Binding affinity between the receptors and the target substrates is used as a metric for feature recognition. Target substrates can be quantitatively categorized on the basis of binding affinity with a specific simulated receptor. Compounds sharing specific structural features will also share similar binding affinities for the same virtual receptor.
  • Binding affinity between biological receptors and substrates is determined by the steric goodness of fit between the adjacent receptor and substrate surfaces, the exclusion of water between non-polar regions of the two surfaces and the strength of electrostatic forces generated between neighbouring charged sites. In some cases the formation of covalent bonds between the substrate and the receptor may also contribute to binding affinity.
  • the simulated receptors generated by this process mimic the binding mechanisms of their biological counterparts. Average proximity of the receptor and target surfaces and the strength of electrostatic attractions developed between charged sites on both surfaces are used to calculate a measurement of binding affinity. The resulting values for binding affinity are used to evaluate substrate molecular similarities.
  • Binding affinity can be globally determined, that is, dependent upon interactions between the entire substrate surface and a closed receptor or receptor envelope that completely surrounds the substrate. In this case analysis of global similarities between substrates is appropriate as a basis for developing useful quantitative structure-activity relationships. However, in most, if not all, biological systems, affinity is locally rather than globally determined. Interactions between substrate molecules and biological receptors are generally limited to contacts between isolated fragments of the receptor and the substrate surface. In this situation, analysis of global similarities between substrates is inappropriate as a method of developing structure-activity relationships, since only fragments of the substrate are directly involved in the generation of binding affinity.
  • Locally similar structures share similar structural fragments in similar relative positions and orientations. Locally similar structures are not necessarily globally similar. Sampling of molecular properties may be achieved by a total sampling strategy involving evaluation of global similarity; a fragment sampling strategy involving evaluation of local similarity; and multiple fragments sampling strategies involving evaluation of both local and global similarity.
  • the analysis of local similarities relies on sampling discrete regions of substrates for similar structures and charge distributions.
  • localized sampling arises due to the irregularity or bumpiness of the adjacent substrate and receptor surfaces . Interactions between closely opposed surfaces will predominant over interactions between more separated regions in the determination of binding affinity. The proximity of the adjacent surfaces will also determine the strength of hydrophobic binding.
  • the effective simulated receptors generated by the present method must exploit discrete local sampling of target substrates (molecules) in order to evaluate functionally relevant similarities between compounds.
  • the part of the present method directed to the generation of simulated receptors capable of categorizing similarities between chemical substrates is essentially a search for receptors that sample the relevant fragments of the substrates at the relevant locations in space.
  • the optimization process relies on four features of simulated receptors: 1) generality: wherein the receptors are able to bind with more than one substrate; 2) specificity: the binding affinity of the receptors varies with substrate structure; 3) parsimony: the receptors differentiate among substrates on the basis of a minimal set of local structural features; and 4) mutability: alteration of the structure of a receptor can change its binding affinity for a specific substrate.
  • Encoding of the receptor phenotype in the form of a linear genotype represented by a character string facilitates the processes of mutation, recombination and inheritance of the structural characteristics of the simulated receptors.
  • Simulated receptors that satisfy these fundamental criteria can be optimized to obtain specific binding affinities for locally similar substrates using evolutionary selective breeding strategies. This is accomplished by encoding the spatial configuration and charge site distribution of the receptor in an inheritable format that can undergo alterations or mutations. Like biological receptors, the simulated receptors generated by this method define a three-dimensional exclusion space. Such a three-dimensional space can be outlined to an arbitrary degree of resolution by a one-dimensional path of sufficient length and tortuosity. Proteins formed from linear polymers of amino acids are examples of such structures. Similarly the three-dimensional structure of simulated receptors can be encoded as a linear array of turning instructions. This one-dimensional encoded form of the receptor constitutes its genotype.
  • the decoded form used to assess binding affinity constitutes its phenotype.
  • alterations are made to the receptor genotype.
  • the effects of these changes on the binding affinity of the phenotype are subsequently evaluated.
  • Genotypes that generate phenotypes with desirable binding affinities are retained for further alteration, until, by iteration of the mutation and selection process, a selected degree of optimization of the phenotype is achieved.
  • a variety of evolutionary strategies including classical genetic algorithms, may be used to generate populations of simulated receptors with optimal binding characteristics.
  • Receptors generated by this method are then used to generate or identify novel chemical structures (compounds) which share the specific, useful properties of the molecular target species used as selection criteria in producing the simulated receptors.
  • novel chemical structures are evolved to optimally fit the receptors. Because these structures must meet the necessary and sufficient requirements for receptor selectivity, they are likely to also possess biological activity similar to that of the original molecular targets.
  • the population of simulated receptors with enhanced selectivity may also be used to screen existing chemical structures for compounds with high affinity that may share these useful properties. The same process may also be used to screen for compounds with selected toxicological or immunological properties.
  • step (g) altering the chemical structure to produce a variant of the chemical structure and repeating step (f);
  • step b) mutating the receptor genotype and repeating step b) and retaining and mutating those receptors exhibiting increased fitness coefficients until a population of receptors with preselected fitness coefficients are obtained; thereafter
  • a method of designing simulated receptors mimicking biological receptors exhibiting selective affinity for compounds with similar functional characteristics comprising the steps of: a) producing a simulated receptor genotype by generating a receptor linear character sequence which codes for spatial occupancy and charge;
  • step c) mutating the genotype and repeating step b) and retaining and mutating those receptors exhibiting increased fitness coefficients until a population of receptors with preselected fitness coefficients are obtained.
  • step (g) altering the chemical strucutre to produce a variant of the chemical structure and repeating step (f);
  • a method of encoding chemical structures comprising atomic elements, the method comprising providing a linear character sequence which codes for spatial occupancy and charge for each atom of said chemical structure.
  • Figure 1 is a flow chart showing relationship between genotype code creation and translation to produce a corresponding phenotype forming part of the present invention
  • Figure 2 is a flow chart showing an overview of the steps in the optimization of a receptor for selectively binding to a set of substrates using point mutations forming part of the present invention
  • Figure 3 is a flow chart showing an overview of the steps in the process of producing a population of related receptors with optimized selective binding affinity for a set of chemical substrates and using these optimized receptors for producing a set of novel chemical substrates with common shared functional characteristics;
  • Figure 4a shows several chemical compounds used in the example relating to examples of ligand generation
  • Figure 4b shows ligands 1.1 to 1.4 generated by the method of the present invention in the example of ligand generation wherein each ligand has at least one orientation wherein it is structurally similar to benzaldehyde;
  • Figure 4c shows ligands 2.1 to 2.4 generated by the method of the present invention in the example of ligand generation relating to design of chemical structural exhibiting an efficacy for repelling misquitoes.
  • the method can be broken into two parts: (A) evolution of a population of simulated receptors with selective affinity for compounds with shared functional characteristics and (B) generation of novel chemical structures having the shared functional characteristics.
  • Part (A) comprises several steps including 1) receptor genotype and phenotype generation; 2) presentation of the known chemical structure(s) to the receptor; 4) evaluation of affinity of the receptor for the chemical structure(s); 5) assessing the selectivity of the receptor for the chemical structure(s); 6) stochastically evolving a family of related receptors with optimized selective affinity for the chemical structure(s); screening chemical substrates for toxicological and pharmacological activity and using the optimized receptors to design novel chemical structure(s) with selective binding affinity for the receptors.
  • Both the simulated receptor genotypes and phenotype are computational objects.
  • the phenotypes of the simulated receptors consist of folded, unbranched polymers of spherical subunits whose diameter is equal in length to the van der Waals radius of atomic hydrogen ( ⁇ 100 pm).
  • Subunits can be connected to each other at any two of the six points corresponding to the intercepts of the spheres with each of their principal axes .
  • connections between subunits cannot be stretched or rotated and the centers of two connected subunits are always separated by a distance equal to the length of their sides (i.e. 1 hydrogen radius). Turns occur when two subunits are not attached to the opposite faces of their common neighbour.
  • Four kinds of orthogonal turns are possible: left, right, up and down. Turns must be made parallel to one of the principal axes. For computational simplicity, if turns result in intersection with other subunits in the polymer, subunits are permitted to occupy the same space with other subunits.
  • a complete simulated receptor consists of one or more discrete polymers.
  • the individual polymers can originate at different points in space.
  • each polymer is encoded as a sequential set of turning instructions.
  • the instructions identify individual turns with respect to an internal reference frame based on the initial orientation of the first subunit in each polymer. Hydration of the receptor and substrate are not treated explicitly in the current implementation, instead, it is assumed that any water molecules present at the binding site are attached permanently to the receptor surface and comprise an integral part of its structure. This is an arbitrary approximation and those skilled in the art will appreciate that it could be replaced by a more exact treatment (see, for example, VanOss, 1995, Molecular Immunology 32:199-211).
  • the code creation module generates random strings of characters.
  • Each character represents either a turning instruction or determines the charge characteristics or reactivity of a point in the three- dimensional shape comprising the virtual receptor.
  • a minimum of five different characters are required to create a string describing the three-dimensional shape of a receptor based on Cartesian (rectangular) coordinate framework.
  • Other frameworks e.g. tetrahedral structures can also be constructed using different sets of turning instructions.
  • the characters represent turning instructions which are defined with respect to the current path of the virtual receptor structure in three- dimensional space (i.e. the instructions refer to the intrinsic reference frame of the virtual receptor and not an arbitrary external reference frame).
  • genotype code creation and phenotype expression will be understood by those skilled in the art to be illustrative only. In this example the following conventions are employed.
  • Subunits are of two types: charged or uncharged. All charged subunits are assumed to carry a unitary positive or negative charge. The uniform magnitude of charges is an arbitrary convention.
  • the receptors comprise 15 discrete polymers.
  • the length of the complete code is always a multiple of fifteen.
  • the length of each polymer is equal to the total code length divided by fifteen. It will be understood that receptors can be constructed from any number of discrete polymers of varying or constant length.
  • Module I gives a flowchart of a sample of genotype code creation.
  • Each genotype code is translated to create the three- dimensional description of its corresponding phenotype or virtual receptor. From a predefined starting point a translation algorithm is used to convert the turning instructions into a series of coordinate triplets which describe the position in space of the successive subunits comprising the receptor polymers. The starting coordinates for each polymer must be given prior to translation. The translation assumes that centers of successive subunits are separated by a distance equal to the covalent diameter of a hydrogen atom.
  • the translation algorithm reads the code string sequentially to generate successive turns and straight path sections. The interpretation of successive turns with respect to an external coordinate system depends upon the preceding sequence of turns. For each polymer comprising the receptor, the initial orientation is assumed to be the same. In the current implementation, the translation algorithm is described by TABLE
  • Targets are represented as molecules consisting of spherical atoms.
  • the atoms are considered to be hard spheres with fixed radii characteristic for each atomic species.
  • the hard sphere radius at which the repulsive force between the target atoms and the virtual receptor is considered to be infinite is approximated by the exposed van der Waals radius given in TABLE
  • the affinity of the each target for the simulated receptor(s) is tested for several orientations of the target relative to the upper surface of the receptor.
  • the upper surface is defined by the translation algorithm.
  • the target and receptor Prior to the evaluation of binding affinity, the target and receptor must be brought into contact. Contact occurs when the distance between the centers of at least one subunit of the receptor and at least one atom of the target is equal to their combined van der Waals radii.
  • the target In order to determine the relative positions of the target and receptor at the point of contact, the target is shifted incrementally towards the receptor surface along a path perpendicular to the surface and passing through the geometric centers of both the receptor and the target.
  • the target has reached its collision position relative to the receptor.
  • the translated positions of the target atoms when the collision position is reached are used to calculate distances between the atoms of the target and the subunits of the receptor. These distances are used to calculate the strength of electrostatic interactions and proximity.
  • the target is assumed to travel in a straight line towards the receptor, and to retain its starting orientation at the time of contact.
  • An alternative approach would allow the target to incrementally change its orientation as it approached the receptor so that the maximal affinity position was achieved at the point of contact.
  • this method is functionally similar to that implemented, it is much more computationally complex.
  • multiple orientations are tested at lower computational effort.
  • the current implementation allows for adjustable displacement of the path along the x and/or y axis of the receptor to accommodate larger molecules. This feature is required to enhance selectivity when molecules differing in size are tested on the same receptor.
  • the orientation of the target Prior to the calculation of the collision position, the orientation of the target is randomized by random rotation in 6° increments around each of the x, y, and z axes. Larger or smaller increments of rotation may be used. Each of these random orientations of the target is unique in a given test series. The reliability of the optimization process is dependent upon the number of target orientations tested as well as the number of target compounds evaluated. A sample process for target presentation is given in Module 3.
  • the current implementation is based on a simplified approximation that evaluates the principal components of affinity with relatively little computational effort.
  • the approximation is developed in the following sections. However, it will be appreciated by those skilled in the art that more exact affinity calculation procedures may be utilized which give a more exact affinity value. Known computational packages for calculating more accurate affinity values may be used directly in the present process.
  • crown ethers It is possible to demonstrate in crown ethers that the major components of electrostatic interactions are determined by local rather than global transfers of charge between atoms. Charge distribution is mainly determined by short range effects due to different chemical bonds. In particular, non-neighbouring atoms contribute little to atomic dipole moments. In addition, although charge transfer between atoms is also influenced by the electrostatic field of the whole molecule, calculations for crown ethers show only a very small influence on the charge distribution.
  • the method of the present invention incorporates an approximation of affinity between the target ligand and the simulated receptor(s) and between the simulated receptor(s) and chemical structure(s) being designed based on two measures.
  • the chemical substrate targets evaluated by the current implementation are assumed to be neutral (i.e. not ionized) molecules. This is an arbitrary limitation, and an implementation applicable to charged and uncharged targets can be developed using the same methodology.
  • the environment surrounding the virtual receptor is assumed to be a solvent system in which the target occurs as a solute .
  • the target is effectively partitioned between the solvent and the virtual receptor.
  • the target and receptor are assumed to be stationary with respect to each other, and in a specific, fixed orientation. 5.
  • the targets are assumed to interact with only two types of site on the receptor surface: fixed charge sites (either negatively or positively charged) and non-polar sites.
  • Hydrophobic Strength and Water Exclusion Contribution are important considerations in the generation of binding affinity. For example, hydrophobic bond formation relies upon the close spatial association of non- polar, hydrophobic groups so that contact between the hydrophobic regions and water molecules is minimized. Hydrophobic bond formation may contribute as much as half of the total strength of antibody-antigen bonds. Hydration of the receptor and substrate surfaces is also a significant factor. Water bound to polar sites of either the receptor or substrate surface can interfere with binding or increase affinity by forming cross-bridges between the surfaces.
  • the hydrophobic interaction describes the strong attraction between hydrophobic molecules in water.
  • receptor-target interactions it is taken to refer to the attraction between the non-polar fragments of the target and adjacent domains of non-polar receptor subunits. The effect arises primarily from entropic effects resulting in rearrangements of the surfaces so that water is excluded between adjacent non-polar domains. Exact theoretical treatments of the hydrophobic interaction are unavailable, however, it is estimated that hydrophobic forces contribute as much as 50% of the total attraction between antibodies and antigens.
  • the present implementation evaluates the proportion of the receptor that is effectively shielded from solvation by binding with the target. All non-polar (uncharged) subunits that are within a fixed distance of non-polar atoms on the target are considered to be shielded from solvation by solvent molecules of diameter equal to or greater than the limiting distance.
  • the combined affinity calculation used in the current implementation combines two measures of interaction: the summed strengths of the charge-dipole interactions and a proximity measure. These affinities are assumed in the current implementation to be isotropic. It will be appreciated by those skilled in the art that greater discriminatory power may be obtained if anisotropic calculations of affinity are used, although these are computationally more complex.
  • D the dipole moment of the ith atom of the target
  • r ij the distance between the ith atom and the jth charge site on the receptor
  • d can range from 1 to 4 subunit diameters (this approximates the van der Waals radius of water).
  • N is the total number of subunits comprising the receptor.
  • P in the equation serves two roles. In the first instance it is a weighting factor. As a measure of 'goodness of fit' it is use to bias the affinity value in favour of those configurations in which the non-polar regions of the target and receptor are in close contact. Under these conditions, hydrophobic interactions and non-polar interaction energies will be large and will contribute significantly to the stability and strength of the bond. Under these conditions the target has fewer possible trajectories to escape from the receptor and its retention time will be prolonged.
  • P is used to estimate the contribution of the dispersion energy to the strength of the interaction. It is assumed that the dispersion energy will only be significant for uncharged, non-polar regions, and that it is only significant when the target and receptor are close to each other (i.e. within d of each other).
  • the values of k and d can be adjusted to alter the relative contribution of P and D. In general, P dominates for non-polar targets, whereas D is more significant for targets with large local dipoles. Hydrogen bonding is approximated by paired negatively and positively charged receptor units interacting simultaneously with target hydroxyl, carboxylic or amine functional groups.
  • the affinity approximation used in the current implementation could be replaced by functionally similar computations that preserve the relationship between local charges, dispersion energy and target-receptor separation.
  • affinity measures for charged targets could be constructed.
  • the present implementation evaluates only non- covalent interactions, however, the method could be expanded by including in the virtual receptor subunits capable of specific covalent bond-forming reactions with selected target functional groups.
  • Module 5 provides a sample flowchart of the preferred effective affinity calculation used in the present invention. (5) Assessment of Selective Affinity
  • Known values can be any index known or suspected to be dependent upon binding affinity, including (but not limited to) ED 50 , ID 50 , binding affinity, and cohesion measures. The values tested must be positive. Logarithmic transformation of the data may be required. Unweighted rank data cannot be used.
  • the optimal orientation of the targets for maximal binding affinity is unknown prior to testing.
  • each target In order to obtain a representative measure of the range of receptor-target affinity, each target must be tested repeatedly using different random orientations relative to the receptor surface.
  • Each test uses Module 4 to evaluate affinity. In general, the reliability of the maximal affinity values obtained depends upon the sample size, since it becomes increasingly likely that the sample will contain the true maximal value. The same set of target orientations is used for testing each receptor.
  • Two techniques are employed in the current implementation to circumvent the need for large sample sets for the generation of optimized receptors: 1) the use of a measure combining average (or sum) affinity and maximal affinity to select for receptors with higher selectivity; and 2) incremental increases in the number of orientations tested with successive iterations of the optimization process (optimization begins with a small set of target orientations, as receptors of greater fitness are generated, more orientations are tested).
  • the sum is calculated for the affinity values obtained for all the tested orientations of each target.
  • This sum affinity score is a measure of the average affinity between the receptor and the target.
  • the maximal affinity value is also determined.
  • Correlations between the known values and both the sum affinity r SA 2 and the maximal affinities r MA 2 are calculated.
  • the origin (0,0) is included in the correlation, based on the assumption that target compounds showing no activity should have little or no affinity for the virtual receptor. This assumption may not always be valid, and other intercept values may be required in some tests.
  • the correlation of using sum affinity is a measure of the average goodness of fit. If this correlation is large, but the correlation between maximal affinity and known affinity is weak, the result suggests that the virtual receptor is not selective, i.e. multiple orientations of the target can interact effectively with the receptor. Conversely, if the maximal affinity is highly correlated with known affinity values and the correlation with sum affinity is weak, the virtual receptor my be highly selective. If both sum affinity and maximal affinity are highly correlated with known affinity, it is probable that the orientations sampled have identified the response characteristics of the receptor with limited error (both type I and type II errors are reduced: the likelihood of either a false positive or false negative result) .
  • a joint correlation value is used as the basis for receptor selection. This value is calculated as the square root of the product of the sum affinity and maximal affinity
  • Module 5 provides a flowchart of a sample goodness of fit calculation.
  • the objective of the optimization process is to evolve a virtual receptor that has selective affinity for a set of target receptors.
  • a highly efficient mechanism for finding solutions is required, since the total number of possible genotypes containing 300 instructions is 7 300 or about 10 253 .
  • the following four phases summarize the steps in the optimization process whereinafter each phase is discussed in more detail and example calculations given.
  • PHASE 1 Generate a set of random genotypes and screen for a minimal level of activity. Use selected genotype as basis for further optimization using genetic algorithm (recombination) and unidirectional mutation techniques.
  • PHASE 2 Mutate selected genotype to generate a breeding population of distinct but related genotypes for recombinations. Chose most selective mutants from population from population for recombination.
  • PHASE 3 Generate new genotypes by recombination of selective mutants. Select from the resulting genotypes those with the highest affinity fitness. Use this subpopulation for the next recombinant or mutation generation.
  • PHASE 4 Take best recombination products and apply repeated point mutations to enhance selectivity.
  • the objective of the first stage in the optimization process is to generate a genotype with a minimal level of affinity for the target set. This genotype is subsequently used to generate a population of related genotypes.
  • a flowchart of a sample process for generation of a genotype with a minimum level of affinity is given in Module 6.
  • Mutation of the genotype comprises changing one or more characters in the code. Mutations in the current implementation do not alter the number of subunits comprising the receptor polymers and do not affect the length of the genotype. It will be appreciated that these conventions are arbitrary, and it will be understood that variants may have utility in some systems. Mutations can alter the folding pattern of the phenotype, with resulting changes in the receptor shape space and the location or exposure of binding sites. Mutations that affect the configuration of peripheral regions of the phenotype can result in shifts of the receptor center relative to the target center.
  • the objective of the second phase of the evolutionary process is the generation of a population of distinct but related genotypes derived from the primary genotype. Members of this population are subsequently used to generate recombinants .
  • This breeding population is created by multiple mutation of the primary genotype. The resulting genotypes are translated and screened for selectivity. The most selective products are retained for recombination.
  • Module 7 gives a flowchart for a sample process for multiple mutation of a genotype.
  • Module 8 provides a flowchart for a sample process for recombination of a genotype.
  • the current implementation retains the population used for recombination for testing in step 7 of Module 8. This ensures that genotypes with high selectivity are not replaced by genotypes with lower selectivity.
  • mutations are applied to 50% of the recombinant genotypes prior to testing (Step 7-Module 8). This step increases the variability within the recombinant population.
  • the test populations used in the current implementation range in size from 10 to 40 genotypes. This is a relatively small population size. Under some conditions, larger populations may be required.
  • the final stage in the optimization process mimics the maturation of antibodies in the mammalian immune system.
  • a series of single point mutations are applied to the genotype, and the effect on phenotypic fitness is evaluated. Unlike recombination, this process generally results in only small incremental changes to the selectivity of the phenotype.
  • the maturation process uses a Rechenberg (1+1) evolutionary strategy (Rechenberg, I. (1973), Evolutionsstrategie. F. Frommann. Stuttgart).
  • Rechenberg, I. (1973), Evolutionsstrategie. F. Frommann. Stuttgart Evolutionsstrategie. F. Frommann. Stuttgart.
  • the fitness of the parental genotype is compared to that of its mutation product, and the genotype with the greater selectivity is retained for the next generation. As a result, this process is strictly unidirectional, since less selective mutants do not replace their parents.
  • Module 9 shows a flowchart for non-limiting sample of maturation of a genotype.
  • the process of the present invention can be used in several areas including: 1) screening for compounds with selected pharmacological or toxicological activity; and 2) development of novel chemical structures with selected functional characteristics. Both applications and examples are provided hereinafter.
  • a population of receptors that have been evolved for selective affinity for a specific group of compounds sharing similar pharmacological properties can be used as probes for the identification of other compounds with similar activity, provided this activity is dependent upon binding affinity.
  • a population of receptors could be evolved to display specific affinity for salicylates. If the affinity of these receptors for salicylates closely correlates with the affinity of cyclooxygenase for salicylates, the receptors must at least partially mimic functionally relevant features of the binding site of the cyclooxygenase molecule. These receptors can therefore be used to screen other compounds for possible binding affinity with cyclooxygenase.
  • receptors could be evolved that mimic the specific binding affinity of steroid hormone receptors. These receptors could then be used to evaluate the affinity of pesticides, solvents, food additives and other synthetic materials for possible binding affinity prior to in vi tro or in vivo testing. Simulated receptors may also be constructed to detect affinity for alternate target sites, transport proteins or non-target binding.
  • compounds with high affinity may have deleterious side effects or may be unsuitable for chronic administration. In this case, compounds with lower binding affinity may be required. Techniques such as combinatorial synthesis do not readily generate or identify such compounds . In contrast, simulated receptors could be used to effectively screen for structures that display binding affinity of any specified level.
  • the selectivity of the simulated receptors can be used as a quantitative measure of molecular similarity.
  • fictitious test values of target affinities were chosen to demonstrate the ability of the receptor generation program to construct simulated receptors mimicking any arbitrarily chosen pattern of activity.
  • all receptors consists of 15 polymers.
  • Width, Length, and Depth values specify origin coordinates of the 15 polymers relative to the center of the receptor.
  • a simulated receptor was generated with the following specifications:
  • the affinity score -for the optimized receptor was 0.9358 which is relatively low.
  • the target substrates used to optimize the receptor were benzene, phenol, benzoic acid and o-salicylic acid.
  • the aspirin precursor o-salicylic acid is an inhibitor of prostaglandin synthesis by cyclooxygenase. Benzoic acid and phenol have much lower affinity for the same site.
  • the target affinity values and the scores for the receptor are shown in Table A below which shows that the simulated receptor has maximal affinity for o- salicylic acid .
  • a population of simulated receptors evolved for selective affinity to a set of target compounds with similar functional characteristics can be used to devise novel compounds with similar characteristics, provided these characteristics are closely correlated with the structure or binding affinity of the model compounds.
  • novel chemical structures can be evolved to optimally fit the receptors. Because these compounds must meet the necessary and sufficient requirements for receptor selectivity, these novel compounds are likely to also possess activity similar to that of the original molecular targets.
  • Step 3 is repeated until a compound with suitable affinity characteristics is obtained.
  • Encoding of the ligand phonotype (molecular structure) in the form of a linear genotype represented by a character string facilitates the processes of mutation, recombination and inheritance of the structural characteristics of the ligand during the evolutionary process.
  • the ligands evolved by the current implementation consist of substituted carbon skeletons.
  • Each code consists of three character vectors .
  • the primary code vector contains the turning instructions for the generation of the carbon skeleton and determines the position of each carbon atom in the skeleton.
  • the secondary code vector identifies the functional groups attached to each carbon atom.
  • the tertiary code vector specifies the position of the functional group relative to the host carbon.
  • Molecular skeletons combining atoms other than carbon e.g. ethers, amides and heterocycles
  • the carbon skeleton is constructed from a series of points which form the nodes of a three-dimensional tetrahedral coordinate system. During initial skeleton construction, the distance between nearest points is equal to the mean bond length between alkyl carbon atoms.
  • the primary code vector consists of characters identifying turning direction relative to the current atom position. Each turning direction specifies the coordinates of the next atom in the tetrahedral matrix.
  • Four directions (1,2,3,4) can be taken from each atom, corresponding to the unfilled valences of sp 3 carbon.
  • Each of the carbon atoms belongs to one of four possible states (A, B, C, D). These states correspond to the number of distinct nodes in the tetrahedral coordinate system.
  • the relationship between turn direction and the new coordinates for the next atom in the skeleton is given by the following tables.
  • the two tables B1 and B2 below embody the two turning conventions required to construct the ligands.
  • the boat convention results in the generation of a tetrahedral matrix in which closed 6-member rings (cyclohexanes) assume the boat configuration.
  • the chair convention results in the generation of a matrix in which cyclohexyl rings assume the chair configuration. It is possible to combine both conventions during code generation. Only the boat convention is used in the examples discussed here.
  • a secondary code vector of the same length as the primary code vector, is used to allocate the type of substituent attached to the carbon atom specified by the primary code vector.
  • Each substituent is identified by a single character. Substituents are added singly to the carbon skeleton. A single carbon atom can have more than one substituent, but only if it is specified more than once by the primary code.
  • a tertiary code vector of the same length as the primary code vector, is used to allocate the valence used for the attachment of the substituent specified by the secondary code vector.
  • the tertiary code consists of the characters 1, 2, 3, and 4 each of which refers to the turn directions specified for the primary code. Substituents are only allocated if the valence is not already occupied by either a carbon atom specified by the primary code vector or another previously allocated substituent. Alternatively, successive substituents could replace previously allocated substituents. 2) Code Creation
  • the primary code is constructed by creating a random sequence of characters belonging to the set ⁇ "1", “2", “3”, “4" ⁇ .
  • the creation of heterocyclic structures, ethers, amides, imides and carboxylic compounds is accomplished by substituting a carbon atom in the skeleton by a different atom specified by the secondary code.
  • the secondary code is generated from a random sequence of characters identifying substituent types.
  • the frequency of the characters can be random or fixed prior to code generation.
  • the tertiary code consists of characters belonging to the set ⁇ "1", “2", “3", “4" ⁇ .
  • Ring structures can be deliberately constructed (as opposed to random generation) by adding specific character sequences to the primary code. For example "431413" codes for a cyclohexyl ring. A total of 24 strings code for all possible orientations of cyclohexyl rings in the tetrahedral matrix.
  • Secondary and tertiary code vectors for the ring primary codes are generated as described previously.
  • Module 10 provides a flowchart of an example creation of code generating carbon skeletons with rings.
  • the relative positions of the entry and exit points from a ring comprising part of the carbon atom skeleton are dteremined by the length of the character sequences used to generate the ring. Specifically, if the sequences contains six characters, for example 431413, then the entry and exit point will be the same member of the ring. If the sequence is partially repeated and appended to the initial six characters, the entry point and exit point will not be the same member of the ring. For example, the sequences 4314134 and 43141343141 will generate rings with exit points at the members of the rings adjacent to the entry points.
  • rings are added to the skeleton by addding sequences of 6 or mroe characters to the code.
  • sequences 6 or mroe characters to the code.
  • the conventions presented for creating a novel ligand genotype can be used to encode other chemical structures in a linear format, either for storage or for introduction into the ligand evolutionary process.
  • a known pharmacophore can be encoded in linear format and used as the starting point for evolving novel ligands with similar or enhanced functional properties.
  • sets of pharmacophores interacting with a common target site can be encoded in linear format and used for recombination.
  • the code vectors are converted into three-dimensional representations of ligands in a translation process consisting of three discrete steps.
  • the carbon atom skeleton is constructed using the primary code.
  • substituents are added to the carbon skeleton using the instructions from the secondary and tertiary code vectors .
  • Instructions from the secondary and tertiary code vectors may also specify replacement of carbon atoms in the skeleton with different atoms.
  • Instructions from the secondary and tertiary codes may also change the number and orientation of available valences present on acarbon or other atom forming part of the primary skeleton. For example, addition of carbonyl oxygen occupies two empty valences.
  • all valences not filled by substituents during the second step are filled with hydrogen atoms (unless otherwise specified).
  • Primary decoding uses the turning instructions from the primary code vector to specify the positions of each carbon atom.
  • the first atom is assumed to be located at the origin of the coordinate system.
  • the first atom is assumed to occupy state A in the matrix.
  • Decoding proceeds sequentially.
  • the result of the primary decoding process is a 3 x n matrix containing the x, y, and z coordinates of each of the n carbon atoms in the skeleton. Because loops and reversals are permitted, the same position in space may be occupied by more than one carbon. In these cases, only one carbon atom is assumed to occupy the position. As a result, the number of carbon atoms forming the completed skeleton may be less than the number of characters in the primary code vector.
  • a list is constructed from the secondary code that identifies the substituents attached to each carbon position.
  • a parallel list is constructed using the tertiary code to specify the valence occupied by each substituent.
  • Substituents are added sequentially to each carbon atom based on the list generated from the secondary code during primary decoding.
  • the corresponding value from the tertiary code is used to specify the valence position of the substituent relative to the host carbon. If the position is already occupied by either an adjacent carbon atom, or a previously specified substituent, the substitution is not carried out. Alternatively, a decoding process could be constructed in which the substitution is carried out at the next unoccupied position or the substitution replcases a previously specificed substituent.
  • the distance between the substituent and the carbon atom is calculated from look up tables of bond lengths. The position data and bond lengths are used to calculate the coordinates of the substituent. In the case of multi-component substituents, such as hydroxyl, nitro, and amino groups, the coordinates for each atom in the substituent are calculated relative to the host carbon.
  • a single carbon atom can have more than one non- hydrogen substituent. This can occur if the same position is specified more than once by the primary code vector.
  • the current implementation does not incorporate multiple substitutions using the secondary code directly, although this can be readily implemented.
  • a list is compiled of the type, radius, and position of all the atoms comprising the ligand. This list is the basis for subsequent target generation.
  • the feasibility of the structure generated from the code sequence is not evaluated.
  • the atomic coordinates may be entered into energy minimization programs to create more realistic structures.
  • no assumptions are made concerning the configuration of the ligand during binding.
  • the current implementation preserves the structural uniqueness of specific configurations of the same molecule. For example, the current implementation distinguishes between three rotational isomers of butane, and treats each isomer as a unique molecule.
  • the code vectors constitute the genotype of the corresponding ligand, and can be subjected to mutation and recombination with resulting changes in ligand structure.
  • the ligand structure itself is the phenotype used to evaluate binding affinity with a selected population of virtual receptors.
  • Chemical structures or target ligands are initially constructed from randomly generated codes. Following decoding, the coordinates, radii, dipole moments and polarizabilities of each atom in the target ligand are obtained from look up tables of value and used to evaluate the binding affinity between the ligand and a selected population of virtual receptors. The affinity of the target for each of the virtual receptors is tested for many orientations of the target relative to the receptor surfaces. No assumptions are made concerning the relative orientations of the ligand and simulated receptor. Prior to the evaluation of binding affinity, the target and receptor must be brought into contact. The method of target presentation and calculation of affinity between the chemical structures and simulated receptors is essentially the same as discussed above in Module 4 between known target molecules and the simulated receptors .
  • the binding affinity of the target ligand for each of the simulated receptors used for fitness evaluation is calculated using the same effective affinity calculation method described for simulated receptor generation using the target molecules.
  • affinity calculations using other criteria can be incorporated into the fitness testing process but the efficacy and computational efficiency of the present invention relies in part on using the same effective affinity calculation for virtual receptor generation and generation of the chemical structures using the simulated receptor populations.
  • Goodness of fit between a selected population of simulated receptors and a novel ligand or chemical structure is evaluated by comparing the target activity or affinity values for the ligand with those obtained for the simulated receptor-ligand complexes.
  • the maximal affinities of an optimally selective virtual receptor should be strongly correlated with the target affinity measures. Successive iterations of the evolutionary process are used to enhance this correlation.
  • the target values can be set to any level of binding affinity. It is not required that the ligand have the same binding affinity for all the virtual receptors used in the selection process.
  • the maximal binding affinities of the optimized virtual receptors for known substrates are used to calculate target binding affinities.
  • the target affinities may be set to 90% of the binding affinity of each member of the virtual receptor population for a specific substrate.
  • the target binding affinity may be set to zero if the interaction between the ligand and the virtual receptor is to be minimised.
  • Ligand fitness measures the match between calculated ligand binding affinities and the target affinity values. The optimization process maximizes ligand fitness.
  • each novel ligand must be tested repeatedly using different random orientations relative to the receptor surface.
  • Each test uses Module 4 discussed in Part A to evaluate affinity.
  • the reliability of the maximal affinity values obtained depends upon the sample size, since it becomes increasingly likely that the sample will contain the true maximal value.
  • the sum is calculated for the affinity values obtained for all the tested orientations of each ligand.
  • This sum affinity score is a measure of the average affinity between the receptor and the ligand.
  • the maximal affinity value is also determined.
  • target max affinity f x maximal affinity of the most potent substrate used for virtual receptor generation
  • target sum affinity f x sum affinity of the most potent substrate used for virtual receptor generation
  • f a scaling factor
  • the fitness scores of each ligand-simulated receptor pair are summed.
  • fitness is maximized when the sum of the fitness scores is zero.
  • fitness is also maximized when the sum of the fitness scores is zero.
  • Other methods for example the use of a geometric mean, could also be used to measure the total fitness of a ligand tested against a series of simulated receptors.
  • PHASE 1 Generate a set of random genotypes coding for ligands and screen against a set of simulated receptors to select ligands exceeding a threshold level of fitness.
  • PHASE 3 Generate new genotypes by recombination of selective mutants. Select from the resulting genotypes those with the highest affinity fitness. Use this subpopulation for the next recombinant (repeat PHASE 3) or mutation (repeat PHASE 4) generation.
  • the objective of the first stage in the optimization process is to generate a genotype and corresponding ligand phenotype with a minimal level of fitness.
  • This genotype is subsequently used to generate a population of related genotypes.
  • the Genetic Algorithm developed by Holland can be used to search for optimal solutions to a variety of problems. Normally this technique is applied using large, initially random sets of solutions. In the present implementation the technique is significantly modified in order to reduce the number of tests and iterations required to find ligands with high selective affinity. This has been accomplished by using a set of closely related genotypes as the initial population and the application of high rates of mutation at each iteration. For any set of target compounds it is possible to develop distinct ligands with optimal affinity characteristics.
  • receptors may bind optimally to the same targets but in different orientations.
  • the use of an initial population of closely related genotypes increases the likelihood that the optimization process is converging on a single solution. Recombination of unrelated genotypes, although it may generate novel genotypes of increased fitness, is more likely to result in divergence.
  • the objective of the second phase of the evolutionary process is the generation of a population of distinct but related genotypes derived from the primary genotype. Members of this population are subsequently used to generate recombinants.
  • This breeding population is created by multiple mutation of the primary genotype. The resulting genotypes are translated and screened for selectivity. The most selective products are retained for recombination.
  • Ligands are subjected to mutation by changing characters in the genotypes (code vectors) encoding their structures. These mutations change the shape of the ligand, as well as functional group placement and functional group types present on the ligand. Mutations in the current implementation can alter the number of carbons comprising the ligand skeleton. Module 11 is a flowchart of a sample process for multiple point mutation. Mutations can alter the folding pattern of the ligand phenotype, with resulting changes in shape and the location or exposure of functional groups. Mutations that affect the configuration of peripheral regions of the ligand phenotype can result in shifts in position relative to the receptor center.
  • Sequence mutations do not change code characters directly. Instead the sequence of characters in the code is rearranged. Sequence mutations can alter the size of the ligand, the structural configuration and presence and location of functional groups. Four types of sequence mutation are used in the current implementation:
  • Module 12 provides a flowchart of a sample sequence mutation.
  • PHASE 3 Generation of Recombinant Code
  • Module 13 provides a flowchart for a sample procedure for recombination.
  • the current implementation retains the population used for recombination for testing. This ensures that genotypes with high selectivity are not replaced by genotypes with lower fitness.
  • multiple mutations are applied to 50% of the recombinant genotypes prior to testing. This process increases the variability within the recombinant population.
  • the test populations used in the current implementation range in size from 10 to 40 genotypes. This is a relatively small population size. Under some conditions, larger populations may be required.
  • the final stage in the optimization process mimics the maturation of antibodies in the mammalian immune system.
  • a series of single point mutations are applied to the genotype, and the effect on phenotypic fitness is evaluated. Unlike recombination, this process generally results in only small incremental changes to the selectivity of the phenotype.
  • the maturation process uses a Rechenberg (1+1) evolutionary strategy. At each generation the fitness of the parental genotype is compared to that of its mutation product, and the genotype with the greater selectivity is retained for the next generation. As a result, this process is strictly unidirectional, since less selective mutants do not replace their parents. During each iteration of the maturation process, only a single instruction in the code is changed in the present implementation.
  • Module 14 provides a flowchart for a sample maturation process.
  • PHASE 2 multiple mutations
  • the mosquito Aedes aegypti is repelled by benzaldehyde and, to a much smaller degree, by benzene and toluene (Table 1). This species is not repelled significantly by cyclohexane or hexane (Table 1).
  • the method is used to generate, ab ini tio, compounds that will be similar in repellent activity to benzaldehyde.
  • simulated receptors were constructed with high affinity for benzaldehyde and low affinity for benzene.
  • ligands are evolved with binding affinities for the simulated receptors similar to that of benzaldehyde.
  • Mosquitoes were lab-reared, 7-14 days post-emergence and unfed. Experiments were conducted over six day periods at 20°C under fluorescent lighting. Tests were run between 12:00 and 17:00 EDT. The test populations in the four sets of trials consisted of 200, 175, 105 and 95 females. Mosquitoes were provided with drinking water. The tests were conducted in a 35 x 35 x 35 cm clear Plexiglas box with two screened sides forming opposite walls. The screening consisted of two layers: an inner layer of coarse plastic mesh and an outer layer of fine nylon mesh. The box was placed in a fumehood such that air entered one of the screened sides and exited through the opposite side. Air flow was ⁇ 0.5cm/s.
  • the mosquitoes landed on the walls of the box, oriented head upwards. Triangular pieces (4 x 4 x 1 mm) of Whatman #1 filter paper were used to present the stimulant compounds. The tips were dipped into the test solution to a depth of 0.5 cm and used immediately. Responses to the test solutions were determined as follows:
  • the treated filter paper tip was placed against the outside of the screen and positioned opposite the mesothoracic tarsus of the mosquito. In all cases the initial approach was made from below the position of the mosquito.
  • the tip was held in position for a maximum of 3 s and the response of the mosquito was noted.
  • Contralateral leg lifting the mosquito raised the mesothoracic leg on the opposite side from the stimulus source.
  • Two simulated receptors were generated using the same selection criteria. Each receptor was used independently to generate a set of ligands.
  • a receptor was evolved with selective affinity for benzaldehyde.
  • the training targets were benzene and benzaldehyde. Fifteen orientations of each target were used to calculate affinity values.
  • the affinity score for the receptor was 0 . 992
  • the optimized simulated receptor was used as a template for the evolution of novel ligands.
  • Four different ligands were assembled by random mutation and selection. Ligands were selected for similarity with benzaldehyde.
  • Evolved ligands 1.1 to 1.4 are shown in Figure 4b. At least one orientation of each ligand was structurally similar to benzaldehyde.
  • a 25 x 6 x 7 receptor was evolved with selective affinity for benzaldehyde.
  • the training targets were benzene and benzaldehyde. Fifteen orientations of each target were used to calculate affinity values.
  • the affinity score for the receptor was 0.996
  • the optimized simulated receptor was used as a template for the evolution of novel ligands.
  • Four different ligands were assembled by random mutation and selection. Ligands were selected for similarity with benzaldehyde.
  • Evolved ligands 2.1 to 2.4 are shown in Figure 4c At least one orientation of each ligand was structurally similar to benzaldehyde.
  • Ligand 2.2 is 5-Chloro-2, 7-nonadione and ligand 2.3 is 2-cyano-5-hexanone.
  • Ligand 1.4 contains a fragment corresponding in structure to methyl cyclohexyl ketone. Experiments testing the repellency of cyclohexanone, menthone, methyl cyclohexyl ketone and 2-octanone (see Figure 4a) suggest that these ligands will also be repellent to mosquitoes (Table E2).
  • the method disclosed herein of designing new chemical structures exhibiting preselected functional characteristics or properties has been described by example only.
  • the method may be readily practise using other known or acceptable values for polarizabilities, dipole moments, covalent radii and the like.
  • the flowcharts giving process calculation steps in the modules are meant to be illustrative only.
  • the calculation of affinity may be carried out using available computational packages using fewer approximations than used herein.
  • the method of generating new chemical structures has relied upon first generating one or more simulated receptors exhibiting a preselected affinity for known target compounds with similar functional characteristics and using these receptors to generate the novel structures exhibiting these characteristics to whatever degree is desired.
  • the receptors themselves may be used for other applications besides generating novel chemical structures, for example as a means of screening for pharmaceutical or toxicological properties of known compounds .
  • novel chemical structures for example as a means of screening for pharmaceutical or toxicological properties of known compounds .
  • Van der Waals 110 140 150 150 170 180 180 190 190 200
  • Each target atom is described fully by a set of eight values ⁇ x i , y i , z i , r i , br i , cr i , d i , ⁇ i ⁇
  • i x, i y and i z are the positional coordinates relative to the geometric center of the molecule
  • r i the van der Waals radius
  • br i the bond or covalent radius
  • ⁇ i the polarizability
  • d i the effective dipole moment value.
  • MODULE 1 CODE GENERATION FOR SIMULATED RECEPTORS
  • Step 1 Input code generation parameters: i) code length;
  • Step 2 Initialize empty character string to store code.
  • Step 3 Generate random number.
  • Step 4 Based on random number and instruction frequency, select a character ⁇ '0', '1', ... , '6' ⁇ to
  • Step 5 Output code.
  • Step 1 Input origin coordinates for polymers comprising receptor.
  • Step 2 Input code for polymer.
  • Step 3 Read first character from code.
  • Step 4 If character is a turning instruction, use of
  • Step 5 Store subunit coordinates. Assign a charge value of 0 to subunit
  • Step 6 If character is not the last character in code, repeat step 3 otherwise step.
  • Step 7 If character is a charge instruction, use of
  • Step 8 Store subunit coordinates. Assign charge value of
  • Step 9 If character is not the last character in code, repeat step 3 otherwise step.
  • Step 10 Repeat steps 2 to 9 for each of the polymers
  • Step 11 Output coordinates and charge values of subunits.
  • MODULE 3 TARGET PRESENTATION
  • Step 1 Input coordinates and radii of target atoms (xt i , yt i , zt i , radius i )
  • Step 2 Generate random angular ( ⁇ , ⁇ )and translation values (k x ,k y ).
  • Step 3 Rotate and translate atomic coordinates by random amounts.
  • Step 3a Convert target coordinates to polar form
  • Step 3c Convert to rectangular coordinates
  • Step 4 Center target coordinates on origin (0,0,0).
  • Step 4a Find maximum and minimum values of xn i , yn i and zn i .
  • Step 4c Calculate centered coordinates:
  • Step 5 Use atomic radii and transformed coordinates
  • Step 5b For each atom (i) set the g(x g ,y g ) (height) value of each grid point (x g ,y g ) according to the
  • Step 6 Center receptor coordinates on origin (0,0).
  • Step 6a Find maximum and minimum values of xr j ,yr j and zr j .
  • Step 6b Find geometric center of receptor:
  • xr center (xr maximum - xr minimum ) /2
  • yr center (yr maximum - yr minimum ) /2
  • Step 6c Calculate centered receptor coordinates:
  • Step 8 Find minimal separation between collision surface of receptor and collision surface of the target. Calculate difference matrix d(x g ,y g ) as follows for all
  • d min is the minimal separation distance
  • Step 9 Transform target and receptor coordinates for
  • Step 10 Use (xtarget i , ytarget i , ztarget i ) and
  • Step 1 Input collision coordinates of target and receptor
  • Step 3 Input threshold value for proximity calculation:
  • Step 4b Calculate the sum of e(i,j) for all combinations of i and j with charge (j) ⁇ 0 .
  • Step 5 Calculate proximity value (this step could be
  • Step 5a For each target atom with
  • Step 5b Calculate the sum of prox(i,j) for all
  • PROXIMITY ⁇ prox(i,j)
  • Step 6 Calculate affinity value for target substrate
  • Step 1 Input known target efficacy or affinity values
  • Step 2 Input collision coordinates of targets and
  • Step 3 Input number of target orientations to be tested
  • Step 5 Determine maximum affinity (MA k ) and sum affinity
  • Step 6 Calculate correlation coefficients r MA 2 for maximum affinity (MA k ) vs known target efficacy or affinity values (y k ) and r SA 2 for sum affinity
  • Step 6' Calculate correlation coefficients r MA 2 for maximum affinity (MA k ) vs known target efficacy or affinity values (y k ) and r SA-MA 2 for sum affinity (SA k ) - maximal affinity vs known target efficacy or affinity values (y k ).
  • MODULE 6 GENERATE GENOTYPE WITH MINIMAL LEVEL OF AFFINITY
  • Step 1 Set minimal fitness threshold
  • Step 2 Generate random genotype (Module 1)
  • Step 3 Translate genotype to construct phenotype (Module
  • Step 5 If the fitness of the phenotype exceeds the
  • Step 1 Input primary code (from phase 1).
  • Step 4 Select a position in the genotype at random.
  • Step 5 Replace the code character at that position with a different character chosen at random.
  • Step 6 Repeat steps 4 and 5 until q times.
  • Step 7 Repeat steps 4-6 to generate a total of p new codes.
  • Step 8 Apply Modules 1-6 to test fitness of mutant
  • Step 2 Select two codes at random from population
  • Step 3 Select a position in the genotype at random.
  • Step 4 Generate a random number for the number of
  • Step 5 Swap characters between codes beginning at
  • Step 6 Repeat steps 2-5 until P new genotypes have been generated.
  • Step 7 Apply Modules 2-6 to test fitness of mutant
  • Step 1 Input parental code derived from Phase 3.
  • Step 2 Set number of iterations.
  • Step 3 Select a position in the parental genotype at
  • Step 4 Replace the code character at that position with a different character chosen at random.
  • Step 5 Test selectivity of parental code (F P ) and
  • Step 7 Repeat steps 3-6 for required number of
  • Step 3 Create character strings.
  • Step 4a If prob_ring > random (0 ⁇ random ⁇ 1) Then
  • Step 4b Assignment of single (non-ring) characters for
  • Step 4c Concatenate new characters to code strings
  • Prime_code Prime_code & new_character_1
  • Second_code Second_code & new_character_2
  • Third_code Third_code & new_character_3 MODULE 11: MULTIPLE POINT MUTATION
  • Step 1 Input primary code.
  • Step 4 Select a position in the genotype at random.
  • Step 5 Replace the code characters at that position in each of the code vectors with different characters chosen at random.
  • Step 6 Repeat steps 4 and 5 until q times.
  • Step 7 Repeat steps 4-6 to generate a total of p new
  • Step 8 Test the fitness of each member of the mutant
  • Step l Set P DEL , P INV , P INS , and P DUP as threshold levels for the occurrence of mutations ( 0 ⁇ P x ⁇ 1).
  • Step 4 Copy sequence from code starting at x and
  • Step 5 If 0 ⁇ P INV ⁇ Random Number ⁇ 1 Then
  • Step 7 If 0 ⁇ P DEL ⁇ Random Number ⁇ 1 Then
  • Step 8 If 0 ⁇ P INS ⁇ Random Number ⁇ 1 Then
  • Step 2 Select two codes at random from population
  • Step 3 Select a position in the genotype at random.
  • Step 4 Generate a random number for the number of
  • Step 5 Swap characters between each of the three code vectors beginning at selected position.
  • Step 6 Repeat steps 2-5 until P new genotypes have been generated.
  • Step 7 Test the fitness of each ligand in the resulting mutant population. Select subpopulation with highest fitness for next recombination series or for maturation.
  • Step 1 Input parental code derived from recombination.
  • Step 2 Set number of iterations
  • Step 3 Select a position in the parental genotype at
  • Step 4 Replace the code characters at those positions in each of the code vectors with a different
  • Step 5 Test fitness of parental code (F P ) and mutation product (F M ) using Modules 4 and 5.
  • Step 7 Repeat steps 3-6 for required number of

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Analytical Chemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Organic Chemistry (AREA)
  • Computing Systems (AREA)
  • Biochemistry (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Organic Low-Molecular-Weight Compounds And Preparation Thereof (AREA)
  • Saccharide Compounds (AREA)

Abstract

La présente invention concerne un procédé informatique de conception de structures chimiques ayant en commun des caractéristiques fonctionnelles, et ce, à partir de combinaisons de leur configuration stérique et de leur affinité à se lier. L'invention concerne plus spécifiquement un procédé de production de récepteurs simulés informatiquement et qui imitent fonctionnellement des récepteurs biologiques. Les récepteurs simulés sont conçus pour faire preuve d'une affinité sélective optimisée pour des molécules cibles connues. Le procédé consiste ensuite à générer les structures sélectives et à les faire évoluer de façon qu'elles fassent preuve d'une affinité sélective pour les récepteurs simulés.
PCT/CA1996/000166 1996-03-22 1996-03-22 Procede informatique de conception de structures chimiques ayant en commun des caracteristiques fonctionnelles WO1997036252A1 (fr)

Priority Applications (6)

Application Number Priority Date Filing Date Title
AU49350/96A AU712188C (en) 1996-03-22 Computational method for designing chemical structures having common functional characteristics
PCT/CA1996/000166 WO1997036252A1 (fr) 1996-03-22 1996-03-22 Procede informatique de conception de structures chimiques ayant en commun des caracteristiques fonctionnelles
NZ332332A NZ332332A (en) 1996-03-22 1996-03-22 Computational method for designing chemical structures sharing common functional characteristics based on specific combinations of steric configuration and binding affinity particularly making receptors for known target molecules
EP96905638A EP0888591A1 (fr) 1996-03-22 1996-03-22 Procede informatique de conception de structures chimiques ayant en commun des caracteristiques fonctionnelles
EA199800843A EA001095B1 (ru) 1996-03-22 1996-03-22 Компьютерный способ создания химических структур, имеющих общие функциональные характеристики
JP09533880A JP2000507940A (ja) 1996-03-22 1996-03-22 共通の機能特性を有する化学構造をコンピューターによって設計する方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CA1996/000166 WO1997036252A1 (fr) 1996-03-22 1996-03-22 Procede informatique de conception de structures chimiques ayant en commun des caracteristiques fonctionnelles

Publications (1)

Publication Number Publication Date
WO1997036252A1 true WO1997036252A1 (fr) 1997-10-02

Family

ID=4173144

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA1996/000166 WO1997036252A1 (fr) 1996-03-22 1996-03-22 Procede informatique de conception de structures chimiques ayant en commun des caracteristiques fonctionnelles

Country Status (4)

Country Link
EP (1) EP0888591A1 (fr)
JP (1) JP2000507940A (fr)
EA (1) EA001095B1 (fr)
WO (1) WO1997036252A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998038208A2 (fr) * 1997-02-28 1998-09-03 Bearsden Bio, Inc. Procede d'evaluation des interactions proteine-ligand par modelisation informatique
WO2002041184A1 (fr) * 2000-11-14 2002-05-23 Kyorin Pharmaceutical Co., Ltd. Procede de recherche d'un nouveau compose tete de serie
WO2002044112A1 (fr) * 2000-11-30 2002-06-06 Toyo Suisan Kaisha, Ltd Procede de conception de la structure moleculaire d'un inhibiteur d'enzyme
US8108151B2 (en) 2004-01-27 2012-01-31 National Institute Of Information And Communications Technology Method and apparatus for chemical genetic programming
US9440097B2 (en) 2005-09-12 2016-09-13 Givaudan Sa Organic compounds
WO2018093538A1 (fr) * 2016-11-16 2018-05-24 Merrithew Paul Système et procédé de calcul de la structure et des propriétés de produits chimiques

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005529158A (ja) * 2002-05-28 2005-09-29 ザ・トラスティーズ・オブ・ザ・ユニバーシティ・オブ・ペンシルベニア 両親媒性ポリマーのコンピュータ分析および設計のための方法、システムおよびコンピュータプログラム製品

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5025388A (en) * 1988-08-26 1991-06-18 Cramer Richard D Iii Comparative molecular field analysis (CoMFA)
WO1995001606A1 (fr) * 1993-06-30 1995-01-12 Daylight Chemical Information Systems, Inc. Procede et appareil de conception de molecules ayant les proprietes voulues par developpement de populations successives

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5025388A (en) * 1988-08-26 1991-06-18 Cramer Richard D Iii Comparative molecular field analysis (CoMFA)
WO1995001606A1 (fr) * 1993-06-30 1995-01-12 Daylight Chemical Information Systems, Inc. Procede et appareil de conception de molecules ayant les proprietes voulues par developpement de populations successives

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MOON ET AL: "computer design of bioactive molecules : a method for receptor-based de novo ligand design", PROTEINS : STRUCTURE, FUNCTION, AND GENETICS, vol. 11, 1991, pages 314 - 328, XP000560842 *
VENKATASUBRAMANIAN ET AL: "evolutionary design of molecules with desired properties using the genetic algorithm", JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, vol. 35, 1995, pages 188 - 195, XP000576025 *
WLTERS ET AL: "genetically evolved receptor models :a computational approach to construction of receptor models", JOURNAL OF MEDICINAL CHEMISTRY, vol. 37, 1994, pages 2527 - 2536, XP000608151 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998038208A2 (fr) * 1997-02-28 1998-09-03 Bearsden Bio, Inc. Procede d'evaluation des interactions proteine-ligand par modelisation informatique
WO1998038208A3 (fr) * 1997-02-28 1998-11-05 Bearsden Bio Inc Procede d'evaluation des interactions proteine-ligand par modelisation informatique
WO2002041184A1 (fr) * 2000-11-14 2002-05-23 Kyorin Pharmaceutical Co., Ltd. Procede de recherche d'un nouveau compose tete de serie
WO2002044112A1 (fr) * 2000-11-30 2002-06-06 Toyo Suisan Kaisha, Ltd Procede de conception de la structure moleculaire d'un inhibiteur d'enzyme
US8108151B2 (en) 2004-01-27 2012-01-31 National Institute Of Information And Communications Technology Method and apparatus for chemical genetic programming
US9440097B2 (en) 2005-09-12 2016-09-13 Givaudan Sa Organic compounds
US10512599B2 (en) 2005-09-12 2019-12-24 Givaudan Sa Organic compounds
WO2018093538A1 (fr) * 2016-11-16 2018-05-24 Merrithew Paul Système et procédé de calcul de la structure et des propriétés de produits chimiques

Also Published As

Publication number Publication date
EA001095B1 (ru) 2000-10-30
EA199800843A1 (ru) 1999-02-25
EP0888591A1 (fr) 1999-01-07
AU4935096A (en) 1997-10-17
JP2000507940A (ja) 2000-06-27
AU712188B2 (en) 1999-10-28

Similar Documents

Publication Publication Date Title
US6219622B1 (en) Computational method for designing chemical structures having common functional characteristics
US5699268A (en) Computational method for designing chemical structures having common functional characteristics
Manallack et al. Neural networks in drug discovery: have they lived up to their promise?
JP6975140B2 (ja) 畳み込みネットワークを空間データに適用するためのシステム及び方法
Das et al. Real-parameter evolutionary multimodal optimization—A survey of the state-of-the-art
Judson Genetic algorithms and their use in chemistry
Gasteiger et al. Neural networks as data mining tools in drug design
WO1997036252A1 (fr) Procede informatique de conception de structures chimiques ayant en commun des caracteristiques fonctionnelles
US20020052694A1 (en) Pharmacophore fingerprinting in primary library design
Yan Application of self-organizing maps in compounds pattern recognition and combinatorial library design
Skolnick et al. Computational studies of protein folding
AU712188C (en) Computational method for designing chemical structures having common functional characteristics
Gordon et al. Bias and scalability in evolutionary development
Bernard et al. Comparison of chemical databases: Analysis of molecular diversity with Self Organising Maps (SOM)
Zaman et al. Using subpopulation EAs to map molecular structure landscapes
WIERZCHOŃ Function optimization by the immune metaphor
Mekenyan et al. COREPA‐M: A Multi‐Dimensional Formulation of COREPA
Komosinski et al. Quantitative measure of structural and geometric similarity of 3D morphologies
CA2248426A1 (fr) Methode de calcul pour concevoir des structures chimiques partageant des caracteristiques fonctionnelles
NZ332332A (en) Computational method for designing chemical structures sharing common functional characteristics based on specific combinations of steric configuration and binding affinity particularly making receptors for known target molecules
CA2247391A1 (fr) Procede informatique de conception de structures chimiques ayant en commun des caracteristiques fonctionnelles
Hasegawa et al. 3D-QSAR study of antifungal N-myristoyltransferase inhibitors by comparative molecular surface analysis
KR20000004909A (ko) 공통의 기능특성을 갖는 화학구조를 컴퓨터로 설계하는 방법
Szántai-Kis et al. Validation subset selections for extrapolation oriented QSPAR models
Van Kampen The applicability of genetic algorithms to complex optimisation problems in chemistry

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 96180229.4

Country of ref document: CN

AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BB BG BR BY CA CH CN CZ DE DK EE ES FI GB GE HU IS JP KE KG KP KR KZ LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TR TT UA UG UZ VN AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): KE LS MW SD SZ UG AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 1996905638

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2247391

Country of ref document: CA

Ref document number: 2247391

Country of ref document: CA

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: PA/a/1998/007703

Country of ref document: MX

Ref document number: 1019980707454

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 332332

Country of ref document: NZ

WWE Wipo information: entry into national phase

Ref document number: 199800843

Country of ref document: EA

WWP Wipo information: published in national office

Ref document number: 1996905638

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWP Wipo information: published in national office

Ref document number: 1019980707454

Country of ref document: KR

WWW Wipo information: withdrawn in national office

Ref document number: 1019980707454

Country of ref document: KR

WWW Wipo information: withdrawn in national office

Ref document number: 1996905638

Country of ref document: EP