CA2247391A1

CA2247391A1 - Computational method for designing chemical structures having common functional characteristics

Info

Publication number: CA2247391A1
Application number: CA002247391A
Authority: CA
Inventors: Jonathan M. Schmidt
Original assignee: Individual
Current assignee: University of Guelph
Priority date: 1996-03-22
Filing date: 1996-03-22
Publication date: 1997-10-02

Abstract

The present invention relates to computational methods for designing chemical structures sharing common useful, functional properties based on specific combinations of steric configuration and binding affinity. More particularly the present invention provides a method for producing computer-simulated receptors which functionally mimic biological receptors. The simulated receptors are designed to exhibit optimized selective affinity for known target molecules. Chemical structures are then generated and evolved to exhibit selective affinity for the simulated receptors.

Description

W ~ 97/362~2 PCT/CA96/00166 'lAO~ FOR ~-~TGNING ~W~ffT~T- ~KU~
ON rUN~ltnN~T- ~T~T~TTCS

FT~Tn OF ~ ~ l~v~N-L~ON
- The present invention relates to a computer-based methods for designing chemical structures sharing common useful, functional properties based on specific combinations of steric configuration and binding affinity. More particularly the present invention provides a method for producing computer-simulated receptors which functionally mimic biological receptors. The simulated receptors are designed to exhibit optimized selective affinity for known target molecules.
Chemical structures are then generated and evolved to exhibit selective affinity for the simulated receptors R~t'R~l~ t~TlNl- OF ~ ~ l~v~NL~ON
Biological receptors are linear polymers o~ either amino acids or nucleotides that are folded to create three-~;m~n~ional envelopes for substrate binding. The speci~ic three-~;m~n~ional arrangements of these linear arrays, and the placement of charged sites on the envelope surface are the products of evolutionary selection on the basis of functional efficacy.
The selectivity of biological receptors depends upon differences in the strength of attractive and repulsive ~orces generated between the receptor and the substrate. The magnitude of these ~orces varies in part with the magnitude and proximity of charged sites on the receptor and substrate sur~aces.
Because substrates dif~er in the number and magnitude of the charged sites present or induced on their surfaces, as well as the spatial arrangement of these sites, binding affinity can vary with substrate structure. Substrates with similar binding affinities for the same receptor have a high likelihood of sharing a common spatial arrangement of at least some of their induced and fixed charged sites. I~ the function of the receptor is correlated with binding affinity, then substrates with similar binding affinities will also be functionally similar in ., 1 their e~fects. It is in this sense the receptor can be sai~ to recognize or quantify similarities between the substrates.
Traditional methods used in molecular recognition to identify or discover novel chemical compounds or substrates for selective binding affinity to receptors are based on ~inding molecular common subgraphs o~ active substrates and using these to predict new, similar compounds. A drawback to this technique is that it presupposes substrates exhibiting a similar efficacy for binding are structurally similar. In many cases however structurally dissimilar substrates can exhibit similar binding affinities for the same receptor. More current techniques based on quantitative structure-activity relationships (QSAR) are suited only to developing novel compounds within the same structural class and is largely inadequate at developing new molecular structures exhibiting the desired selective a~inity, see for example Dean, Philip M., "Molecular Recognition: The Measurement and Search For Molecular Similarity in Ligand-Receptor Interaction", in Concepts and Applications of Molecular Similarity, Ed. Mark A. Johnson and Gerald M. Maggiora, pp. 211-238 (1990).
Recent ef~orts have been directed at the constructicn of atomic models of either pseudoreceptors, in which atoms and functional groups are connected, or minireceptors, comprised of unconnected sets of atoms or functional groups (Snyder, J.P.
(1993) In 3D QSAR in Drug Design: Theory, Methods and Applications; Kubinyi, H. Ed.; Escom, Leiden. P. 336). Related methods involve surrounding known target ligands with a number of model atoms and calculation of the intermolecular forces generated between the ligand and the receptor model. Such models have a high correlation between calculated binding energy and biological activity (Walters, D.E. and Hinds, R. M. (1994) J.
Medic. Chem. 37: 2527) but have not been developed to the point where novel chemical structures exhibiting selective a~finity for the receptor models can be produced. d W O 97136252 PCT/CAg61~0166 Therefore, it would be very advantageous to provide a method for identifying non-trivial similarities between dif~erent chemical structures which are both su~icient and necessary to account for their shared properties which can then be used as the basis for the design of new chemical structures with useful functional properties based on specific combinations o~ steric configuration and binding affinity.

~g~p~Y OF T~ ~Wv~ oN
The present invention provides a method for identifying non-trivial similarities between different chemical structures which are both necessary and su~ficient to account for their shared ~unctional properties. The process also provides a method of generating novel chemical structures that display similar ~unctional properties.
The basic concept underlying the present invention is the use of a two-step computational process to design or discover chemical structures with use~ul ~unctional properties based on specific combinations of steric con~iguration and binding affinity. In the first step of this process an algorithmic emulation o~ antibody ~ormation is used to create a population of computer-generated simulated receptors that mimic biological receptors with optimized binding affinity for selected target substrates. In the second stage of the process the simulated or virtual receptors are used to evaluate the binding affinity of existing compounds or to design novel substrates with optimal binding.
The method described herein provides simulated receptors which mimic selected features of biological receptors, including the evolutionary processes that optimize their binding selectivity. The mimics or simulated receptors generated by the method can be used to recognize specific similarities between molecules. Like antibodies and other biological receptors, the simulated receptors generated by this invention are feature extraction mechanisms: they can be used to identify or recognize

2 PCT/CA96/00166 common or similar structural features of target substrates.
Binding affinity between the receptors and the target substrates is used as a metric for feature recognition. Target substrates can be quantitatively categorized on the basis of binding affinity with a specific simulated receptor. Compounds sharing speci~ic structural features will also share similar binding affinities for the same virtual receptor.
Binding affinity between biological receptors and substrates is determined by the steric goodness o~ ~it between the adjacent receptor and substrate surfaces, the exclusion of water between non-polar regions of the two surfaces and the strength o~ electrostatic forces generated between neighbouring charged sites. In some cases the formation of covalent bonds between the substrate and the receptor may also contribute to binding affinity. The simulated receptors generated by this process mimic the binding mechanisms of their biological counterparts. Average proximity of the receptor and target surfaces and the strength of electrostatic attractions developed between charged sites on both surfaces are used to calculate a measurement of binding affinity. The resulting values for binding affinity are used to evaluate substrate molecular similarities.
Binding affinity can be globally determined, that is, dep~n~nt upon interactions between the entire substrate sur~ace and a closed receptor or receptor envelope that completely surrounds the substrate. In this case analysis of global similarities between substrates is appropriate as a basis for developing useful quantitative structure-activity relationships.
However, in most, if not all, biological systems, affinity is locally rather than globally determined. Interactions between substrate molecules and biological receptors are generally limited to contacts between isolated fragments of the receptor and the substrate surface. In this situation, analysis o~ global similarities between substrates is inappropriate as a method of developing structure-activity relationships, since only fragments o~ the substrate are directly involved in the generation of binding affinity.
Locally similar structures share similar structural fragments in similar relative positions and orientations Locally similar structures are not necessarily globally similar.
Sampling of molecular properties may be achieved by a total sampling strategy involving evaluation of global similarity; a fragment sampling strategy involving evaluation of local similarity; and multiple fragments sampling strategies involving evaluation of both local and global similarity.
The analysis of local similarities relies on sampling discrete regions of substrates for similar structures and charge distributions. In biological receptors, localized sampling arises due to the irregularity or bumpiness of the adjacent substrate and receptor surfaces. Interactions between closely opposed surfaces will pre~om~n~nt over interactions between more separated regions in the determination of binding affinity. The proximity of the adjacent surfaces will also detPrm;ne the strength of hydrophobic binding. The effective simulated receptors generated by the present method must exploit discrete local sampling of target substrates (molecules) in order to evaluate functionally relevant similarities between compounds.
Analysis of local similarities is complicated by two factors: 1) the number, location and identity of the relevant fragments sufficient and necessary for specific binding a~finity cannot usually be established by simple deduction from the chemical structure o~ the substrate; and 2) the positions and orientations of the sampled fragments are dependent upon the underlying structure of the whole molecule.
The part of the present method directed to the generation of simulated receptors capable of categorizing similarities between chemical su~strates is essentially a search for receptors that sample the relevant fragments of the substrates at the relevant locations in space. The optimization process relies on four features of simulated receptors: 1) generality:

wherein the receptors are able to bind with more than one substrate; 2) specificity: the binding affinity of the receptors varies with substrate structure; 3) parsimony: the receptors differentiate among substrates on the basis of a m;nlm~1 set of local structural features; and 4) mutability: alteration of the structure of a receptor can change its binding affinlty for a specific substrate. Bncoding of the receptor phenotype in the form of a linear genotype represented by a character string facilitates the processes of mutation, recombination and inheritance of the structural characteristics of the simulated receptors.
Simulated receptors that satisfy these f-~n~mPntal criteria can be optimized to obtain specific binding affinities for locally similar substrates using evolutionary selective breeding strategies. This is accomplished by encoding the spatial configuration and charge site distribution of the receptor in an inheritable format that can undergo alterations or mutations.
Like biological receptors, the simulated receptors generated by this method define a three-~;m~n~ional exclusion space. ~uch a three-~;men~ional space can be outlined to an arbitrary degree of resolution by a one-~,m~n~ional path of sufficient length and tortuosity. Proteins formed from linear polymers of amino acids are examples of such structures. Similarly the three-~lmen~ional structure of simulated receptors can be encoded as a linear array of turning instructions. This one-~m~n~ional encoded form of the receptor constitutes its genotype. The decoded form used to assess binding affinity constitutes its phenotype. During the optimization process alterations (mutations) are made to the receptor genotype. The effects of these changes on the binding affinity of the phenotype are subsequently evaluated. ~enotypes that generate phenotypes with desirable b;n~;n~ af~inities are retained for further alteration, until, by iteration of the mutation and selection process, a selected degree o~
optimization of the phenotype is achieved. A variety of evolutionary strategies, including classical genetic algorithms, may be used to generate populations of simulated receptors wi~h optimal binding characteristics.
Receptors generated by this method are then used to generate or identify novel chemical structures (compounds) which share the speci~ic, use~ul properties o~ the molecular target species used as selection criteria in producing the simulated receptors. Using interaction with the receptors as selection criteria, novel chemical structures are evolved to optimally ~it the receptors. Because these structures must meet the necessary and suf~icient requirements for receptor selectivity, they are likely to also possess biological activity similar to that of the original molecular targets. The population of simulated receptors with enhanced selectivity may also be used to screen existing chemical structures for compounds with high a~finity that may share these useful properties. The same process may also be used to screen for compounds with selected toxicological or immunological properties.
In one aspect of the invention there is provided a computer-based method of designing chemical structures having a preselected ~unctional characteristic, comprising the steps of:
(a) producing a physical model of a simulated receptor phenotype encoded in a linear charater sequence, and providing a set of target molecules sharing at least one quantifiable functional characteristic;
(b) ~or each target molecule;
(i) calculating an a~finity between the receptor and the target molecule in each o~ a plurality of orientations using an effective affinity calculation;
(ii) calculating a sum affinity by summing the calculated affinities;
(iii) identi~ying a m~; m~l affinity;
(c) using the calculated sum and m~x;m~l affinities to:
(i) calculate a m~; m~l affinity correlation coef~icient between the maximal a~finities and the quantifiable ~unctional characteristic:

(ii) calculate a sum affinity correlation coefficient between the sum affinities and the quantifiable functional characteristic;
(d) using the maximal correlation coefficient and sum correlation coefficient to calculate a fitness coefficient;
(e) altering the structure of the receptor and repeating ~teps (b) through (d) until a population of receptors having a preselected fits coefficient are obtained;
(f) providing a physical model of a chemical structure encoded in a molecular linear character sequence, calculating an affinity between the chemical structure and each receptor in a plurality of orientations using said effective affinity calculation, using the calculated affinities to calculate an affinity fitness score;
(g) altering the chemical structure to produce a variant of the chemical structure and repeating step (f); and (h) retaining and further altering those variants of the chemical structure whose affinity score approaches a preselected affinity score.
In another aspect of the invention there is provid.ed a method of screening chemical structures for preselected functional characteristics, comprising:
a) producing a simulated receptor genotype by generating a receptor linear character sequence which codes for spatial occupancy and charge;
b) decoding the genotype to produce a receptor phenotype, providing at least one target molecule exhibiting a selected functional characteristic, calculating an affinity between the receptor and each target molecule in a plurality of orientations using an effective affinity calculation, calculating a sum and m~; m~ 1 affinity between each target molecule and receptor, calculating a sum affinity correlation coefficient for sum affinity versus said functional characteristic of the target molecule and a maximal affinity correlation coefficient for maximal affinity versus said functional characteri~tic, and -calculating a fitness coefficient dependent on said sum and maximal a~finity correlation coefficients;
c) mutating the receptor genotype and repeating step b) and retaining and mutating those receptors exhibiting increased ~itness coefficients until a population of receptors with preselected fitness coefficients are obtained; thereafter d) calculating an a~finity between a chemical ~tructure being screened and each receptor in a plurality of orientations using said effective affinity calculation, calculating an affinity fitness score which includes calculating a sum and m~;m~l affinity between the compound and each receptor and comparing at least one of said sum and mA~lm~l affinity to the sum and maximal af~inities between said at least one target and said population of receptors whereby said comparison is indicative of the level of functional activity of said chemical structure relative to said at least one target molecule.
In another aspect of the invention there is provided a method of designing simulated receptors mimicking biological receptors exhibiting selective affinity for compounds with similar functional characteristics, comprising the steps of:
a) producing a simulated receptor genotype by generating a receptor linear character sequence which codes ~or spatial occupancy and charge;
b) decoding the genotype to produce a receptor phenotype, providing a set of target molecules sharing similar functional characteristics, calculating an affinity between the receptor and each target molecule in a plurality of orientations using an e~fective affinity calculation, calculating a sum and m~;m~l affinity between each target molecule and receptor, calculating a sum affinity correlation coefficient for sum affinity versus a functional characteristic for each target molecule and a m~;m~l affinity correlation coefficient for maximal affinity versus said functional characteristic ~or each target molecule, and calculating a fitness coef~icient dependent on said sum and m~im~l affinity correlation coefficients for each target molecule; and c) mutating the genotype and repeating step b) and retaining and mutating those receptors exhibiting increased fitness coefficients until a population of receptors with preselected fitness coefficients are obtained.
In another aspect of the invention there is provided a computer-based method of designing chemical structures having a preselected functional characteristic, comprising the steps of:
(a) providing a physical model of a receptor and a set of target molecules, the target molecules sharing at least one quantifiable functional characteristic;
(b) for each target molecule;
(i) calculating an affinity between the receptor and the target molecule in each of a plurality of orientations using an effective affinity calculation;
(ii) calculating a sum affinity by summing the calculated affinities;
(iii) identifying a m~;m~l affinity;
(c) using the calculated sum and maximal affinities to:
(i) calculate a maximal affinity correlation coefficient between the m~x;m~l affinities and the ~uantifiable functional characteristici (ii) calculate a sum affinity correlation coefficient between the sum affinities and the quantifiable functional characteristic;
(d) using the maximal correlation coefficient and sum correlation coefficient to calculate a fitness coefficient;
(e) altering the structure of the receptor and repeating steps (b) through (d) until a population of receptors having a preselected fitness coefficient are obtained;
(f) providing a physical model of a chemical structure, calculating an affinity between the chemical structure and each receptor in a plurality of orientations using said effective affinity calculation, using calculated a~inities to calculate an affinity fitness score;
(g) altering the chemical strucutre to produce a variant of the chemical structure and repeating step (f); and (h) retaining and ~urther altering those variants of the chemical structure whose affinity score approaches a preselected affinity score.
In yet another aspect o~ the invention there is provided a method of encoding chemical structures comprising atomic elements, the method comprising providing a linear character sequence which codes for spatial occupancy and charge for each atom of said chemical structure.

R~T~F D~.~r~TPTION OF T~ n~WTN~-~
The method of the present invention will now be described, by example only, reference being had to the accompanying drawings in which:
Figure 1 is a flow chart showing relationship between genotype code creation and translation to produce a corresponding phenotype forming part of the present invention;
Figure 2 is a flow chart showing an overview of the steps in the optimization of a receptor for selectively binding to a set of substrates using point mutations forming part of the present invention;
Figure 3 is a flow chart showing an overview of the steps in the process of producing a population of related receptors with optimized selective binding affinity for a set of chemical substrates and using these optimized receptors for producing a set of novel chemical substrates with common shared functional characteristics;
Figure 4a shows several chemical compounds used in the example relating to examples of ligand generation;
Figure 4b shows ligands 1.1 to 1.4 generated by the method of the present invention in the example of ligand generation wherein each ligand has at least one orientation wherein it is structurally similar to benzaldehyde; and Figure 4c shows ligands 2.1 to 2.4 generated by the method of the present invention in the example of ligand generation relating to design of chemical structural exhibiting an efficacy for repelling misquitoes.

D~RTPTTON OF T~ PR~F~n ~MRODTMENTS
The method can be broken into two parts: (A) evolution of a population of simulated receptors with selective af~inity for compounds with shared functional characteristics and (B) generation of novel chemical structures having the shared functional characteristics. Part (A) comprises several steps including 1) receptor genotype and phenotype generation; 2) presentation of the known chemical structure(s) to the receptor;
4) evaluation of affinity of the receptor for the chemical structure(s); 5) assessing the selectivity of the receptor for the chemical structure(s); 6) stochastically evolving a family o~ related receptors with optimized selective affinity ~or the chemical structure(s); screening chemical substrates for toxicological and pharmacolo~ical activity and using the optimized receptors to design novel chemical structure(s) with selective binding affinity for the receptors.
The following description of the best mode of the invention refers to various tables of molecular and atomic radius, polarizabilities, effective dipole values, and transition states and addition ~actors which values are found in Tables I to V
located at the end of the description. Flowcharts giving non-limiting examples of process calculations are attached to the end of the description in Modules 1 to 14.

PA~T A: EVOLUT~ON OF POPIJLATION OF SIMULATED R~b:~! OKS
EXHIBITING SE~ECTIVE A~1N1~Y FOR TARGET MOLECU~ES
.~T~G CO~M~N ~U~11ONA~ r~A~TERISTICS
(1) G~nnty~e Code An~ ~ec~Dtor Ph~nntype G~n~Ati~n Both the simulated receptor genotypes and phenotype are computational objects. The phenotypes of the simulated receptors consist of folded, unbranched polymers of spherical subunits whose diameter is equal in length to the van der Waals radius of atomic hydrogen (~110 pm). Subunits can be connected to each other at any two of the six points cor~esponding to the intercepts of the spheres with each of their principal axes. In the present implementation connections between subunits cannot be stretched or rotated and the centers of two connected subunits are always separated by a distance e~ual to the length of their sides (i.e. 1 hydrogen radius). Turns occur when two subunits are not attached to the opposite faces of their common neighbour. Four kinds of orthogonal turns are possible: left, right, up and down. Turns must be made parallel to one of the principal axes. For computational simplicity, if turns result in intersection with other subunits in the polymer, subunits are permitted to occupy the same space wi~h other subunits.
A complete simulated receptor consists of one or more discrete polymers. In the case of receptors consisting of multiple polymers, the individual polymers can originate at different points in space. For computational simplicity, all polymers comprising a sinyle receptor are chosen to be of the same length in this implementation (=number of subunits). This restriction is not a re~uirement for functionality, and sets of polymers differing in length may be useful for modelling specific systems.
The structure of each polymer is encoded as a sequential set of turning instructions. The instructions identify individual turns with respect to an internal reference frame based on the initial orientation of ~he first subunit in each polymer.

Hydration of the receptor and substrate are not treated explicitly in the current implementation, instead, it is assumed that any water molecules present at the binding site are attached permanently to the receptor surface and comprise an integral part of its structure. This is an arbitrary approximation and those skilled in the art will appreciate that it could be replaced by a more exact treatment (see, for example, VanOss, 1995, Molecular Immunology 32:199-211).
With reference to Figure 1, the code creation module generates random strings of characters. Each character represents either a turning instruction or determ;nes the charge characteristics or reactivity of a point in the three-~;m~n~ional shape comprising the virtual receptor. A mi n;mnm of five dif~erent characters are required to create a string describing the three-~;m~n~ional shape of a receptor based on Cartesian (rectangular) coordinate framework. ~ther frameworks, e.g. tetrahedral structures can also be constructed using different sets o~ turning instructions. The characters represent turning instructions which are defined with respect to the current path of the virtual receptor structure in three-~;m~n~ional space (i.e. the instructions refer to the intrinsic re~erence frame of the virtual receptor and not an arbitrary external reference frame).
Turning instructions are given with respect to the current direction and orientation o~ the polymer. Only left, right, up and down turns are permitted. If a turn does not occur the polymer can either term;n~te or continue in its current direction.
For a rectangular system the m;n;mllm character set is:
Cl=no turn; C2=right turn; C3=left turn; C4=Up turn; and C~=down turn. It will be understood that instructions could be combined to create diagonal turns e.g. A1,2 = ~ C , etc. The number of different characters that determine dif~erent charge or reactivity states is unrestricted and may be ad~usted according to empirical evidence. Codes may differ both in length W O 97/36252 P~T/CA96/00166 (number of characters) and frequency with which specific characters appear in the series.
Example Of G~n~type ~r~t~o~
The ~ollowing example o~ a genotype code creation and phenotype expression will be understood by those skilled in the art to be illustrative only. In this example the ~ollowing conventions are employed.
(1) The character set used to generate the codes consists of five characters referring to turning instructions and two characters identifying a charged site: "O" = no turn; ~'1" =
right turn; "2'l = up turn; "3" = left turn; "4" = down turn; "5"
= positively charged site (no turn); and l~6ll = negatively charged site (no turn).
(2) Subunits are of two types: charged or uncharged. All charged subunits are assumed to carry a unitary positive or negative charge. The uniform magnitude of charges is an arbitrary convention.

(3) The receptors comprise 15 discrete polymers. The length of the complete code is always a multiple of fifteen. The length of each polymer is equal to the total code length divided by fifteen. It will be understood that receptors can be constructed from any number of discrete polymers of varying or constant length.

(4) The following parameters are set by the user: (a) total code length (and polymer length); (~) the frequency with which each character occurs in the code string; and (c) the occurrence of character combinations. Module I gives a flowchart of a sample of genotype code creation.
Example of ~r~tor p~n~ty~e Cr~t~ ~n Each genotype code is translated to create the three-~m~n~ional description of its corresponding phenotype or virtual receptor. From a prede~ined starting point a translation algorithm is used to convert the turning instructions into a series of coordinate triplets which describe the position in space of the successive subunits comprising the receptor polymers. The starting coordinates for each polymer must be given prior to translation. The translation assumes that centers of successive subunits are separated by a distance equal to the covalent diameter of a hydrogen atom.
The translation algorithm reads the code string sequentially to generate successive turns and straight path sections. The interpretation of successive turns with respect to an external coordinate system depends upon the preceding sequence of turns. For each polymer comprising the receptor, the initial orientation is assumed to be the same. In the current.
implementation, the translation algorithm is described by TABLE
I giving the input and output states. If no turn occurs, the most recent values for ~x,~y,~z and new state are used to calculate the new coordinate triplet. Charge sites are treated as straight (no turn) sections. The initial value of old state is 20.
The following parameters can be set by the user:
a. Starting coordinates for each polymer comprising the receptor.
Output is stored as :
a. Three vectors (one for each axes: {Xl~X2 X3 . ~ ~Xn} ~ {Yl- Yn} ~ {Zl - Zn} ) b: A three-~l;m~nsional binary matrix.
c. Separate vectors for charge site coordinates. A sample process of code translation is give in ~dule 2.
(2) TA~get G~n~t;on Targets are represented as molecules consisting of spherical atoms. The atoms are considered to be hard spheres with fixed radii characteristic for each atomic species. The hard sphere radius at which the repulsive force between the target atoms and the virtual receptor is considered to be in~inite is approximated by the exposed van der Waals radius given in TABL~
2. Other estimated values of the van der Waals radius can be used in place of those in TABLE 2.

W O 97/36252 rCT/CA96/00166 The distance between the atomic centers of two atoms connected by a covalent bond is expressed as the sum of their covalent bond radii. Covalent bond radii vary with bond order and atomic species. Examples of suitable values of bond radii - are given in TABLE 3. As a first approximation, bond length is assumed to be ~ixed ~i.e. bond vibrations are ignored). Bond - rotation is permitted, and multiple configurations of the same structure are required to sample representative rotational states. Con~igurational stability is not considered because binding with the virtual receptor may stabllize otherwise energetically unstable configurations. Various enery m;n;m; zation algorithms can be applied to the generation of target ligands.
Electrical charges arising due to bond dipole moments are considered to be localized at the atomic nuclei. The negative charge is carried by the atom with the larger electronegativity.
The dipole values used in the current implementation a~e given in TABLE 4.Other estimated values of dipole values can be used in place o~ those in TABLE 4.
~3) TA~get P~e~en~At~n The affinity of the each target for the simulated receptor(s) is tested for several orientations of the target relative to the upper surface of the receptor. The upper surface is defined by the translation algorithm. Prior to the evaluation of binding affinity, the target and receptor must be brought into contact. Contact occurs when the distance between the centers of at least one subunit o~ the receptor and at least one atom of the target is equal to their combined van der Waals radii. In order to determine the relative positions of the target and receptor at the point of contact, the target is shifted incrementally towards the receptor surface along a path perpendicular to the surface and passing through the geometric centers of both the receptor and the target. When contact occurs, the target has reached its collision position relative to the receptor. The translated positions of the target atoms when the collision position is reached are used to calculate distances between the atoms of the target and the subunits of the receptor. These distances are used to calculate the stren~th of electrostatic interactions and proximity.
In the current implementation, the target is assumed to travel in a straight line towards the receptor, and to retain its starting orientation at the time of contact. An alternative approach would allow the target to incrementally change its orientation as it approached the receptor so that the maximal affinity position was achieved at the point o~ contact. ~lthough this method is ~unctionally similar to that implemented, it is much more computationally complex. In the current implementation, multiple orientations are tested at lower computational e~fort. The current implementation allows for ad~ustable displacement of the path along the x and/or y axis of the receptor to accommodate larger molecules. This feature is required to ~nh~nce selectivity when molecules di~fering in size are tested on the same receptor.
Prior to the calculation o~ the collision position, the orientation of the target is r~n~nm;zed by random rotation in 6~
increments around each of the x, y, and z axes. Larger or smaller increments of rotation may be used. Each o~ these random orientations of the target is unique in a given test series. The reliability of the optimization process is dependent upon the number of target orientations tested as well as the number of target compounds evaluated. A sample process ~or target presentation is given in Module 3.
(4) t~A1 C~11 Ati~n of Af~f~ n~i ty ~ r~Y;~ti~n ~trAte~y The current implementation is based on a simplified approximation that evaluates the principal components of affinity with relatively little computational ef~ort. The approximation is developed in the following sections. However, it will be appreciated by those skilled in the art that more exact a~inity calculation procedures may be utilized which give a more exact affinity value. Known computational packages ~or calculating more accurate affinity values may be used dlrectly in the present process.
Studies of crown ethers indicate that the electron density distribution of small molecules can be used to describe the electron densities o~ larger compounds (Bruning, H. And Feil, D. (1991) ~. Comput. Chem. 12: 1). Hirsh~eld's stockholder method can be used to define strictly local charge distributions that are subsequently characterized by charge and dipole moment (Hirshfeld, F.L. (1977) Theor. Chim. Acta 44: 129). The result is the di~ision of the total electron density distribution of the molecule into overlapping atomic parts, the sizes o~ which are related to the free atomic radii.
It is possible to ~m~trate in crown ethers that the major components of electrostatic interactions are det~rm;n~d by local rather than global transfers o~ charge between atoms.
Charge distribution i~ mainly det~rm;n~d by short range effects due to di~ferent chemical bond6. In particular, non-neigh~ouring atoms contribute little to atomic dipole moments. In addition, although charge transfer between atoms is also in~luenced by the electrostatic field o~ the whole molecule, calculations ~or crown ethers show only a very small in~luence on the charge distribution.
Calculated stockholder atomic charges and dipole moments can be used to describe electrostatic interactions (Bruning, H.
And Feil, D. (1991) J. Comput. Chem. 12: 1). Beyond the van der Waals radius there is only a minor contribution ~rom the atomic quadrapole moments. Calculations of the electrostatic potential that take only atomic charges into account give very poor results, whereas use o~ the dipole moments generates improved values.
Based on these considerations, the method of the present invention incorporates an approximation of a~inity between the target ligand and the simulated receptor(s) and between the simulated receptor(s) and chemical structure(s) being designed based on two measures.
1. The magnitude of the electrostatic forces generated between the charged subunits of the simulated receptor(s~ and the atomic dipoles of the target ligand (chemical structure).
Because the charged subunits are assumed to carry non-transferrable unit charges, the magnitude of these forces is directly proportional to the magnitude of the atomic dipole and inversely proportional to the distance between the simulated receptor and the atomic dipole of the ligand.
2. The proportion of the non-polar or uncharged subunits of the simulated receptor sufficiently close to the non-polar regions of the ligand for the generation of significant London dispersion forces.
t~l~n~ U~~A Fo~ A~f~nity ~lc~ n Tn Th~ Cll~ ~nt ~pl ~m~n ~A t~ ~n 1. The chemical substrate targets evaluated by the current implementation are assumed to be neutral (i.e. not ionized) molecules. This is an arbitrary limitation, and an implementation applicable to charged and uncharged targets can be developed using the same methodology.
2. The dipole moments are assumed to be localised at the atomic nuclei. A similar analysis of affinity could be made assuming the dipole moment to be centered on the covalent bond. According to Al~ingham et al. (1989), these assumptions are functionally equivalent.
3. The environment surrounding the virtual receptor is assumed to be a solvent system in which the target occurs as a solute.
The target is effectively partitioned between the solvent and the virtual receptor.
4. At the instant for which the affinity is calculated, the target and receptor are assumed to be stationary with respect to each other, and in a specific, fixed orientation.

5. The target~ are assumed to interact with only two types of site on the receptor sur~ace: ~ixed charge sites (either negatively or positively charged) and non-polar sites.
On the basis o~ these assumptions, it is only necessary - to consider the ~ollowing contributions to the strength o~ the interaction:
~ 1. Charge-Dipole Q2~2/6(4~)2kTr~
2. Charge-Non-polar Q2~/2(4~)2r4 3. Dipole-Non-polar (Debye energy) _~2~/ (4~)2r6 4. Non-polar-Non-polar (London energy) -.75lhv~2/(4~)2r61 In the current implementation, only relative strengths are considered by the approximation, therefore all constants are ignored. In addition the fixed charge site is assumed to be unitary and either positive or negative. On this basis, the ~our components can be rewritten in simplified form:
1. Charge-Dipole ~2/r4 or _~/r2 2. Charge-Non-polar -~/r4 3. Dipole-Non-polar (Debye energy) ~2~/r6 or ~-5/r3 4. Non-polar-Non-polar (T~n~on energy) -a2/r6 or -~/r3 In general, terms 2 and 3 make only small contributions to long-range interactions. However, both 1 and 4 contribute significantly to the interaction energy. In the current implementation, most interactions between non-polar fragments are assumed to occur between adjacent alkyl and aromatic hydrogens and the non-polar subunits of the receptor. Under these conditions the value of ~ is assumed to be approximately constant.
T~yrl-~o~h--h; c .C:1 r~-n~-h ~n~ W:~ter ~!y~ orl~ r~ntr~h~ n Solvation e~fects are important considerations in the generation o~ binding af~inity. For example, hydrophobic bond ~ormation relies upon the close spatial association of non-polar, hydrophobic groups so that contact between the hydrophobic regions and water molecules is m;n;m;zed.
Hydrophobic bond fonmation may contribute as much as half of the total strength of antibody-antigen bonds. Hydration of the receptor and substrate surfaces is also a significant factor.
Water bound to polar sites of either the receptor or substrate surface can interfere with binding or increase affinity by forming cross-bridges between the surfaces.
The hydrophobic interaction describes the strong attraction between hydrophobic molecules in water. In the case of receptor-target interactions it is taken to refer to the attraction between the non-polar fragments of the target and adjacent d~m~n~ of non-polar receptor subunits. The effect arises primarily from entropic effects resulting in rearrangements of the surfaces so that water is excluded between adjacent non-polar ~m~ins~ Exact theoretical treatments of the hydrophobic interaction are unavailable, however, it is estimated that hydrophobic forces contribute as much as 50~ of the total attraction between antibodies and antigens. In order to estimate the hydrophobic interaction between targets and virtual receptors, the present implementation evaluates the proportion of the receptor that is effectively shielded ~rom solvation by binding with the target. All non-polar (uncharged) subunits that are within a fixed distance of non-polar atoms on the target are considered to be shielded from solvation by solvent molecules of diameter e~ual to or greater than the limiting distance.
n~ f f ~ n i ty t"s~ l cll l ~ t l ~n The combined affinity calculation used in the current implementation combines two measures of interaction: the summed strengths of the charge-dipole interactions and a proximity measure. These affinities are assumed in the current implementation to be isotropic. It will be appreciated by those skilled in the art that greater discriminatory power may be obtained if anisotropic calculations of affinity are used, although these are computationally more complex.
The charge-dipole interaction is calculated as D=~i/r1j, where ~1= the dipole moment of the ith atom of the target and ri3=
the distance between the ith atom and the jth charge site on the receptor, and the coef~icient v can be set to 2, 3, or 4. The contribution of D to the total affinity is more sensitive to charge separation for larger values of ~.
The proximity measure is calculated as Ps~nL/N, where nl=the number of uncharged subunits of the receptor that are separated by a maximum distance of a from the ith atom of the target with a dipole moment ~0.75 Debye. In the current implementation, a can range from 1 to 4 subunit diameters (this approximateS the van der Waals radius of water). N is the total number of subunits comprising the receptor.
An affinity value A is calculated from D and P using the following relationship A=lP(D~NP/k)] 0'5, where k is a fitting constant (in the current implementation, kslO~OO). The value o~
P in the equation serves two roles. In the first instance it is a weighting factor. As a measure of 'goodness of fit' it is use to bias the affinity value in favour of those configurations in which the non-polar regions of the target and receptor are in close contact. Under these conditions, hydrophobic interactions and non-polar interaction enersies will be large and will contribute signi~icantly to the stability and strength of the bond. Under these conditions the target has fewer possible trajectories to escape from the receptor and its retention time will be prolonged. In the second instance P is used to estimate the contribution of the dispersion energy to the strength of the interaction. It is assumed that the dispersion energy will only be significant for uncharged, non-polar regions, and that it is only significant when the target and receptor are close to each other (i e. within ~ of each other). ~he values of k and ~ can be adjusted to alter the ~elative contribution of P and D. In general, P dominates for non-polar targets, whereas D is more significant for targets with large local dipoles. Hydrogen bonding is approximated by paired negatively and positively charged receptor units interacting simultaneously with target - hydroxyl, carboxylic or amine functional groups.

Alter~tive ~gDr~ches to A~f;n~ty ~Alc~ tion Pol~;7A~ility It may be advantageous is certain cases to introduce a parameter corresponding to the relative polarizability of the target atoms into the affinity calculation. In this case the equation for calculating P2 in A=~P(D+NP2/k)] 0'5 iS not P2=~ /N.
Instead, P2 is calculated as P2= ~.ni/N; where ni=the number of either charged or uncharged subunits of the receptor that are separated by a maximum distance of ~ from the ith atom of the target and ~1 is the relative polarizability of the ith atom of the target. For simplicity ~H could be set to 1.0 for aliphatic hydrogen. The value of k must be adjusted if polarizabilities are used. Sample polarizabilities based on the sums of adjacent bond polarizabilities are given in TABL~ v.
Since polarizability is associated with displacement of the electron cloud, the polarizability of a molecule can be calculated as the sum of the characteristic polarizabilities of its covalent honds. This additivity holds for non-aromatic molecules that do not have delocalized electrons.
~ 1 tern5~ t;ve T~hn; ~ue~1- F--n~.~ n~ 1 Gro~ ~eci f :i C'~ ty The affinity approximation used in the current implementation could be replaced by functionally similar computations that preserve the relationship between local charges, dispersion energy and target-receptor separation. In addition, af~inity measures for charged targets could be constructed. The present implementation evaluates only non-covalent interactions, however, the method could be expanded by including in the virtual receptor subunits capable of specific covalent bond-~orming reactions with selected target functional groups. Module 5 provides a sample flowchart of the preferred effective a~finity calculation used in the present invention.
( 5 ) ~ n8~m~n t of ~~1 ective A~f~ n ~ ty Goodness of fit between a virtual receptor and a set of target substrates is evaluated by comparing the known activity or affinity values for the targets with those obtained for the virtual receptor-target complex. The m~;m~l af~inities of an optimally selective virtual receptor should be strongly correlated with known a~inity measures. Successive iterations of point mutations can be used to enhance this correlation ~etween a set o~ substrates and a virtual receptor (Figure 2) or ~or optimizing selectivity o~ a population o~ virtual receptors successive iterations of the evolutionary process may used to enhance this correlation (Figure 3).
Known values can be any index known or suspected to be dependent upon binding affinity, including (but not limited to) ED50, I~, binding af~inity, and cohesion measures. The values tested must be positive. Logarithmic trans~ormation o~ the data may be required. Unweighted rank data cannot be used.
The optimal orientation of the targets for m~;m~l ~in~;ng a~finity is unknown prior to testing. In order to obtain a repre~entative measure of the range of receptor-target a~inity, each target must be tested repeatedly using different random orientations relative to the receptor surface. Each test uses M~dule 4 to evaluate af~inity. In general, the reliability o~ the maximal a~inity values obtained depends upon the sample size, since it becomes increasingly likely that the sample will contain the true maximal value. The same set of target orientations is used ~or testing each receptor.
Two techniques are employed in the current implementation to circumvent the need for large sample sets for the generation o~ optimized receptors: 1) the use of a measure combining average (or sum) a~finity and m~;m~l a~inity to select ~or receptors with higher selectivity; and 2) incremental increases in the number of orientations tested with successive iterations of the optimization process (optimization begins with a small set of target orientations, as receptors of greater fitness are generated, more orientations are tested).
In the current implementation, the sum is calculated ~or the a~finity values obtained for all the tested orientations of each target. This sum a~inity score is a measure of the average -af~inity between the receptor and the target. At the same time, the mA~lmAl affinity value is also det~rmlned.
Correlations between the known values and both the sum affinity r~2 and the mA~;mA1 affinities ~2 are calculated. The origin (0,0) is included in the correlation, based on the assumption that target compounds showing no activity should have little or no affinity for the virtual receptor. This assumption may not always be valid, and other intercept values may be required in some tests.
The correlation of using sum affinity is a measure of the average goodness of fit. If this correlation is large, but the correlation between mA~,m~l affinity and known affinity is weak, the result suggests that the virtual receptor is not selective, i.e. multiple orientations of the target can interact effectively with the receptor. Conversely, if the m~lmAl affinity is highly correlated with known affinity values and the correlation with sum affinity is weak, the virtual receptor my be highly selective. If both sum affinity and m~;mAl affinity are highly correlated with known affinity, it is probable that the orientations sampled have identified the response characteristics o~ the receptor with limited error (both type I
and type II errors are reduced: the likelihood of either a false positive or false negative result). In some cases it may be more appropriate to mt n; m; ze the correlation between the known affinities and the sum affinity, while selecting for an increased correlation between mA~;mAl affinity and known affinity. Such a selection would require subtraction of the maximal affinity values from the sum total in order to remove these values as a source of confounding bias.
In the current implementation, a joint correlation value is used as the basis for receptor selection. This value is calculated as the square root of the product of the sum affinity and maximal affinity F-( rMA2 x ~SA2) 0,5 This value is optimized by the evolutionary process applied to the virtual receptors. Note: If r~2 and r~2 are strongly correlated with each other, then the values contributing to r~2 must either individually correlate closely with the maximal affinity value or contribute negligibly to the sum. Alternatively the correlation (r~ ~) for the (sum affinity - the m~xim~l affinity) vs known affinity can be calculated and the measure Fs(r~2 x(1-r~ ~2) ) 0.5 is maximized. Use of this measure will select for receptors that have high af~inity for a very limited set of target orientations. Mbdule 5 provides a flowchart of a sample goodness of fit calculation.
( 6 ) Th~ Opt:~ m; ~z~ t~ ~n Proce~~s The objective of the optimization process is to evolve a virtual receptor that has selective affinity for a set of target receptors. A highly efficient mechanism for finding solutions is required, since the total number of possible genotypes containing 300 instructions is 75~~ or about lo253 The following four phases summarize the steps in the optimization process whereinafter each phase is discussed in more detail and example calculations given.
PEASE 1: Generate a set of random genotypes and screen for a m;n;m~l level of activity. Use selected genotype as basis for further optimization using genetic algorithm (reco-mbination) and unidirectional mutation techniques.
PHASE 2: M~tate selected genotype to generate a breeding population of distinct but related genotypes for recombinations.
Chose most selective mutants from population from population for recombination.
PHASE 3: Generate new genotypes by recombination of selective mutants. Select from the resulting genotypes those with the highest affinity fitness. Use this subpopulation for the next recombinant or mutation generation.

PHASE 4: Take best recombination products and apply repeated point mutations to enhance selectivity.
PhA~e T: ~volllt~n-g~n~t-;~n of Prim~y Code The Genetic Algorithm developed by Holland (Holland, J.H.
(1975) Adaptation in Natural and Artificial Systems. U. Michigar.
Press. Ann Arbour) can be used to search for optimal solutions to a variety of problems. Normally this technique is applied using large, initially random sets of solutions. In the present implementation the technique is significantly modified in order to reduce the number of tests and iterations required to ~ind virtual receptors with high selectivity. This has been accomplished by using a set of closely related genotypes as the initial population and the application of high rates of mutation at each iteration. For any set of target compounds it is possible to develop distinct receptors with optimal affinity characteristics. For example, receptors may bind optimally to the same targets but in di~ferent orientations. The use of an initial population of closely related genotypes increases the likelihood that the optimization process is converging on a single solution. Recombination of unrelated genotypes, although it may generate novel genotypes of increased fitness, is more likely to result in divergence.
The objective of the first stage in the optimization process is to generate a genotype with a m;n;m~l level o~
affinity for the target set. This genotype is subsequently used to generate a population of related genotypes. A flowchart of a sample process for generation of a genotype with a m;n,mllm level of affinity is given in M~dule 6.
ph~e ~: F!vo~-~tinn~ tat;~n of Pr~mary C~oA"
Mutation of the genotype comprises changing one or more characters in the code. Mutations in the current implementation do not alter the number of subunits comprising the receptor polymers and do not affect the length of the genotype. It will be appreciated that these conventions are arbitrary, and it will be understood that variants may have utility in some systems.

CA 0224739l l998-08-27 M~tations can alter the ~olding pattern of the phenotype, with resulting changes in the receptor shape space and the location or exposure of binding sites. Mutations that affect the configuration of peripheral regions of the phenotype can result in shifts of the receptor center relative to the target center.
Nel~ tr~ l Inll ta ~ i nn ~
All mutations alter the structure of the phenotype, however, not all mutations result in changes in the functionality of the receptor. Such neutral mutations may alter components of the receptor that do not affect affinity. In some cases these neutral mutation~ can combine with subsequent mutations to exert a synergistic affect.
Th ~ Rre~fl ~ nSr pO~ll 1 A t i-~n The objective of the second phase of the evolutionary process is the generation of a population of distinct but related genotypes derived from the primary genotype. Members of this population are subsequently used to generate recombinants.
This breeding population is created by multiple mutation of the primary genotype. The resulting genotypes are translated and screened for selectivity. The most selective products are retained for recombination. Mbdule 7 gives a flowchart for a sample process for multiple mutation of a genotype.
ph~ 3 ~;!Vo~ n -RecQ~; r~ ~n The obiective of recombination is the generation of novel genotypes with increased fitness. Recombination facilitates the conservation of genotype fragments that are essential for phenotypic fitness, while at the same time introducing novel combinations of instructions. In general, recombination coupled with selection results in rapid optimization of selectivity.
Mbdule 8 provides a flowchart for a sample process for recombination of a genotype The current implementation retains the population used for recombination for testing in step 7 of Mbdule 8. This ensures that genotypes with high selectivity are not replaced by genotypes with lower selectivity. In addition, in the current -implementation, mutations (Mbdule 7) are applied to 50~ o~ the recombinant genotypes prior to testing (Step 7-Mbdule 8). l'his step increases the variability within the recombinant population. The test populations used in the current implementation range in size from 10 to 40 genotypes. This is a relatively small population size. ~nder some conditions, larger populations may be required.
phF~ ~e 4s ~ro~ ; on-~atuxA t~ ~n E ~y ~Q~ve ~croDml~Ation T~rhnicrl~
The final stage in the optimization process mimics the maturation of antibodies in the m~mm~l ian ~mmllne system.
series of single point mutations are applied to the genotype, and the effect on phenotypic fitness is evaluated. Unlike recombination, this process generally results in only small incremental changes to the selectivity of the phenotype. The maturation process uses a Rechenberg (1+1) evolutionary stra~egy (Rechenberg, I. ~1973), Evolutionsstrategie. F. F~ nn.
Stuttgart). At each generation the fitness of the parental genotype is compared to that of its mutation product, and the genotype with the greater selectivity is retained for the next generation. As a result, this process is strictly unidirectional, since less selective mutants do not replace their parents. M~dule 9 shows a flowchart for non-limiting sample of maturation of a genotype.
During each iteration of the maturation process, only a single instruction in the code is changed. If a parent and its mutation product have the same selectivity, the parent is replaced by its product in the next generation. This method results in the accumulation of neutral mutations that may have synergistic effects with subsequent mutations. This convention is arbitrary.
If recombination or maturation do not generate improved selectivity after repeated iterations, it may be necessary to repeat Phase 2 in order to increase the variability of the breeding population genome.

ect~ ~D~l;~; nn~
The process o~ the present invention can be used in several areas including: 1) screening for compounds with selected pharmacological or toxicological activity; and 2) development o~ novel chemical structures with selected functional characteristics. Both applications and examples are provided hereinafter.
lA) Scre-~n; r~g ~et}~Qd A population of receptors that have been evolved for selective affinity for a specific group of compounds sharing similar pharmacological properties can be used as probes for the identification of other compounds with similar activity, provided this activity is dependent upon binding affinity. For example, a population of receptors could be evolved to display specific affinity for salicylates. If the affinity of these receptors for salicylates closely correlates with ~he affinity of cyclooxygenase for salicylates, the receptors must at least partially mimic functionally relevant features of the binding site o~ the cyclooxygenase molecule. These receptors can therefore be used to screen other compounds for possible binding affinity with cyclooxygenase.
This technique can also be applied to screening compounds for potential toxicological or carcinogenic activity. For example, receptors could be evolved that mimic the specific binding affinity of steroid hormone receptors. These receptors could then be used to evaluate the affinity of pesticides, solvents, food additives and other synthetic materials for possible binding affinity prior to in vitro or in vivo testing.
Simulated receptors may also be constructed to detect affinity ~or alternate target sites, transport proteins or non-target binding.
lB) Scre~n;~g For Sllh-M~ 1 ACtiv~ ty In some instances compounds with high affinity may have deleterious side effects or may be unsuitable for chronic administration. In this case, compounds with lower binding àffinity may be re~uired. Techniques such as com~inatorial synthesis do not readily generate or identify such compounds. In contrast, simulated receptors could be used to effectively screen for structures that display binding affinity of any speci~ied level.
1C) MQ~R11~r;ng ~eC1~ imi~A~-;tY
The selectivity of the simulated receptors can be used as a quantitative measure of molecular similarity.
~-e o~ .~;~ul~ e~Dto~
In the example, fictitious test values of target a~inities were chosen to demonstrate ~he ability o~ the receptor generation program to construct simulated receptors mimicking any arbitrarily chosen pattern of activity.
In this example, all receptors consists of 15 polymers.
Width, Length, and Depth values specify origin coordinates of the 15 polymers relative to the center of the receptor.
~x~lç 1 A simulated receptor was generated with the following specifications:
Number of subunits: 24~; Width: 6; Length: 6 Depth: 25 Code:
"4100033103212204103333424052312013341024124022232334010032242 ~101440513324340032462041210~131310043112101132412022421302413 0331022051414141021402134014310010231110331235210016240"
Each target was tested 20 times against the receptor.
The a~inity score for the optimized receptor was 0.9358 which is relatively low.
The target substrates used to optimize the receptor were benzene, ph~n~l, benzoic acid and o-salicylic acid. The aspirin precursor o-salicylic acid is an inhibitor of prostaglandin synthesis by cyclooxygenase. Benzoic acid and phenol have much lower af~inity for the same site. The target af~inity values and the scores ~or the receptor are shown in Table A belo~ which CA 0224739l l998-08-27 shows that the simulated receptor has m~ mi~l af~inity ~or o-salicylic acid.
TABLE A
Target C~mrQ11n~ Target A~finity Sum A~finity Score M1~; 1 A~finity Score Benzene 0.6 20.88 3.38 Phenol 1.2 8.03 4.99 Benzoic Acid 1.6 42.23 12.98 o-Salicylic Acid 4.4 80.33 34.71 Three test substrates were evaluated using the simulated receptor. Two o~ the compounds are known to be less active than o-salicylic acid: m-salicylic acid and p-salicylic acid. The third compound, Di~lusinal is a ~luorinated salicylic acid derivative of efficacy equal to or greater than that of ~alicylic acid. The results o~ the evaluation are given in Table B.
TABLE B
Target Compound Sum Af~inity Score ~;m~l A~inity Score m-Salicylic acid 45.9 12.3 p-Salicylic acid 63.5 27.5 Diflusinal 117 71.2 o-Salicylic Acid 80.33 34.71 The results obtained using the simulated receptor closely match the ph~rm~cological data for these compounds: m-salicylic acid and p-salicylic acid have lower a~finity scores than o-salicylic acid and di~lusinal is more active than o-salicylic acid. Further refinement o~ the simulated receptor and the use o~ additional, independently optimised receptors would be required to increase the certainty of these predictions o~
activity.

W O 97/36252 PCT/CA96/0~166 P~RT B:Dl2V~T-~P~ wT OF NQV~T Co~PoT~ns ~:rT~ S~T~rT~n ~U~ N~T~
C~M~~ ~Tl;~TsTTcs ~vol 1l~.; ~n 0~ Nov~~ gAn~R
A population of simulated receptors evolved for selective affinity to a set of target compounds with similar functional characteristics can be used to devise novel compounds with similar characteristics, provided these characteristics are closely correlated with the structure or binding affinity of the model compounds. Using interaction with the receptors as selection criteria, novel chemical structures can be evolved to optimally fit the receptors. Because these compounds must meet the necessary and sufficient requirements for receptor selectivity, these novel compounds are likely to also possess activity similar to that of the original molecular targets.
Ov~rv~ew of P~o~s 1. Generate a population of simulated receptors with optimized selectivity for a set of characterized target compounds. In some cases it may be desirable to generate several populations with different affinity characteristics. For example, three populations of simulated receptors could be generated, the first mimicking the properties of the selected target site, the second mimicking a site required for transport of the ligand to its primary target and a third population of simulated receptors mimicking a target site mediating undesirable side-effects. The development of a new ligand structure in this instance would require simultaneous optimization of affinity for the first two receptor populations and m;n;m; zing affinity for the third population.
2. Determ;ne the affinity of a novel primary structure for the simulated receptor population~s).
3. Modify primary structure and evaluate affinity using simulated receptor population(s). If the modification improves affinity characteristics, the modified structure is retained for further modification. Otherwise a di~ferent modification is W O 97/362~2 PCT/CA96/OQ166 tested. Previously rejected modi~ications may be reintroduced in combination with other modi~ications.
4. Step 3 is repeated until a compound with suitable affinity characteristics is obtained.
~ Note: Using suitably discriminating simulated receptors it is possible to evolve chemical structures with sub-m~x;m~l ~ af~inity ~or a selected target site.
1 e~.ll Ar G~nntype ~o-l~ G~n~r~t~ nn Encoding o~ the ligand phonotype (molecular structure) in the form of a linear genotype represented by a character string ~acilitates the processes of mutation, recombination and inheritance o~ the structural characteristics of the ligand during the evolutionary process.
The ligands evolved by the current implementation consist o~ substituted carbon skeletons. Each code consists of three character vectors. The pri~,ary code vector contains the turning instructions for the generation of the carbon skeleton and det~rm;n~s the position of each carbon atom in the skeleton. The secon~ry code vector identi~ies the functional groups attached to each carbon atom. The tertiary code vector specifies the position of the functional group relative to the host carbon.
Molecular skeletons combining atoms other than carbon (e.g.
ethers, amides and heterocycles) can be constructed in a homologous ~ashion using additional characters in the code to specify atomic species replacing carbon atoms in the skeleton.
The carbon skeleton is constructed from a series of points which form the nodes of a three-~;m~n~ional tetrahedral coordinate system. During initial skeleton construction, the distance between nearest points is equal to the mean bond length between al~yl carbon atoms.
Pr~mary code ~ector: ligand ~eleton det~m~n~nts The primary code vector consists of characters identifying turning direction relative to the current atom position. Each turning direction speci~ies the coordinates o~
the next atom in the tetrahedral matrix. Four directions W 097/36252 PCT/CAg6/00166 (1,2,3,4) can be taken from each atom, corresponding to the unfilled valences of Sp3 carbon. Each of the carbon atoms belongs to one of ~our possible states (A, B, C, D~. These states correspond to the number of distinct nodes in the tetrahedral coordinate system.
The relationship between turn direction and the new coordinates for the next atom in the skeleton is given by the following tables. The two tables B1 and B2 below embody the two turning conventions required to construct the ligands. The boat convention results in the generation of a tetrahedral matrix in which closed 6-member rings (cyclo~x~nes) assume the boat configuration. The chair convention results in the generation of a matrix in which cyclohexyl rings assume the chair configuration. It is possible to combine both conventions during code generation. Only the boat convention is used in the examples discussed here.

-Table B1: Boat Co~l~tion Current Position = (x, y, z) New Position Following Turn Current State Turn = 1 Turn = 2 Turn = 3 Turn = 4 - A (x-.75, (x+.75, (x, y-.864, (x, y+.433, z- y+.433, z- z-.5~ y, .5) .5) z+l) B (x+.75, y- (X-.75, y- (x, y+.864, (X, .433, z+.5) .433, z+.5) z+.5) y, z-1) C (x-.75, (x+.75, (x, y-.864, (x, y+.433, y+.433, z+.5) y, z-z+.5) z+.5) 1) D (x+.75, y- (x-.75, y- (x, y+.864, (x, .433, Z-.5) .433, Z-.5) Z-.5) y, z+l ) Each turn also results in the speci~ication o~ the state o~ the new atom:

New State Followin~ Turn Current State Turn = 1 Turn = 2 Turn = 3 Turn = 4 A B B ~, C
B A ~ ~ ?
C D D .- A
D C C C B

TAhle R2 rh~; r C~ ~ v ~ ~ tinn Current Position = (x, y, z) New Position Following Turn Current State Turn = 1 Turn = 2 Turn = 3 Turn = 4 A (x-.75, (x+.75, (x, y-.864, (x, y+.433, z- y~.433, z- z-.5) y, .5) .5) z+1) B (x+.75, y- (x-.75, y- (x, y+.864, ~x, .433, z+.5) .433, z+.5) z~.5) y, z-1) C (x- 75, y- (x+.75, y- (x, y+.864, ~x, .433, z+.5) .433, z+.5) z+.5) y, z-1) D tx+.75, (x-.75, (x, y-.864, (x, y+.433, z- y+.433, z- z-.5) y, .5) .5) z+1) Each turn also results in the speci~ication o~ the state of the new atom:

New State Following Turn Current State Turn = 1 Turn = 2 Turn = 3 Turn = 4 A B B B
B ~ A A D
C D D D A
D C C C B

Using these relationships, primary code vectors consisting o~ strings o~ the characters 1,2,3, and 4 can be decoded to create three-~m~n~ional arrangements o~ carbon atoms. The resulting string of carbon atoms is allowed to ~old back on itsel~ or create closed loops, producing short side chains and ring structures. Speci~ic ring structures (~or example, cycloh~x~nes) can ~e incorporated directly as speci~ic character sequences, as shown below.

~c~r~y Code Vector~ ~sti~nt~
A secondary code vector, o~ the same length as the primary code vector, is used to allocate the type of substituent altached to the carbon atom specified by the primary code vector. Each substituent is identi~ied by a single character.
Substituents are added singly to the carbon skeleton. A single carbon atom can have more than one substituent, but only if it iS specified more than once by the primary code.
In the current impl~m~nt~tion, all valences not filled by substituents specified by the secondary code vector are automatically filled with hydrogen atoms during the ligand construction process. Other rules could be applied for filling empty valences with atoms other than hydrogen.
T~ y vectQr: Sllh81-- i t~ nt Rnn-l Vector A tertiary code vector, of the same length as the primary code vector, is used to allocate the valence used for the attachment of the substituent specified by the secondary code vector. The tertiary code consists o~ the characters 1, 2, 3, and 4 each of which re~ers to the turn directions s;pecified for the primary code. Substituents are only a.llocated if the valence is not already occupied by either a carbon atom speci~ied by the primary code vector or another previously allocated substituent. Alternati~ely, successive substituents could replace previously allocated substituents.
) Cc~ Cr-~A t -~ on To create carbon skeletons the primary code is constructed by creating a random sequence of characters belonging to the set {"1", "2", ~3~ 4~l} The creation of heterocyclic structures, ethers, amides, imides and carboxylic compounds is accomplished by substituting a carbon atom in the skeleton by a different atom specified by the secondary code.
The secondary code is generated ~rom a random se~uence of characters identifying substituent types. The frequency of ~ the characters can be random or fixed prior to code generation.

The tertiary code consists of characters belonging to the set {"1","2","3","4"}. Ring structures can be deliberately constructed ~as opposed to random generation) by adding specific character sequences to the primary code. For ~x~mpl e ~'431413~ codes for a cyclohexyl ring. A total of 24 strings code for all possible orientations of cyclohexyl rings in the tetrahedral matrix. Secondary and tertiary code vectors for the ring primary codes are generated as described previously.
M~dule 10 provides a flowchart of an example creation of code generating carbon skeletons with rings.
The relative positions of the entry and exit points from a ring comprising part of the carbon atom skeleton are dteremined by the length of the character sequences used to generate the ring. Specifically, if the sequences cont~; n~ six characters, for example 431413, then the entry and exit point will be the same member of the ring. If the sequence is partially repeated and appended to the initial six characters, the entry point and exit point will not be the same member of the ring. For example, the sequences 4314134 and 43141343141 will generate rings with exit points at the members of the rings adjacent to the entry points.
In the current implementation, rings are added to the s~eleton by A~; ng sequences of 6 or mroe characters to the code. For the ring defined by 431413 the possible sequences used are:

The conventions presented for creating a novel ligand genotype can be used to encode other chemical structures in a linear format, either ~or storage or for introduction into the ligand evolutionary process. For example, a known ph~rm~Cophore can be encoded in linear format and used as the starting point for evolving novel ligands with similar or enhanced functional properties Similarly, sets of ph~rm~cophores interacting with a common target site can be encoded in linear format and used for recombination.
3) CC~7~ T~ nç~lAtinn ~nf~ S.~g:qn~ ~n~t~Uct.~nn The code vectors are converted into three-~;men.sional representations of ligands in a translation process consisting of three discrete steps. In the first step, the carbon atom skeleton is constructed using the primary code. In the second step substituents are added to the carbon skeleton using the instructions from the secon~ry and tertiary code vectors.
Instructions from the secondary and tertiary code vectors may also specify replacement of carbon atoms in the skeleton with different atoms. Instructions from the secondary and tertiary codes may also change the number and orientation of available valences present on acarbon or other atom forming part of the primary skeleton. For example, addition o~ carbonyl oxygen occupies two empty valences. In the third step, all valences not filled by substituents during the second step are filled with hydrogen atoms (unless otherwise specified).
r~A; Tl~j7 T. l q~n~ ~k~l el-~n ~ tion Primary decoding uses the turning instructions from the primary code vector to specify the positions of each carbon atom. The first atom is assumed to be located at the origin of the coordinate system. The ~irst atom is assumed to occupy state A in the matrix.
Decoding proceeds sequentially. The result of the primary decoding process is a 3 x n matrix cont~;n~ng the x, y, and z coordinates of each of the n carbon atoms in the skeleton. Because loops and reversals are permitted, the same position in space may be occupied by more than one carbon. In these cases, only one carbon atom is assumed to occupy the position. As a result, the number of carbon atoms ~orming the completed skeleton may be less than the number of characters in the primary code vector.
As the primary code is read, a list is constructed from the secondary code that identifies the substituents attached to each carbon position. At the same time a parallel list is constructed using the tertiary code to specify the valence occupied by each substituent.
rC ec!nn~ t-l~nt A~ ti nn~
Substituents are added sequentially to each carbon atom based on the list generated from the secondary code during primary decoding. The corresponding value ~rom the tertiary code is used to specify the valence position o~ the substituent relative to the host carbon. I~ the position is already occupied by either an adjacent carbon atom, or a previously specified substituent, the substitution is not carried out. Alternatively, a decoding process could be constructed in which the substitution is carried out at the next unoccupied position or the substitution replcases a previously specificed substituent. The distance between the substituent and the carbon atom is calculated from look up tables of bond lengths. The position data and bond lengths are used to calculate the coordinates of the substituent. In the case o~ multi-component substituents, such as hydroxyl r nitro, and amino groups, the coordinates for each atom in the substituent are calculated relative to the host carbon.
After all the substituents specified by the secondary code vector are added to the skeleton, all unfilled positions r~m~;n;ng on the skeleton are ~illed with hydrogen atoms. The hydrogen sp3-carbon bond length is used to calculate the coordinates of each hydrogen atom.
A single carbon atom can have more than one non-hydrogen substituent. This can occur i~ the same position is speci~ied more than once by the primary code vector. The current implementation does not incorporate multiple CA 0224739l l998-08-27 W O 97/36252 PCT/CA96/~0166 substitutions u~ing the secondary code directly, although this can be readily implemented.
Substitutions are only allowed at loci not occupied by carbon atoms ~orming the ligand skeleton. A cumulative list is maintained of all occupied sites in the tetrahedral matrix.
During the secondary decoding process a list is compiled of the type, radius, and position of all the atoms comprising the ligand. This list is the basis for subsequent target generation.
At this stage in the process, the feasibility of the structure generated from the code sequence is not evaluated.
In some cases the atomic coordinates may be entered into energy m; n;ml zation programs to create more realistic structures. However, in the present implementation, no assumptions are made concerning the configuration of the ligand during binding. In addition, the current implementation preserves the structural uniqueness of specific configurations of the same molecule For example, the current implementation distinguishes between three rotational isomers of butane, and treats each isomer as a unique molecule.
The code vectors constitute the genotype of the corresponding ligand, and can be subjected to mutation and recombination with resulting changes in ligand structure. The ligand structure itself is the phenotype used to evaluate binding affinity with a selected population of virtual receptors.
4) T~get Pr~n~A~-; ~n Chemical structures or target ligands are initially constructed from r~n~mly generated codes. Following decoding, the coordinates, radii, dipole m~m~n~s and polarizabilities of each atom in the target ligand are obt~ne~ from look up tables of value and used to evaluate the binding affinity between the ligand and a selected population of virtual receptors.

The a~finity o~ the target for each o~ the virtual receptors is tested ~or many orientations of the target relative to the receptor surfaces. No assumptions are made concerning the relative orientations of the ligand and simulated receptor. Prior to the evaluation of binding affinity, the target and receptor must be brought into contact. The method of target presentation and calculation of af~inity between the chemical structures and simulated receptors is essentially the same as discussed above in Module 4 between known target molecules and the simulated receptors~
5) ~v~111~tion oF ~ nf~ ns;~ ~ff; n~ty ~n-l F~n~
The binding a~inity of the target ligand for each o~
the simulated receptors used for fitness evaluation is calculated using the same e~fective affinity calculation method described ~or simulated receptor generation using the target molecules. As previously noted, af~inity calculations using other criteria can be incorporated into the fitness testing process but the e~ficacy and computational e~iciency of the present invention relies in part on using the same effective affinity calculation for virtual receptor generation and generation o~ the chemical structures using the simulated receptor populations.

6 ) ~.~ gs~ R~o~ ut; ~n T~At~ n~ Go~s of F~t Goodness of ~it between a selected population of simulated receptors and a novel ligand or chemical structure is evaluated by comparing the target activity or a~finity values for the ligand with those obtained ~or the simulated receptor-ligand complexes. The maximal affinities of an optimally selective virtual receptor should be strongly correlated with the target a~finity measures. Successive iterations of the evolutionary process are used to enhance this correlation.
The target values can be set to any level of binding af~inity. It is not re~uired that the ligand have the same binding affinity for all the virtual receptors used in the selection process. In the current implementation, the m~;m~l binding a~inities o~ the optimized virtual receptors for known substrates are used to calculate target binding affinities. For example, the target affinities may be set to 90~ of the binding affinity of each member of the virtual receptor pop~lation for a specific substrate. Alternatively, the target binding affinity may be set to zero if the interaction between the ligand and the virtual receptor is to be ml n;m; sed.
By combining simulated receptors optimized for different sets of substrates and associating selected target affinity values with each receptor, novel ligands can be selected for specific binding affinity profiles. Ligand fitness measures the match between calculated ligand binding affinities and the target affinity values. The optimization process m~;m;zes ligand fitness.
The optimal orientation o~ the ligands for m~tm~l binding affinity is unknown prior to testing. In order to obtain a representative measure of the range of receptor-ligand affinities, each novel ligand m-~ust be tested repeatedly using different random orientations relative to the receptor surface. Each test uses Module 4 discussed in Part A to evaluate affinity. In general, the reliability of the m~;m~l affinity values obtained depends upon the sample size, since it becomes increasingly likely that the sample will contain the true m~;m~l value.
Two techniques are employed in the current implementation to circumvent the need for large sample sets for the generation of optimized novel ligands or chemical structures:
1. The use of a measure combining average (or sum) affinity and m~lm~l affinity to select for ligands with ~ optimized affinity profiles.

2. Incremental increases in the number o~ orientations tested with successive iterations of the optimization process.
(Optimization begins with a small set of target orientations, as ligands of greater fitness are generated, more orientations are tested.) In the current implementation, the sum is calculated for the affinity values obtained for all the tested orientations of each ligand. This sum affinity score is a measure of the average affinity between the receptor and the ligand. At the same time, the maximal affinity value is also determined.
Both sum and maximal affinities are used to test the goodness of fit between the virtual receptor and the novel ligand. The fitness of each novel ligand is rated according to the difference between the calculated values of sum a~finity and m~; m~ 1 affinity and the target values for these parameters. In the current implementation, the value:

F= ~ c~ te~ ~x ~ff;n;ty - t~rget m~x ~ff;n;tyl } +
~ 2 x target max affinity {¦c~lclll~te~ sllm ~ff;n;ty - target ~l~m ~ff;n;tyl }
2 x target sum affinity is calculated as the fitness score for each novel ligand-simulated receptor pair. FITNESS IS M~XIMAL WHEN THE FITNESS
SCORE IS ZERO. Target m~; m~ 1 affinity and target sum affinities are obtained from the regression function~
developed during the evolution of optimised virtual receptors, as described in the previous sections. The target values are obtained as follows:
target max affinity = f x maximal af~inity of the mos~
potent substrate used for virtual receptor generation target sum affinity = f x sum affinity of the most potent substrate used for virtual receptor generation where f = a scaling factor.
When more than one simulated receptor is used for the evaluation of ligand fitness, the fitness scores of each ligand-simulated receptor pair are summed.
n Ftot=~ Fi i--1 F~ calculated max affinityi - target max affinity 2 x target max affinityi calcula~ed sum affinityi - target sum a~finityi~}
2 x target sum affinityi In this case, fitness is m~;m; zed when the sum of the fitness scores is zero. In some cases it may be desirable to use only the m~;m~l affinity scores when testing a novel ligand against a panel of different simulated receptors. In this case the fitness would be given by:

F

ntOe calculated max affinityi - target max a~inityil/target max affinityi.
i=l In this case, fitness is also m~;m~ zed when the sum of the fitness scores is zero. Other methods, for example the use of a geometric mean, could also be used to measure the total fitness of a ligand tested against a series o~ simulated receptors.
Use of both the m~; mA 1 affinity values and sum a~finity values obt~ln~ for each simulated receptor ensures that the selectivity of the virtual receptors is implicated in the evaluation of ligand fitness. In this way, the fitness of the ligand reflects not only the affinity of the ligand but also satisfaction of the steric requirements of the virtual receptor that are the basis of selectivity.

6a) Th~ Optimi~ti nn P~O
Objective To evolve a novel ligand that has selected target affinities for a set o~ simulated receptors. A highly efficient mechanism ~or finding solutions is re~uired, since the total number o~ possible genotypes containiny 25 instructions is 25625.
P-~oc~
(1) PHAS~ 1. Generate a set of random genotypes coding for ligands and screen against a set of simulated receptors to select ligands exceeding a threshold level o~
fitness.
~ 2) PHASE 2. The selected genotype is used as the basis for ~urther optimization using genetic algorithm (recombination) and unidirectional mutation techniques.
Mhtate selected genotype to generate a breeding population o~
distinct but related genotypes for reco-m-bination~
(3) Choose most selective mutants from population from population ~or recombination.
(4) PHASE 3. Generate new genotypes by recombination of selective mutants. Select from the resulting genotypes those with the highest affinity fitness. Use this subpopulation for the next recombinant (repeat PHASE 3) or mutation (repeat PHASE 4) generation.
(5) PHASE 4. Ta~e best recombination products and apply repeated point mutations to enhance selectivity.
(6) The optimization process is completed when ligands of desired fitness are generated.
p~.~ T ~2VO~ i on-~n~~A tinn of p~; ~ CO~1D
The objective of the first stage in the optimization process is to generate a genotype and corresponding ligand phenotype with a m; n;m~l level of fitness. This genotype is subsequently used to generate a population of related genotypes.

The Genetic Algorithm developed by Holland can be used to search ~or optimal solutions to a variety of problems.
Normally this technique is applied using large, initially random sets o~ solutions. In the present implementation the - technique is signi~icantly modi~ied in order to reduce the number of tests and iterations required to ~ind ligands with high selective a~inity. This has been accomplished by using a set of closely related genotypes as the initial population and the application o~ high rates of mutation at each iteration. For any set o~ target compounds it is possible to develop distinct ligands with optimal a~inity characteristics. For example, receptors may bind optimally to the same targets but in di~ferent orientations. The use o~ an initial population o~ closely related genotypes increases the likelihood that the optimization process is converging on a single solution. Recom~ination o~ unrelated genotypes, although it may generate novel genotypes o~ increased ~itness, is more likely to result in divergence.
p~Z~:R ~ T. l ~AnA 2~1-A t~i ~n The objective of the second phase o~ the evolutionary process is the generation o~ a population of distinct but related genotypes derived ~rom the primary genotype. Members of this population are subsequently used to generate recombinants. This breeding population is created by multiple mutation o~ the primary genotype. The resulting genotypes are translated and screened ~or selectivity. The most selective products are retained ~or recombination.
Ligands are subjected to mutation by changing characters in the genotypes (code vectors) encoding their structures. These mutations change the shape o~ the ligand, as well as ~unctional group placement and ~unctional group types present on the ligand. M~tations in the current implementation can alter the number o~ carbons comprising the ligand skeleton. M~du}e 11 is a ~lowchart o~ a sample process ~or multiple point mutation.

W O 97136252 PCT/CAg6100166 Mutations can alter the folding pattern of the ligan~
phenotype, with resulting changes in shape and the loc~tion or exposure of functional groups. Mutations that affect the configuration of peripheral regions of the ligand phenotype can result in shi~ts in position relative to the receptor center.
Nel-trAl M~ ti~n~
All mutations alter the structure of the phenotype, however, not all mutations result in changes in the functionality o~ the ligand. Such neutral mutations may alter components of the ligand that do not af~ect af~inity. In so~e cases these neutral mutations can combine with subse~uent mutations to exert a synergistic a~fect.
n~e M~ AtlnnR
Sequence mutations do not change code characters directly. Instead the se~uence o~ characters in the code is rearranged. Sequence mutations can alter the size of the ligand, the structural configuration and presence and location of functional groups. Four types of sequence mutation are used in the current implementation:
a) DELETION: A sequence of characters is removed from the code.
AB~EA ~ ABEA
b) INVERSION: The order of characters comprising a sequence within the code is reversed.
AB~nEA i ABDCEA
c) DUPLICATION: A sequence of characters comprising part of the code is repeated.
AB~pEA ~ ABCDCDEA
d) INSERTION: A sequence of characters is inserted into the code.
A~DEA ~ ABCD~CEA
Mutations are applied in combination in the current implementation. M~dule 12 provides a flowchart o~ a sample se~uence mutation.

W 097/36252 PCT/CA96/00166 pUA.~R 3 G~n~r~tion O~ Recor~tn~nt C~
During recombinatiGn, randomly chosen, complementary sections are exchanged between selected genotypes. The objective of recombination is the generation of novel genotypes with increased fitness. ~ecombination facilitates the conservation of genotype fragments that are essential for phenotypic fitness, while at the same time introducing novel combinations of instructions In general, recombination coupled with selection results in rapid optimization of selectivity. M~du}e 13 provides a flowchart for a sample procedure for recombination.
The current impl~mPnt~tion retains the population used for recom~ination for testing. This ensures that genotypes with high selectivity are not replaced by genotypes with lower fitness. In the current implementation, multiple mutations are applied to 50~ of the recombinant genotypes prior to testing.
This process increases the variability within the recombinant population. The test populations used in the current imp~ementation range in size from 10 to 40 genotypes. This is a relatively small population size. Under some conditions, larger populations may be required.
P~:~ 4 T. ~ ~An~ ratiOn ~ ~y a88; ve ~cz9~1~ nn l--~hn ~
The final stage in the optimization process mimics the maturation of antibodies in the m~mm~lian immune system. A
series of single point mutations are applied to the genotype, and the effect on phenotypic fitness is evaluated. Unlike recombination, this process generally results in only small incremental changes to the selectivity of the phenotype. The maturation process uses a Rechenberg (1+1) evolutionary strategy. At each generation the fitness of the parental genotype is compared to that of its mutation product, and the genotype with the greater selectivity is ret~tnP~ for the next generation. As a result, this process is strictly unidirectional, since less selective mutants do not replace their parents. During each iteration of the maturation process, only a single instruction in the code is changed in the present implementation.
If a parent and its mutation product have the same selectivity, the parent is replaced by its product in the next generation. This method results in the accumulation of neutral mutations that may have synergistic effects with subsequent mutations. This convention is arbitrary. Mbdule 14 provides a flowchart for a sample maturation process.
If recombination or maturation do not generate improved selectivity after repeated iterations, it may be necessary to repeat multiple mutations (PHASE 2) in order to increase the variability of the breeding population genome~

~CAMPT.I;~: OF T.Tt~ ~TTON
0~ ~3W
The mosquito Aedes aegypti is repelled by benzaldehyde and, to a much smaller degree, by benzene and toluene (Table 1). This species is not repelled significantly by cycloh~n~
or h~ne (Table 1). In the following test of novel ligand generation, the method is used to generate, ab initio, compounds that will be similar in repellent activity to benzaldehyde. In the first step of ligand generation, simulated receptors were constructed with high affinity for benzaldehyde and low affinity for benzene. In the second step, ligands are evolved with binding a~finities for the simulated receptors similar to that of benzaldehyde.
M~to ~n~es Mosquitoes were lab-reared, 7-14 days post-emergence and unfed. Experiments were conducted over six day periods at 20~C under fluorescent lighting. Tests were run between 12:00 and 17:00 EDT. The test populations in the four sets of trials consisted of 200, 175, 105 and 95 females. Mosquitoes were provided with drinking water.

The tests were conducted in a 35 x 35 x 35 cm clear Plexiglas box with two screened sides forming opposite walls.
The screening consisted o~ two layers: an inner layer of coarse plastic mesh and an outer layer of fine nylon mesh. The box was placed in a fumehood such that air entered one of the screened sides and exited through the opposite side. Air ~low was ~0.5cm/s.
The mosquitoes landed on the walls of the box, oriented head upwards. Triangular pieces (4 x 4 x 1 mm) of Whatman #1 filter paper were used to present the stimulant compounds. The tips were dipped into the test solution to a depth of 0.5 cm and used immediately. Responses to the test solutions were determined as follows:
1. A stationary female resting on the interior screen of the upwind wall was selected for testing.
2. The treated filter paper tip was placed against the outside of the screen and positioned opposite the mesothoracic tarsus of the mosquito. In all cases the initial approach was made from below the position of the mosquito.
3. The tip was held in position for a mA~;mnm of 3 s and the response of the mosquito was noted.
The procedure was then repeated for a new individual.
Mos~uitoes were tested only once each day with each compound.
Tips were used for five tests each (total duration of use < 30 s), then replaced. Compounds were tested in random order, and each compound was tested twice on separate days. Two sets of controls were conducted using untreated (dry) filter paper tips and tips moistened with distilled water positioned in the same mAnner as treated tips. Tests of these controls were interspersed regularly among tests of the repellent compounds.
Responses to the controls did not vary during the course of the experiment (p~0.25).
Four beha~ioral responses were recorded:
1. No response: the mosquito remained motionless.
2. Take-off: the mos~uito flew away from its resting site.

CA 02247391 l998-08-27 3. Ipsilateral leg lifting: the mosquito raised t~e mesothoracic leg on the same side a6 the stimulus source.
4. Contralateral leg li~ting: the mosquito ralsed the mesothoracic leg on the opposite side from the stimulus source.
Ipsilateral leg li~ting was frequently ~ollowed by take-off, in which case both behaviours were recorded.
Polyethylene gloves were worn during testing an during a]l phases of compound preparation.
TAhle ~1. Mn~uito re ponRes to ~electe~ vo~t~e ~n~D~~' Compound Boiling N ~Flight ~Leg ~ifting Relative Point (~C)~e~ponse Respon~e Repellency~
Benzaldehyde 178 130 90 10 178 Benzene 80 72 72 12.5 68 Toluene110 166 67 27 94 Cycl~h~YAn~ 81 80 6 0 4.9 Hexane 69 100 4 0 2.8 Control (blank) - 450 5 0 *Relative repellency=[(~Flight Response + ~ Leg Response) x Boiling Point~ / 100 ,~m~ te~ Receptor ~n~ T.;g~n~ Gener~t;o~
Two simulated receptors were generated using the same ~election criteria. Each receptor was used independently to generate a set of ligands.
Molecul~r A~s~mh~y 1 PHASE 1: RECEPTOR GENERATION
A receptor was evolved with selective affinity ~or benzaldehyde.
The training targets were benzene and benzaldehyde. Fifteen orientations of each target were used to calcula~e affinity values.
Results o~ the evolutionary process were:
Target Activity Level Sum Affinity M~ m7lm A~finity RPn~n~ 1 . 0 6.87 2.21 Benzaldehyde 5.9 75.87 13.02 The a~inity score ~or the receptor was 0.992 Code ~or the Optimized 25 x 6 x 7 Benzaldehyde Receptor:

CA 0224739l l998-08-27 -p~.~ 7 T.T~ANn ~R ~ ~TT0N
The optimized simulated receptor was used as a template for the evolution of novel ligands. Four different ligands were assembled by random mutation and selection.
Ligands were selected for similarity with benzaldehyde.
The affinity values for the ligands were:
Benzaldehyde Ligand 1.1 Ligand 1.2 Ligand 1.3 Ligand 1.4 CgHl7Cl2oH C8HlsCl C8H,~Cl (=O) Cl3Hl60H (=O) Sum A~finity 75.87 74.03 67.88 72.25 72.94 Max .
Affinity 13.02 12.82 15.14 12.58 11.2 Evolved ligands 1.1 to 1.4 are shown in Figure 4b. At least one orientation of each ligand was structurally similar to benzaldehyde.
M~l ecl~l Ar A~semhly ~
PEASE 1: RECEPTOR GENERATION
A 25 x 6 x 7 receptor was evolved with selective affinity for benzaldehyde. The training targets were benzene and benzaldehyde. Fifteen orientatians of each target were used to calculate affinity values.
Results of the evolutionary process were:
Target Activity Level Sum Af~inity M~ m A~inity Benzene 1.0 25.88 8.53 Benzaldehyde 5.8 162.23 42.74 The affinity score for the receptor was 0.996 The code for the receptor was:

~ 323030313214 002321144010 000243013133 CA 0224739l l998-08-27 PHASE 2: LIGAND GENERATION
The optimized simulated receptor was used as a template for the evolution of novel ligands. Four di~ferent ligands were assembled by random mutation and selection.
Ligands were selected ~or similarity with benzaldehyde.
The affinity values for the ligands were:
Benzaldehyde Ligand 2.1 Ligand 2.2 Ligand 2.3 Ligand 2.4 C8HI3Cl ( =O) CgHlsCl ~ =O) C6HloCN ( =O) CgHl3 ( =0) ~.
S13m A~finity 162.23 182.4 166.5 159.7 156.8 Max.
Affinity 42.74 48.97 43.0 39.0 46.5 Fitness Score 0.135 0.02 0.05 o,O~

Evolved ligands 2.1 to 2.4 are shown in Figure 4c. At least one orientation of each ligand was structurally similar to benzaldehyde.
Compounds 2.1 and 2.4 are substituted cycloh~non~
derivatives. Ligand 2.2 is 5-Chloro-2, 7-nonadione and ligand 2.3 is 2-cyano-5-hexanone. Ligand 1.4 contains a fragment corresponding in structure to methyl cyclohexyl ketone.
Experiments testing the repellency of cycloh~non~/ menthone, methyl cyclohexyl ketone and 2-octanone (see Figure 4a) ~uggest that these ligands will also be repellent to mosquitoes (Table E2).

TA~ e ~. M~R~-~ to resDnn~e~ to selecte~ vol~tile C~mD~' Compound Boiling N ~ Flight ~ Leg Li~ting Relative Point ( C) Re~ponse Respon~e Repellency~
Benzaldehyde 178 130 90 10 178 2-Octanone 173 80 82 12.5 162 2-Acetylcyclo-h~Y~n~ne 225 100 54 24 175 Cycl~h~Y~n~n~ 156 134 99 1 >= 154 Menthone 207 110 72 11 172 Control (blank) -450 5 0 * Relative repellency=[(~ Flight Response+~ Leg Response) x Boiling Point] / 100 The method disclosed herein of designing new chemical structures exhibiting preselected ~unctional characteristics or properties has been described by e~Ample only. For example, the method may be readily practise using other known or acceptable values for polarizabilities, dipole moments, covalent radii and the like In addition, the flowcharts giving process calculation steps in the modules are meant to be illustrative only. For example, the calculation of affinity may be carried out using available computational packages using fewer approximations than used herein. The method o~
generating new chemical structures has relied upon first generating one or more simulated receptors exhibiting a preselected affinity for known target compounds with similar functional characteristics and using these receptors to generate the novel structures exhibiting these characteristics to whatever degree is desired. The receptors themselves may be used for other applications besides generating novel chemical structures, for example as a means o~ screening for ph~rm~ceutical or toxicological properties of known compounds.
Thus, it will be appreciated by those skilled in the art that numerous variations of the method disclosed herein may be made without departing from the scope o~ the invention.

TABLE T: Tran~ition stateç~ and addition ~actors Old Addition factors New State for Turn =
State ~x~y ~z Right Up Left Down.

5 o 0 1 9 1 16 23

7 o 0 -1 13 21 8 3

8 0 -1 0 7 9 24 14

9 -1 0 0 17 10 5 8

10 0 1 0 6 14 22 9

11 0 -1 0 22 16 6 12

12 -1 0 0 18 11 4 13

13 0 1 0 24 12 7 16

14 1 0 0 4 8 18 10

15 0 -1 0 3 17 2 18

16 1 0 0 5 13 17 11

17 0 0 -1 16 19 9 15

18 0 0 1 14 15 12 19

19 0 1 0 20 18 21 ~.7

20 1 0 0 23 24 19 6

21 -1 0 0 19 22 23 7

22 0 0 1 10 3 11 21

23 0 -1 0 21 5 20 4

24 0 0 1 8 2 13 20 Formula for algorithm: Input(old state, turn) ~output(Ax,~y,~z, new state) Example: Initial position (12, 34, -18); Input: old state=10, turn=right:
Output: new state=6, ~x=0, ~y=1, ~z=0; Subsequent position (12, 35,-18) TART.T ~ VAn ~ W~ls ~A~ i ~lement H F O N C Cl S Br P
Van der Waals:llO 140 150 150 170 180 180 190 190 200 Radiu~ (pm) Relative Radiu~ 0.5 0.64 0.68 0.68 0.77 0.82 0.82 0.86 0.86 0.9 (H 0.5) Based on: N.S. Issacs, lg87. Physical Organic Chemistry.
Longman Scientific and Technical, New York. 828 pp.

T~RT.~. 3 COVA1 ent B~n~ RAt1~i i (V~-L
Ron~ Or~er First H
B C N O F
First 88 77 70 66 64 Second 66.5 60 55 Third 60.2 55 Aromatic 70 Si P S Cl First 117 110 104 99 Br First 114 Based on values in N.S. Issacs (1987).

TART~ 4: ~II2le e~fect~ve ~le ~ n ~ o~ ~h~ye ~;te A ~ ment8.
Bond Atom Dipole Value ~Debye) C-~ H +0.35 or +0.084*
C no charge as~igned ArC-H H +0.6 C -0.366 or no charge assigned =C-H H +0.336 C -0.6 or no charge assigned~
C=O O -2.7 ~ C no charge assigned* or ll.35 C-o-C O -0.8 C-OH H +l.5 or +l.7 O -1 .1 C-NH2 H +l.3 N -l.3 C-NO2 O -2.0 N +4.0 C=N N -3.7 C no charge assigned C-S-C S in thiophene or dimethyl sulphide +1.5* (may be negatively charged in c~nte~ts) C-N=C N in pyridine or CH3-N=CH2 +l.~ or +l.3 Ar-F or C=C-F F -1.3 C-F F -1.8 Ar-Cl or C=C-Cl C -l 7 C-Cl Cl -2.l Ar-Br or C=C-Br Br -l.7 C-Br Br -2.0 C-I I -2.0 ~Preferred uner most conditions Fach target atom is described fully by a ~et o~ eight values ~xi, Yi, Zi~
ri, bri, c~ } where ix ,iY andiz are the positional coordinates relative to the geometric center of the molecule, ri=the van der Waals radius, bri=the bond or covalent radius, cri=the collision radius (=ri+0.5), =the polarizability, and di=the e~ective dipole moment value.

_ T~RT-~ 5: Select~ t~ve ~ffective po~ 7Ah;l;ti~ For ect~ T~get Atom~
Atom Context Relative Polarizability (~i) H C-H 1.0 H N-H 1.1 H O-H 1.1 H S-H 3.0*
F C-F 1.5*
Cl C-Cl 4.0 Br C-Br 5.8 I C-I 8.9*
C C-CH3 3.7 C C-CHz-C 3.5 C C-CC2-H 3.2 C C=CH2 4.5 C C=CH-C 4.3 C C=CC2 4.0 C C-C-H 4.9*
C C--C-C 4.6*
C Arene ring 4.3* or 2.6 (based on benzene (delocalized electron cloud)) C C-C-N 4.0 C C3-C-0- 3.6 C C2H-C-O- 3.8 C CH2-C-O- 4.1 C H3-C-O- 4.4 C C2-C=O 3.6 C CH-C=O 3.8 C C2-C-N ?
C CH-C=N ?
C C3-C-N 3.1 C C2H-C-N 3.3 C CH2-C-N 3.6 C H3-C-N 3.8 O C-O-H 2.1 O C=O 2.1 O C-O-C 1.8 O NO2 1.9*
N C-NH2 3.1 N C-NH-C 2.8*
N C-NC2 2. 5*
N C-NO2 4.6* (may be larger in small molecules) N C--N 3.2 S C=S 7.7 S C-S-C ?
S C-S-H 5.0 * By calculation from molecular polarizabilities.
? Values can be determined from appropriate molecular data~

W O 97/362~2 PCT/CA96/00166 ~DDInLE 1: CODE ~-~r~TION ~OR SI~ T~n K~ lGKS
Step 1 Input code generation parameters: i) code length;
and ii) instruction frequency.
Step 2 Initialize empty character string to store code.
Step 3 Generate random number.
Step 4 Based on random number and instruction frequency, select a character {/O~ 6'} to concatenate to code string.
Repeat Step 4 until string length equals preset code length.
Step 5 Output code.

MOD~LE 2: CODE TRANSLATION FOR SI~WhATED K~lORS
Step 1 Input origin coordinates for polymers comprising receptor.
Step 2 Input code for polymer.
Step 3 Read first character from code.
Step 4 I~ character is a turning instruction, use translation algorithm to det~rm;ne subunit coordinates otherwise step 7.
Step 5 Store subunit coordinates. Assign a charge value of 0 to subunit Step 6 If character is not the last character in code, repeat step 3 otherwise step.
Step 7 If character is a charge instruction, use translation algorithm to determine subunit coordinates assuming no turn.
Step 8 Store subunit coordinates. Assign charge value o~
~1 or -1 to subunit based on character.
Step 9 If character is not the last character in code, repeat step 3 otherwise step.
Step 10 Repeat steps 2 to 9 for each of the polymers comprising the receptor.
- Step 11 Output coordinates and charge values o~ subunits.

, 63 MOD~L~ 3: TARGET PRFSENTATION
Step 1 Input coordinates and radii o~ target atoms (xti, ytl, zti,radiusi) (i=number o~ atoms in target) Input coordinates of receptor (xrj,yrj,zri,chargej) (j=number o~ subunits in receptor) Step 2 Generate random angular (~ )and translation values (kX,ky).
Step 3 Rotate and translate atomic coordinates by random.
amounts.
Step 3a Convert target coordinates to polar ~orm (xti,yti,zti,radius~ pi~radiusi) Step 3b Add random changes to angles (~ pi,radiusi) (ei+~ i+~,Pi,radiusi) Step 3c Convert to rectangular coordinates ~ 9,~i+~,Pi~radiusi) ~ (xi,yi,zi,radiusi) Step 3d Add random translation (xni,yni,zni,radiusi)=
(Xi+kx~yi+ ky,zi,radiusi) Step 4 Center target coordinates on origin (0,0,0).
Step 4a Find maximum and m;n~mnm values of xni, yni and zni .
Step 4b Find geometric center o~ receptor xncenter= (xn Yi ~ xnminimum) /2 ~ Yncenter = (Ynmaximum ~ Ynminimum) /2 ~ zncenter =
( Znmaximum ~ Znminimum) /2 Step 4c Calculate centered coordinates:
(xncj,yncj, zncj) = (xni -- xncenter~
yni - yncenter~ zn~ - ~n _y;
Step 5 Use atomic radii and trans~ormed coordinates (xnci, ynci, znci, radiusi) to construct collision sur~ace o~ target g(xg,yg) =zg Step 5a Create a grid with spacing equal to the diameter o~ the receptor subunits (=1).
Coordinates o~ grid:

xg~{Int(xnminimum-xncenter),Int (xnminimum~
xncenter)+~ o~Int (~n ,~; ~xncenter) -1,Int(xnm~ximum-Xncenter)}
yg~{Int (ynmi nimum ~ Yncen~ er ) , Int(Ynminimum-yncenter)+l~o~Int (yn ~ , ~Yncenter ) -1 Int(ynmaximum-yncenter)}
Set the initial values of g(xg,yg) to O at all points on the grid Step 5b For each atom (i) set the g(xg,yg) (height) value of each grid point (xg,y9) according to the ~ollowing rule:
For i=1 to number of atoms in target If (xnci-xp) 2+ (ynci+yp)2<radiusi~ then g(xg,yq) mtn;mllm (g(Xg~yg) ~ znci - radiusi) Else If (xnci-xp)~+(ynci~yp)2c(radiusi+.5) 2 then g(Xg,yg)= m;n;mllm (g(Xg~yq)~znci-(radiusi/2)) Else g(Xg~yg) = m;n;mllm (g(Xg~yg) ~ O) Next i Step 6 Center receptor coordinates on origin (0,0).
Step 6a Find m~;mllm and m;n~mllm values of xrj,yrj and zr3.
Step 6b Find geometric center of receptor:
xrcenter = (xrmaximum- xrminimum)/
Yrcenter= (yrmay~imum- yrminimum)/
zrcenter= (zrmaximum - zrminimum)/2 Step 6c Calculate centered receptor coordinates:
(xcj,yc3,zcj ) = (Xr; ~xcenter~ yr; Ycenter~
zrj ~ Zminimum) Step 7 Construct collision surface of receptor s(x5~yS)=zS
using the centered receptor coordinates according to the ~ollowing rule:
Set all initial values of s(xc;, ycj) to 0.
- for ~=1 to the number of subunits in receptor i~ zc; > s(xc;, ycj) then s(xc;, ycj)=zc3 next j Step 8 Find m;n;m;~l separation between collision surface of receptor and collision surface of the target.
Calculate difference matrix d(xglyg) as ~ollows f or all Xg~ { Int (xnmini ~um~xncenter ), Int (xnminimum~
xncenter) ~1 . . . O, . . Int (xnmaximum-xncenter) -1, Int (xnm~ mum-xncenter ) }
and yg ~{Int (ynminimum-ycenter), Int (ynminimum-Yncent~r) ~1 . . . O, . ~ Int (Ynmaximum-yncenter) -1, Int (yn ~yj ,, -Yncenter ) }
calculate d(xglyg) =(h(xglyg) -znminimum+~n yi ) +
(s (xg,yg) +zrminimum--zrmaximum) For all xgly9 find the m~n;m~31 value of d(xgly~
dmin -dmin is the m~n;mi:ll separation distance.
Step 9 Transform target and receptor coordinates for collision con~iguration For the receptor:
(xreceptorj, yreceptorj, zreceptorj ) = (xc~, ycj, zcj + zrminimum -- zrmz,ximum ) For the target: (xtargeti,ytargeti, ztargeti) (xnci, ynci, znCi --znminimum + 7:n Y; in) -Step 10 Use (xtargeti,ytargeti, ztargeti) and (xreceptorj,yreceptor~, zreceptorj) for affinity cal cul at ions .
Repeat Steps 2-9 for each target configuration tested .

~DIlT E 4: A~1N1~ r CA~C~ATION
Step 1 Input collision coordinates o~ target and receptor (xtargeti, ytargeti, ztargeti ) and (xreceptorj, yreceptor1, zreceptorj ) where i=number of atoms in target, j=number o~ subunits in receptor Step 2 Input dipole moment values for target dip(i) Input charge values for receptor charge(i) Step 3 Input threshold value for proximity calculation:
T~RESHOLD
Step 4 Calculate dipole affinity value Step 4a For each charged subunit (charge(j) $0) calculate e(i,j) = dip(i)/((xtargeti - xreceptorj) 2+ (ytarget - yreceptorj) 2+ (ztargeti - zreceptorj ) 2~1-5 Step 4b Calculate the sum of e(i,j) for all combinations o~ i and j with charge(j) ~0.
DIPOLE=~ e(i,j) Step 5 Calculate proximity value (this step could be replaced by a calculation based on polariza~ility) Step 5a For each target atom with ¦dip(j)¦ s 0.75 Calculate l(i,j)=((xtargeti-xreceptorj) 2+
(ytargeti-yreceptorj) 2+ (ztargeti-zreceptorc; ) 2) 0-5 I~ l(i,j) < THRESHOLD then prox(i,j) = 1 Step Sb Calculate the sum of prox(i,j) for all combinations of i and j with Idip(j)I s 0.75 PROXIMITY=~ prox(i,j) Step 6 Calculate a~finity value for target substrate combination = AFFINITY
AFFINITY= (PROXIMITY/j)((PROXIMITY/10000)+DIPOLE) M~DU~E 5: GOODNESS OF FIT C~T~ TION
Step 1 Input known target efficacy or a~finity values (Yk)~ k=number of targets tested Step 2 Input collision coordinates of targets and receptor (xtargeti,ytargeti,ztarget.) and (xreceptorj,yreceptorj,zreceptorcj) ik = number of atoms in target k j = number of subunits in receptor Step 3 Input number of target orientations to be tested (=m).

~tep 4 Use Module 5 to obtain a~finity values ~or each target and target orientation (=AFFINITYk,m).~tep 5 Determine maximum affinity (M~k) and sum affinity (SAk) values for each target.~tep 6 Calculate correlation coefficients r~2 for m~;ml7m affinity (MAk) VS known target efficacy or affinity values (Yk) and rSA2 for sum af~inity (SAk) vs known target efficacy or affinity values (Yk)~tep 7 Calculate fitness coefficient F
F=( rMA2 x rSA2 ) ~ ~ 5 Alternate Step 6' Calculate correlation coefficients r~2 for maximum affinity (M~k) VS known target efficacy or affinity values (Yk) and rSA ~2 for sum affinity (SAk) - m~;m~l affinity vs known target efficacy or affinity values (Yk)-Step 7' Calculate fitness coefficient FF=( rM~2 x (l-rS~ MA2) ) ~'S

M~D~LE 6: G~ ~E ~G ~ WITE MINIMA~ LEVEL OF Arr~ Y
Step 1 Set m; n;m~l fitness threshold Step 2 Generate random genotype (Module 1) Step 3 Translate genotype to construct phenotype (Module 2) Step 4 Test a~finity of phenotype for targets ~Modules 3, 4, 5, 6) Step 5 I~ the fitness of the phenotype exceeds the fitness threshold then discontinue code generation and pass code to phase 2. Otherwise repeat steps 1-5.

~OD~LE 7: ~0JLTIPLE ~nrTATION
Step 1 Input primary code (~rom phase 1).~tep 2 Set number (=q)o~ mutations per code (Current implementation mutates 2.5-5~ of characters in genotype).~tep 3 Input population size (= p~.~tep 4 Select a position in the genotype at random.~tep 5 Replace the code character at that position with a different character chosen at random.~tep 6 Repeat steps 4 and 5 until q times.
Step 7 Repeat steps 4-6 to generate a total o~ p new~odes.~tep 8 Apply Modules 1-6 to test fitness o~ mutant population. Select subpopulation with highest selectivity for use in Phase 3.

M~DULE 8: ~FCOI~INATION~tep 1 Set population size (= P).~tep 2 Select two codes at random from population generated by Phase 2.~tep 3 Select a position in the genotype at random.~tep 4 Generate a random number for the number of characters to exchange.~tep 5 Swap characters between codes beginning at selected position.~tep 6 Repeat steps 2-5 until P new genotypes have been generated.
Step 7 Apply Modules 2-6 to test ~itness of mutant population. Select subpopulation with highest selectivity for next recombination series or ~or Phase 4 maturation.

OD~LE 9~ T~U~TION
Step 1 Input parental code derived from Phase 3.
Step 2 Set number of iterations.
Step 3 Select a position in the parental genotype at random.
Step 4 Replace the code character at that position with a different character chosen at random.
Step 5 Test selectivity of parental code (Fp) and mutation product tFM) using Modules 2-6.
Step 6 If FM 2 FP replace parental genotype with mutation product.
Step 7 Repeat steps 3-6 for required number of iterations.

MODIILE 10: CR~3ATION OF CODE G ~ ~TING ~i~R~)l;r S~:LETO~S WIT9 RINGS (6 Member Rings, Entry point = Exit Point) Step 1. Set length of code Set vl, v2, v3, ...vn (frequencies of substituent groups).
Set prob_ring (fre~uency of ring code sequence).
(O ~ prob_ring < 1) Step 2. Initialize prime_code = "".
Initialize second_code = "".
Initialize third_code = "".

Step 3 Create character strings.
Repeat step 4 until code length is obt~ n~ .

_ Step 4a. If prob_ring ~ random (O c random < 1) Then Assignment of characters for ring (boat convention).
Set new_character_1 to r~n~oml y selected member of {~431413~, ~314134~, ~141343~, ~132132~, ~321321~, '213213', '123123', '231231', '312312', '421412', '214124', '141242', '324234', '242343', '423432'}

Assignment of characters for substituents.
Set new_character_2 to six randomly selected members of {cl, c2, c3,..., cn ~
using frequencies vl, v2, v3, ...vn. (cl..cn are characters specifying different functional groups) Assignment of characters for substituent valences.
Set new_character_3 to six r~n~mly selected members of {~ 2/, ~3~,~4~}.
Else Step 4b. Assignment of single (non-ring) characters for primary code.
Set new_character_1 to a r~n~omly selected mem~er of {~ 2~, ~3~, ~4~ }.
Assignment of characters for substituents.
Set new_character_2 to a randomly selected member of {cl, c2, ..., cn}
using frequencies vl, v2,...vn.
Assignment of characters for substituent valences.
Set new_character_3 to a randomly selected member of {'1','2','3','4'}
Step 4c. Concatenate new characters to code strings Prime_code = Prime_code & new_character_1 Second_code = Second_code & new_character_2 Third_code = Third_code & new_character_3 DmL~ ~ nnLTIPLE POINT ~nJTATION
Step 1 Input primary code Step 2 Set number (= q) of mutations per code (Current implementation mutates 2.5-5~ of characters in genotype) Step 3 Input population size (= p).
Step 4 Select a position in the genotype at random.
Step 5 Replace the code characters at that position in each o~ the code vectors with dif~erent characters chosen at random.
Step 6 Repeat steps 4 and 5 until q times.
Step 7 Repeat steps 4-6 to generate a total of p new codes.
Step 8 Test the ~itness of each member o~ the mutant population. Select subpopulation with highest fitness ~or use in recombination or additional multiple mutation.

MOD~LE 12: ~QU~N'C~ M~TATIONS
Step 1 Set PDE:LJ PII~IVI PIN~ and P~UF as threshold levels for the occurrence o~ mutations ( 0 ' Px ' 1).
Step 2 Generate a random position (= x) in the code (0 <
p ~ Length o~ code).
Step 3 Generate random length o~ sequence ( = L) (O < L
~ Length of code - x).
Step 4 Copy sequence ~rom code starting at x and extending for a total of L characters.
Step 5 I:E 0 < PINV ~ R~nflnm Nurnber < 1 Then Reverse the order o~ the characters in the string.
Step 6 I:E 0 < PDUP < ~i~n~om Number < 1 Then Copy the sequence and concatenate copy to sequence.
Step 7 If 0 ~ PDEL < Random Num~er < 1 Then Eliminate L characters ~rom the code starting at position x Else Replace sequence in code with sequence generated in steps 5 and 6.
Step 8 I:E 0 < PINS < Random Number ~ 1 Then Generate a position (= y) at random in code (0 y < Length of code) - lnsert se~uence generated by steps 5 and 6 at position y.

~D~LE 13: RECOMBINATION
Step 1 Set population size (= P) Step 2 Select two codes at random ~rom population generated by multiple mutation.
Step 3 Select a position in the genotype at random.
Step 4 Generate a random number ~or the number o~
characters to ~rh~nge.
Step 5 Swap characters between each o~ the three code vectors beginning at selected position.
Step 6 Repeat steps 2-5 until P new genotypes have been generated.
Step 7 Test the fitness of each ligand in the resulting mutant population. Select subpopulation with highest ~itness ~or next recombination series or for maturation.

DmLE 145 ~TURU~TION
Step 1 Input parental code derived from recombination.
Step 2 Set number of iteration~
Step 3 Select a position in the parental genotype at random.
Step 4 Replace the code characters at those positions in each of the code vectors with a dif~erent characters chosen at random.
Step 5 Test fitness of parental code ~Fp) and mutation product (FM) using Modules 4 and 5.
Step 6 If FM 2 Fp replace parental genotype with mutation product Step 7 Repeat steps 3-6 for required number of iterations.
-

Claims

WHAT IS CLAIMED IS:

1. A method running on a computer for designing chemical structures having a preselected functional characteristic, comprising the steps of:
(a) producing a physical model of a simulated receptor phenotype encoded in a linear character sequence, and providing a set of target molecules sharing at least one quantifiable functional characteristic;
(b) for each target molecule;
(i) calculating an affinity between the receptor and the target molecule in each of a plurality of orientations using an effective affinity calculation;
(ii) calculating a sum affinity by summing the calculated affinities;
(iii) identifying a maximal affinity;
(c) using the calculated sum and maximal affinities to:
(i) calculate a maximal affinity correlation coefficient between the maximal affinities and the quantifiable functional characteristic;
(ii) calculate a sum affinity correlation coefficient between the sum affinities and the quantifiable functional characteristic;
(d) using the maximal correlation coefficient and sum correlation coefficient to calculate a fitness coefficient;
(e) altering the structure of the receptor and repeating steps (b) through (d) until a population of receptors having a preselected fitness coefficient are obtained;
(f) providing a physical model of a chemical structure encoded in a molecular linear character sequence, calculating an affinity between the chemical structure and each receptor in a plurality of orientations using said effective affinity calculation, using the calculated affinities to calculate an affinity fitness score;
(g) altering the chemical structure to produce a variant of the chemical structure and repeating step (f);
and (h) retaining and further altering those variants of the chemical structure whose affinity score approaches a preselected affinity score.

2. The method according to claim 1 wherein the linear character sequence encoding for said receptor phenotype is produced by generating a receptor linear character sequence which codes for spatial occupancy and charge, and wherein the step of producing a physical model of a chemical structure comprises generating said molecular linear character sequence which codes for spatial occupancy and charge.

3. The method according to claim 2 wherein said effective affinity calculation comprises two measures, the first being a proximity measure wherein the proportion of uncharged portions on said simulated receptors being sufficiently close to non-polar regions on said molecular structure to generate effective London dispersion forces is estimated, and the second being the summed strengths of charge-dipole electrostatic force interactions generated between charged portions of said simulated receptor and dipoles present in said molecular structure.

4. The method according to claim 2 wherein said step of calculating the affinity fitness score includes calculating a sum and maximal affinity between the molecular structure and each receptor, the fitness score being calculated as:

~ {~calculated maximal affinity - target maximal affinity~ /
target maximal affinity}and wherein said preselected fitness score is substantially zero.

5. The method according to claim 2 wherein said step of calculating the affinity fitness score includes calculating a sum and maximal affinity between the molecular structure and each receptor, the fitness score being calculated as:
~ {(~calculated maximal affinity-target maximal affinity~ /
2 x target maximal affinity) + (~calculated sum affinity-target sum affinity~/2 x target sum affinity)}, and wherein said preselected fitness score is substantially zero.

6. The method according to claim 2 wherein said sum affinity correlation coefficient is r SA 2, said maximal affinity correlation coefficient is r MA 2, and wherein said fitness coefficient is F=(r MA 2 x r SA 2) 0.5, and wherein said preselected fitness coefficient is substantially unity.

7. The method according to claim 2 wherein said sum affinity correlation coefficient is r SA-MA 2, said maximal affinity correlation coefficient is r MA 2, and wherein said fitness coefficient is F=(r MA 2 x (1-r SA 2)) 0.5, and wherein said preselected fitness coefficient is substantially unity

8. The method according to claim 2 wherein said molecular linear character sequences comprise a plurality of sequential character triplets, a first character of said triplet being randomly selected from a first character set specifying position and identity of an occupying atom in a molecular skeleton of said molecular structure, a second character of said triplet being randomly selected from a second character set specifying the identity of a substituent group attached to said occupying atom, and a third character of said triplet being randomly selected from a third character set specifying the location of said substituent on the atom specified by said first character of the triplet.

9. The method according to claim 8 wherein the molecular linear character sequence is decoded using an effective molecular assembly algorithm which sequentially translates each triplet from said molecular linear sequence and thereafter fills unfilled positions on said molecular skeleton with hydrogen atoms.

10. The method according to claim 9 wherein the step of altering said molecular structure includes at least one of the following steps: i) mutating said molecular genotype by randomly interchanging at least one of said first, second and third characters of at least one triplet from the associated character sets, ii) deletion wherein a triplet from molecular genotype is deleted, iii) duplication wherein a triplet in the molecular genotype is duplicated, iv) inversion wherein the sequential order of one or more triplets in the molecular genotype is reversed, and v) insertion wherein a triplet from the molecular genotype is inserted at a different position in the molecular genotype.

11. The method according to claim 10 wherein the step of mutating said molecular genotypes includes recombining randomly selected pairs of said retained mutated molecular genotypes whereby corresponding characters in said molecular linear sequences are interchanged.

12. The method according to claim 2 wherein each character in the receptor linear character sequence specifies one of either a spatial turning instruction and a charged site with no turn.

13. The method according to claim 12 wherein said receptor phenotype comprises at least one linear polymer provided with a plurality of subunits, one of said subunits being a first subunit in said at least one linear polymer.

14. The method according to claim 13 wherein said receptor linear character sequence is decoded using an effective receptor assembly algorithm in which turning instructions applied to each subunit subsequent to said first subunit are made relative to an initial position of said first subunit.

15. The method according to claim 14 wherein said characters specifying spatial turning instructions code for no turn, right turn, left turn, up turn, down turn, and wherein characters specifying charge sites code for positively charged site with no turn, and negatively charged site with no turn.

16. The method according to claim 14 wherein said subunits are substantially spherical having a Van der Waals radii substantially equal to the Van der Waals radius or hydrogen.

17. The method according to claim 15 wherein the step of altering said receptor genotype includes at least one of the following steps: i) deletion wherein a character from the receptor genotype is deleted, ii) duplication wherein a character in the receptor genotype is duplicated, iii) inversion wherein the sequential order of one or more characters in the receptor genotype is reversed, and iv) insertion wherein a character from the receptor genotype is inserted at a different position in the genotype.

18. The method according to claim 17 wherein the step of mutating said receptor genotypes includes recombining randomly selected pairs of said retained mutated receptor genotypes whereby corresponding characters in said receptor linear sequences are interchanged.

19. A method running on a computer for screening chemical structures for preselected functional characteristics, comprising:
a) producing a simulated receptor genotype by generating a receptor linear character sequence which codes for spatial occupancy and charge;
b) decoding the genotype to produce a receptor phenotype, providing at least one target molecule exhibiting a selected functional characteristic, calculating an affinity between the receptor and each target molecule in a plurality of orientations using an effective affinity calculation, calculating a sum and maximal affinity between each target molecule and receptor, calculating a sum affinity correlation coefficient for sum affinity versus said functional characteristic of the target molecule and a maximal affinity correlation coefficient for maximal affinity versus said functional characteristic, and calculating a fitness coefficient dependent on said sum and maximal affinity correlation coefficients;
c) mutating the receptor genotype and repeating step b) and retaining and mutating those receptors exhibiting increased fitness coefficients until a population of receptors with preselected fitness coefficients are obtained; thereafter d) calculating an affinity between a chemical structure being screened and each receptor in a plurality of orientations using said effective affinity calculation, calculating an affinity fitness score which includes calculating a sum and maximal affinity between the compound and each receptor and comparing at least one of said sum and maximal affinity to the sum and maximal affinities between said at least one target and said population of receptors whereby said comparison is indicative of the level of functional activity of said chemical structure relative to said at least one target molecule.

20. The method according to claim 19 wherein said effective affinity calculation comprises two measures, the first being a proximity measure wherein a proportion of uncharged portions on said simulated receptors being sufficiently close to non-polar regions on said molecular structure to generate effective London dispersion forces is estimated, and the second being the summed strengths of charge-dipole electrostatic force interactions generated between charged portions of said simulated receptor and dipoles present in said molecular structure.

21. The method according to claim 20 wherein the fitness score is calculated as .SIGMA. (~calculated maximal affinity - target maximal affinityl~/target maximal affinity}.

22. The method according to claim 20 wherein the fitness score is calculated as:
.SIGMA. {(~calculated maximal affinity-target maximal affinity~ /
2 x target maximal affinity) + (~calculated sum affinity-target sum affinity~ / 2 x target sum affinity)}.

23. The method according to claim 20 wherein said sum affinity correlation coefficient is r SA 2, said maximal affinity correlation coefficient is r MA 2, and wherein said fitness coefficient is F=(r MA 2 x r SA 2)0.5, and wherein said preselected fitness coefficient is substantially unity.

24. The method according to claim 20 wherein said sum affinity correlation coefficient is r SA 2, said maximal affinity correlation coefficient is r MA 2, and wherein said fitness coefficient is F=(r MA 2x (1- rSA-MA 2)) 0.5, and wherein said preselected fitness coefficient is substantially unity.

25. The method according to claim 20 wherein each character in the receptor linear character sequence specifies one of either a spatial turning instruction and a charged site with no turn.

26. The method according to claim 25 wherein said receptor phenotype comprises at least one linear polymer provided with a plurality of subunits, one of said subunits being a first subunit in said at least one linear polymer.

27. The method according to claim 26 wherein said receptor linear character sequence is decoded using an effective receptor assembly algorithm in which turning instructions applied to each subunit subsequent to said first subunit are made relative to an initial position of said first subunit.

28. The method according to claim 27 wherein said characters specifying spatial turning instructions code for no turn, right turn, left turn, up turn, down turn, and wherein characters specifying charge sites code for positively charged site with no turn, and negatively charged site with no turn.

29. The method according to claim 28 wherein said subunits are substantially spherical having a van der Waals radii substantially equal to the van der Waals radius of hydrogen.

30. The method according to claim 27 wherein the step of mutating said receptor genotype includes at least one of the following steps: i) deletion wherein a character from the receptor genotype is deleted, ii) duplication wherein a character in the receptor genotype is duplicated, iii) inversion wherein the sequential order of one or more characters in the receptor genotype is reversed, and iv) insertion wherein a character from the receptor genotype is inserted at a different position in the genotype.

31. The method according to claim 30 wherein the step of mutating said receptor genotypes includes recombining randomly selected pairs of said retained mutated receptor genotypes whereby corresponding characters in said receptor linear sequences are interchanged.

32. A method running on a computer for designing simulated receptors mimicking biological receptors exhibiting selective affinity for compounds with similar functional characteristics, comprising the steps of:
a) producing a simulated receptor genotype by generating a receptor linear character sequence which codes for spatial occupancy and charge;
b) decoding the genotype to produce a receptor phenotype, providing a set of target molecules sharing similar functional characteristics, calculating an affinity between the receptor and each target molecule in a plurality of orientations using an effective affinity calculation, calculating a sum and maximal affinity between each target molecule and receptor, calculating a sum affinity correlation coefficient for sum affinity versus a functional characteristic for each target molecule and a maximal affinity correlation coefficient for maximal affinity versus said functional characteristic for each target molecule, and calculating a fitness coefficient dependent on said sum and maximal affinity correlation coefficients for each target molecule; and c) mutating the genotype and repeating step b) and retaining and mutating those receptors exhibiting increased fitness coefficients until a population of receptors with preselected fitness coefficients are obtained.

33. The method according to claim 32 wherein each character in the receptor linear character sequence specifies one of either a spatial turning instruction and a charged site with no turn.

34. The method according to claim 33 wherein said receptor phenotype comprises a plurality of linear polymers provided with a plurality of subunits, each linear polymer being coded for by a corresponding linear character sequence, one of said subunits being a first subunit in said at least one linear polymer.

35. The method according to claim 34 wherein said receptor linear character sequence is decoded using an effective receptor assembly algorithm in which turning instructions applied to each subunit subsequent to said first subunit are made relative to an initial position of said first subunit.

36. The method according to claim 35 wherein said characters specifying spatial turning instructions code for no turn, right turn, left turn, up turn, down turn, and wherein characters specifying charge sites code for positively charged site with no turn, and negatively charged site with no turn.

37. The method according to claim 36 wherein said subunits are substantially spherical having a Van der Waals radii substantially equal to the Van der Waals radius of hydrogen.

38. The method according to claim 35 wherein the step of mutating said receptor genotype includes at least one of the following steps: i) deletion wherein a character from the receptor genotype is deleted, ii) duplication wherein a character in the receptor genotype is duplicated, iii) inversion wherein the sequential order of one or more characters in the receptor genotype is reversed, and iv) insertion wherein a character from the receptor genotype is inserted at a different position in the genotype.

39. The method according to claim 38 wherein the step of mutating said receptor genotypes includes recombining randomly selected pairs of said retained mutated receptor genotypes whereby corresponding characters in said receptor linear sequences are interchanged.

40. The method according to claim 33 wherein said effective affinity calculation comprises two measures, the first being a proximity measure wherein a proportion or uncharged portions on said simulated receptors being sufficiently close to non-polar regions on said molecular structure to generate effective London dispersion forces is estimated, and the second being the summed strengths of charge-dipole electrostatic force interactions generated between charged portions of said simulated receptor and dipoles present in said molecular structure.

41. The method according to claim 40 wherein said sum affinity correlation coefficient is r SA 2, said maximal affinity correlation coefficient is r MA 2, and wherein said fitness coefficient is F=(r SA 2 x r MA 2) 0.5, and wherein said preselected fitness coefficient is substantially unity.

42. The method according to claim 40 wherein said sum affinity correlation coefficient is r SA-MA 2, said maximal affinity correlation coefficient is r MA 2, and wherein said fitness coefficient is F=(r MA 2 x (1-r SA-MA 2)) 0.5, and wherein said preselected fitness coefficient is substantially unity.

43. A method running on a computer for designing chemical structures having a preselected functional characteristic, comprising the steps of:
(a) providing a physical model of a receptor and a set of target molecules, the target molecules sharing at least one quantifiable functional characteristic;
(b) for each target molecule;
(i) calculating an affinity between the receptor and the target molecule in each of a plurality of orientations using an effective affinity calculation;
(ii) calculating a sum affinity by summing the calculated affinities;
(iii) identifying a maximal affinity;
(c) using the calculated ~~ and maximal affinities to:
(i) calculate a maximal affinity correlation coefficient between the maximal affinities and the quantifiable functional characteristic;
(ii) calculate a sum affinity correlation coefficient between the sum affinities and the quantifiable functional characteristic;
(d) using the maximal correlation coefficient and sum correlation coefficient to calculate a fitness coefficient;
(e) altering the structure of the receptor and repeating steps (b) through (d) until a population of receptors having a preselected fitness coefficient are obtained;

(f) providing a physical model of a chemical structure, calculating an affinity between the chemical structure and each receptor in a plurality of orientations using said effective affinity calculation, using calculated affinities to calculate an affinity fitness score;
(g) altering the chemical strucutre to produce a variant of the chemical structure and repeating step (f);
and (h) retaining and further altering those variants of the chemical structure whose affinity score approaches a preselected affinity score.

44. The method according to claim 43 wehrein the step of providing a physical model of a receptor comprises generating a receptor linear character sequence which codes for spatial occupancy and charge, and wherein the step of producing a physical model of a chemical structure comprises generating a linear character sequence which codes for spatial occupancy and charge.

45. The method according to claim 44 wherein said linear character sequences for said chemical structure comprises a plurality of sequential character triplets, a first character of said triplet being randomly selected from a first character set specifying position and identity of an occupying atom in a molecular skeleton of said chemical structure, a second character of said triplet being randomly selected from a second character set specifying the identity of a substituent group attached to said occupying atom, and a third character of said triplet being randomly selected from a third character set specifying the location of said substituent on the atom specified by said first character of the triplet.

46. The method according to claim 45 wherein the chemical structure linear character sequence is decoded using an effective molecular assembly algorithm which sequentially translates each triplet from said molecular linear sequence and thereafter fills unfilled positions on said molecular skeleton with hydrogen atoms.

47. A method of encoding a chemical structure comprising atomic elements, the method comprising providing a linear character sequence which codes for spatial occupancy, relative atomic position, bond type and charge for each atom to define a unique three dimensional conformation of said chemical structure.

48. The method according to claim 47 wherein said linear character sequence for said chemical structure comprises a plurality of sequential character triplets, a first character of said triplet being selected from a first character set specifying position and identity of an occupying atom in a molecular skeleton of said chemical structure, a second character of said triplet being selected from a second character set specifying the identity of a substituent group attached to said occupying atom, and a third character of said triplet being selected from a third character set specifying the location of said substituent on the atom specified by said first character of the triplet.

49. The method according to claim 45 wherein the linear character sequence is decoded using an effective molecular assembly algorithm which sequentially translates each triplet from said linear character sequence and thereafter fills unfilled positions on said molecular skeleton with preselected atoms.

50. The method according to claim 49 including the step of storing said linear character sequence in a storage means accessible by a computer

51. The method according to claim 19 wherein said functional characteristic is biological toxicity.

52. The method according to claim 19 wherein said functional characteristic is catalytic activity.