WO1999012118A1 - Systeme de tamisage de composes - Google Patents

Systeme de tamisage de composes Download PDF

Info

Publication number
WO1999012118A1
WO1999012118A1 PCT/AU1998/000715 AU9800715W WO9912118A1 WO 1999012118 A1 WO1999012118 A1 WO 1999012118A1 AU 9800715 W AU9800715 W AU 9800715W WO 9912118 A1 WO9912118 A1 WO 9912118A1
Authority
WO
WIPO (PCT)
Prior art keywords
virtual
compounds
molecular
receptor
representation
Prior art date
Application number
PCT/AU1998/000715
Other languages
English (en)
Inventor
David Alan Winkler
Frank Robert Burden
Original Assignee
Commonwealth Scientific And Industrial Research Organisation
Monash University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AUPO8921A external-priority patent/AUPO892197A0/en
Priority claimed from AUPP1192A external-priority patent/AUPP119297A0/en
Application filed by Commonwealth Scientific And Industrial Research Organisation, Monash University filed Critical Commonwealth Scientific And Industrial Research Organisation
Priority to AU89644/98A priority Critical patent/AU8964498A/en
Priority to EP98941143A priority patent/EP1010094A4/fr
Publication of WO1999012118A1 publication Critical patent/WO1999012118A1/fr

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01JCHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
    • B01J19/00Chemical, physical or physico-chemical processes in general; Their relevant apparatus
    • B01J19/0046Sequential or parallel reactions, e.g. for the synthesis of polypeptides or polynucleotides; Apparatus and devices for combinatorial chemistry or for making molecular arrays
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/60In silico combinatorial chemistry
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/60In silico combinatorial chemistry
    • G16C20/64Screening of libraries
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01JCHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
    • B01J2219/00Chemical, physical or physico-chemical processes in general; Their relevant apparatus
    • B01J2219/00274Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
    • B01J2219/00583Features relative to the processes being carried out
    • B01J2219/00601High-pressure processes
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01JCHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
    • B01J2219/00Chemical, physical or physico-chemical processes in general; Their relevant apparatus
    • B01J2219/00274Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
    • B01J2219/0068Means for controlling the apparatus of the process
    • B01J2219/007Simulation or vitual synthesis
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01JCHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
    • B01J2219/00Chemical, physical or physico-chemical processes in general; Their relevant apparatus
    • B01J2219/00274Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
    • B01J2219/0068Means for controlling the apparatus of the process
    • B01J2219/00702Processes involving means for analysing and characterising the products
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01JCHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
    • B01J2219/00Chemical, physical or physico-chemical processes in general; Their relevant apparatus
    • B01J2219/00274Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
    • B01J2219/00718Type of compounds synthesised
    • B01J2219/0072Organic compounds

Definitions

  • a first aspect of invention relates to the virtual screening of molecular representations, and in particular the invention is directed to the ability to evaluate the theoretical activity of molecules in various fields, such as, but not limited to, chemistry, agriculture (e.g. crop protection chemicals, growth modifiers), pharmacology (e.g. human and veterinary pharmaceuticals, toxicological profiles, diagnostic reagents) and the physical, physicochemical, and in particular biological activity of chemical compounds in general.
  • chemistry e.g. crop protection chemicals, growth modifiers
  • pharmacology e.g. human and veterinary pharmaceuticals, toxicological profiles, diagnostic reagents
  • the physical, physicochemical, and in particular biological activity of chemical compounds in general such as, but not limited to, chemistry, agriculture (e.g. crop protection chemicals, growth modifiers), pharmacology (e.g. human and veterinary pharmaceuticals, toxicological profiles, diagnostic reagents) and the physical, physicochemical, and in particular biological activity of chemical compounds in general.
  • a further aspect of invention relates to refining the screening process in order to accentuate evaluation of likely active structures.
  • Still a further aspect relates to a method of mutating structures for evaluation by the screening system.
  • Still a further aspect relates to a fitness function which is used to assist in the evaluation of likely active structures.
  • Other aspects are also disclosed.
  • BACKGROUND OF THE INVENTION The determination of the biological activity of chemical compounds is a continuing endeavour of research institutions and chemical companies, particularly due to its implications in the development of new drugs and other therapeutic remedies to treat or cure specific diseases.
  • Biological activity of a compound is generally accepted as being the consequence of the fit of chemical compound into a receptor site involved in the particular biological process in a manner that the process is altered in some desirable way, e.g. either accentuated or inhibited.
  • Lead compound which is a substance which exhibits a useful biological activity.
  • Lead compounds are often obtained from natural sources or by synthesis of new chemical structures.
  • SAR Structure Activity Relationship
  • Quantitative Structure-Activity Relationships One way of determining the theoretically most highly active compounds is to use one of the various regression techniques to map molecular structure to activity, where the physicochemical properties are used to represent structure.
  • This QSAR mapping allows determination of the values of the optimum physicochemical properties of the data set, and thus the structure of the most active compounds, may be determined.
  • an analytical technique is used, such as multiple linear regression (MLR).
  • MLR multiple linear regression
  • PCT/CA96/00166, PCT/IB94/00257 and US 5,699,268 disclose inventions related to drug-receptor interactions.
  • the embodiment of these simulated receptors is in a three dimensional, molecular level form. Therefore certain properties of the molecule as a whole are difficult, if at all possible, to ascertain.
  • PCT/IB94/00257 discloses a method of calculating the free energy of binding of molecules to receptors whose three dimensional structures have been determined by other means.
  • US 5,699,268 discloses methods of generating computer simulated receptors using genetic evolution.
  • US 5,434,796 also discloses a computer simulated system for genetically evolving a population of molecules towards higher biological activity.
  • the disclosure mainly relates to the way in which the generation of molecules for screening evolves.
  • the disclosure revolves around the use of SMILES (Simplified Molecular Input LineEntry System) strings, which is described in "SMILES, a chemical language and information system. I. Introduction to methodology and encoding rules", D.Weininger, J. Chem. Inf. Comput. Sci., 28, 31 (1988).
  • SMILES strings are lexical forms of molecular objects which are randomly mutated. However, the mutation rules are somewhat limited in that many types of chemically-important molecular modification are not readily accessible.
  • the genetically evolved lead generation system draws on work done by several groups which was aimed at generation of the large novel chemical databases referred to above (virtual combinatorial libraries).
  • Nilikantan, R, Bauman, N., Venkataraghavan, R.A. J.Chem. Inf. Comput. Sci. (1991) 31 , 527-30 developed a method of random structure generation based on the random fusion of 2D chemical fragments. More recently, Clark, D.E., Firth, M.A., Murray, CW. J Chem. Inf. Comput. Sci. (1996) 36, 137-145 used graph theoretical techniques for vertex degree set generation and constructive enumeration of molecular graphs to generate 3D databases for drug design.
  • the present application relates to a number of aspects, including: 1. finding relationships between molecular structure and useful properties of molecules, more particularly using a virtual or mathematical analogue or model of a biological receptor or active site (a "virtual receptor") or other biological activity, such as toxicity ;
  • BRANN Bayesian regularised artificial neural network
  • the database may be real or virtual, may apply to existing or hypothetical molecules or compounds;
  • This aspect provides a method of creating a virtual receptor capable of being used to scan a range of compounds and providing a measure indicative of whether the compounds are likely to exhibit a particular characteristic, including the steps of: compiling a data set of compounds which exhibit the known characteristic; forming a conceptual structure/activity model with a given architecture; converting the data set into a representation readable by the conceptual model; • . training the conceptual model on at least a portion of the converted data set in order to improve the architecture of the conceptual model.
  • the data input to the virtual receptor is a molecular representation of the compounds which include the entire molecule and embody relevant properties such as steric, electronic and lipophilic properties.
  • a preferred output of a virtual receptor that may be determined is the binding affinity of the compounds or other biological activity.
  • a further aspect is based on the use of a mathematical concept called an artificial neural net to derive a virtual receptor.
  • Artificial neural networks are mathematical models, and thus it has been found that they can be used in respect of scanning compounds and training virtual receptors.
  • an evolutionary neural network may be used.
  • the virtual receptor may be rendered in a number of forms.
  • the rendering is in a mathematical form.
  • One form may be by the atomistic approach, which classifies each atom according to its element and the number of connections.
  • the compounds may be represented in terms of simple molecular structural parameters, such as constituent atoms or functional groups.
  • An advantage that stems from the inventive method using an atomistic representation is that it allows compounds to be screened with no more knowledge than is provided by counting molecular fragments. Many other molecular representations however are possible, such as depicting the molecules based on their optimal physicochemical properties (see example 2 below).
  • topological indices, Burden's chemically intuitive molecular index (CIMI), and/or molecular hologram representation of Tripos Assoc. may be used as compound descriptors. Additional novel representations which form additional aspects of the invention are exemplified in the sections following.
  • one inventive concept involves the creation of a virtual receptor by training the receptor using compounds with known properties. Once a virtual receptor has been created based on a particular molecular or mathematical representation of the compounds, all future compounds that are used as input to that receptor must also be represented in the particular molecular or mathematical representation used in the training of the receptor.
  • This aspect provides a method of generating a virtual receptor by use of models which exhibit stability or compensate for noise.
  • One such model is a Bayesian regularised artificial neural network (BRANN).
  • Another model is Maximum Entropy Method (MEM).
  • Bayesian regularisation MacKay, 1992
  • MacKay 1992
  • the present aspect may also be used to screen databases or chemical libraries of real, synthesised compounds derived using the concept of combinatorial chemistry.
  • Screening Process is predicated on the discovery that by creating a "virtual receptor” first, and then using this virtual receptor to screen compound libraries.it is possible to test, in a "virtual” environment, the compatibility of each compound being screened to the virtual receptor.
  • this aspect provides a method of screening a range of compounds, including:
  • a preferred measure that may be determined is the binding affinity of the compounds or other biological activity.
  • a given compound contains certain structural features (i.e. conforms to a pharmacophore) there is a high likelihood of the compound having a particular biological activity. Due to the screening being done in a "virtual" environment, the need to synthesise a large number of compounds is avoided. The number of compounds synthesised is reduced to those predicted as being suitable in the "virtual" environment, and which also have a higher likelihood of being verified in the real world.
  • the virtual receptor is continually modified, in order to improve its prediction abilities, based on compounds located in database scans that have proved to in fact exhibit the characteristics sought.
  • this "virtual environment” is a neural network in a computer environment.
  • Hardware implementations of neural nets are also possible (and may be preferable once a virtual receptor of a given type is defined and large databases are to be screened).
  • a mutation operator determines that, with some low probability, a portion of the new individuals will have some of their bits flipped.
  • a crossover operation two individuals are chosen from the population using a selection operator.
  • This aspect provides using mutation and cross-over strategies as applied to SMILES strings, in order to modify the behaviour of the SMILES string as applied to a compound screening system.
  • a virtual receptor is dependent on the quality of the molecular representation used to develop it.
  • the quality of the virtual receptor is also dependent on the quality of the training data and possibly on the architecture of the neural net.
  • the numerical representation of the compound being analysed adequately represents the steric, electronic and lipophilic properties of the whole molecule.
  • MMM molecular multipoie moment
  • the further aspect of the invention is an additional type of molecular representation. It involves the generation of useful molecular descriptors from eigenvalues of adjacency, or modified adjacency matrices in which the diagonal elements are values relating to steric, electrostatic or lipophilic properties of the constituents atoms of the compounds. In a preferred embodiment it is envisaged that eigenvalues of three matrices (one each of steric, electrostatic, and lipophilic-related properties) would be generated.
  • the steric diagonal elements of the adjacency, or modified adjacency matrices could be the Vander Waals radii of the atoms; the electrostatic diagonal matrix elements could be the atom charges derived from empirical or molecular orbital calculations and; the lipophilic diagonal matrix elements could be the atomistic lipophilicities referred to in the section above on molecular multipoie moments.
  • Figure 1 shows a set of data used in an example
  • Figure 2 illustrates an example size of training, validation sets and number of networks generated
  • Figure 3A illustrates an example measure of the predictive ability of a network
  • Figure 3B illustrates the B5 representation
  • Figure 3C illustrates a summary of the A1 representation
  • Figure 4A illustrates a sample output from a 23:2:1 neural network using the B2 representation as input
  • Figure 4B illustrates a sample output from an 11 :4:1 neural network using the B3 representation as input
  • Figure 4C illustrates an sample output of 11 :4:1 neural network using A1 representation as input
  • Figure 5 shows an optimal architecture
  • Figure 6 shows results for example 3
  • Figure 7 shows a sample output from a 21 :8:5:3:1 network
  • Figure 8 shows a comparison of neural network and MLR
  • Figure 10 shows an example flowchart of a genetically-evolved lead generation system as disclosed in accordance with the further disclosed 'fitness function' invention
  • Figure 11 illustrates a summary of the genetic algorithm
  • Figure 12 illustrates an example mutation operator
  • Figure 13 illustrates an example cross-over operator
  • Figure 14 illustrates an overall concept flowchart for virtual receptor generation.
  • Figure 15 illustrates a virtual screening flowchart showing use of virtual receptor to predict properties of library members, library can be real or virtual.
  • Figure 16 illustrates a genetically evolved chemical library overview flowchart.
  • Figure 17 illustrates a genetically-evolved chemical library detailed flowchart showing role of fitness functions and specific examples of smiles mutation.
  • Figure 18 illustrates a flowchart of improved multipoie moment molecular representation generation.
  • Figure 19 illustrates a flowchart for generation of improved eigenvalue indices as molecular representations.
  • Figures 20 illustrates Muscarinic virtual receptor training, observed versus calculated scaled log (activity) for training set (examples).
  • Virtual analogue of a receptor "virtual receptor"
  • a method of screening a range of compounds which includes (a) creating a virtual or mathematical analogue of a biological receptor or active site (a "virtual receptor”) and
  • a preferred measure that may be determined is the binding affinity of the compounds or other biological activity.
  • This method of creating a virtual receptor may also be used to scan a range of compounds and provide a measure indicative of whether the compounds are likely to exhibit a particular characteristic in which it includes the steps of: compiling a data set of compounds which exhibit the known characteristic; 13/1 forming a conceptual structure/activity model with a given architecture; converting the data set into a representation readable by the conceptual model; training the conceptual model on at least a portion of the converted data set in order to improve the architecture of the conceptual model.
  • the data input to the virtual receptor is a molecular representation of the compounds which consider the entire molecule and embody relevant properties such as steric, electronic and lipophilic properties.
  • relevant properties such as steric, electronic and lipophilic properties.
  • a preferred output of a virtual receptor that may be determined is the binding affinity of the compounds or other biological activity.
  • Artificial Neural Networks Virtual receptors can be generated by a number of different methods, many of which rely essentially on regression in one form or another. A particularly useful way of deriving a virtual receptor is to use a mathematical concept called an artificial neural net. Artificial neural networks (ANNs) provide an improved platform from which to predict the behaviour of molecules. Several advantages in using neural networks are that they are fast, they do not rely on subjective judgements as to the form of the functional relationships between structure and activity to be provided, and they process numerous parameters simultaneously. In addition, they are robust and capable of producing reasonable results even when the data is noisy . The prime advantage of using neural networks over other known methods, however, lies in their ability to internally process complex non-linear relationships.
  • ANNs are mathematical models, based loosely on the way biological neural networks process information.
  • ANNs consist of layers of artificial neurones (or neurodes). Each neurode has numerous inputs (x1 ,x2,%) each of which is modified by a weight (w1 ,w2,). These inputs are summed on entry to the neurode. This net input is then modified by an internal transfer function.
  • the output of the internal transfer function forms the output of the neurode, which is either passed on as the input for other neurodes or as an output carrying a result.
  • ANNs can take many forms, such as single layer, multi-layer, feed forward and lateral connectivity.
  • the layers of neurodes may be fully or partially connected.
  • a full connection is where the output of a neurode is passed onto each neurode in the next layer, whereas in a partial connection the output is transferred only to selected neurodes.
  • An example of a three layered 4:3:1 ANN architecture is shown below:
  • the output of an ANN depends upon numerous factors, namely the nature of the neurodes' transfer functions, the architecture of the network and the weights connecting the neurodes. Of these factors, the weights connecting the neurodes are most easily altered.
  • the ANN as a whole is trained so that it is capable of recognising the important characteristics in molecules that may mean that they exhibit a 16
  • the representations of molecules, with known properties are repeatedly input to the ANN.
  • the ANN is then modified by adjusting the weights connecting the neurodes until the error between its outputs and the correct outputs is minimised.
  • the method used to adjust the weights in the process of training the ANN is called "the leaming rule" and may be supervised or unsupervised. Back propagation is an example of a supervised leaming rule.
  • Back propagation is a gradient descent algorithm.
  • the network error may be considered a function of the network weights.
  • Back propagation minimises the average squared error between the network output and the "correct answer" by moving down the gradient of this error function.
  • the network weights are altered according to the Delta Rule (also known as the Least Mean Squared Rule).
  • the output is compared with the desired result, and a proportion of this error determined is then propagated back through the network, with the network weights modified accordingly.
  • the number of neurodes in the input layer and the output layer will be determined by the number of input parameters and the number of outputs respectively. However, ascertaining the optimal number of hidden layers (the layers between the input and the output layers) and the 17
  • DOE freely rotatable bonds inthe molecule
  • the compounds may be represented in terms of simple molecular structural parameters, such as constituent atoms or functional groups.
  • An advantage that stems from the inventive method using an atomistic representation is that it allows compounds to be screened with no more knowledge than is provided by counting molecular fragments.
  • Tripos Assoc. may be used as compound descriptors. Additional novel representations which form further aspects of the invention are exemplified in the sections following. 19
  • the output generated by the virtual receptor upon screening a range of compounds would indicate which compounds have the highest likelihood of forming the basis of new lead compounds.
  • the most novel of these could also be used to synthesise biased combinatorial libraries of organic compounds for screening in pharmacological receptor assays.
  • the use of a neural network to map structure to activity results in superior models to the use of linear methods such as MLR or PLS. This reflects the presence of non-linear relationships between structural parameters and activity, and interactions between the descriptors.
  • the ability of neural networks to account for these relationships is an advantage in virtual receptor generation.
  • the inventive concept involves the creation of a virtual receptor by training the receptor using compounds with known properties. Once a virtual receptor has been created based on a particular molecular or mathematical representation of the compounds, all future compounds that are used as input to that receptor must also be represented in the particular molecular or mathematical representation used in the training of the receptor.
  • Regression is an "ill-posed" problem in statistics, which sometimes results in structure-activity models exhibiting instability when trained with noisy data.
  • Regression methods including back propagation neural nets, also face additional problems. Principal amongst these are overtraining, overfitting, and selection of the best QSAR model from a number obtained in the validation process. Overtraining results from running the neural network training for too long and results in a loss of ability of the trained net to generalise. Overtraining can be avoided by used of a validation set.
  • Cross-validation which provides a good test for the predictive capabilities of a network, also provides assistance in determining the optimal neural net architecture.
  • Cross- validation involves running a data set through a network numerous times until all data points have been in both the training and the validation sets. 20
  • MML Minimum Message Length
  • MEM Maximum Entropy Method
  • Bayesian regularised artificial neural network may be better suited to virtual receptor calculations than other regression methods. Neural network training can be regularised, a mathematical process which converts the regression into a well-behaved "well-posed" problem and overcomes model instability. Bayes theorem provides the correct language for describing the inference of a message communicated over a noisy channel. In structure-activity models the 'noise' corresponds to experimental error, poor choice of molecular representations etc. The SAR 'message' corresponds to a useful, valid structure-activity model (or virtual receptor). Where orthodox statistics provide several models with several different criteria for deciding which model is best, Bayesian statistics only offers one answer to a well-posed problem.
  • FIG. 15 Another aspect of the invention, which may be referred to as a Virtual Screening Process, and one embodiment of which is illustrated in Figure 15, is predicated on the discovery that by creating a "virtual receptor" first, and then using this virtual receptor to screen compound libraries, it is possible to test, in a "virtual" environment, the compatibility of each compound being screened to the virtual receptor. If a given compound contains certain structural features (i.e. conforms to a pharmacophore) there is a high likelihood of the compound having a particular biological activity. Due to the screening being done in a "virtual" environment, the need to synthesise a large number of compounds is avoided. The number of compounds synthesised is reduced to those predicted as being suitable in the "virtual" environment, and which also have a higher likelihood of being verified in the real world.
  • the virtual receptor is continually modified, in order to improve its prediction abilities, based on compounds located in database scans that have proved to in fact exhibit the characteristics sought.
  • a preferred form of this "virtual environment" is 22
  • neural network in a computer environment.
  • Hardware implementations of neural nets are also possible (and may be preferable once a virtual receptor of a given type is defined and large databases are to be screened). 4. Genetic evolution of structures using virtual receptors as fitness functions
  • Additional aspects of the invention include the use of virtual receptors as fitness functions, and the discovery of efficient methods of mutating chemical structures to span as much of combinatorial space as possible.
  • the aspect of the invention involving mutation strategies is discussed in the next section.
  • each structure is mutated by means of single point mutations, insertions, deletions and crossovers, to generate another population of structures for testing against the fitness function represented by a virtual receptor and possibly others such as ease of synthesis, toxicity etc.
  • Examples of library evolution are shown in Figures.10, 16 and 17.
  • the aspect considered unique to the approach is that the mutated structures together with a suitably defined fitness function and evolutionary process, such a genetic algorithm or other types of genetic programs, can be used to explore very large areas of combinatorial space and generate lead structures likely to be active at the specified receptor.
  • the algorithm starts with an initial population of these individuals.
  • the fitness of each is evaluated to determine how well it solves the problem.
  • the characteristics of each individual in the initial population are generated randomly.
  • two individuals are selected from the population. This is done so that the individuals that are more fit are more likely to be selected.
  • the two selected individuals can be considered to be "parents”.
  • two new individuals (“children") are created that are recombinations of the genes from the parents.
  • the process of creating the children is called "crossover"
  • Some combination of the parents and children are then passed to the "next generation”.
  • the selection and crossover steps are repeated until the number of individuals in the next generation is the same as that in the current generation. That is where mutation comes in.
  • a selection operator is usually used to select which member of an evolving population will be involved in crossover or other mutations. In human terms this may be analogous to selection processes which favour the most powerful male mating with the most desirable female. In this application to lead discovery selection operators choose which two or more molecules will be involved in crossover or other mutations. These operators may be: selecting the best and second best molecules for crossover; or
  • a selection operator is used to give preference to better individuals, allowing them to pass on their genes to the next generation.
  • the goodness of each individual depends on its fitness, which may be determined by an objective function or by a subjective judgement.
  • a 'global' fitness function may involve either a weighted average of some or all of component functions, or some of the fitness criteria may be applied sequentially.
  • An example of the sequential application is for all members of the evolving populations(s) may have their fitness evaluated against the chemical valence fitness function (to eliminate nonsense compounds) then be evaluated for biological activity fitness via the virtual receptor.
  • the most active molecules as determined by the virtual receptor fitness function may then be 'filtered' for toxicity or some other property.
  • fitness functions may be exemplified by some of the following types (not an exhaustive list):
  • A valence function which determines whether the structure represented by the chromosome obeys the laws of chemical bonding and valence.
  • a stability function which eliminates chemically unstable or extremely difficult to synthesise structures such as peroxides, or large numbers of chiral centres. This could be derived from a lookup table of undesirable functional groups.
  • a safety function which rates the structures represented by the chromosomes in terms of likely toxicity. For example, nitrogen mustards, alkylating agents etc would be eliminated. This could be derived from structure-activity models in a similar way to the Topkat commercial software.
  • a biological activity function This would be implemented via the virtual receptor concept as disclosed above. It is most likely implemented as a neural network model.
  • a molecular diversity function The evolutionary algorithms used in this invention have a stochastic element which ensures a degree of molecular diversity. However, another fitness function would be used which ensures that, for example, no individual in the population has a greater than 85% similarity to the others. This function may also screen out molecular redundancies.
  • the fitness function may determine whether combinatorial methods may be adapted to be used in the synthesizing compounds for screening.
  • “pharmacokinetic efficiency” fitness This is a measure of how well the molecule is transported from its site of entry to the site of action. A simple example of this may be whether a CNS active drug can penetrate the blood-brain barrier.
  • a further aspect of the invention is based on the concept of using evolutionary modification of compound structures whereby the calculated activity from the Virtual Screening Process is used as a measure of the 'fitness' of a chemical structure for performing a particular function.
  • the better, or a predetermined group of, compounds can be selected based on the 'fitness' or arrange of 'fitness' as base structures for subsequent genetic modification.
  • 'Fitness' may be considered as an assessment of a compound exhibiting survival of the fittest in a genetic algorithm.
  • Optimisation provides a'fitness function'.
  • the fitness function is used to evaluate the "fitness", or superiority of one member of a population over another by some definable criteria.
  • the fitness function is the mathematical embodiment of the criteria used to define the "fitness" of a chemical compound over another.
  • the criteria can be set according to the particular result required or outcome hoped for. Variations and additions of the inventions disclosed are possible within the general inventive concept as will be apparent to those skilled in the art. 5. Mutating structures by modifying a SMILES string Mutation Strategies
  • the mutation operator determines that, with some low probability, a portion of the new individuals will have some of their bits flipped.
  • An example is shown in Figure 12.
  • bit string There is relationship between the bit string and a molecular structure, which is usually 1 :1 (except in some cases where optical or geometric isomers are not accounted for). It may be noted that molecular structures may not literally be represented by bit strings but the same operations and logic which apply to bit strings in the general discussion of genetic algorithms will also apply to other representations of molecules. It should be possible, for example, to use the SMILES string to represent a molecule, then alter this by symbol substitution, addition, fragment insertion or deletion etc to produce evolved structures via the genetic algorithm and the fitness function. Mutation alone induces a random walk through the search space. Mutation and selection (without crossover) create a parallel, noise- tolerant, hill-climbing algorithm.
  • the crossover operation happens in an environment where the selection of who gets to mate is a function of the fitness of the individual, i.e. how good the individual is at competing in its environment.
  • Some genetic algorithms use a simple function of the fitness measure to select individuals (probabilistically) to undergo generic operations such as crossover or asexual reproduction (the propagation of genetic material unaltered). This is fitness- proportionate selection.
  • Other implementations may use a model in which certain randomly selected individuals in a subgroup compete and the fittest is selected. This is called tournament selection and is the form of selection we see in nature when stags rut to vie for the privilege of mating with a herd of hinds.
  • the two processes that are considered to most contribute to evolution are crossover and fitness based selection/reproduction. As it turns out, there 29 are mathematical proofs that indicate that the process of fitness proportionate reproduction is, in fact, near optimal in some senses.
  • the choice of which mutation operator is carried out on a given member of the chemical population can be decided randomly eg by use of a number wheel algorithm.
  • Insertion mutations involve randomly selecting a character position in the string and inserting one or more chemically parsable text strings at that position.
  • the choice of which string to insert could, for example, be chosen randomly from a large lookup table of SMILES strings. Some of the strings in the lookup table, or other selection process which derives the string to be substituted, could be contained in brackets. In this case the insertion results in a branching of the new string from the old. Strings inserted without these enclosing brackets would be incorporated into the original molecule without branching.
  • original string CCCCCC mutated string CCCSCCC chain insertion
  • each structure is mutated by means of single point mutations, insertions, deletions and crossovers, to generate another population of structures for testing against the fitness function represented by a virtual receptor and possibly others such as ease of synthesis, toxicity etc as outlined above.
  • the novelty of the approach is that the mutated structures together with a suitably defined fitness function and a genetic algorithm, can be used to explore very large areas of combinatorial space and generate lead structures likely to be active at the specified receptor.
  • the quality of a virtual receptor is dependent on the quality of the molecular representation used to develop it.
  • the quality of the virtual receptor is also dependent on the quality of the training data and possibly on the architecture of the neural net.
  • the numerical representation of the compound being analysed adequately represents the steric, electronic and lipophilic properties of the whole molecule.
  • MMM molecular multipoie moment
  • MMM descriptors relating solely to molecular shape are the three principal moments of inertia, Ix, ly, Iz.
  • the two descriptors that relate solely to charge are the magnitude of the dipole moment, p, and the magnitude of the principal quadrupole moment, Q. Descriptors that relate to shape and charge can be developed in a number of different ways.
  • One example is by calculating the magnitudes of the dipolar components, the magnitudes of the components of displacement between the centre-of-mass and centre-of-dipole with respect to the principal inertia axes to provide the descriptors px, py, pz and dx, dy, dz.
  • Quadrupolar components are calculated with respect to a translated inertial reference frame whose origin coincides with the centre-of-dipole, providing two additional descriptors Qxx and Qyy. This set of thirteen numbers is independent of the orientation and position of the molecules in three-dimensional space, (see B. D. Silverman and Daniel. E. Platt "Comparative Molecular Moment Analysis (CoMMA): 3D- QSAR without Molecular Superposition" J. Med. Chem. ,39 (1 1 ), 2129 -2140, 1996)
  • a lipophilic analogue of the steric and electrostatic multipoie expansions may be derived by ascribing atomistic lipophilic values to each type of atom found in molecules. We did this by carrying out multiple regression analysis on a series of structures with known lipophilicities
  • the further aspect of the invention is an additional type of molecular representation. It is possible to describe the topographical relationships between atoms contained in a given molecular structure by means of connectivity or adjacency matrices. In general the diagonal elements of these matrices are zero and the off diagonal elements are unity only if the two atoms represented by the location of the matrix element are connected. Useful molecular representations may be derived from the eigenvalues of a modification of these matrices as first described by Burden (J. Chem. Inf. Comput. Sci., 29, 225 (1989).
  • eigenvalues of three matrices are generated.
  • the steric diagonal elements of the adjacency, or modified adjacency matrices could be the Van der Waals radii of the atoms;
  • the electrostatic diagonal matrix elements could be the atom charges derived from empirical or molecular orbital calculations and;
  • t e lipophilic diagonal matrix elements could be the atomistic lipophilicities refereed to in the section above on molecular multipoie moments.
  • Benzodiazepine receptor BZR
  • GABAA ⁇ - aminobutyric acid receptor
  • the ANN used for this experiment had full connectivity, with the input layer of neurodes having linear transfer functions and all other layers of neurodes having sigmoidal transfer functions.
  • the following neural 20 network parameters were used:
  • Training patterns input noise Gaussian (mean : 0; standard deviation :0.02) 34
  • the network calculations were performed using a commercial software package Propagator, however any neural network package could be used.
  • the input data was scaled between 0 and 1 , as it is between these values that the sigmoidal transfer functions range.
  • Output data was also scaled appropriately.
  • the data set used was a set of 57 1 ,4-benzodiazepin-2-ones. This data set was chosen because their activity in relation to the receptor is known. The molecular representations of this data set that were employed are shown in
  • Figure 1 while the size of training and validation sets and the number of networks generated during cross-validation is shown in Figure 2.
  • initial representations B1 and B2 which were based heavily on an atomistic approach, provided position information - for example, separate input parameters were provided for C4 atoms at positions 7, 1 and 3.
  • the representation comprised 25 input variables.
  • representation B2 the number of input parameters were slightly reduced by treating the halogens as being of the same element - "Hal”.
  • no positional information was provided - the neural network would not be told whether a C4 atom was attached to position 7, 1 or 3.
  • the representation B4 differs from B3 in that it does not distinguish between the halogens.
  • SEP Standard Error of Prediction
  • the SEPval (which provides a me 1asure of the predictive ability of the network) obtained from the two architectures used is shown in Figure 3A.
  • a sample output from a 23:2:1 neural network using the B2 representation as input is shown in Figure 4A, and the sample output from an 11 :4:1 neural network using the B3 representation is shown in Figure 4B.
  • MLR Multiple Linear Regression
  • MLR identified four linearly significant variables - C4, N3,
  • Figure 1 As A1. Due to this representation not being positionally dependent, the number of input parameters is much lower than the positionally dependent representations B1 and B2. Consequently, greater freedom is afforded in the architectures that can be devised.
  • the results for the A1 representation are summarised in Figure 3C, whilst Figure 4C shows a typical output using the representation.
  • MLR Multiple linear regression
  • a data set was compiled from the literature consisting of 321 compounds. These were broken up into two sets: 21 compounds would form the basis of training and validation sets. Training sets consisted of 270 compounds, validation sets consisted of 30 compounds. Thus, cross validation involved the generation of 10 training and validation set pairs. The neural network produced in each case was tested using the test set.
  • the representation used was based on the atomistic approach described previously. However, input parameters relating to the number and type of rings were added, thus affording the neural network some insight into the molecules topology. Twenty one input variables were used to represent each molecule: C(aromatic), C4, C3, C2, N(aromatic), N3, N2, N1 , 02, 01 , S, P, Cl, F, Br, I, 7-membered rings, 6-membered rings, 5-membered rings, 4- membered rings, 3-membered rings. An example of the representations is shown below:
  • IC50 being the binding affinity, which often corresponds to biological activity
  • plC50 value this work modelled log 1/IC50
  • MLR was performed on the data set twice - the first (MLR1 ) used only first-order terms, whilst the second (MLR2) used first and second order terms (but no cross-terms). MLR was employed on a "training set” of 270 compounds, then the resulting equation was tested on a validation set of 30 and a test set of 21. The results are compared with the neural network results on exactly the same data sets in Figure 8.
  • a portion of a chemical structure database was screened and the biological activities of the members predicted.
  • the database chosen was the first 7800 compounds in the Maybridge chemical database. While this database contains known, commercially available molecules, not hypothetical structures generated by techniques such as DBMaker, it serves to illustrate the screening procedure equally well.
  • the 7800 structures were converted into an atomistic representation similar to that outlined above.
  • the representations were presented as input to a trained neural network representing a benzodiazepine receptor. Training was disabled so that the weights were fixed and the virtual receptor model generated 7800 outputs representing predicted log biological responses for 38
  • Example 5 The Maybridge column refers to the compound ID in the Maybridge database. The results of screening the benzodiazepine data set in the virtual receptor are also included.
  • Example 5 The results of screening the benzodiazepine data set in the virtual receptor are also included.
  • Example 1-3 We carried out an analogous study to that in Example 1-3 to derive a Muscarinic Virtual Receptor from the analysis of a data set of 161 compounds which act upon the muscarinic receptor.
  • Compounds capable of binding to this receptor are currently the subject of intense research, due to the believe that memory related problems in Alzheimer's disease could be treated using agonists at this receptor.
  • the IC50 values sued in the analysis are the concentrations required to displace [ 3 H]Oxotremorine-M (OXO-M), an agonist at the M1 muscarinic receptor.
  • the training sets contained 151 compounds, whilst the test set contained 10 compounds.
  • An example of observed versus calculated scaled log (activity) for training set (examples) is shown in figure 20.
  • NCI National Cancer Institute
  • the ANN'S used were three layer fully connected, feed forward networks which were trained using a Levenberg-Marquardt[Marquardt, D.W. J.Soc.lnd.Appl.Math. 11 ,431-441 , (1963)] optimised back-propagation algorithm which incorporated Bayesian regularisation[MacKay,D.J.C. A Practical Bayesian Framework for Backprop Networks, Neural Computation, 4, 415-447,(1992)].
  • Bayesian regularisation removes the need to supply a validation set since it minimises a linear combination of squared errors and weights. It also modifies the linear combination so that at the end of training the resulting network has good generalisation qualities.
  • the network architecture made use of 3 hidden nodes which proved to be more than sufficient in all cases with the Bayesian regularisation method estimating the number of effective parameters.
  • the concerns about overfitting and overtraining are also removed by this method so that the production of a definitive and reproducible model is attained.
  • the standard error of predictions (SEPs) and correlation coefficients, using the various representations, are shown in Table 14.
  • SEPs standard error of predictions
  • Table 14 A number of fully-connected ANN architectures, containing different numbers of hidden layers and nodes, were tested and a single hidden layer with 3 nodes was found to be optimal in each case.
  • the number of effective parameters was always considerably less than the number of weights implied by the network architecture.
  • the data set compounds were scrambled to remove any inadvertent ordering effects such as by the magnitude of the biological activity.
  • a K-means hierarchical clustering was carried out on the input variables and one compound from each cluster, at the 11 cluster level, was extracted for a test set. This test set, of 11 compounds, was not the same for 43
  • Nl Number of independent variables.
  • NPC Number of Principal Components used.
  • NPar Number of effective parameters.
  • peff Number of input variables/NPar (c) Randic [5] indices.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Medicinal Chemistry (AREA)
  • Library & Information Science (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Organic Chemistry (AREA)
  • Biochemistry (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention concerne différents aspects d'un système de tamisage d'une structure moléculaire et/ou d'un composé. Le procédé consiste à établir des relations entre une structure moléculaire et des propriétés utiles des molécules, notamment l'utilisation d'un analogue mathématique ou virtuel ou d'un modèle de récepteur biologique ou d'un site actif (un récepteur virtuel) ou d'une autre activité biologique telle qu'une toxicité; à créer un récepteur virtuel à l'aide de principes de longueur de message minimum ou de procédé d'entropie maximum (MEM) tels que l'application d'un réseau neuronal artificiel régularisé de Bayesian (BRANN); à utiliser un récepteur virtuel pour tamiser la base de données, celle-ci pouvant être réelle ou virtuelle et pouvant s'appliquer à des molécules ou des composés hypothétiques ou existants. La présente invention concerne également l'utilisation de récepteurs virtuels comme fonctions d'aptitude, un procédé de mutation de structures en modifiant une représentation de la chaîne SMILES, une représentation améliorée de moment moléculaire multipole, un index amélioré de valeur propre moléculaire. La présente invention concerne, en outre, d'autres aspects.
PCT/AU1998/000715 1997-09-03 1998-09-03 Systeme de tamisage de composes WO1999012118A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU89644/98A AU8964498A (en) 1997-09-03 1998-09-03 Compound screening system
EP98941143A EP1010094A4 (fr) 1997-09-03 1998-09-03 Systeme de tamisage de composes

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
AUPO8921A AUPO892197A0 (en) 1997-09-03 1997-09-03 Compound screening system
AUPO8921 1997-09-03
AUPP1192 1997-12-31
AUPP1192A AUPP119297A0 (en) 1997-12-31 1997-12-31 Compound screening system

Publications (1)

Publication Number Publication Date
WO1999012118A1 true WO1999012118A1 (fr) 1999-03-11

Family

ID=25645594

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU1998/000715 WO1999012118A1 (fr) 1997-09-03 1998-09-03 Systeme de tamisage de composes

Country Status (2)

Country Link
EP (1) EP1010094A4 (fr)
WO (1) WO1999012118A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000079263A2 (fr) * 1999-06-18 2000-12-28 Synt:Em S.A. Identification de molecules actives au moyen de parametres physico-chimiques
WO2002082329A2 (fr) * 2001-04-06 2002-10-17 Axxima Pharmaceuticals Ag Procede pour creer une relation quantitative structure-propriete-activite
US7415358B2 (en) 2001-05-22 2008-08-19 Ocimum Biosolutions, Inc. Molecular toxicology modeling
US7447594B2 (en) 2001-07-10 2008-11-04 Ocimum Biosolutions, Inc. Molecular cardiotoxicology modeling
US7469185B2 (en) 2002-02-04 2008-12-23 Ocimum Biosolutions, Inc. Primary rat hepatocyte toxicity modeling
CN109359833A (zh) * 2018-09-27 2019-02-19 中国石油大学(华东) 一种基于abc-brann模型的海洋平台燃爆风险分析方法
CN111916143A (zh) * 2020-07-27 2020-11-10 西安电子科技大学 基于多样子结构特征融合的分子活性预测方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023123021A1 (fr) * 2021-12-29 2023-07-06 深圳晶泰科技有限公司 Procédé et appareil d'acquisition de description de caractéristique de molécule et support de stockage

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5459077A (en) * 1989-12-29 1995-10-17 Pepmetics, Inc. Methods for modelling tertiary structures of biologically active ligands and for modelling agonists and antagonists thereto
US5524086A (en) * 1993-05-14 1996-06-04 Nec Corporation Dipole parameter estimation method and apparatus
US5526281A (en) * 1993-05-21 1996-06-11 Arris Pharmaceutical Corporation Machine-learning approach to modeling biological activity for molecular design and to modeling other characteristics
US5699268A (en) * 1995-03-24 1997-12-16 University Of Guelph Computational method for designing chemical structures having common functional characteristics

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5434796A (en) * 1993-06-30 1995-07-18 Daylight Chemical Information Systems, Inc. Method and apparatus for designing molecules with desired properties by evolving successive populations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5459077A (en) * 1989-12-29 1995-10-17 Pepmetics, Inc. Methods for modelling tertiary structures of biologically active ligands and for modelling agonists and antagonists thereto
US5524086A (en) * 1993-05-14 1996-06-04 Nec Corporation Dipole parameter estimation method and apparatus
US5526281A (en) * 1993-05-21 1996-06-11 Arris Pharmaceutical Corporation Machine-learning approach to modeling biological activity for molecular design and to modeling other characteristics
US5699268A (en) * 1995-03-24 1997-12-16 University Of Guelph Computational method for designing chemical structures having common functional characteristics

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Molecular Biology and Molecular Simulations", XP003010327, Retrieved from the Internet <URL:http://web.archive.org/web/20010522024900/http://www.biosym.com/about/jobs/index.html> [retrieved on 20070101] *
See also references of EP1010094A4 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000079263A2 (fr) * 1999-06-18 2000-12-28 Synt:Em S.A. Identification de molecules actives au moyen de parametres physico-chimiques
WO2000079263A3 (fr) * 1999-06-18 2001-05-17 Synt Em Sa Identification de molecules actives au moyen de parametres physico-chimiques
WO2002082329A2 (fr) * 2001-04-06 2002-10-17 Axxima Pharmaceuticals Ag Procede pour creer une relation quantitative structure-propriete-activite
WO2002082329A3 (fr) * 2001-04-06 2004-01-15 Axxima Pharmaceuticals Ag Procede pour creer une relation quantitative structure-propriete-activite
US7415358B2 (en) 2001-05-22 2008-08-19 Ocimum Biosolutions, Inc. Molecular toxicology modeling
US7426441B2 (en) 2001-05-22 2008-09-16 Ocimum Biosolutions, Inc. Methods for determining renal toxins
US7447594B2 (en) 2001-07-10 2008-11-04 Ocimum Biosolutions, Inc. Molecular cardiotoxicology modeling
US7469185B2 (en) 2002-02-04 2008-12-23 Ocimum Biosolutions, Inc. Primary rat hepatocyte toxicity modeling
CN109359833A (zh) * 2018-09-27 2019-02-19 中国石油大学(华东) 一种基于abc-brann模型的海洋平台燃爆风险分析方法
CN109359833B (zh) * 2018-09-27 2022-05-27 中国石油大学(华东) 一种基于abc-brann模型的海洋平台燃爆风险分析方法
CN111916143A (zh) * 2020-07-27 2020-11-10 西安电子科技大学 基于多样子结构特征融合的分子活性预测方法
CN111916143B (zh) * 2020-07-27 2023-07-28 西安电子科技大学 基于多样子结构特征融合的分子活性预测方法

Also Published As

Publication number Publication date
EP1010094A4 (fr) 2001-03-07
EP1010094A1 (fr) 2000-06-21

Similar Documents

Publication Publication Date Title
Mai et al. Molecular photochemistry: recent developments in theory
Singh et al. Comparison of multi-modal optimization algorithms based on evolutionary algorithms
Judson Genetic algorithms and their use in chemistry
Pedersen et al. Genetic algorithms for protein structure prediction
Davidor Genetic Algorithms and Robotics: A heuristic strategy for optimization
JPH08512159A (ja) 連続して分子群を進化させて、所望の特性を有する分子を設計する方法と装置
Suchan et al. Pragmatic approach to photodynamics: Mixed Landau–Zener surface hopping with intersystem crossing
Hasegawa et al. GA strategy for variable selection in QSAR studies: enhancement of comparative molecular binding energy analysis by GA‐based PLS method
Lameijer et al. Evolutionary algorithms in drug design
US6219622B1 (en) Computational method for designing chemical structures having common functional characteristics
US5699268A (en) Computational method for designing chemical structures having common functional characteristics
Fatemi et al. Prediction of bioconcentration factor using genetic algorithm and artificial neural network
CA2478556A1 (fr) Procedes et systemes destines a la decouverte de composes chimiques et a leur synthese
EP1010094A1 (fr) Systeme de tamisage de composes
Danel et al. Docking-based generative approaches in the search for new drug candidates
Hageman et al. Design and assembly of virtual homogeneous catalyst libraries–towards in silico catalyst optimisation
WO2005083616A1 (fr) Dispositif de recherche de ligands, procede de recherche de ligands, programme, et support d&#39;enregistrement
McLeod et al. Development of a genetic algorithm for molecular scale catalyst design
Lin et al. An efficient hybrid Taguchi-genetic algorithm for protein folding simulation
Yan Application of self-organizing maps in compounds pattern recognition and combinatorial library design
US20020133297A1 (en) Ligand docking method using evolutionary algorithm
Goh et al. Evolving molecules for drug design using genetic algorithms via molecular trees
Olariu et al. Biology-derived algorithms in engineering optimization
Zaman et al. Using subpopulation EAs to map molecular structure landscapes
Ajjarapu et al. Ligand-based drug designing

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
NENP Non-entry into the national phase

Ref country code: KR

WWE Wipo information: entry into national phase

Ref document number: 1998941143

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 09486930

Country of ref document: US

WWP Wipo information: published in national office

Ref document number: 1998941143

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: CA

WWW Wipo information: withdrawn in national office

Ref document number: 1998941143

Country of ref document: EP