WO2008116495A1 - Method and apparatus for the design of chemical compounds with predetermined properties - Google Patents

Method and apparatus for the design of chemical compounds with predetermined properties Download PDF

Info

Publication number
WO2008116495A1
WO2008116495A1 PCT/EP2007/052856 EP2007052856W WO2008116495A1 WO 2008116495 A1 WO2008116495 A1 WO 2008116495A1 EP 2007052856 W EP2007052856 W EP 2007052856W WO 2008116495 A1 WO2008116495 A1 WO 2008116495A1
Authority
WO
WIPO (PCT)
Prior art keywords
molecular
descriptors
routine
fragment
qsar
Prior art date
Application number
PCT/EP2007/052856
Other languages
French (fr)
Inventor
Mati Karelson
Pilv Mehis
Original Assignee
Molcode Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Molcode Ltd filed Critical Molcode Ltd
Priority to PCT/EP2007/052856 priority Critical patent/WO2008116495A1/en
Publication of WO2008116495A1 publication Critical patent/WO2008116495A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C10/00Computational theoretical chemistry, i.e. ICT specially adapted for theoretical aspects of quantum chemistry, molecular mechanics, molecular dynamics or the like
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/30Prediction of properties of chemical compounds, compositions or mixtures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Definitions

  • the present disclosure relates generally to the generation of chemical compounds or other molecular structures with target numerical values of a physical property, chemical reactivity or biological activity, or their combination.
  • Theoretical molecular descriptors can be divided into several separate groups based on their physical origin or methods of calculation. Constitutional, geometrical, topological, electrostatic or charge distribution related, quantum chemical or molecular orbital (MO) related, solvational, thermodynamic, or combined descriptors can be distinguished (M. Karelson, (2000, 2004)). Numerous physical, chemical and biological properties of chemical compounds and other molecular structures have been successfully projected on large spaces of molecular descriptors through different mathematical relationships.
  • sample but not exhaustive list of such properties includes the physico-chemical and analytical properties of chemical compounds such as the boiling points, melting points, critical temperatures, refractive indices, liquid viscosity, partition coefficients, GC retention times and response factors and UV spectral absorbance, technological properties of materials such as the polymer glass transition temperatures, critical micelle concentrations of surfactants and efficiency of rubber vulcanization accelerators, biological activity of potential drug candidates as antibacterials, ⁇ - adrenergic antagonists, HIV-I protease inhibitors, and their transfer to blood, brain and human breast milk, the toxicity and soil sorption of vari- ous pollutants (Karelson, M.
  • ligand of the parent molecule
  • the validation criterion from molecular dynamics In the search of the combinatorial library of the targeted molecules to the biological counterpart a Monte Carlo simulated method can be applied (Archetti, F. et al., 2006).
  • Another method for computational design of macromolecules (peptides, nucleic acids or polymers) with specific properties is based on the analysis of conformational energies (Lacroix, E., et al., 2001, 2002).
  • the present disclosure describes an automatic computer-aided method and system for the creation of novel molecular structures and/or individual chemical compounds with predetermined (targeted) property values.
  • the procedure is based on the separation of compounds and/or structures into distinct chemical entities (fragments).
  • the compounds are generated through mapping the property/reactivity/activity of the whole compound and/or molecular structure to the molecular descriptors of its individual fragments.
  • Fig. 1 is a schematic diagram demonstrating the irreversibility of quantum mechanic and molecular dynamic equations contrasted with the design of molecular structures based upon predetermined properties;
  • Fig. 2 is an example of a fragmental molecular design routine for a computational development of novel molecular structures with predetermined chemical, physical or biomedical properties;
  • Fig. 3 is an example of a representation of a molecular structure of a single molecule through three interconnected fragments, R 1 , R 2 and G 1 ;
  • Fig. 4 is an example of a representation of a structural difference of model fragment compounds for differently bonded furyl-fragments in two different phenylfurans;
  • Fig. 5 is an example of an architecture of a three-layer back-propagation fragment- based neural network
  • Fig. 6 is an example of a set of disubstituted (X, Y) benzenes.
  • Fig. 7 is a block diagram of a computing system that may operate in accordance with the invention.
  • the molecular descriptors present a single tool for the construction of the molecular structures with predetermined properties.
  • the approaches based on the use of quantum mechanics (Schr ⁇ dinger equation) or molecular dynamics (based on Newton mechanics and statistical physics) are not applicable for such so-called reverse task due to the irreversibility of the respective fundamental equations (Fig. 1).
  • Fig. 2 is an example of a fragmental molecular design routine 10 which may be used to develop new molecular structures having predetermined chemical, physical and/or biomedical properties.
  • the system and method involve the following:
  • the data are the natural experimental measurement data or numerical data related to them through a known mathematical relationship such as the proportionality, the linear relationship or some other normalization scheme.
  • the data can represent the physical or spectroscopic properties, chemical reactivity or biological activity of the molecular structures.
  • the molecular structures may be the individual molecules, the complexes or arrays of molecules.
  • the data includes the following features:
  • ADME Absorption, Distribution, Metabolization, and Excretion
  • the molecular descriptors include the following generic types:
  • constitutional descriptors calculated from the total number of atoms in the fragment or the whole molecule, the number and percentage of atoms of a given atomic species, the number and percentage of alicyclic and aromatic carbocycles, hetero- cycles, the number of functional groups in the fragment or the whole molecule, and the molecular weight of the fragment or the whole molecule;
  • electrostatic descriptors calculated using the three-dimensional molecular structure of the fragments or the whole molecule represented by their 3D atomic coordinates, atomic charges and the size of the atoms in the fragment or the whole molecule.
  • the atomic charges can be calculated using Sanderson's electronegativity equalization principle and its variations or quantum chemically.
  • the size of atoms can be calculated proceeding from their van der Waals radii or the cutoff electronic charge distribution;
  • e quantum chemical descriptors, calculated from the quantum-mechanical wave function of a molecule.
  • the molecule is divided into several semi-independent parts (fragments). Each fragment of molecule has an enumerable amount of free valencies bridging it to neighboring fragments. The single-valent fragments are called substituents. The multiple-valent fragments are called bridging structures.
  • One of the simplest cases is shown in Fig. 3, with a bridging structure component Gi and the two substituent group components (Rj and R 2 ).
  • One or several fragments may be missing in a given series of compounds.
  • FIG. 4 illustrates an example of the structural difference of the model fragment compounds for the differently bonded furyl- fragments (B) in two different phenylfurans (AB).
  • the descriptors can be calculated for the molecules/fragments in the respective dielectric environment, using the spherical-cavity, ellipsoidal-cavity, multi-cavity or polarizable continuum model (PCM) reaction field quantum chemical molecular wave functions.
  • PCM polarizable continuum model
  • the whole molecule QSAR/QSPR models are developed either as the respective artificial neural networks (ANN) or as the whole molecule descriptor-based multi-linear regression equations (equation 1). Those have the following general form of:
  • the fragment QSAR/QSPR models are developed either as the respective fragment based artificial neural networks (FANN) or as the fragment descriptor-based multi-linear cross-term regression equations. The latter have the general form of the following expansion (equation 2):
  • Equation 2 The summations in equation 2 are carried out over all applicable descriptors for a given model.
  • the coefficients ⁇ ,, b p and c ⁇ are determined by the least squares technique.
  • the summation in equation 3 is carried out over all fragments in the molecular structure.
  • the summation in equation 4 is carried out for all adjacent, i.e. bonded fragment pairs (a,b), and the summation in equation 5 is applicable for all bonded fragment triples (a,b,c).
  • similarly defined terms for fragment quadruplets, quintuplets etc. can be used.
  • the fragment-based artificial neural network is composed of a number of single processing elements (PE) or units (nodes). Each PE has weighted inputs, a transfer function and one output. PEs are connected with coefficients (weights) and are organized in a layered topology as follow: (i) the input layer, (ii) the output layer and (iii) the hidden layers between them. The number of layers and the number of units in each layer determines the functional complexity of the FANN (Fig. 5). Each input layer node of FANN corresponds to a single independent variable defined by equations 3-5, etc. with the exception of the bias node. For the nodes involving descriptors related to the extreme (minimum or maximum) value of some structural property of the whole molecule, equations 9 and 10 are applicable. Each output layer node of FANN corresponds to a dependent variable (property under investigation).
  • each node Associated with each node is an internal state designated by / réelle H h , and O 1n for the input, hidden, output layers, respectively.
  • Each of the input and hidden layer has an additional unit, termed a bias unit, whose internal state is assigned a value of 1.
  • the input layer's /, values are related to the corresponding independent variables by the scaling equation (equation 11):
  • W OT/ is the bond that connects output unit m to hidden layer bias unit.
  • the network calculated O m values are within the range [0,1].
  • the training of the FANN is achieved by minimizing an error function E with respect to the bond weights ⁇ w ⁇ , W mh ⁇ :
  • E p is the error of the pth training pattern, defined as the set of descriptors and activity corresponding to the pth data points, or chemical compound; a pm corresponds to the experimentally measured value of the mth dependent variable.
  • the fragment library is constructed from the molecular structures obtained by the fragmentation of the compounds from the QSAR/QSPR model data set.
  • the additional fragments are added based on various chemi- cal or structural similarities with the fragments from the model data set or using chemical intuition.
  • a full set of molecular descriptors used in the development of the QSAR/QSAR models with fragmental descriptors are calculated for each fragment in the library.
  • Loop B Laboratory chemical synthesis of the best predicted novel molecular structures and the measurement of their targeted properties. After that, the modeling set of structures and the initial database of property values are extended by inclusion of these data and the whole procedure is repeated (control returned to block 12).
  • Partial Positive Surface Area is the sum of the solvent-accessible surface areas of all positively charged atoms, S A -
  • Total charge weighted partial positive surface area is the sum of all the atomic partial positive charges, q & , multiplied by the sum of the solvent-accessible surface areas of all positively charged atoms, S A '.
  • Atomic charge weighted partial positive surface area is the sum of the products of the atomic solvent-accessible surface areas S A and partial charges qA, over all positively charged atoms, respectively:
  • TMSA total molecular surface area
  • equation 21 is identical to equation 6, for the special case of the single constant bridging fragment.
  • Table 1 the results of the correlation between the whole molecule CPSA-s and those calculated using the fragmental QSPR approach are presented. The predictions have great accuracy (in most cases the pair correlation coefficient R2 > 0.95).
  • HASAl ⁇ s A A e A H ⁇ cceptor (eq. 22)
  • HASA2 ⁇ JJ A ⁇ (eq. 23)
  • HACA2 ⁇ q A ⁇ (eq. 25)
  • qo is the partial charge on hydrogen bonding donor (H) atom(s) and S D denotes the surface area for this(these) atom(s).
  • H hydrogen bonding donor
  • S D denotes the surface area for this(these) atom(s).
  • the hydrogen atoms in the ⁇ -position to carbonyland cyano-groups are considered as possible hydrogen bonding donors.
  • H-acceptor dependent H- donor surface areas only an equal number of H-donors with the H-acceptor atoms is taken into account for the same reasons of the possible H-bonds in the molecule.
  • Fig. 7 illustrates an example of a suitable computing system environment 100 on which a system for the steps of the claimed method and apparatus may be implemented.
  • the computing system environment 100 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the method of apparatus of the claims. Neither should the computing environment 100 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment 100.
  • the steps of the claimed method and apparatus are operational with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the methods or apparatus of the claims include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • the methods and apparatus may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computer storage media including memory storage devices.
  • an exemplary system for implementing the steps of the claimed method and apparatus includes a general purpose computing device in the form of a computer 110.
  • Components of computer 110 may include, but are not limited to, a processing unit 120, a system memory 130, and a system bus 121 that couples various system components including the system memory to the processing unit 120.
  • the system bus 121 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures.
  • such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus also known as Mezzanine bus.
  • ISA Industry Standard Architecture
  • MCA Micro Channel Architecture
  • EISA Enhanced ISA
  • VESA Video Electronics Standards Association
  • PCI Peripheral Component Interconnect
  • Computer 110 typically includes a variety of computer readable media.
  • Computer readable media can be any available media that can be accessed by computer 110 and includes both volatile and nonvolatile media, removable and non-removable media.
  • Computer readable media may comprise computer storage media and communication media.
  • Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by computer 110.
  • Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
  • modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
  • communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer readable media.
  • the system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132.
  • ROM read only memory
  • RAM random access memory
  • BIOS basic input/output system
  • RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120.
  • Fig. 7 illustrates operating system 134, application programs 135, other program modules 136, and program data 137.
  • the computer 110 may also include other removable/non-removable, volatile/nonvolatile computer storage media.
  • Fig. 7 illustrates a hard disk drive 140 that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive 151 that reads from or writes to a removable, nonvolatile magnetic disk 152, and an optical disk drive 155 that reads from or writes to a removable, nonvola- tile optical disk 156 such as a CD ROM or other optical media.
  • removable/nonremovable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like.
  • the hard disk drive 141 is typically connected to the system bus 121 through a non-removable memory interface such as interface 140, and magnetic disk drive 151 and optical disk drive 155 are typically connected to the system bus 121 by a removable memory interface, such as interface 150.
  • the drives and their associated computer storage media discussed above and illustrated in Fig. 7, provide storage of computer readable instructions, data structures, program modules and other data for the computer 110.
  • hard disk drive 141 is illustrated as storing operating system 144, application programs 145, other program modules 146, and program data 147. Note that these components can either be the same as or different from operating system 134, application programs 135, other program modules 136, and program data 137. Operating system 144, application programs 145, other program modules 146, and program data 147 are given different numbers here to illustrate that, at a minimum, they are different copies.
  • a user may enter commands and information into the computer 20 through input devices such as a keyboard 162 and pointing device 161, commonly referred to as a mouse, trackball or touch pad.
  • Other input devices may include a microphone, joystick, game pad, satellite dish, scanner, or the like.
  • These and other input devices are often connected to the processing unit 120 through a user input interface 160 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB).
  • a monitor 191 or other type of display device is also connected to the system bus 121 via an interface, such as a video interface 190.
  • computers may also include other peripheral output devices such as speakers 197 and printer 196, which may be connected through an output peripheral interface 190.
  • the computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180.
  • the remote computer 180 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110, although only a memory storage device 181 has been illustrated in Fig. 7.
  • the logical connections depicted in Fig. 7 include a local area network (LAN) 171 and a wide area network (WAN) 173, but may also include other networks.
  • LAN local area network
  • WAN wide area network
  • Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.
  • the computer 110 When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170. When used in a WAN networking environment, the computer 110 typically includes a modem 172 or other means for establishing communications over the WAN 173, such as the Internet.
  • the modem 172 which may be internal or external, may be connected to the system bus 121 via the user input interface 160, or other appropriate mechanism.
  • program modules depicted relative to the computer 110, or portions thereof may be stored in the remote memory storage device.
  • Fig. 7 illustrates remote application programs 185 as residing on memory device 181. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
  • fragmental molecular design system and method, and other elements have been described as preferably being implemented in software, they may be implemented in hardware, firmware, etc., and may be implemented by any other processor.
  • the elements described herein may be implemented in a standard multi-purpose CPU or on specifically designed hardware or firmware such as an application-specific integrated circuit (ASIC) or other hard-wired device as desired, including, but not limited to, the computer 110 of Fig. 7.
  • ASIC application-specific integrated circuit
  • the software routine may be stored in any computer readable memory such as on a magnetic disk, a laser disk, or other storage medium, in a RAM or ROM of a computer or processor, in any database, etc.
  • this software may be delivered to a user or a process plant via any known or desired delivery method including, for example, on a computer readable disk or other transportable computer storage mechanism or over a communication channel such as a telephone line, the internet, wireless communication, etc. (which are viewed as being the same as or interchangeable with providing such software via a transportable storage medium).

Landscapes

  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

A method and apparatus for the generation of chemical compounds or complex molecular structures having the target numerical values of a physical property, chemical reactivity or biological activity, or their combination. The procedure is based on the separation of compounds and/or structures into distinct chemical entities (fragments). The compounds are generated through mapping the property/reactivity/activity of the whole compound and/or molecular structure to the molecular descriptors of its individual fragments.

Description

METHOD AND APPARATUS FOR THE DESIGN OF CHEMICAL COMPOUNDS WITH PREDETERMINED PROPERTIES
FIELD OF THE TECHNOLOGY
[0001] The present disclosure relates generally to the generation of chemical compounds or other molecular structures with target numerical values of a physical property, chemical reactivity or biological activity, or their combination.
BACKGROUND
[0002] A large variety of the theoretical and semi-empirical methods are available for the prediction of the physical properties, chemical reactivity or biological activity of chemical compounds, using the theoretical statistical mechanical, quantum chemical or molecular mechanics methods, or by mapping of those properties on the molecular descriptors, calculated from the molecular structure alone.
[0003] Such theoretical molecular descriptors have become an important intermediate in connecting the structures of compounds with their physical properties or chemical activities through the quantitative stracture-property/activity relationships (QSPR/QSAR). In most cases theoretical descriptors have explicit mathematical definition based on fundamental physical equations and knowledge of the molecular matter, which makes them particularly helpful in understanding the physical mechanisms and models of the studied phenomena.
[0004] Theoretical molecular descriptors can be divided into several separate groups based on their physical origin or methods of calculation. Constitutional, geometrical, topological, electrostatic or charge distribution related, quantum chemical or molecular orbital (MO) related, solvational, thermodynamic, or combined descriptors can be distinguished (M. Karelson, (2000, 2004)). Numerous physical, chemical and biological properties of chemical compounds and other molecular structures have been successfully projected on large spaces of molecular descriptors through different mathematical relationships. The sample but not exhaustive list of such properties includes the physico-chemical and analytical properties of chemical compounds such as the boiling points, melting points, critical temperatures, refractive indices, liquid viscosity, partition coefficients, GC retention times and response factors and UV spectral absorbance, technological properties of materials such as the polymer glass transition temperatures, critical micelle concentrations of surfactants and efficiency of rubber vulcanization accelerators, biological activity of potential drug candidates as antibacterials, α- adrenergic antagonists, HIV-I protease inhibitors, and their transfer to blood, brain and human breast milk, the toxicity and soil sorption of vari- ous pollutants (Karelson, M. et al, 1996, 1997, 1999, 2000a, 200b, 2004; Katritzky, A.R. et al., 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004a, 2004b, 2005; Huibers, P., et al., 1996, 1997; Menziani, M.C., 1998; Ignatz-Hoover, F. et al., 1999, Hiob, R. et al., 2000, 2002; SiId, S., et al., 2002, Sak, K. et al., 2002; Fitch, W.L. et al., 2002; Fara, D. et al., 2005).
[0005] Molecular structural descriptors, possessing a neighborhood property that is appropriate to whole molecules, specifically the CoMFA fields, have been used to generate similar chemical compounds constructed from component parts (Cramer, R.D. et al., 1994, 2001, 2004). However, the validation is carried out on the metric comparison (structural distance) of these compounds, not the actual property of interest. In another approach, the specifically targeted molecules that exhibit an activity associated with a parent molecule are constructed from chemical groups of similar activity (i.e., amino acid residues). The closeness of the generated compound to target molecule is validated through the molecular dynamics simulation (Casset, F. et al., 2001). In this case, the molecular structure of the chemical or biological counterpart (e.g. receptor) of the parent molecule (e.g., ligand) has to be known to use the validation criterion from molecular dynamics. In the search of the combinatorial library of the targeted molecules to the biological counterpart a Monte Carlo simulated method can be applied (Archetti, F. et al., 2006). Another method for computational design of macromolecules (peptides, nucleic acids or polymers) with specific properties is based on the analysis of conformational energies (Lacroix, E., et al., 2001, 2002).
[0006] A number of patents describe the methodologies and procedures that relate the molecular properties to their molecular constitution and structure (Platt, D. E., 1998; Hurst, J.R. et al., 2001; Agrafiotis, D.K., 2002; Silverman, B.D., 2003; Gottfries, et al., 2004; La- bute, P.R., 2004; Lobanov, V.S., et al., 2004; Schwartz, S.D. et al., 2005; Miller, D.W. et al., 2005; Uchiyama, M., et al., 2002; Cheng, A.; Merz, K.M. Jr., 2003; Kovesdi, I. et al., 2004).
SUMMARY
[0007] The present disclosure describes an automatic computer-aided method and system for the creation of novel molecular structures and/or individual chemical compounds with predetermined (targeted) property values. The procedure is based on the separation of compounds and/or structures into distinct chemical entities (fragments). The compounds are generated through mapping the property/reactivity/activity of the whole compound and/or molecular structure to the molecular descriptors of its individual fragments. BRIEF DESCRIPTION OF THE DRAWINGS
[0008] Fig. 1 is a schematic diagram demonstrating the irreversibility of quantum mechanic and molecular dynamic equations contrasted with the design of molecular structures based upon predetermined properties;
[0009] Fig. 2 is an example of a fragmental molecular design routine for a computational development of novel molecular structures with predetermined chemical, physical or biomedical properties;
[0010] Fig. 3 is an example of a representation of a molecular structure of a single molecule through three interconnected fragments, R1, R2 and G1;
[0011] Fig. 4 is an example of a representation of a structural difference of model fragment compounds for differently bonded furyl-fragments in two different phenylfurans;
[0012] Fig. 5 is an example of an architecture of a three-layer back-propagation fragment- based neural network;
[0013] Fig. 6 is an example of a set of disubstituted (X, Y) benzenes; and
[0014] Fig. 7 is a block diagram of a computing system that may operate in accordance with the invention.
DESCRIPTION
[0015] The molecular descriptors present a single tool for the construction of the molecular structures with predetermined properties. The approaches based on the use of quantum mechanics (Schrδdinger equation) or molecular dynamics (based on Newton mechanics and statistical physics) are not applicable for such so-called reverse task due to the irreversibility of the respective fundamental equations (Fig. 1).
[0016] The fragmental molecular design system and method as described herein is aimed on the computational development of novel molecular structures with predetermined chemical, physical and/or biomedical properties. Fig. 2 is an example of a fragmental molecular design routine 10 which may be used to develop new molecular structures having predetermined chemical, physical and/or biomedical properties. Referring to Fig. 2, and as described further below, the system and method involve the following:
1) the creation of the modeling set as a database of experimental data on the properties of interest; 2) the determination of the fragmental and the whole molecule descriptors for the chemical compounds, or other molecular structures, in the modeling set;
3) the development of QS AR/QSPR models based on fragmental molecule descriptors and whole molecule descriptors;
4) the creation of a fragment library for the construction of the novel chemical compounds, or other molecular structures;
5) the construction of novel chemical compounds, or other molecular structures, as possible combinations, and preferably all possible combinations, of molecular fragments from the fragment library, and the prediction of novel chemical compounds, or other molecular structures, having the best fitting target numerical property values using the QSAR/QSPR models based on the fragmental molecule descriptors;
6) the validation of predictions using the QSAR/QSPR models with the whole molecule descriptors and/or experimental measurements on the predicted compounds.
[0017] The whole process (loop 1-6) can be continued iteratively. Each aspect of the process (1-6) described above and below is noted in Fig. 2 with reference numerals 1-6, along with corresponding reference numerals for each block depicted therein.
1) The creation of the modeling set as a database of experimental data on the properties of interest.
[0018] Referring to blocks 14 and 16 of Fig. 2, the data are the natural experimental measurement data or numerical data related to them through a known mathematical relationship such as the proportionality, the linear relationship or some other normalization scheme. The data can represent the physical or spectroscopic properties, chemical reactivity or biological activity of the molecular structures. The molecular structures may be the individual molecules, the complexes or arrays of molecules.
[0019] The data includes the following features:
• one-to-one correspondence between the property value and the precise 2D representation of molecular structure;
• variability in the chemical composition or nanoscale arrangement of the molecular structures; and • the possibility to divide the molecular structures into distinct fragments.
[0020] The construction of novel molecules with the best target property values can be carried out for a single molecular property or a set of molecular properties. A typical case is the medical drug design, when apart from the specific drug-receptor interaction, the so- called ADME (Absorption, Distribution, Metabolization, and Excretion) properties of the compounds that describe the disposition of a pharmaceutical compound within an organism should be also optimized.
2) The determination of the fragmental and the whole molecule descriptors for the compounds in the modeling set.
[0021] Referring to block 18 of Fig. 2, the whole molecule descriptors are calculated for the whole molecular structure. The molecular descriptors include the following generic types:
a) constitutional descriptors, calculated from the total number of atoms in the fragment or the whole molecule, the number and percentage of atoms of a given atomic species, the number and percentage of alicyclic and aromatic carbocycles, hetero- cycles, the number of functional groups in the fragment or the whole molecule, and the molecular weight of the fragment or the whole molecule;
b) topological descriptors, calculated from the connectivity matrices between the atoms in the fragment or the whole molecule;
c) geometrical descriptors, calculated from the geometrical parameters of molecular fragments or the whole molecule (bond lengths, bond angles, dihedral angles), the mass distribution in the fragment and gravitational indices;
d) electrostatic descriptors, calculated using the three-dimensional molecular structure of the fragments or the whole molecule represented by their 3D atomic coordinates, atomic charges and the size of the atoms in the fragment or the whole molecule. The atomic charges can be calculated using Sanderson's electronegativity equalization principle and its variations or quantum chemically. The size of atoms can be calculated proceeding from their van der Waals radii or the cutoff electronic charge distribution; and
e) quantum chemical descriptors, calculated from the quantum-mechanical wave function of a molecule. [0022] Referring to block 20 of Fig. 2, in order to calculate the fragmental descriptors, the molecule is divided into several semi-independent parts (fragments). Each fragment of molecule has an enumerable amount of free valencies bridging it to neighboring fragments. The single-valent fragments are called substituents. The multiple-valent fragments are called bridging structures. One of the simplest cases is shown in Fig. 3, with a bridging structure component Gi and the two substituent group components (Rj and R2). One or several fragments may be missing in a given series of compounds.
[0023] The same generic types of molecular descriptors as for the whole molecular structure are calculated for the molecular fragments, either directly or using the standard terminating molecular fragments (entities), such as hydrogen, alkyl (e.g., methyl) or, substituted or unsubstituted, aryl- (e.g., phenyl). In order to distinguish the fragment connecting bond, it can be substituted by a pseudo-hydrogen atom (Hp), having different orbital exponent(s) in the calculation of molecular descriptors for such fragment. Depending on the quantum- chemical basis set, the exponents for pseudo-hydrogen atoms are optimally 85 to 115 % of the original orbital exponents. This ensures sufficient difference in the molecular descriptor values of the same but differently bonded fragment. Fig. 4 illustrates an example of the structural difference of the model fragment compounds for the differently bonded furyl- fragments (B) in two different phenylfurans (AB).
[0024] In the case of properties applicable for the molecules or molecular structures in condensed disordered media (liquids, solutions, polymers), the descriptors can be calculated for the molecules/fragments in the respective dielectric environment, using the spherical-cavity, ellipsoidal-cavity, multi-cavity or polarizable continuum model (PCM) reaction field quantum chemical molecular wave functions.
3) The development of the QSAR/QSPR models based on the fragmental and the whole molecule descriptors.
[0025] Referring to block 22 of Fig. 2, the whole molecule QSAR/QSPR models are developed either as the respective artificial neural networks (ANN) or as the whole molecule descriptor-based multi-linear regression equations (equation 1). Those have the following general form of:
P = Po + %A (eq. D
I=I
where P is the property investigated, Po is its standard value, D1 are the relevant molecular descriptors and C1 is the respective regression coefficients. [0026] Referring to block 24 of Fig. 2, the fragment QSAR/QSPR models are developed either as the respective fragment based artificial neural networks (FANN) or as the fragment descriptor-based multi-linear cross-term regression equations. The latter have the general form of the following expansion (equation 2):
P = P0 +£α,A(1) +%b,D?> +±ckD® +... (eq. 2)
(=1 j=X k=\
where P is the property modeled and the derived descriptors (D(1), D(2),
Figure imgf000009_0001
etc.) are defined as follows (equations 3, 4 and 5, respectively):
Figure imgf000009_0002
DT = ∑d»dβ> (eq. 4) a,b&n [f
D?] = ∑dkΛΛc (eq. 5) a,b,ceιif
[0027] The summations in equation 2 are carried out over all applicable descriptors for a given model. The coefficients α,, bp and c^ are determined by the least squares technique. The summation in equation 3 is carried out over all fragments in the molecular structure. The summation in equation 4 is carried out for all adjacent, i.e. bonded fragment pairs (a,b), and the summation in equation 5 is applicable for all bonded fragment triples (a,b,c). According to equation 2, similarly defined terms for fragment quadruplets, quintuplets etc. can be used.
[0028] In the special case described in Fig. 3, the respective equations are reduced to the following set:
P = P0 +%lD^ +fjbJDf) (eq. 6)
(=1 J=I
where P is the property investigated and two derived descriptors (D(1) and D( ) are defined as follows (equations 7 and 8, respectively):
^l) = dm + dlR2 + d,G1 (eq. 7) and Df] = dmdm + dlR2dlGl (eq. 8)
[0029] Thus, these terms correspond to the linear and cross-terms of fragment descriptors, respectively. In the equations above, Oi1Ri, d,R2, and d&i denote the i-th molecular descriptor values for fragments Rl, R2 and Gl, respectively, and α, and b} are the respective least- squares regression coefficients. The descriptors related to the extreme (minimum or maximum) value of some structural property of the whole molecule, such as the most negative atomic partial charge or the maximum valency for a given atomic type in the molecule, are treated separately. As only the extreme value within all fragments in the molecule has physical meaning in the case of such descriptors, those values can be used as follows:
D?L = max[rfIfil , dlR2 , dlG1 ) (eq. 9)
or
D;{L = mink«, , djR2 , djG1 ] (eq. 10)
where dt belong to the maximum and d} to the minimum value descriptors.
[0030] The fragment-based artificial neural network (FANN) is composed of a number of single processing elements (PE) or units (nodes). Each PE has weighted inputs, a transfer function and one output. PEs are connected with coefficients (weights) and are organized in a layered topology as follow: (i) the input layer, (ii) the output layer and (iii) the hidden layers between them. The number of layers and the number of units in each layer determines the functional complexity of the FANN (Fig. 5). Each input layer node of FANN corresponds to a single independent variable defined by equations 3-5, etc. with the exception of the bias node. For the nodes involving descriptors related to the extreme (minimum or maximum) value of some structural property of the whole molecule, equations 9 and 10 are applicable. Each output layer node of FANN corresponds to a dependent variable (property under investigation).
[0031] Associated with each node is an internal state designated by /„ Hh, and O1n for the input, hidden, output layers, respectively. Each of the input and hidden layer has an additional unit, termed a bias unit, whose internal state is assigned a value of 1. The input layer's /, values are related to the corresponding independent variables by the scaling equation (equation 11):
I1 = D ! - D i≥, ϋ5 Λ + 0.1 (eq. ! 1}
A(max) ~ A(min) + ^- 1 where D1 is the value of the ith input node, Acrøwj and D1(K11n) are its maximum and minimum values, respectively. The state Hh of each hidden unit is calculated by the squashing (sigmoid, logistic) function (equations 12 and 13):
l + e ψ"
Figure imgf000011_0001
where Wh, is the weight of the bond that connects hidden unit h with input unit i and θh is the weight connecting hidden unit h to the input layer bias unit. The state On, of output unit m is calculated by: 14)
<P,,, ^WmhHh + em (eq. 15) h
where WOT/, is the bond that connects output unit m to hidden layer bias unit. The network calculated Om values are within the range [0,1].
[0032] The training of the FANN is achieved by minimizing an error function E with respect to the bond weights {w^, Wmh}:
Figure imgf000011_0003
where Ep is the error of the pth training pattern, defined as the set of descriptors and activity corresponding to the pth data points, or chemical compound; apm corresponds to the experimentally measured value of the mth dependent variable. These values were also scaled in the same manner as in equation 1. The procedure of finding a gradient vector in a network structure is generally referred to as a back-propagation because the gradient vector is calculated in the direction opposite to the flow of the output of each node.
4) The creation of the fragment library for the construction of the novel chemical compounds.
[0033] Referring to blocks 26 and 28 of Fig. 2, the fragment library is constructed from the molecular structures obtained by the fragmentation of the compounds from the QSAR/QSPR model data set. The additional fragments are added based on various chemi- cal or structural similarities with the fragments from the model data set or using chemical intuition.
[0034] An arbitrary number of restrictive conditions on the structure of the molecules built from fragments can be imposed, to prevent the inclusion of undesirable compounds. Such restrictions include:
• require the presence of a certain chemical functionality (e.g., the compounds should have tertiary amino group, acidic hydrogen etc.)
• exclude certain combinations of fragments or molecular structural entities (e.g., the compounds with hydroxyl group adjacent to the double bond are disqualified)
• impose limiting values for any applicable molecular descriptor.
[0035] A full set of molecular descriptors used in the development of the QSAR/QSAR models with fragmental descriptors are calculated for each fragment in the library.
5) The construction of novel chemical compounds as all possible combinations of molecular fragments from the fragment library and the prediction of novel chemical compounds having the best fitting target numerical property values using the QSAR/QSPR models based on the fragmental molecule descriptors.
[0036] Referring to blocks 30 and 32 of Fig. 2, using the fragments from the library described above, possible combinations of the fragments, and preferably all possible combinations of the fragments, are used to generate new compounds, or other molecular structures. For example, proceeding from the simple fragmentation scheme (Fig. 3) with two substituents and one bridge structure, the number of possible new compounds is 1,000,000 using a library of 100 substituents together with the 100 bridge structures. Provided that the fragmental descriptors for all fragments are available, the property values can be predicted from the respective fragmental QSAR/QSPR models and the compounds with the optimum (target) property values can be determined.
6) The validation of the predictions by using the QSAR/QSPR models with the whole molecule descriptors and/or experimental measurements on the predicted compounds.
[0037] Referring to block 34 of Fig. 2, because of the less number of regression coefficients and higher reliability of calculated descriptors, the QSAR/QSPR equations based on whole molecule descriptors are more robust. Therefore, for the constructed compounds with the closest property values to the target values, those values will be also predicted using the QSAR/QSPR models with whole molecule descriptors. [0038] After the evaluation of the quality of results (closeness of the property values from the QSAR/QSPR equations based on whole molecule descriptors for predicted novel molecular structures to the target property values), two iterative loop routes are applicable for the improvement of the results (Fig. 2):
Loop A. The heuristic extension of the fragment library and repetition of the procedure (control returned to block 26).
Loop B. Laboratory chemical synthesis of the best predicted novel molecular structures and the measurement of their targeted properties. After that, the modeling set of structures and the initial database of property values are extended by inclusion of these data and the whole procedure is repeated (control returned to block 12).
[0039] The procedure is completed after the satisfaction of the requirements for the target values of the properties or after a predefined number of iterative loops of the prediction of novel molecular structures.
Examples
Example 1
Fragment-based prediction of charged partial surface areas (CPSA) of disubstituted benzenes
[0040] Local electron densities or charges determine the mechanism and rate of most chemical reactions and physico-chemical properties of compounds and thus the charged partial surface areas (CPSAs) have been computed and widely applied in QSAR/QSPR treatments of intermolecular interactions (D.T. Stanton et al., 1990, 1992; M. Karelson, 2000).
[0041] Partial Positive Surface Area (PPSAl) is the sum of the solvent-accessible surface areas of all positively charged atoms, SA-
PPSAl = ∑SA (eq. 17)
A
[0042] Total charge weighted partial positive surface area (PPSA2) is the sum of all the atomic partial positive charges, q&, multiplied by the sum of the solvent-accessible surface areas of all positively charged atoms, SA'.
PPSA2 = ∑qA∑SA (eq. 18) [0043] Atomic charge weighted partial positive surface area (PPSA3) is the sum of the products of the atomic solvent-accessible surface areas SA and partial charges qA, over all positively charged atoms, respectively:
PPSAi = ∑qASA (eq. 19)
A
[0044] Similar equations can be developed for the description of the partial negative charge distribution in the molecule, by using the summation over the atoms with partial negative charges. The total molecular surface area (TMSA) is defined as the sum over the solvent-accessible surface areas of its constituent atoms.
[0045] A series of ortho-, meta-, and para- disubstituted benzenes were selected as test compounds to determine the mutual effect of two substituents (X and Y) on the quantum chemically calculated values of different CPSA-s (Fig. 6). All combinations of the substituents together resulted in 163 compounds, including benzene itself.
[0046] Dx and Dy are the fragment charged partial surface areas (CPSA-s) of the two substituents X and Y, Dn the CPSA for X = H and DPh the fragmental CPSA of the phenyl part of the molecule, α, α' and α" are the expansion coefficients.
[0047] The value of the whole molecule descriptor D can then be expressed as follows:
D = DPh + {Dx - DH )+ (Dy - DH )
+ cxDPh [[Dx - DH ) + {Dy - D11 ))+ a'(Dx - D11 )(D}, - D11 ) ^
where the terms (Dx - DH) and (Dy - DH) denote the differences of the substituents from hydrogen. The first two differences in the equation describe the additive properties of the two substituents X and Y, and the next term describes the effect of the interactions between substituents on the phenyl part. The last term the interactions of the substituents with each other.
[0048] After retrieving the constant terms and relaxing the coefficient in front of the additive terms, one obtains:
D = ao + Ci1 [Dx + D11 ) + U2DxD y (eq. 21)
where do, aj and a2 are the regression coefficients to be found. Mathematically, equation 21 is identical to equation 6, for the special case of the single constant bridging fragment. In Table 1 below, the results of the correlation between the whole molecule CPSA-s and those calculated using the fragmental QSPR approach are presented. The predictions have great accuracy (in most cases the pair correlation coefficient R2 > 0.95).
Figure imgf000015_0001
Table 1. Correlation coefficients between the whole molecule CPS A-s of disubstituted benzenes and those predicted using the fragmental QSPR (equation 21).
Example 2
Fragment-based prediction of hydrogen-bonding dependent charged partial surface areas (HBPSA) of disubstituted benzenes
[0049] The hydrogen bonding acceptor abilities of the molecule, HASAl and HASA2, are calculated as the sums of the solvent-accessible areas of the atoms active as the possible hydrogen bonding acceptors, and their square roots, respectively (Karelson, 2000): HASAl = ∑ sA A e AH→cceptor (eq. 22)
A
HASA2 = ∑JJA ~ (eq. 23)
A
while the area-weighted surface charges of the hydrogen bonding acceptor atoms in the molecule, ΗACA1 and ΗACA2, were calculated as follows:
HACAl = X qAsA (eq. 24)
A
HACA2 = ∑ qA ^ (eq. 25)
A
where q& is the partial charge on hydrogen bonding acceptor atom(s) and SA is the surface area for this atom. The electronegative atoms with lone electron pairs are usually considered as the potential hydrogen bonding acceptors in the molecule (O, N, etc.). Analogously, the solvent-accessible areas of hydrogen bonding donor atoms in the molecule, ΗDSA1 and ΗDSA2, are defined as follows:
HDSAl = ∑ SD D e H H^onor (eq. 26)
D
Figure imgf000016_0001
and the area- weighted surface charges of hydrogen bonding donor atoms in the molecule, ΗDCA1 and ΗDCA2, are calculated as follows:
HDCAl = ]T qDsD (eq. 28)
D
Figure imgf000016_0002
where qo is the partial charge on hydrogen bonding donor (H) atom(s) and SD denotes the surface area for this(these) atom(s). The hydrogen atoms directly connected to an electronegative atom in the molecule, e.g. O or N, are accounted as the possible hydrogen bonding donors. Also, the hydrogen atoms in the α-position to carbonyland cyano-groups are considered as possible hydrogen bonding donors. In the case of H-acceptor dependent H- donor surface areas only an equal number of H-donors with the H-acceptor atoms is taken into account for the same reasons of the possible H-bonds in the molecule. [0050] Again, a series of ortho-, meta-, and para- disubstituted benzenes were selected as test compounds to determine the mutual effect of two substituents (X and Y) on the hydrogen bonding abilities of compounds (Fig. 6). The comparison was made between these property values corresponding to the whole molecule and to those calculated using the fragmental QSPR scheme (equation 21). The results reflecting the pair correlation coefficients between those two property sets are given in Table 2 below. There is an excellent correspondence between the hydrogen bonding abilities for the whole molecule and those calculated using the fragmental QSPR scheme. In most cases, the pair correlation between the original and calculated data has R2 > 0.99.
Figure imgf000017_0001
Table 2. Correlation coefficients between the whole molecule hydrogen bonding abilities of disubstituted benzenes and those predicted using the fragmental QSPR (equation 21).
[0051] Fig. 7 illustrates an example of a suitable computing system environment 100 on which a system for the steps of the claimed method and apparatus may be implemented. The computing system environment 100 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the method of apparatus of the claims. Neither should the computing environment 100 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment 100.
[0052] The steps of the claimed method and apparatus are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the methods or apparatus of the claims include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
[0053] The steps of the claimed method and apparatus may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The methods and apparatus may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
[0054] With reference to Fig. 7, an exemplary system for implementing the steps of the claimed method and apparatus includes a general purpose computing device in the form of a computer 110. Components of computer 110 may include, but are not limited to, a processing unit 120, a system memory 130, and a system bus 121 that couples various system components including the system memory to the processing unit 120. The system bus 121 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus also known as Mezzanine bus.
[0055] Computer 110 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by computer 110 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by computer 110. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer readable media.
[0056] The system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132. A basic input/output system 133 (BIOS), containing the basic routines that help to transfer information between elements within computer 110, such as during start-up, is typically stored in ROM 131. RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120. By way of example, and not limitation, Fig. 7 illustrates operating system 134, application programs 135, other program modules 136, and program data 137.
[0057] The computer 110 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only, Fig. 7 illustrates a hard disk drive 140 that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive 151 that reads from or writes to a removable, nonvolatile magnetic disk 152, and an optical disk drive 155 that reads from or writes to a removable, nonvola- tile optical disk 156 such as a CD ROM or other optical media. Other removable/nonremovable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. The hard disk drive 141 is typically connected to the system bus 121 through a non-removable memory interface such as interface 140, and magnetic disk drive 151 and optical disk drive 155 are typically connected to the system bus 121 by a removable memory interface, such as interface 150.
[0058] The drives and their associated computer storage media discussed above and illustrated in Fig. 7, provide storage of computer readable instructions, data structures, program modules and other data for the computer 110. In Fig. 7, for example, hard disk drive 141 is illustrated as storing operating system 144, application programs 145, other program modules 146, and program data 147. Note that these components can either be the same as or different from operating system 134, application programs 135, other program modules 136, and program data 137. Operating system 144, application programs 145, other program modules 146, and program data 147 are given different numbers here to illustrate that, at a minimum, they are different copies. A user may enter commands and information into the computer 20 through input devices such as a keyboard 162 and pointing device 161, commonly referred to as a mouse, trackball or touch pad. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 120 through a user input interface 160 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB). A monitor 191 or other type of display device is also connected to the system bus 121 via an interface, such as a video interface 190. In addition to the monitor, computers may also include other peripheral output devices such as speakers 197 and printer 196, which may be connected through an output peripheral interface 190.
[0059] The computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180. The remote computer 180 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110, although only a memory storage device 181 has been illustrated in Fig. 7. The logical connections depicted in Fig. 7 include a local area network (LAN) 171 and a wide area network (WAN) 173, but may also include other networks. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet. [0060] When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170. When used in a WAN networking environment, the computer 110 typically includes a modem 172 or other means for establishing communications over the WAN 173, such as the Internet. The modem 172, which may be internal or external, may be connected to the system bus 121 via the user input interface 160, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 110, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation, Fig. 7 illustrates remote application programs 185 as residing on memory device 181. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
[0061] Although the forgoing text sets forth a detailed description of numerous different embodiments of the invention, it should be understood that the scope of the invention is defined by the words of the claims set forth at the end of this patent. The detailed description is to be construed as exemplary only and does not describe every possibly embodiment of the invention because describing every possible embodiment would be impractical, if not impossible. Numerous alternative embodiments could be implemented, using either current technology or technology developed after the filing date of this patent, which would still fall within the scope of the claims defining the invention.
[0062] While the fragmental molecular design system and method, and other elements, have been described as preferably being implemented in software, they may be implemented in hardware, firmware, etc., and may be implemented by any other processor. Thus, the elements described herein may be implemented in a standard multi-purpose CPU or on specifically designed hardware or firmware such as an application-specific integrated circuit (ASIC) or other hard-wired device as desired, including, but not limited to, the computer 110 of Fig. 7. When implemented in software, the software routine may be stored in any computer readable memory such as on a magnetic disk, a laser disk, or other storage medium, in a RAM or ROM of a computer or processor, in any database, etc. Likewise, this software may be delivered to a user or a process plant via any known or desired delivery method including, for example, on a computer readable disk or other transportable computer storage mechanism or over a communication channel such as a telephone line, the internet, wireless communication, etc. (which are viewed as being the same as or interchangeable with providing such software via a transportable storage medium).
[0063] Thus, many modifications and variations may be made in the techniques and structures described and illustrated herein without departing from the spirit and scope of the present invention. Accordingly, it should be understood that the methods and apparatus described herein are illustrative only and are not limiting upon the scope of the invention.
REFERENCES
U.S. Patent Documents
1. Cramer, R.D. et al, "Comparative molecular field analysis (COMFA)", US Patent No. 5,307,287; April 26, 1994.
2. Platt, D.E. et al, "System and method for comparative molecular moment analysis (CoMMA)", US Patent No. 5,784,294, July 21, 1998.
3. Cramer, R.D. et al, "Method for selecting an optimally diverse library of small molecules based on validated molecular structural descriptors", US Patent No. 6,185,506; February 6, 2001.
4. Cramer, R.D. et al, "Method for selecting an optimally diverse library of small molecules based on validated molecular structural descriptors", US Patent Application No. 20040215397; October 28, 2004.
5. Hurst, J.R. et al, "Molecular hologram QSAR", US Patent No. 6,208,942; March 27, 2001.
6. Agrafiotis, D. K., "System, method and computer program product for identifying chemical compounds having desired properties", US Patent No. 6,421,612, July 16, 2002.
7. Silverman, B.D., "Determination and use of three-dimensional moments of molecular property fields", US Patent No. 6,671,626 December 30, 2003.
8. Gottfries, J.; Oprea, T., "Global method for mapping property spaces", US Patent No. 6,675,136; January 6, 2004.
9. Labute, P.R., "Method for determining discrete quantitative structure activity relationships", US Patent No. 6,691,045; February 10, 2004.
10. Lobanov, V.S., et al, "Method, system, and computer program product for determining properties of combinatorial library products from features of library building blocks", US Patent No. 6,834,239; December 21, 2004.
11. Schwartz, S.D. et al, "Neural network methods to predict enzyme inhibitor or receptor ligand potency", US Patent No. 6,895,396; May 27, 2005. 12. Miller, D.W. et al, "Methods for predicting the biological, chemical, and physical properties of molecules from their spectral properties", US Patent No. 6,898,533; May 24, 2005.
13. Uchiyama, M., et al, "Method and system for predicting pharmacokinetic properties", US Patent Application No. 20030069698; April 10, 2002.
14. Cheng, A.; Merz, K.M. Jr., "System and method for aqueous solubility prediction", US Patent Application No. 20030028330; February 6, 2003.
15. Kovesdi, I. et al, "Method for generating a quantitative structure property activity relationship", US Patent Application No. 20040199334; October 7, 2004.
16. Casset, F. et al, "Computational Design Methods for Making Molecular Mimetics", WO 0109191; February 8, 2001.
17. Lacroix, E. et al, "Computer-Based Method for Macromolecular Engineering and Design", WO 0116810, March 8, 2001.
18. Lacroix, E. et al, "Computer-Based Method for Macromolecular Engineering and Design", US Patent Application 2002072894; June 13, 2002.
19. Archetti, F. et al, "Method of construction and selection of virtual libraries in combinatorial chemistry", EP1628234, February 22, 2006.
Other References
1. D. T. Stanton and P. C. Jurs, Development and Use of Charged Partial Surface Area Structural Descriptors in Computer-Assisted Quantitative Structure-Property Relationship studies, Anal. Chem., 62, 2323-2329 (1990).
2. D.T. Stanton and P.C. Jurs, Computer- Assisted Study of the Relationship Between Molecular Structure and Surface Tension of Organic Compounds, J. Chem. Inf. Comp. ScL, 32, 109-115 (1992).
3. M. Karelson, V. S. Lobanov, and A.R. Katritzky, " Quantum-Chemical Descriptors in QSAR/QSPR Studies'" Chem. Rev., 96, 1027-1043 (1996).
4. A.R. Katritzky, P. Rachwal, K. W. Law, M. Karelson, and V.S. Lobanov, "Prediction of Polymer Glass Transition Temperatures Using a General Quantitative Structure- Property Relationship Treatment", J. Chem. Inf. Comput. ScL, 36, 879-884 (1996). 5. P. Huibers, V.S. Lobanov, D.O. Shah, and M. Karelson, "Prediction of Critical Micelle Concentration Using a General Quantitative Structure-Property Relationship Approach. I. Nonionic Surfactants", Langmuir, 12, 1462-1470 (1996).
6. P. Huibers, A.R. Katritzky, V.S. Lobanov, D.O. Shah, and M. Karelson, "Prediction of Critical Micelle Concentration Using a General Quantitative Structure-Property Relationship Approach. I. Anionic Surfactants", J.Colloid Interf. ScL, 187, 113-120 (1997).
7. A.R. Katritzky, M. Karelson and V.S. Lobanov, "QSPR as a means of predicting and understanding chemical and physical properties in terms of structure", Pure & Appl. Chem., 69 , 246-248 (1997).
8. M.Karelson, Molecular Properties and Spectra in Solution in: "Problem Solving in Computational Molecular Science: Molecules in Different Environments", S. Wilson and .H.F. Diercksen (Eds.), Kluwer Academic Publ, Dordrecht, 1997, 353-387.
9. M.C. Menziani, P.G. De Benedetti, and M. Karelson" Theoretical Descriptors in Quantitative Structure-Affinity and Selectivity Relationship Study of Potent N4- Substituted Arylpiperazine 5-HT1A Receptor Antagonists" Bioorg. & Med. Chem., 6, 535-550 (1998).
10. A.R. Katritzky, S. SiId, and M. Karelson, "Correlation and Prediction of the Refractive Indices of Polymers by QSPR", J. Chem. Inf. Comput. ScL, 38, 1171-1176 (1998).
11. M. Karelson and A. Perkson, "QSPR Prediction of Densities of Organic Liquids, Computers & Chemistry", 23, 49-59 (1999).
12. A.R. Katritzky, K. Chen, Y. Wang, M. Karelson, B. Lucic, N. Trinajstic, T. Suzuki and G. Schuϋrmann, "Prediction of liquid viscosity for organic compounds by a quantitative structure-property relationship", J. Phys. Org. Chem., 12, 1-7 (1999).
13. F. Ignatz-Hoover, A.R. Katritzky, V.S. Lobanov, and M. Karelson, "Insights into Sulfur Vulcanization from QSPR Studies", Rubber Chem. and Technol. 72, 318-333 (1999).
14. A.R. Katritzky, U. Maran, V.S. Lobanov, and M. Karelson" Structurally Diverse QSPR Correlations of Technologically Relevant Physical Properties", J. Chem. Inf. Comput. ScL, 40, 1-18 (2000).
15. M. Karelson, (2000) Molecular Descriptors in QSAR/QSPR, J. Wiley & Sons, New York, 2000, 430 pp. 16. R. Hiob and M. Karelson, Quantitative Relationship between Rate Constants of the Gas Phase Homolysis of C - X Bonds and Molecular Descriptors, J. Chem. Inf. Comput. ScL, 40, 1062-1071 (2000).
17. M. Karelson, S. SiId, and U. Maran, "Non-linear QSAR Treatment of Genotoxic- ity", MoI. Simulat, 24, 229-242 (2000).
18. F. Ignatz-Hoover, R. Petrukhin, M. Karelson, "QSPR correlation of free-radical polymerization chain-transfer constants for styrene", J. Chem. Inf. Comput. ScL, 41, 295- 299 (2001).
19. A.R. Katritzky, R. Jain, A. Lomaka, R. Petrukhin, U. Maran and M. Karelson, "Perspective on the Relationship between Melting Points and Chemical Structure", Crystal Growth & Design, 1, 261-265 (2001).
20. R. Hiob and M.Karelson, "QSPR Models Derived for the Kinetic Data of the Gas- Phase Homolysis of the Carbon-Methyl Bond", Computers & Chemistry, 26, 237-243 (2002).
21. S. SiId and M. Karelson, "A General QSPR Treatment for Dielectric Constants of Organic Compounds", J. Chem. Inf. Comput. ScL, 42, 360-367 (2002).
22. K. Sak, J. Jarv and M. Karelson, "Strain Effect" Descriptors for ATP and ADP Derivatives with Modified Phosphate Groups", Computers & Chemistry, 26, 341-346 (2002).
23. W.L. Fitch, M. McGregor, A.R. Katritzky, A. Lomaka, R. Petrukhin, and M. Karelson, "Prediction of Ultraviolet Spectral Absorbance using Quantitative Structure- Property Relationships", J. Chem. Inf. Comput. ScL, 42, 830 -840, (2002).
24. A.R. Katritzky, D. Tatham, D. Fara, U. Maran, A. Lomaka, and M. Karelson, The Present Utility and Future Potential to Medicinal Chemistry of QSAR/QSPR with Whole Molecule Descriptors, Current Topics in Medicinal Chemistry, 2, 1333-1356 (2002).
25. A.R. Katritzky, P. Oliferenko, A. Oliferenko, A. Lomaka, and M. Karelson, "Nitrobenzene toxicity: QSAR correlations and mechanistic interpretations", J. Phys. Org. Chem., 16, 811-817 (2003).
26. M. Karelson, "Quantum-Chemical Descriptors in QSAR", Chapter 24 in: Computational Medicinal Chemistry and Drug Discovery, Eds. J.P.Tollenaere, P. Bultnick, H. De Winter, W. Langenaeker, Dekker Inc., New York, 2004, pp. 641-668 27. A.R. Katritzky, D.C. Fara, H. Yang, M. Karelson, T. Suzuki, and A. Varnek, "QSPR Modelling of b-Cyclodextrin Complexation Free Energies", J. Chem. Inf. Comput. ScL, 44, 529-541 (2004).
28. A.R. Katritzky, M. Kuanar, D.C. Fara, M. Karelson, and W.E. Acree, "QSPR treatment of rat blood : air, saline : air and olive oil: air partition coefficients using theoretical molecular descriptors", Bioorg. & Med. Chem., 12, 4735-4748 (2004).
29. A.R. Katritzky, D.A. Dobchev, E. Hur, D.C. Fara, and M. Karelson, "QSAR treatment of drugs transfer into human breast milk", Bioorg. & Med. Chem., 13, 1623-1632 (2005).
30. D. Fara, I. Kahn, U. Maran, M. Karelson, P. Andersson, and J. Hermens, "General QSAR Treatment of Soil Sorption Coefficients of Organic Pollutants", J. Chem. Inf. Comput. ScL, 45, 94-105 (2005).

Claims

31. What is claimed is :
1. A method of generating molecular structures having target numerical values of one or more properties, wherein the properties comprise a physical property, a chemical reactivity or a biological activity, and wherein the molecular structures are separated into distinct molecular entities each of which is characterized by a set of molecular descriptors, the method comprising: creating a modeling set as a database of experimental data on one or more of the properties; calculating fragmental molecule descriptors and whole molecule descriptors for one or more molecular structures in the modeling set; developing QSAR/QSPR models based on the fragmental molecule descriptors and the whole molecule descriptors; creating a fragment library for the construction of the novel molecular structures; constructing new molecular structures as a plurality of possible combinations of molecular fragments from the fragment library; predicting molecular structures from the new molecular structures having the best fitting target numerical property values using the QSAR/QSPR models based on the fragmental molecule descriptors; validating the predictions using at least one of the group consisting of: the QSAR/QSPR models with the whole molecule descriptors or experimental measurements on the predicted compounds; and iteratively developing further novel molecular structures.
2. The method of claim 1, wherein each fragment of the novel molecular structure comprises a molecular structure having an enumerable amount of free valencies bridging the fragment to neighboring fragments.
3. The method of claim 1, wherein the molecular fragment and the whole molecule are characterized by molecular descriptors calculated according to one of the group consisting of: theoretically or determined by using the experimental data on the physical, chemical and/or spectroscopic properties of molecules.
4. The method of claim 1, wherein calculating the fragmental molecule descriptors and whole molecule descriptors comprises calculating the molecular descriptors from the total number of atoms in the fragment or the whole molecule, the number and percentage of atoms of a given atomic species, the number and percentage of alicyclic and aromatic carbocycles, heterocycles, the number of functional groups in the fragment or the whole molecule, and the molecular weight of the fragment or the whole molecule.
5. The method of claim 1, wherein calculating the fragmental molecule descriptors and whole molecule descriptors comprises calculating the molecular descriptors from the connectivity matrices between the atoms in the fragment or the whole molecule.
6. The method of claim 5, wherein the atoms comprise carbon atoms.
7. The method of claim 5, wherein the atoms comprise carbon atoms and non-carbon heteroatoms.
8. The method of claim 1, wherein calculating the fragmental molecule descriptors and whole molecule descriptors comprises calculating the molecular descriptors from the atomic constitution and connectivity of the fragments or the whole molecule using information theory.
9. The method of claim 1, wherein calculating the fragmental molecule descriptors and whole molecule descriptors comprises calculating the molecular descriptors from the geometrical parameters of molecular fragments or the whole molecule (bond lengths, bond angles, dihedral angles), the mass distribution in the fragment or the whole molecule and gravitational indices.
10. The method of claim 1, wherein calculating the fragmental molecule descriptors and whole molecule descriptors comprises calculating the molecular descriptors using the three-dimensional molecular structure of the fragments or the whole molecule represented by their three-dimensional atomic coordinates, atomic charges and the size of the atoms in the fragment or the whole molecule.
11. The method of claim 10, wherein calculating the fragmental molecule descriptors and whole molecule descriptors comprises calculating the atomic charges proceeding from Sanderson's electronegativity equalization principle and variations of the Sanderson's elec- tronegativity equalization principle.
12. The method of claim 10, wherein calculating the fragmental molecule descriptors and whole molecule descriptors comprises calculating the atomic charges proceeding from the quantum chemically calculated partial charges.
13. The method of claim 10, wherein calculating the fragmental molecule descriptors and whole molecule descriptors comprises calculating the sizes of atoms proceeding from van der Waals radii of the atoms.
14. The method of claim 10, wherein calculating the fragmental molecule descriptors and whole molecule descriptors comprises calculating the sizes of atoms proceeding from a cutoff electronic charge distribution of the atoms.
15. The method of claim 1, wherein calculating the fragmental molecule descriptors comprises calculating the molecular descriptors for a molecular fragment from a molecular wave function of a molecular structure in the modeling set comprising the fragment and terminating entities at the sites of free valencies of the fragment.
16. The method of claim 15, wherein calculating the fragmental molecule descriptors comprises calculating the molecular wave function is calculated using ab- initio- or semi- empirical quantum chemical methods.
17. The method of claim 15, wherein the molecular descriptors comprise descriptors related to charge distribution in the molecule and polarizabilities, the orbital energies, the excitation energies and the thermodynamic functions of the fragments or the whole molecule.
18. The method of claim 15, wherein the terminating entities comprise hydrogen atoms.
19. The method of claim 15, wherein the terminating entities comprise alkyl groups.
20. The method of claim 15, wherein the terminating entities comprise aryl groups.
21. The method of claim 15, wherein the terminating entities comprise hydrogen atoms described by a different orbital exponential for a given quantum-chemical basis set.
22. The method of claim 1, wherein calculating the whole molecule descriptors comprises calculating descriptors for the whole molecule from the molecular wave function of the whole molecule.
23. The method of claim 22, wherein calculating the whole molecule descriptors comprises calculating the molecular wave function using ab- initio- or semi-empirical quantum chemical methods.
24. The method of claim 1, wherein developing QSAR/QSPR models comprises developing a QSAR/QSPR model that gives the mathematical relationship between the molecular property/reactivity/activity and the fragment or the whole molecule descriptors through multilinear regression treatment.
25. The method of claim 24, wherein the multilinear regression treatment is carried out with the successively increasing number of descriptors and resulting equations with the highest multiple correlation coefficient, and/or cross-validated correlation coefficient, and/or statistical Fischer criterion value are selected.
26. The method of claim 1, wherein developing QSAR/QSPR models comprises developing a QSAR/QSPR model that relates the molecular property/reactivity/activity with the fragment or the whole molecule descriptors through the training of a back-propagation artificial neural network.
27. The method of claim 1, wherein developing QSAR/QSPR models comprises developing a QSAR/QSPR model that relates the molecular property/reactivity/activity with the fragment or the whole molecule descriptors through a genetic algorithm.
28. The method of claim 1, wherein developing QSAR/QSPR models comprises developing a QSAR/QSPR model that relates the molecular property/reactivity/activity with the fragment molecular descriptors and their mathematical products.
29. The method of claim 28, wherein products of molecular descriptors are constructed for adjacent fragments.
30. The method of claim 1, wherein developing QSAR/QSPR models comprises developing a QSAR/QSPR model that relates the molecular property/reactivity/activity with the fragment molecular descriptors, where the descriptor characterized by the minimum value is accounted for only for the fragment that has the lowest value of this descriptor.
31. The method of claim 1, wherein developing QSAR/QSPR models comprises developing a QSAR/QSPR model that relates the molecular property/reactivity/activity with the fragment molecular descriptors, where the descriptor characterized by the maximum value is accounted for only for the fragment that has the highest value of this descriptor.
32. The method of claim 1, wherein developing QSAR/QSPR models comprises developing a QSAR/QSPR model that relates the molecular property/reactivity/activity with the fragment molecular descriptors, where the descriptor characterized by the average value is calculated as the average value of this descriptor for all fragments consisting of the molecule.
33. The method of claim 1, wherein developing QSAR/QSPR models comprises developing a QSAR/QSPR model that relates the molecular property/reactivity/activity with the fragment molecular descriptors, where the descriptor defined through some weighting scheme is calculated using the same weighting scheme over all fragments consisting of the molecule.
34. The method of claim 1, wherein the modeling set as a database of experimental data comprises experimental values for a physical property, chemical reactivity or biological activity for which targeted values of the molecular structure is designed, wherein the data give one-to-one correspondence between the property value and a two-dimensional representation of the molecular structure and have variability in the chemical composition or the nanoscale arrangement of the molecular structures.
35. The method of claim 1, wherein the fragment library comprises any number of molecular fragments having one or more free valencies.
36. The method of claim 35, wherein the fragment library comprises all descriptors appearing in fragmental QSAR/QSPR models for each fragment.
37. A computer readable medium having computer executable instructions for generating molecular structures having target numerical values of one or more properties, wherein the properties comprise a physical property, a chemical reactivity or a biological activity, and wherein the molecular structures are separated into distinct molecular entities each of which is characterized by a set of molecular descriptors, the computer readable medium comprising: a routine stored on the computer readable medium and adapted to be executed by a processor to create a modeling set as a database of experimental data on one or more of the properties; a routine stored on the computer readable medium and adapted to be executed by a processor to calculate fragmental molecule descriptors and whole molecule descriptors for one or more molecular structures in the modeling set; a routine stored on the computer readable medium and adapted to be executed by a processor to develop QSAR/QSPR models based on the fragmental molecule descriptors and the whole molecule descriptors; a routine stored on the computer readable medium and adapted to be executed by a processor to create a fragment library for the construction of the novel molecular structures; a routine stored on the computer readable medium and adapted to be executed by a processor to construct new molecular structures as a plurality of possible combinations of molecular fragments from the fragment library; a routine stored on the computer readable medium and adapted to be executed by a processor to predict molecular structures from the new molecular structures having the best fit- ting target numerical property values using the QSAR/QSPR models based on the frag- mental molecule descriptors; a routine stored on the computer readable medium and adapted to be executed by a processor to validate the predictions using at least one of the group consisting of: the QSAR/QSPR models with the whole molecule descriptors or experimental measurements on the predicted compounds; and a routine stored on the computer readable medium and adapted to be executed by a processor to iteratively develop novel molecular structures.
38. A computer readable medium according to claim 37, wherein the routine to calculate the fragmental molecule descriptors and whole molecule descriptors comprises a routine to calculate the molecular descriptors from the total number of atoms in the fragment or the whole molecule, the number and percentage of atoms of a given atomic species, the number and percentage of alicyclic and aromatic carbocycles, heterocycles, the number of functional groups in the fragment or the whole molecule, and the molecular weight of the fragment or the whole molecule.
39. A computer readable medium according to claim 37, wherein the routine to calculate the fragmental molecule descriptors and whole molecule descriptors comprises a routine to calculate the molecular descriptors from the connectivity matrices between the atoms in the fragment or the whole molecule.
40. A computer readable medium according to claim 37, wherein the routine to calculate the fragmental molecule descriptors and whole molecule descriptors comprises a routine to calculate the molecular descriptors from the atomic constitution and connectivity of the fragments or the whole molecule using information theory.
41. A computer readable medium according to claim 37, wherein the routine to calculate the fragmental molecule descriptors and whole molecule descriptors comprises a routine to calculate the molecular descriptors from the geometrical parameters of molecular fragments or the whole molecule (bond lengths, bond angles, dihedral angles), the mass distribution in the fragment or the whole molecule and gravitational indices.
42. A computer readable medium according to claim 37, wherein the routine to calculate the fragmental molecule descriptors and whole molecule descriptors comprises a routine to calculate the molecular descriptors using the three-dimensional molecular structure of the fragments or the whole molecule represented by their three-dimensional atomic coordinates, atomic charges and the size of the atoms in the fragment or the whole molecule.
43. A computer readable medium according to claim 42, wherein the routine to calculate the fragmental molecule descriptors and whole molecule descriptors comprises a routine to calculate the atomic charges proceeding from Sanderson's electronegativity equalization principle and variations of the Sanderson's electronegativity equalization principle.
44. A computer readable medium according to claim 42, wherein the routine to calculate the fragmental molecule descriptors and whole molecule descriptors comprises a routine to calculate the atomic charges proceeding from the quantum chemically calculated partial charges.
45. A computer readable medium according to claim 42, wherein the routine to calculate the fragmental molecule descriptors and whole molecule descriptors comprises a routine to calculate the sizes of atoms proceeding from a van der Waals radii of the atoms.
46. A computer readable medium according to claim 42, wherein the routine to calculate the fragmental molecule descriptors and whole molecule descriptors comprises a routine to calculate the sizes of atoms proceeding from a cutoff electronic charge distribution of the atoms.
47. A computer readable medium according to claim 37, wherein the routine to calculate the fragmental molecule descriptors comprises a routine to calculate the molecular descriptors for a molecular fragment from a molecular wave function of a molecular structure in the modeling set comprising the fragment and terminating entities at the sites of free valencies of the fragment.
48. A computer readable medium according to claim 47, wherein the routine to calculate the fragmental molecule descriptors comprises a routine to calculate the molecular wave function is calculated using ab- initio- or semi-empirical quantum chemical methods.
49. A computer readable medium according to claim 37, wherein the routine to calculate the whole molecule descriptors comprises a routine to calculate descriptors for the whole molecule from the molecular wave function of the whole molecule.
50. A computer readable medium according to claim 49, wherein the routine to calculate the whole molecule descriptors comprises a routine to calculate the molecular wave function using ab- initio- or semi-empirical quantum chemical methods.
51. A computer readable medium according to claim 37, wherein the routine to develop QSAR/QSPR models comprises a routine to develop a QSAR/QSPR model that gives the mathematical relationship between the molecular property/reactivity/activity and the fragment or the whole molecule descriptors through multilinear regression treatment.
52. A computer readable medium according to claim 37, wherein the routine to develop QSAR/QSPR models comprises a routine to develop a QSAR/QSPR model that relates the molecular property/reactivity/activity with the fragment or the whole molecule descriptors through the training of a back-propagation artificial neural network.
53. A computer readable medium according to claim 37, wherein the routine to develop QSAR/QSPR models comprises a routine to develop a QSAR/QSPR model that relates the molecular property/reactivity/activity with the fragment or the whole molecule descriptors through a genetic algorithm.
54. A computer readable medium according to claim 37, wherein the routine to develop QSAR/QSPR models comprises a routine to develop a QSAR/QSPR model that relates the molecular property/reactivity/activity with the fragment molecular descriptors and their mathematical products.
55. A computer readable medium according to claim 37, wherein the routine to develop QSAR/QSPR models comprises a routine to develop a QSAR/QSPR model that relates the molecular property/reactivity/activity with the fragment molecular descriptors, where the descriptor characterized by the minimum value is accounted for only for the fragment that has the lowest value of this descriptor.
56. A computer readable medium according to claim 37, wherein the routine to develop QSAR/QSPR models comprises a routine to develop a QSAR/QSPR model that relates the molecular property/reactivity/activity with the fragment molecular descriptors, where the descriptor characterized by the maximum value is accounted for only for the fragment that has the highest value of this descriptor.
57. A computer readable medium according to claim 37, wherein the routine to develop QSAR/QSPR models comprises a routine to develop a QSAR/QSPR model that relates the molecular property/reactivity/activity with the fragment molecular descriptors, where the descriptor characterized by the average value is calculated as the average value of this descriptor for all fragments consisting of the molecule.
58. A computer readable medium according to claim 37, wherein the routine to develop QSAR/QSPR models comprises a routine to develop a QSAR/QSPR model that relates the molecular property/reactivity/activity with the fragment molecular descriptors, where the descriptor defined through some weighting scheme is calculated using the same weighting scheme over all fragments consisting of the molecule.
59. A system for generating molecular structures having target numerical values of one or more properties, wherein the properties comprise a physical property, a chemical reactivity or a biological activity, and wherein the molecular structures are separated into distinct molecular entities each of which is characterized by a set of molecular descriptors, the system comprising: a processor; a database adapted to store a modeling set of experimental data on one or more of the properties and fragment data of molecular fragments for the construction of the novel molecular structures; a routine adapted to be executed by the processor to calculate fragmental molecule descriptors and whole molecule descriptors for one or more molecular structures in the modeling set; a routine adapted to be executed by the processor to develop QSAR/QSPR models based on the fragmental molecule descriptors and the whole molecule descriptors; a routine adapted to be executed by the processor to construct new molecular structures as a plurality of possible combinations of molecular fragments from the fragment data; a routine adapted to be executed by the processor to predict molecular structures from the new molecular structures having the best fitting target numerical property values using the QSAR/QSPR models based on the fragmental molecule descriptors; a routine adapted to be executed by the processor to validate the predictions using at least one of the group consisting of: the QSAR/QSPR models with the whole molecule descriptors or experimental measurements on the predicted compounds; and a routine adapted to be executed by the processor to iteratively develop novel molecular structures.
60. A system according to claim 59, wherein the routine to calculate the fragmental molecule descriptors and whole molecule descriptors comprises a routine to calculate the molecular descriptors from the total number of atoms in the fragment or the whole molecule, the number and percentage of atoms of a given atomic species, the number and percentage of alicyclic and aromatic carbocycles, heterocycles, the number of functional groups in the fragment or the whole molecule, and the molecular weight of the fragment or the whole molecule.
61. A system according to claim 59, wherein the routine to calculate the fragmental molecule descriptors and whole molecule descriptors comprises a routine to calculate the molecular descriptors from the connectivity matrices between the atoms in the fragment or the whole molecule.
62. A system according to claim 59, wherein the routine to calculate the fragmental molecule descriptors and whole molecule descriptors comprises a routine to calculate the molecular descriptors from the atomic constitution and connectivity of the fragments or the whole molecule using information theory.
63. A system according to claim 59, wherein the routine to calculate the fragmental molecule descriptors and whole molecule descriptors comprises a routine to calculate the molecular descriptors from the geometrical parameters of molecular fragments or the whole molecule (bond lengths, bond angles, dihedral angles), the mass distribution in the fragment or the whole molecule and gravitational indices.
64. A system according to claim 59, wherein the routine to calculate the fragmental molecule descriptors and whole molecule descriptors comprises a routine to calculate the molecular descriptors using the three-dimensional molecular structure of the fragments or the whole molecule represented by their three-dimensional atomic coordinates, atomic charges and the size of the atoms in the fragment or the whole molecule.
65. A system according to claim 64, wherein the routine to calculate the fragmental molecule descriptors and whole molecule descriptors comprises a routine to calculate the atomic charges proceeding from Sanderson's electronegativity equalization principle and variations of the Sanderson's electronegativity equalization principle.
66. A system according to claim 64, wherein the routine to calculate the fragmental molecule descriptors and whole molecule descriptors comprises a routine to calculate the atomic charges proceeding from the quantum chemically calculated partial charges.
67. A system according to claim 64, wherein the routine to calculate the fragmental molecule descriptors and whole molecule descriptors comprises a routine to calculate the sizes of atoms proceeding from van der Waals radii of the atoms.
68. A system according to claim 64, wherein the routine to calculate the fragmental molecule descriptors and whole molecule descriptors comprises a routine to calculate the sizes of atoms proceeding from a cutoff electronic charge distribution of the atoms.
69. A system according to claim 59, wherein the routine to calculate the fragmental molecule descriptors comprises a routine to calculate the molecular descriptors for a molecular fragment from a molecular wave function of a molecular structure in the modeling set comprising the fragment and terminating entities at the sites of free valencies of the fragment.
70. A system according to claim 69, wherein the routine to calculate the fragmental molecule descriptors comprises a routine to calculate the molecular wave function is calculated using ab- initio- or semi-empirical quantum chemical methods.
71. A system according to claim 59, wherein the routine to calculate the whole molecule descriptors comprises a routine to calculate descriptors for the whole molecule from the molecular wave function of the whole molecule.
72. A system according to claim 71, wherein the routine to calculate the whole molecule descriptors comprises a routine to calculate the molecular wave function using ab- initio- or semi-empirical quantum chemical methods.
73. A system according to claim 59, wherein the routine to develop QSAR/QSPR models comprises a routine to develop a QSAR/QSPR model that gives the mathematical relationship between the molecular property/reactivity/activity and the fragment or the whole molecule descriptors through multilinear regression treatment.
74. A system according to claim 59, wherein the routine to develop QSAR/QSPR models comprises a routine to develop a QSAR/QSPR model that relates the molecular property/reactivity/activity with the fragment or the whole molecule descriptors through the training of a back-propagation artificial neural network.
75. A system according to claim 59, wherein the routine to develop QSAR/QSPR models comprises a routine to develop a QSAR/QSPR model that relates the molecular property/reactivity/activity with the fragment or the whole molecule descriptors through a genetic algorithm.
76. A system according to claim 59, wherein the routine to develop QSAR/QSPR models comprises a routine to develop a QSAR/QSPR model that relates the molecular property/reactivity/activity with the fragment molecular descriptors and their mathematical products.
77. A system according to claim 59, wherein the routine to develop QSAR/QSPR models comprises a routine to develop a QSAR/QSPR model that relates the molecular property/reactivity/activity with the fragment molecular descriptors, where the descriptor characterized by the minimum value is accounted for only for the fragment that has the lowest value of this descriptor.
78. A system according to claim 59, wherein the routine to develop QSAR/QSPR models comprises a routine to develop a QSAR/QSPR model that relates the molecular property/reactivity/activity with the fragment molecular descriptors, where the descriptor characterized by the maximum value is accounted for only for the fragment that has the highest value of this descriptor.
79. A system according to claim 59, wherein the routine to develop QSAR/QSPR models comprises a routine to develop a QSAR/QSPR model that relates the molecular property/reactivity/activity with the fragment molecular descriptors, where the descriptor characterized by the average value is calculated as the average value of this descriptor for all fragments consisting of the molecule.
80. A system according to claim 59, wherein the routine to develop QSAR/QSPR models comprises a routine to develop a QSAR/QSPR model that relates the molecular property/reactivity/activity with the fragment molecular descriptors, where the descriptor defined through some weighting scheme is calculated using the same weighting scheme over all fragments consisting of the molecule.
81. A system according to claim 59, wherein the modeling set of experimental data comprises experimental values for a physical property, chemical reactivity or biological activity for which targeted values of the molecular structure is designed, wherein the data give one-to-one correspondence between the property value and a two-dimensional representation of the molecular structure and have variability in the chemical composition or the nanoscale arrangement of the molecular structures.
82. A system according to claim 59, wherein the fragment data comprises any number of molecular fragments having one or more free valencies.
83. A system according to claim 59, wherein the fragment data comprises all descriptors appearing in fragmental QSAR/QSPR models for each fragment.
84. A method of generating molecular structures having target numerical values of one or more properties, wherein the properties comprise a physical property, a chemical reactivity or a biological activity, and wherein the molecular structures are separated into distinct molecular entities each of which is characterized by a set of molecular descriptors, the method comprising: calculating fragmental molecule descriptors and whole molecule descriptors for one or more molecular structures in a modeling set of experimental data on one or more of the properties; developing QSAR/QSPR models based on the fragmental molecule descriptors and the whole molecule descriptors; constructing new molecular structures as a plurality of possible combinations of molecular fragments from a fragment library for the construction of the novel molecular structures; predicting molecular structures from the new molecular structures having the best fitting target numerical property values using the QSAR/QSPR models based on the fragmental molecule descriptors; and validating the predictions using at least one of the group consisting of: the QSAR/QSPR models with the whole molecule descriptors or experimental measurements on the predicted compounds.
85. The method of claim 84, further comprising iteratively repeating the method to construct further molecular structures.
PCT/EP2007/052856 2007-03-26 2007-03-26 Method and apparatus for the design of chemical compounds with predetermined properties WO2008116495A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2007/052856 WO2008116495A1 (en) 2007-03-26 2007-03-26 Method and apparatus for the design of chemical compounds with predetermined properties

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2007/052856 WO2008116495A1 (en) 2007-03-26 2007-03-26 Method and apparatus for the design of chemical compounds with predetermined properties

Publications (1)

Publication Number Publication Date
WO2008116495A1 true WO2008116495A1 (en) 2008-10-02

Family

ID=39033727

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2007/052856 WO2008116495A1 (en) 2007-03-26 2007-03-26 Method and apparatus for the design of chemical compounds with predetermined properties

Country Status (1)

Country Link
WO (1) WO2008116495A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011041247A1 (en) 2009-10-02 2011-04-07 Exxonmobil Research And Engineering Company A system for the determination of selective absorbent molecules through predictive correlations
WO2019172280A1 (en) * 2018-03-09 2019-09-12 昭和電工株式会社 Polymer physical property prediction device, storage medium, and polymer physical property prediction method
US10515715B1 (en) 2019-06-25 2019-12-24 Colgate-Palmolive Company Systems and methods for evaluating compositions
CN111986735A (en) * 2020-08-19 2020-11-24 兰州大学 Computing method for predicting multipole distance of atoms in RNA by using ARDGPR model
CN112634992A (en) * 2020-12-29 2021-04-09 上海商汤智能科技有限公司 Molecular property prediction method, training method of model thereof, and related device and equipment
CN113223632A (en) * 2021-05-12 2021-08-06 北京望石智慧科技有限公司 Molecular fragment library determination method, molecular segmentation method and device
US11126695B2 (en) 2018-11-02 2021-09-21 Showa Denko K.K. Polymer design device, polymer design method, and non-transitory recording medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6421612B1 (en) * 1996-11-04 2002-07-16 3-Dimensional Pharmaceuticals Inc. System, method and computer program product for identifying chemical compounds having desired properties
EP1589463A1 (en) * 2004-04-21 2005-10-26 Avantium International B.V. Molecular entity design method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6421612B1 (en) * 1996-11-04 2002-07-16 3-Dimensional Pharmaceuticals Inc. System, method and computer program product for identifying chemical compounds having desired properties
EP1589463A1 (en) * 2004-04-21 2005-10-26 Avantium International B.V. Molecular entity design method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SILD SULEV ET AL: "Open computing grid for molecular science and engineering.", JOURNAL OF CHEMICAL INFORMATION AND MODELING 2006 MAY-JUN, vol. 46, no. 3, May 2006 (2006-05-01), pages 953 - 959, XP002473708, ISSN: 1549-9596 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011041247A1 (en) 2009-10-02 2011-04-07 Exxonmobil Research And Engineering Company A system for the determination of selective absorbent molecules through predictive correlations
EP2517075A4 (en) * 2009-10-02 2016-11-02 Exxonmobil Res & Eng Co A system for the determination of selective absorbent molecules through predictive correlations
WO2019172280A1 (en) * 2018-03-09 2019-09-12 昭和電工株式会社 Polymer physical property prediction device, storage medium, and polymer physical property prediction method
US11915799B2 (en) 2018-03-09 2024-02-27 Resonac Corporation Polymer physical property prediction device, recording medium, and polymer physical property prediction method
JP6633820B1 (en) * 2018-03-09 2020-01-22 昭和電工株式会社 Apparatus, program, and method for predicting physical properties of polymer
JP2020074095A (en) * 2018-03-09 2020-05-14 昭和電工株式会社 Physical property prediction device for polymer, program, and physical property prediction method for polymer
CN111819441A (en) * 2018-03-09 2020-10-23 昭和电工株式会社 Polymer physical property prediction device, storage medium, and polymer physical property prediction method
JP7217696B2 (en) 2018-03-09 2023-02-03 昭和電工株式会社 Apparatus for predicting physical properties of polymer, program, and method for predicting physical properties of polymer
US11126695B2 (en) 2018-11-02 2021-09-21 Showa Denko K.K. Polymer design device, polymer design method, and non-transitory recording medium
US10861588B1 (en) 2019-06-25 2020-12-08 Colgate-Palmolive Company Systems and methods for preparing compositions
US10839941B1 (en) 2019-06-25 2020-11-17 Colgate-Palmolive Company Systems and methods for evaluating compositions
US11315663B2 (en) 2019-06-25 2022-04-26 Colgate-Palmolive Company Systems and methods for producing personal care products
US11342049B2 (en) 2019-06-25 2022-05-24 Colgate-Palmolive Company Systems and methods for preparing a product
US10839942B1 (en) 2019-06-25 2020-11-17 Colgate-Palmolive Company Systems and methods for preparing a product
US11728012B2 (en) 2019-06-25 2023-08-15 Colgate-Palmolive Company Systems and methods for preparing a product
US10515715B1 (en) 2019-06-25 2019-12-24 Colgate-Palmolive Company Systems and methods for evaluating compositions
CN111986735A (en) * 2020-08-19 2020-11-24 兰州大学 Computing method for predicting multipole distance of atoms in RNA by using ARDGPR model
CN112634992A (en) * 2020-12-29 2021-04-09 上海商汤智能科技有限公司 Molecular property prediction method, training method of model thereof, and related device and equipment
CN113223632A (en) * 2021-05-12 2021-08-06 北京望石智慧科技有限公司 Molecular fragment library determination method, molecular segmentation method and device
CN113223632B (en) * 2021-05-12 2024-02-13 北京望石智慧科技有限公司 Determination method of molecular fragment library, molecular segmentation method and device

Similar Documents

Publication Publication Date Title
Irwin et al. Practical applications of deep learning to impute heterogeneous drug discovery data
Hutchinson et al. Solvent-specific featurization for predicting free energies of solvation through machine learning
Vidal et al. LINGO, an efficient holographic text based method to calculate biophysical properties and intermolecular similarities
Jónsdóttir et al. Prediction methods and databases within chemoinformatics: emphasis on drugs and drug candidates
Rybinska-Fryca et al. Prediction of dielectric constant of ionic liquids
Sheridan et al. Experimental error, kurtosis, activity cliffs, and methodology: what limits the predictivity of quantitative structure–activity relationship models?
WO2008116495A1 (en) Method and apparatus for the design of chemical compounds with predetermined properties
Visscher et al. Deriving force-field parameters from first principles using a polarizable and higher order dispersion model
Manallack et al. A consensus neural network-based technique for discriminating soluble and poorly soluble compounds
Befort et al. Machine learning directed optimization of classical molecular modeling force fields
Goll et al. Prediction of the normal boiling points of organic compounds from molecular structures with a computational neural network model
Giese et al. Parametrization of an orbital-based linear-scaling quantum force field for noncovalent interactions
Casciuc et al. Virtual screening with generative topographic maps: how many maps are required?
Zankov et al. QSAR modeling based on conformation ensembles using a multi-instance learning approach
Jiang et al. Guiding conventional protein–ligand docking software with convolutional neural networks
Mondal et al. Exploring the effectiveness of binding free energy calculations
Chaudhari et al. Hydration of Kr (aq) in dilute and concentrated solutions
Vella Fick diffusion coefficients of the gaseous CH4–CO2 system from molecular dynamics simulations using TraPPE force fields at 101.325, 506.625, 1013.25, 2533.12, and 5066.25 kPa
Duarte Ramos Matos et al. Infinite dilution activity coefficients as constraints for force field parametrization and method development
Zeng et al. QDπ: A quantum deep potential interaction model for drug discovery
Fernández et al. QSAR modeling of matrix metalloproteinase inhibition by N-hydroxy-α-phenylsulfonylacetamide derivatives
Konovalov et al. Statistical confidence for variable selection in QSAR models via Monte Carlo cross-validation
Beckner et al. Fantastic liquids and where to find them: Optimizations of discrete chemical space
Carrillo-Parramon et al. Flexible and comprehensive implementation of MD-PMM approach in a general and robust code
De Sancho et al. Identification of mutational hot spots for substrate diffusion: Application to myoglobin

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07727329

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07727329

Country of ref document: EP

Kind code of ref document: A1