WO2012028962A2

WO2012028962A2 - Pharmacophore toxicity screening

Info

Publication number: WO2012028962A2
Application number: PCT/IB2011/002721
Authority: WO
Inventors: Nicolas Delacotte; Alexandre Jacob; Philippe Manivet
Original assignee: Bioquanta Sa
Priority date: 2010-09-01
Filing date: 2011-09-01
Publication date: 2012-03-08
Also published as: WO2012028962A3

Abstract

The disclosure provides a software method of securely encoding a text file comprising spherical mapping coordinates describing a pharmacophore molecule which can be generated from a compound by a user. The encoded pharmacophore allows for secure electronic transfer from the user to an expert for in silico analysis without revealing the structure of the compound. In one aspect, the in silico analysis is used to provide toxicity predictions by superimposing the encoded pharmacophore molecule with those of an expert database of toxic compounds that are well-characterized by in vitro or in vivo analysis in standardized tests..

Description

PHARMACOPHORE TOXICITY SCREENING

Cross Reference to Related Applications

[0001] This application claims the benefit of U.S. Provisional Application No. 61/379,350, filed Sept. 1, 2010, and U.S. Provisional Application No. 61/379,326, filed Sept. 1, 2010, the entire disclosures of both applications are hereby

incorporated by reference herein in their entirety.

Technical Field

[0002] The document discloses apparatuses and methods related to processing and analyzing pharmacophores.

Background

„[0003] Toxicity is a recurring problem in the chemical, pharmaceutical and cosmetic industries. Numerous studies show the high financial impact of failure during clinical development. The European Union REACH principles, and their US equivalent ToxCast^®, are now requesting that companies manufacturing or selling chemical substances publish toxicological data about these substances. See, e.g., Huynh, L. et al., Drug Discov. Today 14, 401-405 (2009). These industries are very concerned not only by the potential legal issues of a toxic product but also by the potentially devastating effect on their name and brand reputation if toxicity occurs in a marketed product. There is growing pressure to reduce animal testing for all kinds of chemicals, including pharmaceutical compounds. Moreover, there is growing pressure to reduce healthcare expenditures and consequently drug prices.

[0004] Meanwhile, productivity appears to be hamstrung at several key steps of the current drug development paradigm, notably in the formation of bottlenecks and the late-stage discovery of adverse ADME-Tox properties. The high attrition rate in late stages of drug development displayed by the current paradigm may in large part be caused by the retention of drug candidates with poor ADME-Tox properties.

Indeed, deficiencies in ADME-Tox parameters are still present in 50% of marketed drugs, causing undesirable side effects and adverse drug reactions. Early stage toxicity screening and characterization is clearly a key strategy to changing numbers like these. Current technologies applied to these problems, such as statistical- correlation QSAR, lack the predictive power to truly improve the efficiency of the process. R&D costs have nearly tripled since 1995 without being able to produce a corresponding increase in the number of new drugs on the market. Another challenge to drug development is increasing regulatory demands for proof of safety and efficacy before market release. REACH, GHS, and other industry-specific legislation has created a demand for toxicity data early in the development process for molecules. Together, these forces are coinciding to increase demand for early stage determination of ADME-Tox parameters with a reliability that current systems cannot provide.

[0005] In silico methods for modeling and predicting ADME-Tox properties for any xenobiotic, including drugs, are becoming increasingly popular. This is due to the recent emergence of the REACH principles in EU and their US equivalence ToxCast, both promoting the application of computational toxicology to assess the risk of chemicals poses to human health and environment. Existing technologies for toxicity prediction generally fall into two categories: knowledge-based expert systems and statistical-correlation QSAR (Quantitative Structure-Activity

Relationships). Each has some fundamental limitations. Expert systems and QSAR share limitations in the fidelity of their molecular structure description. They also have limitations in how they establish the mapping from molecular structure to physiological activity. Their limitations prevent them from achieving enough predictive accuracy to truly change the drug discovery and development paradigm, or really solve environment problems or reduce animal model use.

[0006] One main limitation of currently employed in silico methods is in the representation of molecular structure. Both expert systems and QSARs often rely on a fragment-based representation of molecular features thought to be important for physiological activity. However, many molecular attributes important for interaction with biological targets (such as electronic features) should be evaluated over the entire molecule. Another aspect of molecular structure missing from a fragment- based approach is consideration of different conformers, where each conformer and isomers of the same molecule likely to form in a biological environment. A conformer typically describes one of a set of stereoisomers, each of which is characterized by a conformation, which is the spatial arrangement of atoms in the pharmacophore, corresponding to a distinct energy minimum. The conformation affords distinction between stereoisomers that can be interconverted by rotations about formally single bonds and can encompass inversion at trigonal pyramidal centers and other polytopal rearrangements.

[0007] Another main limitation in the currently employed in silico methods is in the comparison of molecular structures. Expert systems rely on lists of rules (called rulebases) to classify a compound as potentially toxic based on the fragment based analysis. Therefore, the accuracy of toxicity predictions from expert systems depends on the completeness of their rulebase. The quantity and quality of manually curated rules included in such a system is unlikely to capture all of the key molecular features important for activity among the conformational and isomeric space available to a given molecule.

[0008] In the case of most QSAR methods, molecular comparisons are also a step that depends heavily on manual intervention, and less on a high-fidelity and objective molecular description. QSARs depend on the correct grouping of molecules both in their initial formulation, and in their selection for the evaluation of a given molecule. The grouping of molecules in QSARs is not completely objective. There are many different and overlapping schemes for grouping, and there is not a unique mapping from molecular structure features (as determined by the fragment- based approach) to these categories.

[0009] Grouping molecules sharply limits the applicability to molecules likely to be encountered under the very wide scope of regulatory mandates (REACH, GHS), or even the large compound libraries pharmaceutical companies screen in primary high-throughput screening (HTS) campaigns. Application of these existing methods to such large sets of molecules is initially attractive because of their speed. However, this comes at the expense of realism, which leads to poor predictive power. Trading accuracy for speed is a limit that can be only broken by a fundamentally new approach to representing molecular reality. Clearly, a more efficient, accurate method of generating in silico compound toxicity predictions is desirable. [0010] In order to reduce costs while benefiting from external expertise, pharmaceutical industry already outsources numerous in vitro and in vivo steps in early drug selection and validation process. See, e.g., Tralau- Stewart, C.J. et al. Drug Discov. Today 14, 95-101 (2009). Used in combination with in vitro and in vivo assays, in silico methods enable significant cost reduction in the screening of a large library of compounds. Computational methods offer a reliable, cheaper and faster alternative way of surveying chemical space for toxic dead-ends in

development. However, handling unpatented chemical compounds raises the problem of confidentiality and related intellectual property protection.

[0011] Despite limitations, conventional in silico methods are currently used for drug design or for ADME-T (Absorption, Disposition, Metabolism, Excretion and Toxicity) pharmacological parameter modeling and prediction. They can be QSAR (Quantitative Structure/ Activity Relationships) model elaboration, molecular dynamics, ligand-protein docking, de novo ligand design, etc. See, e.g., Jacob, A. et al. Drug Discov. Today 14, 406-412 (2009). These computational tools often require explicit information regarding the structure of confidential compounds and/or biomolecules to visualize and handle said compounds and/or biomolecules. Chemical and biological data is often transferred by a scientist or company to an outside contractor to analyze the pharmacophore to identify matching molecules, toxicity, and the like. Disclosure and handling of chemical compounds raises problems of confidentiality between the scientist/company and a contractor, which is especially critical in early research and development (R&D) stages for many industries. The exchanged information represents a highly valuable asset and requires heavy investments. Additionally, even if the confidentiality between the scientist/company and contractor is preserved, the risk of illegal misuse by a third party is present, especially when data are electronically processed with computers connected to the Internet. Hackers could penetrate the outside contractors' electronic security and access confidential intellectual property belonging to others.

[0012] Insufficient levels of protection can be an obstacle to the flow of confidential data that is essential for further drug development. A lack of security can impede the chemical (e.g., pharmaceutical) and biotechnology industries to outsource early development phases to in silico expert teams, which can

substantially increase the cost of developing new drugs.

[0013] Therefore there is an unmet need to maximize intellectual property security when transferring or handling raw data, including chemical structures, in a workable form to external drug designers and molecular modelers for in silico analysis.

Summary

[0014] One aspect of this document is directed to an apparatus for encoding data representative of the chemical structure of a molecule. The apparatus comprises memory storing chemical structures representative of a plurality of compounds and an input device. At least one compound comprises a plurality of features. A programmable circuit is in electrical communication with the memory and the input device, the programmable circuit programmed to select a chemical structure upon receiving input from the input device; selectively identify a subset of the data, the subset of data comprising data representing a subset of features from the compound, selectively screen data not included in the subset of data, and storing the subset of data in a data file in the memory.

[0015] Another aspect of this patent document is an apparatus for encoding data representative of a pharmacophore from a molecular compound. The apparatus comprises memory storing chemical structures representative of a plurality of compounds and an input device. At least one compound comprising pharmacophore features and non-pharmacophore features. A programmable circuit is in electrical communication with the memory and the input device, the programmable circuit programmed to select a chemical structure, the selected chemical structure having pharmacophore features and non-pharmacophore features, identify a set of pharmacophore features, screen the features not in the identified set of

pharmacophore features, and encode the data representative of the pharmacophore features.

[0016] Yet another aspect of this patent document is a method of encoding data representative of the chemical structure of a molecule. The method comprises providing electronic data representing a chemical structure of a compound, the compound having a plurality of features; identifying a subset of the data, the subset of data comprising data representing a subset of features from the compound;

screening data not included in the subset of data; and storing the subset of data in a data file.

[0017] Another aspect is a method of encoding data representative of a pharmacophore from a molecular compound. The method comprises providing data representative of a chemical structure of a molecule, the chemical structure including pharmacophore features and non-pharmacophore features; identifying a set of pharmacophore features; screening the features not in the identified set of pharmacophore features; and encoding the data representative of the pharmacophore features.

[0018] Another aspect is a computer-based method for automatically generating a decoyed pharmacophore molecule from a compound. The method comprises inputting the structure of a compound of interest; modeling the compound to obtain a three-dimensional chemical structure of the compound in a specific conformation; computing a pharmacophore molecule from the specific conformation, the pharmacophore molecule containing one or more real pharmacophore features defined in spherical coordinates; erasing the structure of the compound; generating one or more decoy pharmacophore features defined in spherical coordinates; and adding the decoy pharmacophore features to the real pharmacophore features to create a decoyed pharmacophore molecule.

[0019] Yet another aspect is a method for allowing secure transfer of chemical structural data for a proprietary compound of interest from a user to a third-party for in silico analysis without disclosing the proprietary compound structure. The method comprises transmitting the decoyed pharmacophore molecule from the user to the third-party on any support selected from electronic, DVD, or CD; supplying the third-party with a key for de-encrypting the list of unique identification numbers for the decoy features; and removing the decoy features of the decoyed

pharmacophore prior to pharmacophore analysis by the third-party. [0020] Another aspect of the patent document is a method for predicting toxicity for a source molecule. The method comprises comparing (a) one or more source pharmacophore molecule(s) derived from the source molecule, to (b) a library of multiple target pharmacophore molecules derived from known compounds with known toxicological data in one or more biological assays. Geometric similarity between the one or more source pharmacophore molecule(s) and a target pharmacophore molecule for a known compound is predictive of similarity in toxicological profile of the source molecule compared to the toxicological profile of the known compound.

[0021] Yet another aspect is a method for predicting physiological activity for a source molecule. The method comprises comparing: (a) one or more source pharmacophore molecule(s) derived from the source molecule, to (b) a library of multiple target pharmacophore molecules derived from known compounds with known data in one or more biological assays. Geometric similarity between the one or more source pharmacophore molecule(s) and a target pharmacophore molecule for a known compound is predictive of similarity in physiological activity of the source molecule compared to the physiological activity of the known compound.

Brief Description of the Drawings

[0022] Figures 1 A-1C illustrate the conversion of a chemical structure to a pharmacophore structure.

[0023] Figure 2 illustrates the pharmacophore shown in Figure 1C in spherical coordinates, but it is rotated 180 degrees.

[0024] Figure 3 illustrates the hydrophobic volume ellipsoid representing an aromatic ring of the pharmacophore shown in Figures 1 and 2.

[0025] Figure 4 illustrates a structure for data describing a pharmacophore.

[0026] Figure 5 illustrates a structure for data describing vectors, the vectors describing a pharmacophore. [0027] Figures 6A and 6B illustrates a system in which the processes and apparatuses describe herein can be used.

[0028] Figures 7A-7G is a flow diagram of a method for encoding a

pharmacophore.

[0029] Figure 8 illustrates the format of a file used to store structural data about a pharmacophore, and stores data in the data structures shown in Figures 5 and 6.

[0030] Figure 9 illustrates the format of a file used to store the file shown in Figure 8 and other information useful for analyzing the pharmacophore.

[0031] Figure 10 illustrates a process of multidimensional pharmacophore screening of a source pharmacophore molecule versus a toxic pharmacophore database.

[0032] Figure 11 illustrates physicochemical properties of the various pharmacophore points superimposed over the chemical structure from which they originate.

[0033] Figure 12 illustrates a data warehouse containing databases of information for analyzing pharmacophores.

[0034] Figure 13 illustrates an online analytical processing (OLAP) cube for searching multidimensional information stored in the data warehouse shown in Figure 12 and analyzing pharmacophores described in the data structures shown in Figures 5 and 6 and files shown in Figures 8 and 9.

[0035] Figures 14A-14C illustrate evaluation of database diversity in terms of compound size, chemical function diversity per molecule, and chemical function complexity, respectively.

[0036] Figures 15 A and 15B illustrate evaluation of the database pharmacophore diversity in terms of the number of different pharmacophoric center types per pharmacophore, and the number of pharmacophoric centers per pharmacophore, respectively. [0037] Figure 16 provides the equations for conversion between spherical and Cartesian coordinates to describe the pharmacophore.

[0038] Figures 17A and 17B illustrate the definition of the directionalities of classical Hydrogen bonds and Hydrogen bonds involving halogens where Y= 0(sp³), S(sp³), N(sp³), or N(sp²) and X= 0(sp³), S(sp³), and where D = O, N, S and F can be replaced by CI, Br or I, respectively.

[0039] Figure 18 illustrates the definition of the directionality of canonical halogen bonds.

[0040] Figures 19 A- 19D provide the definition of different anisotropic non- bonded interactions involving π orbital electrons for cation-, anion-, Hydrogen-, and halogen-π, respectively.

[0041] Figures 20A-20C illustrate scatter plots of various pharmacophore ellipsoid volumes with x, y, and z axis parameters in angstroms, viewed along each x, y and z axis, respectively.

[0042] Figures 21 A and 2 IB illustrate scatter plot of parent pharmacophores ellipsoids in Figure 21 A compared to a scatter plot of isomers pharmacophores ellipsoids in Figure 2 IB.

[0043] Figure 22 provides a Receiver Operating Characteristic (ROC) analysis confusion matrix.

[0044] Figures 23 A-23E show ROC analyses to provide predictivity of an in silico MultiDIP Tox screening method compared to the actual data from known toxicology databases in five assays.

[0045] Figure 24 shows one source pharmacophore for Mefenorex, as discussed in Example 2.

[0046] Figure 25 shows one target pharmacophore representation of one isomer of Mephentermine. [0047] Figure 26 shows the source pharmacophore representation for Mefenorex superimposed with the target pharmacophore representation for Mephentermine.

[0048] Figure 27 shows one target pharmacophore representation of

Amphetamine.

[0049] Figure 28 shows the source pharmacophore for Mefenorex superimposed with the target pharmacophore for Amphetamine.

Detailed Description

[0050] Various embodiments will be described in detail with reference to the drawings, wherein like reference numerals represent like parts and assemblies throughout the several views. Reference to various embodiments does not limit the scope of the claims attached hereto. Additionally, any examples set forth in this specification are not intended to be limiting and merely set forth some of the many possible embodiments for the appended claims.

[0051] In general terms, this patent document relates to several aspects of apparatuses and methods for analyzing pharmacophores. For example, the apparatuses and methods provide a simple, efficient, and secure way to encode data describing a pharmacophore from a source molecule without disclosing additional information about the source molecule itself. An advantage of this encoding is that it provides a digital representation of the pharmacophore that does not include the confidential chemical structure of the underlying source molecule from which the pharmacophore is associated, but still allows evaluation and safe outsourcing of QSAR studies, chemical database screening, ADME-Tox predictions, ligand docking, etc. Another advantage is that this encoding is that a scientist investigating a compound can send data about the pharmacophore to a third party for analysis without the risk of disclosing the identity or structure of the underlying source molecule being studied.

[0052] In another example, the apparatuses and methods analyze the

pharmacophore to generate compound toxicity predictions by superimposing each pharmacophore of interest with those of an expert database of toxic compounds well characterized by in vitro and in vivo standardized tests. These predictions can then be returned to the scientist investigating the molecule underlying the pharmacophore without the scientist ever having to disclose the molecule they are analyzing.

Additionally, the apparatuses and methods for generating the toxicity predictions provide faster access information located in a variety of different databases. They also enable faster superimposition of the pharmacophore with potential toxic compounds and faster generation or toxicity predictions.

[0053] In exemplary embodiments described herein, pharmacophores are a molecular framework that carries the essential features responsible for a source molecule. The source molecule is in a three-dimensional conformation responsible for conferring biochemical or pharmacological effects, and is commonly a drug molecule that is being researched, studied, or otherwise analyzed. The source molecule may represent a minimum energy conformation, and can be a small molecule or a biomolecule (e.g., large molecule). Examples of biomolecules include, but are not limited to, polynucleotides, amino acids, peptides, polypeptides, proteins, sugars, carbohydrates, fatty acids, lipids, steroids, and combinations thereof (e.g., glycoproteins, ribonucleoproteins, lipoproteins).

[0054] A pharmacophore is typically a substructure of the source molecule, and commonly possesses a collection of functional groups in a three-dimensional configuration that is substantially identical to the three-dimensional arrangements compared to an active protein, small molecule, toxic molecule, or other compounds of interest.

[0055] In some embodiments, the pharmacophore is converted from a two- dimensional to a three-dimensional structure to obtain one or more low energy conformations. In other embodiments, pharmacophores can be derived from a three- dimensional structure of a molecule provided by X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, or the like. Suitable pharmacophores also can be derived from homology models based on the structures of related compounds or from three-dimensional structure-activity relationships. For example,

pharmacophores of the Platelet Factor 4 (PF4) polypeptide can be derived from the analysis of point mutations and evaluation of effects those mutations had on PF4 activity. PF4, also known as CXCL4, is a member of the CXC sub-family of chemokines derived from platelets and has been found to play a role in atherosclerosis and angiogenesis. See, for example, Manivet et al.,

US2008/0305041, the entire disclosure of which is incorporated herein by reference. Suitable PF4 pharmacophores were deduced or derived by correlating the effects of point mutations to three-dimensional, homology models of a mature PF4

polypeptide. In other embodiments, the pharmacophore is already in a three- dimensional structure and does not need to be converted. In other embodiments, the analysis is performed on a two-dimensional structure or a structure having some other dimension.

[0056] In exemplary embodiments, the apparatuses and methods processed disclosed herein are used herein to study and analyze ligand-based pharmacophores rather than strictly target-based pharmacophores, although embodiments can be used to study and analyzed all types of pharmacophores and other substructures of a molecule.

[0057] Figures 1A-1C broadly illustrates the initial isolation of a spherical pharmacophore from a source molecule or other chemical structure as part of the encoding process. Although the embodiments disclosed herein relate to

pharmacophores, the apparatus and methods disclosed herein can be used to encode and analyze other, non-pharmacophore substructures from a molecule. A source molecule is a candidate compound molecule to be converted to a geodesic pharmacophore and encoded for the purposes of in silico analysis.

[0058] Figure 1 A illustrates a classical ball and stick three dimensional model of (2R)-4-chloro-2-hydroxy-2-[4-(2-hydroxyethyl)phenyl]butan- 1 -aminium, although a pharmacophore from any compound or molecule of interest can be isolated for encoding and analysis using the apparatuses and methods disclosed herein. The three-dimensional structure of this molecule was modeled by software from two- or three-dimensional molecule files. The molecule was then standardized by adding Hydrogen atoms and refining the three-dimensional structure.

[0059] Referring now to Figure IB, a pharmacophore is superimposed on the source molecule shown in Figure 1 A. The pharmacophore has pharmacophore points, which are individual features of the pharmacophore that have geometric or possibly electronic components representing one or more elements responsible for at least one physiological activity. Such pharmacophore features can include different physicochemical properties, such as Hydrogen bond donor, acceptor and both, aromatic ring as hydrophobic entity or charge transfer acceptor or donor,

hydrophobic volume, polar group (positively or negatively charged), halogen bond donor or acceptor, Hydrogen and halogen bonds ambivalent acceptor and radical center. A pharmacophore feature can be a point feature or a vector feature.

[0060] In the example illustrated in Figure IB, the Hydrogen bond

donor/acceptor pharmacophore point is mapped on the hydroxyl group of the molecule. This pharmacophore point has one Hydrogen bond donor vector that corresponds to the Hydrogen atom bond to the oxygen atom, and two Hydrogen bond acceptor vectors that correspond to two free electronic orbitals of the oxygen atom. The ammonium group was mapped with a positive polar pharmacophore point centered on the position of the nitrogen atom. A hydrophobic pharmacophore point represented by an ellipsoid volume and an aromatic pharmacophore point represented by a set of twelve aromatic vectors were both mapped on the aromatic ring of the molecule. The chlorine atom of the molecule was mapped as a halogen bond donor pharmacophore point and a Hydrogen bond acceptor pharmacophore point.

[0061] After the structure of the pharmacophore of interest is identified, the non- pharmacophore features are removed leaving a model of the pharmacophore. In some possible embodiment, a pharmacophore model describes a classical pharmacophore, as defined by the International Union of Pure and Applied

Chemistry (IUPAC), as the ensemble of steric and electronic features that is necessary to ensure the optimal supramolecular interactions with a specific biological target structure and to trigger (or to block) its biological response. A pharmacophore model does not represent a real molecule or a real association of functional groups, but a purely abstract concept that accounts for the common molecular interaction capacities of a group of compounds towards their target structure. The pharmacophore model can be considered the largest common denominator shared by a set of active molecules. A pharmacophore model is an idealized, three-dimensional representation of structural requirements for physiological activity, which can be useful for computer modeling applications. More simply stated, a pharmacophore model is the spatial arrangement of functional groups essential for physiological activity; it is the three-dimensional pattern that emerges from a set of biologically active molecules. See Van Drie, Internet Elec. J. Mol. Design 2007, 6, 271-279. p. 277.

[0062] Chemical compounds or biomolecules may contain one or more chiral centers and/or double bonds and, therefore, exist as stereoisomers, such as double- bond isomers (e.g., geometric isomers), enantiomers, or diastereomers.

Embodiments of the spherical pharmacophore modeling possess an advantage in representation of chiral molecules. As spherical pharmacophore points, two enantiomers possess approximately the same spherical coordinates except for the sign of the polar angle (i.e., Θ is positive for one stereoisomer and negative for the other stereoisomer) or azimuth angle (φ). If one pharmacophore point is rotated

90°, then the two pharmacophore points would virtually overlap. However, mapping pharmacophores using spherical coordinates allows identification of two

enantiomers of a racemic mixture. In mapping stereoisomers using Cartesian coordinates, all of the points would be different between two enantiomers.

[0063] Figure 1C illustrates the pharmacophore after the pharmacophore is identified and the non-pharmacophore features (e.g., the original atoms and bonds that constitute the non-pharmacophore features of the molecule) are removed.

Figure 1C illustrates one possible pharmacophore of (2R)-4-chloro-2-hydroxy-2-[4- (2-hydroxyethyl)phenyl]butan-l-aminium, although other pharmacophore of this source molecule may be possible and pharmacophores of other source compounds also are possible.

[0064] The pharmacophore from Figure 1 C is shown in spherical coordinates in Figure 2, but it is rotated 180 degrees, "r " is the distance from the geometric center, or origin ("O"), to the Hydrogen bond acceptor. The azimuth angle (φ) and the polar angle (Θ) are also depicted. The two angles, φ and Θ, are independent of the object size and generalized based on the atoms of the pharmacophore. These same parameters can be applied to locate the pharmacophore center for each of the pharmacophore features— the ellipsoid hydrophobic volume representing the aromatic ring, the Hydrogen bond donor (the pharmacophore feature closest to the y axis), and the halogen bond donor (the pharmacophore feature closest to the z axis). The pharmacophore center is typically the position, or geometric center, of the pharmacophoric point, or pharmacophore feature, using the reference coordinates. The pharmacophore center may represent any of several functional physicochemical properties such as hydrogen bonds, halogen bonds, aromatic group, etc. In addition, use of a field of normal vectors can be added for describing directionally dependent components of the interactions. An advantage of using spherical coordinates to model pharmacophores is that every point is described with spherical reference coordinates independently from each other. The set of complex data (distances, angles and planes) used in the classical distance matrix-based pharmacophore definition is thus no longer necessary, although it can still be used in alternative embodiments.

[0065] The spherical coordinates also can be converted to Cartesian coordinates, where x = r sin Θ cos φ; y = r sin Θ sin φ; and z = r cos Θ, although there may be other ways to calculate or determine the Cartesian coordinates. In alternative embodiments, additionally, the pharmacophore can be modeled, encoded, and analyzed using Cartesian coordinates. Yet other embodiments might model, encode, and analyze pharmacophores using coordinate or mapping systems other than Cartesian or spherical coordinate systems. Interconversion of spherical coordinates and Cartesian coordinates is further illustrated in Figure 16.

[0066] The hydrophobic volume ellipsoid pharmacophore feature representing the aromatic ring is shown in Figure 3. This hydrophobic volume can be further represented by parameters a, b, and c. The ellipsoid has its own geometric center. The shape of the ellipsoid is given by three internal axes constituting the internal Cartesian coordinates system, where a and b are the equitorial radii along the x- and y-axes, and where c is the polar radius along the z-axis. Specifically, "a" is the radial distance from the ellipsoid's geometric center to an outermost point on the ellipsoid along the x-axis. "b" is the radial distance from the ellipsoid's geometric origin to an outermost point on the ellipsoid along the y-axis. "c" is the distance from the ellipsoid's geometric origin to an outermost point on the ellipsoid along the z-axis. [0067] In one aspect, pharmacophore features or pharmacophore centers which are charged atoms or charged groups of atoms can be represented by a spherical object. A geometric center of a sphere of charged atoms can be referenced in a spherical coordinate system. For charged atom groups, a geometric center can be a point placed apart from the center of mass between atoms of a charged group.

Positioning can then be adjusted according to charge derealization in a parent molecule and promoted by difference in electronegativity of atoms belonging to the group, respecting its total charge. The total charge of charged atoms and charged groups of atoms can be either negative or positive according to the valence of the ion. In addition, this total charge can be adjusted depending on the number of interactions established with neighboring pharmacophore points that can attract some amount of charge by through-space charge transfers or charge derealization in the case of mesomeric effect in conjugated molecules. Volume and geometric centers of a sphere of charged atoms and charged groups of atoms can be adjusted according to the total charge.

[0068] In another aspect, a polar pharmacophore center can be represented by a sphere. The radius, r, of the sphere is proportional to the absolute value of the charge attached to the polar pharmacophore center point.

[0069] Locating additional pharmacophore points can enable an increase in the diversity of compounds screened by the software. In other previously published pharmacophore definitions, pharmacophore centers describing non-bonded interactions like Hydrogen, halogen bonds, cation- and anion-π interactions, etc. are missing. If Hydrogen bond donor and acceptor centers were defined, the spatial dependence of this interaction was missing. Indeed, a Hydrogen bond can be an anisotropic electrostatic interaction with an angular dependence. See, e.g., Figure 17.

[0070] In one aspect, Hydrogen bond (H-bond) center pharmacophore features are represented as either acceptors, donors, or both donors and acceptors

simultaneously. When a Hydrogen bond center is both a donor and acceptor simultaneously, it can be referred to as an ambivalent center. Geometry of an ambivalent center can be identified according to atoms involved in the Hydrogen bond. In one aspect, geometrical parameters are utilized that describe directionally dependent properties of possible Hydrogen bond patterns, as shown in Figure 17. Figure 17 illustrates the definition of the directionalities of classical Hydrogen bonds and Hydrogen bonds involving halogens (a) where Y= 0(sp³), S(sp³), N(sp³), or N(sp²) and X= 0(sp³), S(sp³), and (b) where D = O, N, S and F can be replaced by CI, Br or I.

[0071] In one aspect, directionality of a Hydrogen bond center can be represented by a vector with an origin centered on an atom depending on the nature of the Hydrogen bond center. Spatial positioning of a vector depends on orientation of the group of atoms potentially involved in a Hydrogen bond to a parent molecule. Spatial orientation of a Hydrogen bond vector can also be adjusted during pharmacophore alignment by respecting angular dependence of a vector for different natures of an atom to obtain an optimal fit (Figure 17).

[0072] In another aspect, a vector's magnitude depends on the specific atom and possibly neighboring atoms, which can render a radial component and force of the Hydrogen bond. The length of the vector typically represents the distance from the atom of the pharmacophore point to the theoretical contact point with the outer shell of the Hydrogen atom's electron orbital. A vector's magnitude can be adjusted if a Hydrogen bond center is implicated in intramolecular interactions, intermolecular interactions, or cooperative effects from cyclic Hydrogen bond networks.

[0073] Water molecules around a molecule should be considered when molecules are binding inside an enzyme site or inside a membrane receptor, channel or transporter. Water is well known for directly participating in an enzymatic reaction by stabilizing the substrate transition state or helping bonds rearrangements to occur. Water can strongly bind to a molecule.

[0074] In one aspect, the software can allow for adjustment of the

pharmacophore to account for water clusters and the associated Hydrogen bond vector field. This allows for an accelerated database screening process of molecules that are similar to an enzymatic substrate or to a transition state conformation of the molecule obtained after quantum chemistry calculations. [0075] Halogens (F, CI, Br or I) are frequently found in biomolecules and biomolecule ligands. Incorporation of halogen atoms into compounds can result in analogues that are usually more lipophilic and less water soluble. Thus, halogen atoms can be used to improve penetration through lipid membranes and tissues.

[0076] Halogens can also promote Hydrogen bonds. However, radial and angular components of this anisotropic interaction can be different to that of X = O, N or S. In one aspect, for halogens, a similar pharmacophore feature can be adjusted due to topological specificities. Beside H-bonds, halogens can establish halogen bonds (hX-bond). hX-bond centers where X = F, CI, Br or I, can also be represented in a similar way as H-bonds. However X is a donor by convention and interacts with a Lewis base acceptor (B = O, N, or S). Figure 18 illustrates the definition of the directionality of canonical halogen bonds.

[0077] Halogens can also be found in nonconventional interactions, for instance with a number of systems found in. In one aspect, a pharmacophore center and a vector of hX-bond directionality can be represented in a similar way to that of H- bond respecting specificity of hX-bond. Like H-bond, the magnitude of the hX-bond vector can be adjusted upon electronic rearrangements occurring in the parent molecule that can be promoted by conformational change, mesomerization or hydration by water molecules of the bulk solvent.

[0078] In another aspect, an H-bond and a hX-bond can be found at a same pharmacophore center when for instance, X = O, N or S, and can simultaneously be an acceptor of H-bond or hX-bond. This would be an ambivalent pharmacophore center. O, N and S can be represented by two types of pharmacophore centers. When at least one of these atoms is present in the parent molecule, chemical diversity and a number of possible similar molecules consequently increase dramatically. This ambivalence should be considered when an oxygen atom, for instance, is implicated in the parent molecule at the same time in an H-bond and an hX-bond within intramolecular or an intermolecular many body non-bonded interactions. This can lead to a rearrangement of bond geometry due to a phenomenon called cooperativity between Hydrogen and halogen bonds. In one aspect, the magnitude of the H-bond and hX-bond vectors on an ambivalent center can be modulated depending on these many body interactions. [0079] In another aspect, a pharmacophore feature can be used to represent an aromatic ring on the source molecule. An aromatic ring or aryl refers is a hydrocarbon monocyclic or polycyclic radical in which at least one ring is aromatic. Examples of suitable aryl groups include, but are not limited to, phenyl, tolyl, anthracenyl, fluorenyl, indenyl, azulenyl, and naphthyl, as well as benzo-fused carbocyclic moieties such as 5,6,7,8-tetrahydronaphthyl. Aryl groups may be optionally substituted with one or more substituents. In one embodiment, the aryl group is a monocyclic ring, wherein the ring comprises 6 carbon atoms, referred to herein as (C₆)aryl.

[0080] A heteroaromatic, heteroaryl, or like terms is a monocyclic or polycyclic heteroaromatic ring comprising carbon atom ring members and one or more heteroatom ring members. Each heteroatom can independently be selected from nitrogen, which can be oxidized (e.g., N(O)) or quaternized; oxygen; and sulfur, including sulfoxide and sulfone. Representative heteroaryl groups include pyridyl, 1- oxo-pyridyl, furanyl, thienyl, pyrrolyl, oxazolyl, imidazopyridyl, tetrazolyl, benzimidazolyl, indolyl, imidazopyridyl, purinyl, and benzothienyl. The point of attachment of a heteroaromatic or heteroaryl ring may be at either a carbon atom or a heteroatom of the heteroaromatic or heteroaryl rings. Heteroaryl groups may be optionally substituted with one or more substituents.

[0081] Substituents is to a group substituted on aryl or heteroaryl group at any atom of that group. Suitable substituents include, without limitation, alkyl, alkenyl, alkynyl, alkoxy, halo, hydroxy, cyano, nitro, amino, S0₃H, perfluoroalkyl, perfluoroalkoxy, methylenedioxy, ethylenedioxy, carboxyl, oxo, thioxo, imino (alkyl, aryl, aralkyl), amine (mono-, di-, alkyl, cycloalkyl, aralkyl, heteroaralkyl, and combinations thereof), ester (alkyl, aralkyl, heteroaralkyl), amide (mono-, di-, alkyl, aralkyl, heteroaralkyl, and combinations thereof), sulfonamide (mono-, di-, alkyl, aralkyl, heteroaralkyl, and combinations thereof), unsubstituted aryl, unsubstituted heteroaryl, unsubstituted heterocyclyl, and unsubstituted cycloalkyl. In one aspect, the substituents on a group are independently any one single, or any subset of the aforementioned substituents.

[0082] Aromatic rings can play an important role in biomolecules and particularly in ligand-receptor interactions. The main characteristics of this molecular system include a planar and very stable structure containing one or more atoms rings and a delocalized conjugated electronic π system where a specific number of delocalized electrons (4n+2 where n is a non-negative integer, following Huckel's rule) form a cloud of conjugated π-orbitals. These structures can be responsible for important interactions in biology such as π-stacking (plane-to-plane interactions) and T-shape stacking geometry (edge-to-plane interactions), cation-π, and anion-π interactions, and non-canonical interactions in proteins structures made through Hydrogen and halogen interactions. In classical pharmacophore-based methods, aromatic rings usually include hydrophobic centers represented by a spherical volume or a normal vector perpendicular to the two sides of the ring plane, both with origin referenced on the ring centroid.

[0083] In one embodiment, aromatic rings are considered as ambivalent pharmacophore points since they can participate in many different kinds of non- bonded interactions. In various aspects, aromatic rings can be a hydrophobic volume represented by spheroids with an origin placed at the center of the ring; and a volume of conjugated and substituted aromatic ring systems can be adjusted due to electronic derealization and distribution. This volume delimits a surface where attraction-dispersion interactions are possible with other hydrophobic aromatic or aliphatic volumes. An another aspect, anisotropic interactions (cation- π, anion- π, Hydrogen- π, and halogen) can be represented by two sets of aromatic normal vectors symbolizing a π-bond in the system, each set localized on one side of the aromatic plane. Figure 19 provides the definition of different anisotropic non- bonded interactions involving π orbital electrons for (a) cation-, (b) anion-, (c) Hydrogen- and (d) halogen-π. In b) anion and cation can compete for interacting with π orbital electrons. The magnitude of the vectors can be modified upon electronic rearrangements that occur either with ring conjugations and substitutions, or when rings are involved in intra and extramolecular interactions.

[0084] Hydrophobic volumes can represent parts of a molecule that are involved in so-called hydrophobic interactions but are in fact due to attraction-dispersion forces. In one embodiment, in the generation of the pharmacophore, hydrophobic volumes are spheroids adopting different shapes like spherical, ellipsoid, oblate or prolate, depending on the nature and size of the chemical group of interest. Spheroids can be represented by a triaxial ellipsoid given in spherical coordinates by: r²cos²0sin ¾ r²sin ²#sin ¾ r cos¾ , . „,

— Λ — Λ — = 1 , where the semi-axes are of lengths a, a² b² c²

b, and c, and include hydrophobic centers, aliphatic chains, satured cycles, and aromatic groups.

[0085] In another aspect, at least one hydrophobic unit can correspond to a hydrophobic fragment. It contains at least three heavy non-Hydrogen atoms and at most fourteen atoms (including Hydrogens). A center of inertia of the hydrophobic chemical group can be used as an origin for the spheroid, and the volume can be calculated by the Van der Waals surface of the chemical group's atoms. A spheroid's volume of aromatic rings can be adjusted by taking into account the mesomeric effects promoted by ring substitutions and/or conjugations. If two neighboring hydrophobic spheroid volumes partly overlap, they can be merged if the total number of atoms in both structures is at most fifteen atoms (including Hydrogens). The influence of intra- and intermolecular interactions can also be accounted for adjusting spheroid volumes since electronic rearrangements and derealization are promoted when through-space interactions are established between rings and the solvent or a neighboring pharmacophore centers.

[0086] In one embodiment, the disclosure provides a method wherein a geodesic pharmacophore can be encoded using a spherical coordinate system in which each pharmacophore feature can be described by a set of at least three coordinates representing and/or indicating its position in three-dimensional space. In this way, arrangement of key points in a pharmacophore can be readily modeled and/or visualized (e.g., using various programs and algorithms for modeling molecular structure, such as INSIGHT II (Accelrys Inc., San Diego, CA)). In another embodiment, coordinates of a pharmacophore are readily used to compare pharmacophore structure with points in a target pharmacophore, a pharmacophore model, or another potential ligand, wherein similarity can be used to predict similar activity, or toxicity (e.g., ADME-T), etc. In possible embodiments, a target pharmacophore is a pharmacophore identified from a target library of

pharmacophores that originate from a multidimensional pharmacophore toxic database, which is described in more detail herein, and share a degree of similarity with a source pharmacophore.

[0087] In various aspects, additional parameters can be used to describe other properties of the individual pharmacophore features. These can include, in the case of pharmacophore points that are Hydrogen bond donors or acceptors, parameters indicating the direction, orientation, size and/or distance of the Hydrogen bond. Other parameters that can be used include, for hydrophobic pharmacophore points, a parameter indicating the size (e.g., the distance or volume) of the hydrophobic interaction.

DESCRIPTION OF THE PHARMACOPHORE DATA FILES

[0088] In one aspect, a pharmacophore feature can be described by incorporation of one or more pharmacophore points or pharmacophore features. Figure 4 illustrates an encoded data string that forms one line in a data file. It includes encoded data describing one pharmacophore center of a pharmacophore feature. In possible embodiments, the data string has a fixed column-size. In the first three columns are printed obligatory data, shared by each kind of pharmacophore center: The number of the pharmacophore center (8 chars) the type of pharmacophore centers (4 char), and its r, theta and phi coordinates in the spherical system (8 chars each). The nine next facultative columns are provided for hydrophobic

pharmacophore centers but can be blank for all other. It represents the coordinate of the ellipsoid (8 chars each). The third vector representing the direction of the z-axis is not provided as it can be calculated from the x- andy-axes. In the last facultative column is printed the charge information relative to the polar pharmacophore centers (8 chars).

[0089] In another aspect, a pharmacophore feature can be optionally further described by incorporation of one or more pharmacophore vectors. Figure 5 illustrates a data string that forms one line and includes data describing a

pharmacophore vector. All columns are obligatory and have a fixed column size. The first column indicates the number of the vector (4 chars). The second column indicates the number of the pharmacophore center on with the current vector is attached (4 chars). The next column represents the type of the vector (8 chars). The six last columns represent the coordinates of the vector, where the three first represent the spherical coordinates of the origin of the vector, and the three next coordinates represent the direction of the vector. All coordinates are printed in an 8- chars column.

[0090] A pharmacophore structure file is a file used to store structural data about one pharmacophore. In one embodiment, the software utilizes a pharmacophore structure file to encode a pharmacophore. This file is split into three parts; the pharmacophore block (header block); the center block (describing pharmacophore features), and the vector block (describing any vectors ascribed to the

pharmacophore features). Figure 8 shows a brief description of each of the three blocks of the pharmacophore structure file. The Pharmacophore block is the first block in the pharmacophore structure file.

@<BIOQUANTA>PHARMACOPHORE

The first line of the pharmacophore block contains the name of the pharmacophore or source molecule. The second line lists the number of pharmacophore features, including the number of decoy features, and the number of vectors, including decoy vectors.

[0091] The Centre block is the second block in the pharmacophore structure file. The first line of the Centre block is the beginning of the pharmacophoric center bloc.

@<BIOQUANTA>CENTRE

[0092] This line describes the beginning of the list of the pharmacophoric points. The Centre block, also known as the Center block, includes X lines, where X is the number of pharmacophoric features in the pharmacophore molecule. Each line is a point data string for one separate pharmacophore feature. Each point data string is on a separate line in the Centre block. The Centre block data sting format is illustrated in Figure 4 and described in Table 1. The format of a line is the following (lines beginning with a star represents optional data that can be replaced by space char if the pharmacophoric point is not concerned ): Table 1. Pharmacophore Structure File Center Block Field Definitions.

COLUMNS DATA TYPE FIELD DEFINITION

1-8 Integer Number of the pharmacophore center

9-12 Char Type of pharmacophore center {B, P, H, Y, F, R}

13-20 Float r coordinate (in angstroms) of the

pharmacophore center in the spherical system

21-28 Float teta coordinate (in radian) in the spherical system

29-36 Float phi coordinate (in radian) in the spherical system

*37-44 Float a internal coordinate of the ellipsoid, for H centers

*45-53 Float b internal coordinate of the ellipsoid, for H centers

*54-61 Float c internal coordinate of the ellipsoid, for H centers

*62-69 Float x coordinate of internal axe 1 for ellipsoid *70-77 Float y coordinate of internal axe 1 for ellipsoid *78-85 Float Z coordinate of internal axe 1 for ellipsoid *86-93 Float x coordinate of internal axe 2 for ellipsoid •94-101 Float y coordinate of internal axe 2 for ellipsoid *102-109 Float z coordinate of internal axe 2 for ellipsoid * 110-117 FFllooaatt charge q for charged pharmacophoric points

Where B represents a Hydrogen bond donor/acceptor, Y represents and aromatic feature, H represents a hydrophobic feature, P represents a charged polar feature, F represents a halogen feature, and R represents radicals. Additionally, q represents an elementary charge and defines the polarity of the charge (e.g., + or -) and the magnitude of the charge (e.g., le, 2e, 3e, etc.).

[0093] The Vector block is the third block in the pharmacophore structure file.

@<BIOQUANTA>VECTOR Lines in the third block represent pharmacophore vector features, one line per vector. The Vector data string format is illustrated in Figure 5 and described in Table 2. The format of a vector line is as follows.

Table 2. Pharmacophore Structure File Vector Block Field Definitions.

COLUMNS DATA TYPE FIELD DEFINITION

1 -4 Integer Number of the vector

5-8 Integer Number of the parent pharmacophore center

9-16 Char Type of vector {A, D, Y}

7-24 Float r coordinate of the origin in the spherical system

25-32 Float teta coordinate (in radian) of the origin in the spherical system

33-40 Float phi coordinate (in radian) of the origin in the spherical system

41-48 Float dr coordinate of the vector

49-56 Float dteta coordinate (in radian) of the vector

57-64 Float dphi coordinate (in radian) of the vector

Where A represents an acceptor, D represents a donor, and Y represents an aromatic.

[0094] A global output file describes a file that contains all information used to request in silico analysis, e.g., a MultiDlP Screen test. In one aspect, this file can be generated by the software and can be sent by the user to the expert, for example, to perform a MultiDlP screen. In one aspect, data is formatted in XML format (more details about XML available online: http://www.w3.org XML/). Exemplary Output files are structured as shown in Figure 9. Each example line or block descriptor is shown below and is followed by a brief description

<?xml version="l .0" encoding= "UTF-8 " ?>

This line describes the XML header.

This line describes the beginning of the global block.

<DataSource> This line describes the beginning of a DataSource block. Each DataSource block contains all information needed to perform a MultiDIP screen of one

pharmacophore. There is one DataSource block per pharmacophore in the global output file.

<Generator>MultiDIP SphereLigh</Generator>

This is the beginning of the Generator block, including data relative to the software used to generate the present pharmacophore.

<Customer_Name>MyCustomer</Customer_Name>

This is the Customer name block, which contains the name of the customer. This

This is the Customer number block, which contains the number of the user, or customer, relative to the expert, e.g., BioQuanta records.

<Date>Fri Aug 20 17:12:41 CEST 2010</Date> This Date block, containing the date when the file was generated.

<Comment>Auto generated MultiDIP Export

File</Comment >

This is the Comment block, which contains additional miscellaneous data about the record.

<EndPoint>SA01 Cardiotoxicity (hERG

blockers) </EndPoint>

This is an example EndPoint block record which contains information for MultiDIP screen process. Each endpoint DataSource block selectively contains from zero to multiple endpoint blocks. Each endpoint block contains information used to screen the pharmacophore. For example, the data source block can selectively contain separate endpoint blocks, each endpoint block listing, for example, cardiotoxicity end points, gastrointestinal toxicity endpoints, reproductive organs toxicity endpoints, teratogenicity, ecotoxicity, and the like. In this example, each endpoint block corresponds to the toxicity endpoint required by the customer for the

MultiDIP Tox screen of the present pharmacophore. Further exemplary Endpoint block records are shown below. <EndPoint>SA04 Gastrointestinal

toxicity</EndPoint>

<EndPoint>SA07 Reproductive organs

toxicity</EndPoint>

<EndPoint>T009 Teratogenicity</EndPoint>

<EndPoint>T012 Ecotox: Daphne</EndPoint>

The Input block describes the pharmacophore identifier.

The Input block contains the pharmacophore identifier and an input key that is automatically generated by the software, and given to the user. It is a unique key, wherein only the user can link this unique key to the source molecule. The user uses the unique key to associate the screening results with the source molecule from which the pharmacophore was identified.

The Molecule block can be left blank by the user, or can list structural information about the source molecule. The molecule block will be empty unless the customer specifically asked the software to provide structural information about the source molecule.

<Secret>01OpHCYjNj UlMzM7Lnh4JiM2NTUzMztlATomlzYlNTMzOyYj

NjUlMz 7XlgmIzYlNTMzOzsmIzYl</Secret>

The Secret block contains encrypted information of what pharmacophoric points are decoy points. Specifically, the Secret block lists the encrypted list of unique identification numbers for the decoy data strings.

[ HERE IS PHARMACOPHORE DATA, REFER TO EXAMPLE FILES]

< IPharmacophore> The Pharmacophore block contains structural information about the pharmacophore, exactly as described in the pharmacophore structure file, as described herein. The data can be either non-encoded or base64 encoded.

</DataSource>

This line describes the end of the DataSource block. If the global output file contains more than one pharmacophore to screen, here will be the opening of the next DataSource Block.

</MultiDIP>

This line describes the end of the global block in the global output file.

[0095] In one embodiment, a user can generate and encode a geodesic pharmacophore molecule, having spherical coordinates, from a candidate compound using the disclosed software methods. The user can transmit the pharmacophore data embedded in the global output file to a third party for in silico analysis without disclosing the structure of the candidate compound. Use of the securely encoded pharmacophores provides secure transfer of pharmacophore data to a third party, ensuring the protection of confidentiality, integrity and availability of generic information objects such as chemical structures. The three spherical coordinates and the physicochemical nature of the pharmacophore points have the advantage of bringing new dimensions to the security process. A securely encoded geodesic pharmacophore of any molecule is of no use for an unauthorized third party. It is only a list of symbols and spherical coordinates, not revealing chemical structure. It can only be decoded and understood by the expert performing the in silico study using adequate software.

[0096] Such secure in silico technology can be used to pave the way for radical evolution the discovery and validation cycle for chemical compounds. In one aspect, secure use of breakthrough in silico technologies can streamline various ADME-T issues in the process of designing or developing chemical compounds for pharmaceutical, cosmetics or broad industrial or consumer use, without jeopardizing the customers' intellectual property. In one aspect, by use of the herein disclosed in silico methods, toxicity prediction for a candidate compound can occur early in the development process. The overall cost reduction together with the increased safety and speed will provide a competitive advantage to the pharmaceutical, chemical or cosmetic industry, to allow more efficient and cost-effective R&D.

[0097] Figures 6A and 6B illustrate a possible environment in which the processes and apparatuses described herein can be used. Referring now to Figure 6A, research facility or researcher has a computer and a server on which they encode pharmacophore data as described in more detail herein. The server has access to databases(s) storing the source molecules being researched and needing analysis. The pharmacophore data, encoded data, and files are stored in memory on the computer or networked memory accessed through the server, and may be stored in the database.

[0098] After generation of the data file holding the encoded pharmacophore data as described herein, the data file is transferred to another facility for analysis. The facility includes computers, servers, and access to a data warehouse as described in more detail herein. The facility decodes the pharmacophore data from the data file and then analyzes it in silico to generate toxicity predictions for the pharmacophore. Although the embodiments used herein disclose in silico analysis to make toxicity predictions, the apparatus and method could be used to perform other types of analyses on the pharmacophore received in the data file or even analysis on molecule structures other than pharmacophores. Additionally, the data warehouse includes a variety of different databases and the data is organized into an OLAP data structure or cubes.

[0099] The in silico analysis to generate toxicity predictions uses an OLAP engine for accessing the data. Data in the databases is multidimensional considering the multidimensional nature of molecules (e.g., . . . x stereoisomers x conformers x tautomers x mesomers), although the multiplicity is may not be produced by crossing together biochemical, toxicological and pharmaceutical properties (e.g., molecule x toxicological study x end point). Additionally, the OLAP engine and OLAP data structure enable direct analysis of data, whereas traditional relational database engines analyze the data offline. OLAP is more efficient when we want to select some pharmacophores of the data warehouse according to a lot of criteria, or to quickly analyze toxicological data according to a very specific list of compound selected from the data warehouse. [00100] In alternative embodiments, the data is stored in relational databases and analyzed using traditional relationship database engines. Yet other embodiments can use a combination of OLAP databases and relational databases.

[00101] Figure 6B illustrates an exemplary architecture of a computing device that can be used to implement aspects of the present disclosure, including any of the plurality of client computing devices or a server computing device. The computing device illustrated in Figure 6B can be used to execute the operating system, application programs, and software modules (including the software engines) described herein. By way of example, the computing device will be described below as the point of care computing device. To avoid undue repetition, this description of the computing device will not be separately repeated herein for each of the other computing devices, including the point of care computing device, the administrative computing device, and the server computing device, but such devices can also be configured as illustrated and described with reference to Figure 6B.

[00102] The computing device includes, in some embodiments, a programmable circuit or processing device, such as a central processing unit (CPU). A variety of processing devices are available from a variety of manufacturers, for example, Intel or Advanced Micro Devices. In possible embodiments, the CPU operates at 1 GHz or faster. In this example, the computing device also includes a system memory, and a system bus that couples various system components including the system memory to the processing device. The system bus is one of any number of types of bus structures including a memory bus, or memory controller; a peripheral bus; and a local bus using any of a variety of bus architectures.

[00103] Examples of computing devices suitable for the computing device include a desktop computer, a laptop computer, a tablet computer, or other devices configured to process digital instructions.

[00104] The system memory includes read only memory and random access memory. In possible embodiments, the random access memory has about 512 MB or more of storage. A basic input/output system containing the basic routines that act to transfer information within computing device, such as during start up, is typically stored in the read only memory. [00105] The computing device also includes a secondary storage device in some embodiments, such as a hard disk drive, for storing digital data. In possible embodiments, the hard disk drive or other secondary memory being used has about 200 MB or more of available storage. The secondary storage device is connected to the system bus by a secondary storage interface. The secondary storage devices and their associated computer readable media provide nonvolatile storage of computer readable instructions (including application programs and program modules), data structures, and other data for the computing device.

[00106] Although the exemplary environment described herein employs a hard disk drive as a secondary storage device, other types of computer readable storage media are used in other embodiments. Examples of these other types of computer readable storage media include magnetic cassettes, flash memory cards, digital video disks, Bernoulli cartridges, compact disc read only memories, digital versatile disk read only memories, random access memories, or read only memories. Some embodiments include nontransitory media.

[00107] A number of program modules can be stored in secondary storage device or memory, including an operating system, one or more application programs, other program modules (such as the software engines described herein), and program data. The computing device can utilize any suitable operating system, such as Microsoft Windows™ 98 or newer, Linux, MacOS, or any other operating system suitable for a computing device. Other examples can include Microsoft, Google, or Apple operating systems, or any other suitable operating system used in tablet computing devices.

[00108] In some embodiments, input to the computing device is performed through one or more input devices. Examples of input devices include a keyboard, mouse, microphone, and touch sensor (such as a touchpad or touch sensitive display). Other embodiments include other input devices. The input devices are often connected to the processing device through an input/output interface that is coupled to the system bus. These input devices can be connected by any number of input/output interfaces, such as a parallel port, serial port, game port, or a universal serial bus. Wireless communication between input devices and the interface is possible as well, and includes infrared, BLUETOOTH® wireless technology, 802.1 la/b/g/n, cellular, or other radio frequency communication systems in some possible embodiments.

[00109] In this example embodiment, a display device, such as a monitor, liquid crystal display device, projector, or touch screen display device, is also connected to the system bus via an interface, such as a video adapter. In addition to the display device, the computing device can include various other peripheral devices (not shown), such as speakers or a printer.

[00110] When used in a local area networking environment or a wide area networking environment (such as the Internet), the computing device is typically connected to the network through a network interface, such as an Ethernet interface. Other possible embodiments use other communication devices. For example, some embodiments of the computing device include a modem for communicating across the network.

[00111] The computing device typically includes at least some form of computer readable media. Computer readable media includes any available media that can be accessed by the computing device. By way of example, computer-readable media include computer readable storage media and computer readable communication media.

[00112] Computer readable storage media includes volatile and nonvolatile, removable and non-removable media implemented in any device configured to store information such as computer readable instructions, data structures, program modules or other data. Computer readable storage media includes, but is not limited to, random access memory, read only memory, electrically erasable programmable read only memory, flash memory or other memory technology, compact disc read only memory, digital versatile disks or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and that can be accessed by the computing device.

[00113] Computer readable communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, computer readable communication media includes wired media such as a wired network or direct- wired connection, and wireless media such as acoustic, radio frequency, infrared, and other wireless media. Combinations of any of the above are also included within the scope of computer readable media.

[00114] In one embodiment, the disclosure provides software methods to transform chemical structures to computational representations, or pharmacophores, in order to hide the chemical structure of compounds. The computational representations will allow safe transfer and processing of data about a compound. The computational transformation of the chemical structure utilizes spherical coordinates of a compound as a geodesic pharmacophore.

[00115] In one aspect, query pharmacophore molecules of the disclosure can be generated wherein the pharmacophore consists essentially only of those functional groups, or pharmacophoric points, that are necessary for specific activity, while removing pharmacophore features that do not affect such activity (Figure 7C). In an optional aspect, key pharmacophoric points from the query pharmacophore are identified and non-key data can be removed, or masked, to create a masked query pharmacophore. Such pharmacophores thereby simplify the search for predictive toxicity data since the number of functional groups, or pharmacophoric points, that must be compared between pharmacophore(s) of the candidate compounds and the library of pharmacophore molecules derived from library pharmacophores from compounds with known toxicological data is greatly reduced.

[00116] Accordingly, in various embodiments, that each pharmacophore molecule consists essentially of from one to 100; 2 to 50; 2 to 30; or 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 functional groups or pharmacophore points bearing the aforementioned spatial relationship. In yet other embodiments, the pharmacophore has from 3 to 10 functional groups. [00117] In another aspect, decoy pharmacophore point data is generated by the software and inserted to the pharmacophore structure file. Optionally, decoy pharmacophoric vector data is also generated and inserted to the pharmacophore structure file.

[00118] In one embodiment, the disclosure provides a method of secure transmission of a query pharmacophore from a user to a third party for analysis without disclosing the structure of the candidate compound. In one aspect, the analysis comprises prediction of toxicity of the candidate compound. The user identifies the candidate compound and selects the number of pharmacophores to be generated from the candidate compound. A query pharmacophore is generated from a specific conformation of the candidate compound. Key pharmacophoric points from the query pharmacophore are identified and non-key data is removed, or masked, to create a masked query pharmacophore. Decoy pharmacophoric point data is generated by the software and inserted to the masked query pharmacophore to create a decoyed masked query pharmacophore. The decoyed masked query pharmacophore is loaded into a data file. The data file is optionally encrypted. The encrypted data file is sent to a third party. The third party decrypts the data, and removes the decoy data from the decoyed masked query pharmacophore. The masked query pharmacophore is compared to a multidimensional library

pharmacophore database of toxic library compounds. The third party identifies the masked query pharmacophore having matching toxic library compounds. The third party transmits the identity of the query pharmacophore with the matching library compound and toxicity data including one or more of toxicity, assay, patent and nonpatent literature data to the user. The steps from selection of candidate compound, or molecule of interest, to encryption of the data file, are shown in Figures 7A to 7G.

[00119] In one embodiment, a user generates and sends the global output file to a third-party expert for in silico analysis, without disclosing the source molecule. Figures 7A-7G provide a flow diagram of one possible embodiment of the disclosure describing a method that can be used to encode a geodesic

pharmacophore, including data strings described herein, to provide a pharmacophore structure file which is incorporated to a global output file. [00120] The user selects and inputs a 2D or 3D molecule data file describing the structure of the source molecule, which is the molecule of interest, as shown in Figure 7A. The source molecule is imported to the software by the user in any of a number of possible formats such as two- or three-dimensional molecule files, SMILES, PDB, etc. The molecule is standardized, e.g., a fast 3D scan is run of the molecule, e.g., to obtain a minimum energy conformation of the source molecule, and explicit Hydrogens are added to the structure. The user determines if pharmacophore generation for isomers is desired. The software algorithm

automatically calculates the total number of dihedral angles, identifies chemical groups and the molecules' scaffold, analyses chirality, electronic distribution and charges' derealization to determine one or more three-dimensional conformations (e.g., Figure 1 A). The source molecule superposed with the generated

pharmacophore can be visualized with the graphical interface (e.g., Figure IB).

[00121] The user determines the number of optional desired stereoisomers, conformers, tautomers, and/or mesomers for pharmacophore generation to encode in pharmacophore points as illustrated in Figure 7B. Next, the three-dimensional molecule is modeled by the software to generate one or more conformers. The molecule is then standardized, wherein, for example, explicit hydrogen atoms are added, and the 3D structure is refined prior to pharmacophore generation. An example of a modeled conformation of the selected source molecule, 2R)-4-chloro- 2-hydroxy-2-[4-(2-hydroxyethyl)phenyl]butan-l-aminium, is shown in Figure 1A. The pharmacophore is then calculated and superimposed to the molecule. See Figure IB. The non-pharmacophoric features, namely the original atoms and bonds that constitute the molecule are removed. See Figure 1C.

[00122] In one aspect, after importing the source molecule, the user can select any number of conformers representing the three-dimensional structure of the source molecule prior to the modeling. In one aspect, the user can select from any number from 1 to 1024, 1 to 512, 1 to 256, 1 to 128, 1 to 64, 1 to 32, 1 to 16, or 1 to 4 different conformers for a single molecule. In one aspect, one, two, three, four, five, six, seven, eight, or nine conformers are selected for calculation.

[00123] In one or more additional optional aspects, any number each of additional stereoisomers, tautomers or mesomers is further selected. In one aspect, the user can select from any number of stereoisomers, tautomers or mesomers for each molecule. In one aspect, the user selects from 1 to 1024, 1 to 512, 1 to 128, 1 to 64, 1 to 32, 1 to 16, or 1 to 4 different stereoisomers, tautomers, and/or mesomers for each molecule. One pharmacophore is generated for each desired conformer,

stereoisomer, tautomer and mesomer of the source molecule.

[00124] One aspect of the software involves determination of the desired conformation of the candidate compound, or molecule. Lower energy

conformations can be identified by any of a variety of methods known in the art.

[00125] In one specific aspect, each of the source molecules three dimensional- structures can be calculated from 2D or 3D information using the ChemAxon Marvin clean method, and explicit hydrogen atoms are generated with the Marvin hydrogenise function. Next, ChemAxon calculation plug-in

MajorMicrospeciesPlugin can be used to determine a protonation state of each compound at physiological pH 7.4. The primary three-dimensional structure of each molecule can be the used as reference structure to explore the multidimensional isomeric space. In any case, structure data files of candidate compounds may include several molecules along with the main structure such as ions or solvent molecules (e.g. water molecules). In this case, all additional structures can be removed.

[00126] In another possible aspect, conformational energy calculations can be determined using the CHARMM program (Brooks et al., J. Comput. Chem. 1983, 4:187-217). The energy terms include bonded and non-bonded terms, including bond length energy. It will be apparent that the conformational energy of a compound can also be calculated using any of a variety of other commercially available quantum mechanic or molecular mechanic programs. Generally, a low energy structure has a conformational energy that is within 50 kcal/mol of the global energy minimum.

[00127] As another possible aspect, low energy conformations of candidate compound molecules can be identified using combinations of two procedures. The first procedure involves a simulated annealing molecular dynamics approach. In this procedure, the system (which includes the designed candidate compound and water molecules) is heated up to above room temperature, in one possible embodiment to around 600 degrees Kelvin (i.e., 600 K), and is simulated for a period for about 50 to 100 ps (e.g., for 70 ps) or longer. Gradually, the temperature of the system is reduced, e.g., to about 500 K and simulated for a period of about 100 ps or longer, then gradually reduced to 400 K and simulated for a period of 100 ps or longer. The system temperature is then reduced, again, to about 300 K and simulated for a period of about 500 ps or longer. During this analysis, the atom trajectories are recorded. Such simulated annealing procedures are well known in the art and are particularly advantageous, e.g., for their ability to efficiently search the conformational space of , e.g., a peptide, or protein or other compound. That is to say, using such procedures, it is possible to sample a large variety of possible conformations for a compound and rapidly identify those conformations having the lowest energy.

[00128] In another possible aspect, low energy conformations of candidate compound molecules can be identified using self-guided molecular dynamics (SGMD), as described by Wu & Wang, J. Physical Chem. 1998, 102:7238-7250. The SGMD method has been demonstrated to have an extremely enhanced conformational searching capability. Using the SGMD method, therefore, simulation may be performed at 300 K for 1000 ps or longer, and the atom trajectories recorded for analysis.

[00129] In another possible aspect, low energy conformations of candidate compound molecules can be identified using the INSIGHT II molecular modeling package. First, cluster analysis may be performed using the trajectories generated from molecular dynamics simulations (as described above). From each cluster, the lowest energy conformation may be selected as the representative conformation for this cluster and can be compared to other conformational clusters. Upon cluster analysis, major conformational clusters may be identified and compared to the solution conformations of the cyclic peptide(s). Specifically, a peptidomimetic or other agonist/antagonist compound is optimally superimposed on the

pharmacophore model using computational methods well known to those of skill in the art as implemented in, e.g., CATALYST.TM. (Accelrys, Inc., San Diego, Calif.). A superposition of structures and the pharmacophore model is defined as a minimization of the root mean square distances between the centroids of the corresponding features of the molecule and the pharmacophore. A van der Waals surface is then calculated around the superimposed structures using a computer program such as CERIUS.sup.2.TM. (Accelrys, Inc., San Diego, Calif.). The conformational comparison may also be carried out by using the Molecular

Similarity module within the program INSIGHT II.

[00130] In another aspect, for each source molecule, an exhaustive exploration of all possible isomers, including stereoisomers, enantiomers, E-Z isomers,

diastereoisomers, conformers, tautomers and mesomers, can be done by using the ChemAxon Marvin plug-in suite. Then, a multidimensional pharmacophore space can be created after transforming all the isomers into geodesic pharmacophores called phoromers. In one aspect, phoromers are obtained as follows: first, stereoisomers of the parent chemical compound are generated and then, their conformational space is explored; for each stereoisomer, unrealistic isomers or those containing steric clashes are eliminated by using the molecular Dreiding generic force field; finally, mesomers and tautomers are evaluated for each calculated stereoisomer.

[00131] It should be noted that the active conformation of a ligand inside its macromolecule target is unknown. Inside a binding site a ligand can adopt an infinity of conformations and the bioactive one is not obviously the lowest energy conformation. For these reasons, in one aspect, each molecule's pharmacophores corresponding to a number of conformers, enantiomers, diastereoisomers, and/ or mesomers, are automatically generated leading to the description of the full conformational space of the molecule that contains all possible representations of the same compound. It is possible to generate pharmacophores for enantiomers and diastereomers of a molecule since a conformational analysis is performed on the parent molecule. The conformation of the molecules is also under the influence of the solvent, salts, macromolecules and reactive chemical species in the biological medium that can also promote charge flux and atomic rearrangements. Molecular rearrangements can be considered by generating mesomers and tautomers of a parent molecule enabling consideration of such molecular changes, especially for enzymatic substrates or molecules sensitive to oxidative stress frequently met in toxicology as radical species generators responsible for cell damages. [00132] In one embodiment of the disclosure, after the one or more desired stereoisomers, conformers, tautomers or mesomers of the candidate source molecule have been selected, the software generates the desired number of stereoisomers, conformers, tautomers, and mesomers and stores the initial molecule of interest and its isomer molecules. The software computes a single geodesic pharmacophore for each stored molecule.

[00133] In a further aspect, the software displays the pharmacophore molecule, by the method shown in Figure 7C. The user decides whether to optionally remove one or more pharmacophore features from the pharmacophore. A unique ID is generated for each retained pharmacophore feature. A point data string is generated for each pharmacophore feature pharmacophore center by the steps of populating a point data string with the unique ID representing the pharmacophore feature;

populating the point data string with a character identifying the type of data string, selected from B, P, H, Y, F and R. The point data string is populated with spherical coordinates for the pharmacophore center of the pharmacophore feature. If the feature is a hydrophobic volume, the point data string is populated with ellipsoid coordinates for the hydrophobic volume; if the feature is not a hydrophobic volume, the point data string is populated with a value of zero in place of ellipsoid coordinates. The point data string is added to the list of real point data strings for the pharmacophore.

[00134] In another aspect, vector data string is generated, by the method shown in Figure 7D. For each pharmacophore feature, if a charge is available for a potential molecular bond, vector string data is generated representing a potential bond with the atom by the steps of: generating a unique ID for the vector sting data, populating a vector data string with the unique vector ID, populating the vector data string with the unique ID for the point data string, and populating the vector data string with spherical coordinates for the base of the vector and the end of the vector. The vector data string is added to the list of real vector data strings.

[00135] In a specific aspect, decoy pharmacophore features are developed by generation of decoy point data, by the method shown in Figure 7E. Once all point and vector data strings have been generated representing each real pharmacophore feature, the number of decoy features (N) to be generated is determined. A randomly generated type, location and structure of each decoy feature are determined. If there is any overlap with any real feature, the data for the decoy point is discarded. The decoy point data string is populated and the decoy point data for each decoy feature is added to the list of decoy point data strings. The whole set of points (e.g., real and decoy pharmacophore points) is randomly spread through a three-dimensional space in an export file. In another embodiment, the export files can optionally be even further protected with data encryption software or protocols known in the art. In yet other embodiments, the pharmacophore data is encoded in a text file simply containing a generated unique identification code and sets of pharmacophore points without any atoms.

[00136] In another specific aspect, decoy vector data are optionally generated, by the method shown in Figure 7F. If the decoy pharmacophore feature includes vectors, a decoy vector data string is generated and populated with the associated decoy point data string. N number of decoy features is stored and a list of unique IDs for decoy features, including decoy point data strings and decoy vector data strings. The list of unique IDs for decoy features is encrypted.

[00137] In one aspect, a Global Output File is generated by the method shown in Figure 7G. The real point data strings are shuffled with decoy point data strings into a combined list of point data strings. The real vector data strings are shuffled with decoy vector data strings and combined into a list of vector data strings. The pharmacophore structure file is generated. The point and vector data strings are encoded with Base64 encoding. If there are any isomers for the source molecule for which a pharmacophore has not yet been generated and stored, additional pharmacophores can be computed, as disclosed above. Once desired

pharmacophores have been generated and encoded, a Global Output File is generated. The Global Output File can be optionally encrypted.

[00138] In one embodiment, the exported data can be easily processed by the in silico expert. In one aspect, the expert is provided with a key for decoding and or decrypting the securely encoded pharmacophore by removal of the decoy points.

For instance, in one embodiment, toxicity predictions for a compound are achieved by superposing its pharmacophore with those of an expert library pharmacophore database of toxic compounds well characterized by in vitro standardized tests, disclosed infra (Figure 10). The library pharmacophore database is derived from multiple source toxicology databases (Figure 12). Diagnosing potential toxicity of a compound is carried out by calculating a similarity score of query pharmacophores matching with one or several library pharmacophores. The similarity score is calculated by the equations of the examples, which are provided herein, and the in silico predictive results are sent back to the compounds' owner with pharmacophore identification code. The owner can match information, e.g., toxicity data, from the library compounds to the actual candidate compounds, without having disclosed any valuable structural data.

GENERATION OF MULTIDIMENSIONAL PHARMACOPHORE TOX DATABASES FROM KNOWN TOXICOLOGICAL DATABASES

[00139] In one aspect of the disclosure, multidimensional library pharmacophores are generated from selected compounds from a known toxicological database to generate multidimensional pharmacophore (MultiDIP) toxicological databases. Multidimensional pharmacophores may be generated from each toxic compound member of a database of interest.

[00140] In one embodiment, known chemical toxicology libraries are converted to pharmacophores and used to screen, in silico, various candidate compounds after they have been converted to pharmacophore molecules.

[00141] The toxicology library pharmacophores can be used for in silico screening of candidate compound pharmacophore molecules in order to predict toxicity early in the drug development process. As a result, development efforts can focus on candidate compounds with low predicted toxicity. Such screens use the three-dimensional conformation of a pharmacophore to search such databases in three-dimensional space. A single pharmacophore molecule can be generated per candidate compound, or any number of pharmacophore molecules can be optionally generated per pharmacophore compound, each representing a different conformer, stereoisomer, tautomer, or mesomer. In one aspect, the number of conformers to compute per candidate compound is selected by the drug developer, or client. In another aspect, the number of different stereoisomers to compute per candidate compound is selected by the client. In another aspect, the number of different tautomers to compute per candidate compound is selected by the client. In another aspect, the number of different mesomers to compute per candidate compound is selected by the client. The total number of pharmacophore molecules per candidate compound can be selected from between one to 1,000,000; one to 100,000; one to 10,000; one to 1,000; one to 100; or between one to 10. Alternatively, any number of pharmacophore molecules per candidate compound may be selected in accordance with the best allowable fit to one or more pharmacophore models by considering the crucial chemical structural features present within multiple three- dimensional structures.

[00142] Any of a variety of databases of three-dimensional structures can be used for such searches. A database of three-dimensional structures can also be prepared by generating three-dimensional structures of compounds, and storing the three- dimensional structures in the form of data storage material encoded with machine- readable data. The three-dimensional structures can be displayed on a machine capable of displaying a graphical three-dimensional representation and programmed with instructions for using the data. Within possible embodiments, three- dimensional structures are supplied as a set of coordinates that define the three- dimensional structure.

[00143] In possible embodiments, the three-dimensional (3D) structure database contains at least 100,000 compounds. It is also important that the 3D coordinates of compounds in the database be accurately and correctly represented. The National Cancer Institute (NCI) 3D-database (Milne et al., J. Chem. Inf. Comput. Sci. 1994, 34:1219-1224) and the Available Chemicals Directory (ACD; available from MDL Information Systems, San Leandro, Calif.) are two exemplary databases that can be used to generate a database of three-dimensional structures, using molecular modeling methods such as those described, supra. For flexible molecules, which can have several low-energy conformations, it is desirable to store and search multiple conformations. The Chem-X program (Oxford Molecular Group PLC, Oxford, United Kingdom) is capable of searching thousands or even millions of

conformations for a flexible compound. This capability of Chem-X provides a real advantage in dealing with compounds that can adopt multiple conformations. Using this approach, hundreds of millions of conformations can be searched in a 3D- pharmacophore searching process.

[00144] Two main properties should be considered for a selecting a database to be used for in silico toxicity prediction: the accuracy and reproducibility of the in vitro and in vivo toxicity assays and the chemical diversity of the tested compounds (Figure 14). It can be difficult to combine both advantages inside the same database. Usually, expert databases in toxicology are dedicated to compounds that belong to the same chemical family or exhibit the same toxicity mechanism and contain accurate toxicological data obtained with the help of standardized in vitro and in vivo assays; however they suffer of a lack of diversity. Diversity is obtained by combining several different expert databases. A possible embodiment can include a data warehouse combining two properties for a good toxicity prediction, diversity and accuracy, as a collection of various databases that have been cleaned by eliminating wrong annotations and redundancies (Figure 12). Data are classified by using an in-house annotation method facilitating data recovery, extraction and treatment (Figure 13).

[00145] In one aspect, the software toxicity data warehouse can regroup, for example, 14,184 compounds from more than 70 different sources of toxicity analysis projects. After generating all possible diastereoisomers, steroisomers, tautomers and mesomers of all compounds and extracting their pharmacophores, the database can represent almost 2 million pharmacophores. For one compound, the number of different spatial configurations can depend on a number of possible stereo, tauto and mesomers. A possible embodiment utilizes a small dihedral angle increment for generating all of their different diastereoisomers. Those with configurations with steric clashes are eliminated.

[00146] In another aspect, a subclassification by data endpoint classification is used to facilitate the screening process, for example, but not limited to genotoxicity with carcinogenicity (rodent, human, mammals) and mutagenicity (Ames Tests), cytoxocity (in vivo, in vitro based), teratogenicity, hepatoxocity, neurotoxicity, irritancy, ocular toxicity, skin sensitization (LLNA, GPMT toxicity tests based), respiratory sensibilization, ecotoxicity (also persistence and bioaccumulation phenomena). In another aspect, different data warehouses, or libraries, are developed which are specific to each type of industry (pharmaceutical, agro food, chemical or cosmetic) (Figures 23A-23C).

[00147] In another aspect, in order to guarantee the quality of the stored data, an algorithm can perform peer scientific literature scanning. It extracts from the publications attached to each toxic molecule in the data warehouse information concerning experimental protocols used in toxicity assays such as cell type, animal model, nature of consumable used, protocols, toxicological data such as LD50, etc. Several different sources of information for one molecule can be compared since sometimes there may be a contradiction. Extra information sources can be obtained either through patent agencies (USPTO, INPI, JPO), or databases of compounds classified by their therapeutic properties, or clinical assays and public toxicological reports.

[00148] In one embodiment, the data warehouse is composed of molecules that range from 2 to more than 12 different functions, or pharmacophoric points, per library pharmacophore molecule, regardless of molecular weight, or number of atoms possessed by the library compound.

[00149] In one aspect, the Distributed Structure-Searchable Toxicity (DSSTox) proposed by the National Center for Computational Toxicology of United States

Environmental Protection Agency (U.S. EPA) can be one main source of

standardized information. In various aspects, other sources can be incorporated, including, but are not limited to, PubChem (National Institutes of Health's

Molecular Libraries Roadmap Initiative), TOXNET (Toxicology Data Network, managed by the Toxicology and Environmental Health Information Program in the

Division of Specialized Information Services of the National Library of Medicine), or DrugBank (Departments of Computing Science & Biological Sciences,

University of Alberta) (Figure 12). In one aspect, toxicology data was extracted from more than 70 databases of international projects and scientific publications to build an in-house database of 14,184 unique compounds where classifications into common toxicity endpoints include carcinogenicity, mutagenicity, genotoxicity, teratogenicity, cytotoxicity, skin sensitization, ocular toxicity, irritancy, respiratory sensitization, hepatoxicity, neurotoxicity and ecotoxicity. In a possible embodiment, the available toxicology information is not directly exploitable. The database of interest is screened prior to selection or prior to pharmacophore generation for redundancies, partial annotations and other sources of error, including errors in structure annotations.

[00150] To exemplify, an in house data warehouse was created to enable extraction, transformation, loading data into a repository, and managing and retrieving data from dictionary (a.k.a. metadata) data warehouse systems. Toxicity endpoints previously enumerated were divided into different data marts, and then included into the analysis. The algorithm was based on the OLAP (Online analytical processing) cube that is a data structure allowing fast analysis of data. The data structure is organized following a hypercube of data, with compounds classified into their isomers definition (conformers, tautomers, enantiomers, diastereoisomers) and their multidimensional pharmacophores definition (nD-Pharmacophores).

Information and properties are also defined in one dimension of the data warehouse with the CAS number, the molecular structures and properties, the multidimensional pharmacophores properties, the toxicity and clinical studies, a complete

bibliographic properties and much more diverse information. Each toxicity endpoint is shown as another dimension of the hypercube, which is screened with specific paths for each of the source molecules. The arrangement of data into cubes overcomes a limitation of relational databases. However, the number of dimensions needed to be manipulated for toxicity screening is much higher than that of business or finances. Thus the basic OLAP cube was modified by increasing the number of dimensions of data storage by using the concept of multidimensional cubes called hypercubes or zonotopes in order to obtain a Zonocube™, (Figure 13), although alternative embodiments might use cube structures.

[00151] Structure data format files .sdf files) containing structural information and toxicology data of a set of toxic known chemical compounds, were gathered on the public World Wide Web repositories as described above. These types of data files, extended from the MDL file format, contain both structural information about each compound (list and position of all atoms in two or three dimensions, information about inter atomic bonds and connectivity between atoms) and chemical related information. Main chemical and Meta information collected in the data warehouse from the sdf original files are chemical related data such as molecular weight, molecule formula, molecule SMILES (simplified molecular input line entry specification) representation, IUPAC International Chemical Identifier (InChi™) and Chemical Abstracts Service Registry Number (CASRN). In addition to the common structural, formulaic, and identification data, sdf files contain Meta information about each compound, related to toxicology and pharmacology. In one aspect, all toxicology information, provided from several studies from each original database source, are collected in a data warehouse, and all data corresponding to one single compound referenced in several databases are merged to one entry in the data warehouse.

[00152] In another aspect, in addition to toxic data fields, information about toxicology studies can be stored in a data warehouse as a set of information relative to one compound and one study. Toxicology test data are provided from several sources, like the Genetic Toxicology Data Bank (GENE-TOX) from NIH

TOXNET® (Toxicology Data Network).

[00153] In one embodiment, structural and toxicology related information collected from original databases of toxic compounds can be merged in an in-house structural and information database. To merge information relative to one single compound, InChi identifier can be used. For each molecule incorporated in the database from the structure data file, structure information can be extracted when possible and standardized in three dimensions using the Marvin Application

Programming Interface provided by ChemAxon Ltd. InChi and InChiKey hash is then calculated using the Marvin incorporated InChi 1.02 library and used to determine if the compound is already in the database. If this compound exists in a data warehouse, all toxicology-related Meta information can be merged to the existing entry, otherwise a new molecule entry is created.

[00154] Structural information can be extracted from sdf files as described above, using the ChemAxon Ltd. Marvin API. Each molecules three dimensional structures can be calculated from 2D information using the ChemAxon Marvin clean method, and explicit Hydrogen atoms are generated with the Marvin Hydrogenise function.

Next, ChemAxon calculation plug-in MajorMicrospeciesPlugin can be used to determine a protonation state of each the compound at physiological pH 7.4. This primary three dimensional structure of each molecule can be the used as reference structure to explore the multidimensional isomeric space. In any case, structure data files of toxic compound may include several molecules along with the main structure such as ions or solvent molecules (e.g. water molecules). In this case, all additional structures can be removed.

[00155] In one embodiment, for each toxic compound stored in the reference a data warehouse, an exhaustive exploration of all possible isomers, including stereoisomers, enantiomers, E-Z isomers, diastereoisomers, conformers, tautomers and mesomers, can be done by using the ChemAxon Marvin plug-in suite. Then, a multidimensional pharmacophore space can be created after transforming all the isomers into geodesic pharmacophores or phoromers™. Phoromers are obtained as followed: first, stereoisomers of the parent chemical compound are generated and then, their conformational space is explored; for each stereoisomer, unrealistic isomers or those containing steric clashes are eliminated by using the molecular Dreiding generic force field; finally, mesomers and tautomers are evaluated for each calculated stereoisomer.

[00156] Multi-dimensional pharmacophores are several pharmacophore molecules representing different conformations of a single molecule and the corresponding conformational space. Various embodiments of a multidimensional pharmacophore include representation of physico-chemical properties of a molecule over its entire conformational space, a data- warehouse system to organize this data for thousands of toxic compounds characterized in the literature, a screening methodology to perform multi-dimensional comparisons of a given molecule's pharmacophore against all those stored in the data-warehouse and an expert toxico- biological analysis of the results obtained. In one aspect, these embodiments can be combined into a platform for prediction of toxicity. Embodiments that include multidimensional pharmacophores are not only based on the molecules' three- dimensional (3D) representation, but can also be exploited to consider an entire conformational space, taking account of physico-chemical properties, such as constitutional isomers and stereoisomers, geometrical definition of hydrogen-bond and halogen-bond donor and acceptor vectors, hydrophobic ellipsoidal volumes and aromatic π-orbitals vectors, or polar charges properties. [00157] To estimate molecular diversity, or how atom types and chemical groups are represented in the molecules of the data warehouse, their occurrence in the parent molecules is calculated. For each pharmacophore set, the occurrence of the center types was estimated. Referring to Figures 14A-14C, molecular diversity can also be defined in term of compound size (Figure 14 A). To get a representation of the size scale, the total number of atoms and functions per molecule was enumerated and molecules of each size were counted and classified by their molecular weight (Figures 14B-14C). Pharmacophore size diversity was evaluated by counting the total number of centers per pharmacophore and counted pharmacophore of each size (Figure 15).

[00158] Complexity in chemical diversity can be evaluated by establishing a relationship between the chemical composition of a compound and the spatial arrangement of its different sub-scaffolds. This complexity can be evaluated by the number of different functions per molecules, the relation between the molecular weight and the total number of atoms, the number of different chemical groups among the total number of chemical groups in a molecule. For the pharmacophores, the complexity in diversity can be evaluated by the number of centers per pharmacophore (Figure 15 A), and the number of different center types as a function of the total number of centers per pharmacophore (Figure 15B). Pharmacophore size diversity was evaluated by counting the total number of centers per pharmacophore and counting the number of pharmacophores of each size.

[00159] In order to determine the portion of the 3D space occupied by the compounds, the spheroid that roughly represents the compound global shape was calculated. Each molecule was placed into an orthonormal basis and measured the greatest dimensions along each axis (Figures 20A-20C). In this way, the volume and the occupied space can be estimated by one compound and qualifies them from spherical to ellipsoidal shapes, representing globular, flat and linear molecules (Figures 1 OA- IOC). Comparing to the scattered plot or parent pharmacophore ellipsoids, the plot of isomers pharmacophore ellipsoids is much more outspread than the parent cloud (Figures 21 A-21B). It illustrates the contribution of the conformers to the screening space covered by the data warehouse. Phoromers increase pharmacophore space like conformers for molecular space. For the isomers pharmacophores, their volume was calculated as described above for the parent molecules with the help of spheroids and illustrated in a scatter plot of the parent pharmacophore's ellipsoid in Figure 21 A and a scatter plot of isomers

pharmacophore's ellipsoids in Figure 2 IB, which provide a comparison between the two to illustrate the contribution of conformers to the screening space covered by the data warehouse.

SCREENING A QUERY PHARMACOPHORE AGAINST A MULTIDIMENSIONAL PHARMACOPHORE TOX DATABASE

[00160] In one embodiment, the query pharmacophore, generated from the candidate molecule, is screened versus a multidimensional pharmacophore toxicology database to predict toxicity of the candidate molecule based on pharmacophore similarity.

[00161] For any source molecule, toxicity prediction can be processed by measuring its physio-chemical multidimensional similarity to reference toxic compounds stored in an expert database. This similarity screening can be made after transformation of compounds' chemical structure (query and database compound) into a geodesic pharmacophore defined in a spherical coordinates. The

pharmacophore is made of a set of pharmacophore points each corresponding to a specific geometric or electronic feature representing at least one element responsible for a physiological activity. Those features can include physicochemical properties, such as Hydrogen bond donors, acceptors, and both, aromatic rings as hydrophobic entities or charge transfer acceptors or donors, hydrophobic volumes polar groups (positively or negatively charged), halogen bond donors or acceptors, Hydrogen and halogen bonds ambivalent acceptors and radical centers (Figure 17).

[00162] Each of the pharmacophore features of the source molecule are represented in spherical coordinates by using three parameters: the radial distance r from the point from the origin of the system (the geometric center), its elevation angle Θ, and its azimuth angle φ. Conversion between spherical and Cartesian coordinates is then possible by using a set of equations as described in Figure 16. In addition to the definition of pharmacophore centers that locate the position of the pharmacophore point in the reference coordinates, for some functional physicochemical properties such as Hydrogen and halogen bonds and aromatic group, a field of normal vectors can be added to describe the anisotropic nature of the interactions they promote. Encoding of vectors is shown in Figure 5; illustration of vectors is shown in Figure 1 1.

[00163] In one aspect, the database pharmacophore-based screening can be made to extract candidate molecules, and potentially substrates of an enzyme, where water participates in the catalytic reaction. Water is able to influence conformational properties and electronic distribution of solute molecules and influence their binding to macromolecule targets. In one aspect, the software takes into account of the influence of the first and second solvation shells by automatically generating a water box around the solute molecule saturating all possible acceptor and donor H-bond centers of all conformations of a molecule. A force field relaxes the complex in order to avoid steric clashes and adjust Hydrogen bonds. Water can sculpt molecules and can promote charge transfers with the solvated molecule. Charge transfers can be strong enough to form cyclic cluster with the molecule creating additional rings with a high charge density in the vicinity of the Hydrogen bonds at bond critical points.

PHARMACOPHORE SIMILARITY

[00164] In another embodiment, the disclosure provides a method for evaluation of any source molecule for, e.g., prediction of toxicity, or prediction of physiological activity, based on the concept of pharmacophore similarity. A pharmacophore similarity between a source molecule and thousands of reference molecules having well-identified toxicity data recorded in reference databases of toxic compounds is searched. A good virtual screening process can include highly reliable and diverse databases to cover a wide and insightful range of the entire chemical space. The quality of the toxicity screening can be dependent on the quality of experimental assays performed on molecules belonging to the reference database. Standardized biological reference data can be utilized for virtual screening and data mining.

[00165] Molecular similarity (or congruence) searching is a key tool in drug design. In drug design, an amount of similarity in a structure or a portion of a structure should be correlated to an equivalent amount of physiological activity. Searching common features between two molecules with the help of a molecular alignment algorithm, a similarity score and mathematical or physicochemical parameters can characterize limits of the chemical space, e.g., provide a unit of measurement. Screening algorithms that rely on two-dimensional (2D) fragment matching is demonstrably less effective at grouping together molecules with similar physiological activity or toxicity. Such systems reduce the dimensionality of the molecular description to one or two dimensions. Two molecules can contain one or several common fragments in their chemical structure, yet not show the same activity and/or the same toxicity mechanism. A good example is promethazine (targeting histamine HI receptors) and imipramine (targeting presynaptic serotonin and norepinephrine transporters). Both molecules have two similar fragments representing 60% of their structure while their respective targets, activities and side effects are, however, completely different. Changing only one atom can promote dramatic changes in its electronic and physicochemical properties that impact on their biological or therapeutic activities.

[00166] As disclosed herein, improvements with respect to molecular similarity analysis are seen with the emergence of pharmacophore-based screening methods. For example, see Figure 23A-23F, and example 4, provided herein. In each of five genotoxicity or carcinogenicity assays, the predictivity, in terms of sensitivity, specificity and concordance, of the MultiDIP Tox screening method exhibited improved results compared to that of either of the two commercially available methods DEREK and MCASE/MC4PC.

[00167] The software uses an innovative and sophisticated pharmacophore definition in spherical coordinates enabling it to handle congruence and similarity between two molecules in a multidimensional space that increase dramatically the accuracy of the measure (Figure 11).

[00168] Similarity in structure can also be evaluated by visual comparison of the three-dimensional structures in graphical format, or by any of a variety of computational comparisons. For example, an atom equivalency may be represented in peptidomimetic and pharmacophore three-dimensional structures, and a fitting operation can be used to establish a level of similarity. As used herein, an atom equivalency is a set of conserved atoms in the two structures. [00169] A fitting operation may be any process by which a candidate compound structure is translated and rotated to obtain an optimum fit with the cyclic peptide structure. A fitting operation may be a rigid fitting operation (e.g., the

pharmacophore structure can be kept rigid and the three dimensional structure of the peptidomimetic can be translated and rotated to obtain an optimum fit with the pharmacophore structure). Alternatively, a fitting operation may use a least squares fitting algorithm that computes the optimum translation and rotation to be applied to the moving compound structure, such that the root mean square difference of the fit over the specified pairs of equivalent atoms is a minimum. Atom equivalencies may be established by the user and the fitting operation is performed using any of a variety of available software applications (e.g., INSIGHT II (available from

Accelrys Inc. in San Diego, Calif.) or QUANTA, (available from Molecular

Simulations)). As disclosed herein, three-dimensional structures of candidate compounds for use in establishing substantial similarity can be determined experimentally (e.g., using NMR or X-ray crystallography techniques) or may be computer generated.

[00170] Generally, the three-dimensional structure of a candidate compound is considered substantially similar to that of a pharmacophore if the two structures have RMSD less than or equal to about one angstrom, as calculated, e.g., using the Molecular Similarity module with the QUANTA program (Biopolymer module of INSIGHT II program available from Accelrys, Inc., San Diego, Calif.) or using other molecular modeling programs and algorithms that are available to those skilled in the art. In possible embodiments, compounds have a RMSD less than or equal to about 1.0 Angstrom. In other possible embodiments, compounds have an RMSD that is less than or equal to about 0.5 Angstrom, and in still other embodiments about 0.1 Angstroms. In one aspect, a candidate compound will have at least one low-energy three-dimensional structure that is or is predicted to be (e.g. by ab-initio modeling) substantially similar to a library pharmacophore.

[00171] In various embodiments, any of a variety of databases of three- dimensional structures can be used for such searches. In one embodiment, a database of three-dimensional structures is prepared by generating three-dimensional structures of compounds as spherical or otherwise geodesic pharmacophores, and storing the pharmacophores in the form of data storage material encoded with machine-readable data. Three-dimensional structures can be displayed on a machine capable of displaying a graphical three-dimensional representation and programmed with instructions for using the data. Additionally, the structure can be modified to correct any errors included in the representation to increase the accuracy of the search. A three-dimensional (3D) structure database contains up to 100,000 or more compounds such as small, non-peptidyl molecules having relatively simple chemical structures. The National Cancer Institute (NCI) 3D-database and the Available Chemicals Directory (ACD.) are examples of databases that can be used to generate a database of three-dimensional structures, e.g., spherical pharmacophores, using molecular modeling methods such as those described, supra. Pharmacophore queries can be developed from an evaluation of distances in the three-dimensional structure of the pharmacophore. Critical pharmacophore distances, which can be used in a pharmacophore search, are indicated by lines drawn between different functional groups. These distances can be readily determined and evaluated by a user.

[00172] Using a pharmacophore query, screening can be performed on a database to identify compounds that fulfill possible geometrical constraints. Candidate compounds can be scanned to determine physical features (e.g., Hydrogen bond donors, Hydrogen bond acceptors, hydrophobic volumes, etc.) and geometric parameters (e.g., relative distances between important physical points). Chemical groups (e.g., hydrophobic, N¾⁺, carbonyl, carboxylate) can be used to map the surface of each candidate compound, while interaction fields are utilized to extract the number and nature of key-points within candidate compound molecules.

[00173] In other aspects, fitting a compound to a pharmacophore volume can be done using other computational methods. Visual inspection and manual docking of compounds into the active site volume can be done using such programs as NAMO or VMD. This modeling step may be followed by energy minimization using standard force fields, such as CHARMM or AMBER. Other more specialized modeling programs include GRID, MCSS, AUTODOCK, and DOCK. In addition, compounds may be constructed de novo in an empty active site or in an active site including some portions of a known inhibitor using computer programs such as LUDI or LeapFrog. [00174] In various embodiments, ligands with a shared activity may be overlaid directly (e.g., MOE (Chemical Computing Group, Inc.), FlexS (cartan.gmd.de/flexs) and Medchem Explorer (Accelrys Inc., San Diego, Calif.)). A number of algorithms have been developed which consider rigid-body, semi-flexible, and flexible superimpositioning of small molecules.

[00175] In various embodiments, source pharmacophores and database pharmacophores can be superimposed and/or aligned. The degree of similarity between the library pharmacophore features and the corresponding features of a query pharmacophore can be calculated and utilized to determine a degree of similarity between two molecules.

[00176] In one aspect, the degree of similarity between two pharmacophores (e.g., the target and source pharmacophores) can be determined by calculation of a superimposition score, or RMSD for certain pharmacophore points and/or vectors, between the compared pharmacophores. If any three-dimensional pharmacophore of the target database is found to fit with a three-dimensional pharmacophore of the source, a superimposition score can be calculated, comprised between 0 (no superimposition) and 1 (perfect match for two identical pharmacophores). The method of calculating the superimposition score for two pharmacophores is disclosed in Example 4, infra. In one aspect, relevant hits may be determined by exhibiting a pharmacophore superimposition score greater than, e.g., 0.25, 0.30, 0.35, 0.40, 0.45, or 0.50.

[00177] In another aspect, the degree of similarity between two pharmacophores can be determined by calculation of the RMSD between the number of fitted pharmacophore centers, the root mean square deviation (RMSD) of the

pharmacophore pair according to the pharmacophore center's position, the root mean square deviation (RMSD) of the pharmacophore pair according to the pharmacophore center's position and the pharmacophore center's vectors Hydrogen- bonds and Pi orientation can be calculated and reported.

[00178] In one aspect of the disclosure, after a superposition procedure, molecules with a high matching score or high degree of similarity are selected for further verification of their similarity. Programs, such as ANOVA (performed, for example, with Minitab Statistical Software (Minitab, State College, Pa.)), can extract differences that are statistically significant for a value p between the pharmacophore and the candidate molecule. In at least one possible embodiment, the value of p is less than about 0.05. Candidate molecules with a p value above the desired p value are rejected.

[00179] In another aspect of the disclosure, the software aligns two

pharmacophores not only by optimizing the fit between centers but also by optimizing the alignment of the Hydrogen bond vectors' fields. Vector alignment increases the chance to find a match between two pharmacophores. Ligands can bind their targets and promote their activation even if the Hydrogen bonds network established between both entities, the ligand and the receptor, is not optimal, with the Hydrogen bond vectors not pointing in the ideal direction during the fitting procedure. The software allows the vectors moving in an authorized portion of space where a Hydrogen bond is still possible, maximizing the chance to find similar ligands. This strategy is also applied to other anisotropic non bonded interactions like halogen bonds, cation-, anion-, Hydrogen- and halogen-π (see Figures 18, 19A, 19B, 19C and 19D, respectively).

[00180] High throughput screening of millions of pharmacophores implies a quick but efficient fitting algorithm. In one aspect, to decrease the time of pharmacophores structural alignment, distance matrix comparison can be used as a pre-fit. Given spherical 3D coordinates of each pharmacophore center of the pharmacophore, the matrix of the distances between one pharmacophore center and all others can be calculated for query pharmacophore and each toxic pharmacophore of the data warehouse. Comparing two distance matrices can be a first step to determine if pharmacophores can specifically fit together. Indeed, if a minimum of four centers of a given pharmacophore can present the same distances profiles than four centers of the other one, it will allow the partial 3D superimposition of both pharmacophores according to these fitted centers. If it is impossible to match at least four pharmacophore centers using distances matrix, it means that both pharmacophores do not share a common framework of four pharmacophore centers, and the tested toxic pharmacophore will be excluded. If, on the other hand, it is possible to determine a minimum superimposed framework of four pharmacophore centers aligned, it means that both pharmacophores are able to fit each other. The next step can be a fine superimposition of them, according to the set of known aligned pharmacophores that have been determined by the distance matrix comparison. The nature of matched pharmacophore centers can be considered, and incompatible fitted centers (like a polar negative center fitted with a hydrophobic one) are removed from the matched structure. After hyperfine fitting of both pharmacophores, a superimposition score S is calculated, so that: 0 < S < 1.

[00181] This score, based on centers spatial root mean square deviation, vectors orientation, mismatched centers and matched centers likelihood can be used for the ranking of all positives matches of the screening.

[00182] In various embodiments, toxicity screening of a candidate compound versus the multidimensional pharmacophores database can be executed from full structural molecule information (provided by a molecule data file of two or three dimensional atoms coordinates such as MDL file, Tripos SYBYL mol2 file or xyz file), or from a pharmacophore-only input. In the first aspect, the candidate molecule can be firstly standardized as described above, using the ChemAxon Marvin API in order to compute the right three-dimensional coordinates of the molecule (including Hydrogen atoms) and its correct protonation state at given pH. To improve confidentiality about candidate molecule, it can be possible to perform toxicity screen test from a pre-calculated pharmacophore structure. In the second aspect, a software compatible pharmacophore of the source may be generated from an offline tool or generated by hand. Once given the input pharmacophore structure, a toxicity screen can be performed versus the whole or part of the toxic

pharmacophores database using a pharmacophore superimposition algorithm. One example of a superimposition algorithm is provided in Example 4. In one aspect, the pharmacophore screening process is restricted by specifically selecting a class of toxic molecules to test versus target molecule. For example, one or more toxic endpoints are selected from among several corresponding to toxic compounds in the database, such as carcinogenicity, mutagenicity, genotoxicity, teratogenicity, cytotoxicity, skin sensitization, ocular toxicity, irritancy, respiratory sensitization, hepatoxicity, neurotoxicity and ecotoxicity. In another aspect, the screening process can be restricted to specify the original database from where all toxic pharmacophores have been extracted. It is also possible to select toxic

pharmacophores that have been the subject of a specific toxicology study. Once the test compound pharmacophore is defined (calculated from the source molecule to test or provided as it stands) and the target set of toxic multidimensional

pharmacophores is chosen, the toxicity screen can be run. Each pharmacophore of the toxic database is compared to the test molecule pharmacophore using a specific three dimensional alignment algorithm. Toxic pharmacophores that fit with the source molecule will be selected as positive hits, sorted according to the alignment score function.

[00183] In different aspects, a number of different mathematical indices can be utilized to measure the similarity between pharmacophore and candidate molecules. Mathematical indices can be incorporated in software packages. The choice of mathematical indices will depend on a number of factors, such as the

pharmacophore of interest, the library of candidate molecules, and the functional groups identified as essential for activity. For a review on this topic see, Frederique et al, Current Topics in Medicinal Chem. 2004, 4: 589-600.

[00184] In one embodiment, the disclosure provides a method to calculate a Pharmacophore Superimposition Score between two pharmacophores by comparison of paired and unpaired pharmacophoric points. The method is described in detail in Example 4.

STATISTICAL ANALYSIS

[00185] In various embodiments, source molecule toxicity assessment depends on virtual screening results and pharmacophoric superimposition degree with pharmacophores generated from known toxic compounds in the expert database. Some particular endpoints can also be characterized, depending on the level of information integrated in each endpoint data mart. In another aspect, for compounds not already included in the data warehouse, the other result alternative is to find reliable pharmacophoric similarities of the screened compound, or source molecule, compared with other compounds in the data warehouse. [00186] Reliability and performance of the predictions based on the herein disclosed methods were assessed with statistical measures of binary classification tests: sensitivity, specificity and accuracy (also known as concordance). Some statistical predictions measures include True Positive predictions as TP, True Negative predictions as TN, False Positive predictions as FP and False Negative predictions as FN.

[00187] Sensitivity is Se= TP/(TP+FN) and represents the true positive rate of the prediction model, measuring the proportion of actual positives correctly predicted (e.g. the percentage of toxic compounds which are identified as toxic).

[00188] Specificity (Sp= TN/(TN+FP)) represents the true negative rate and measures the proportion of negatives which are correctly identified (e.g. the percentage of non-toxic compounds which are identified as non-toxic). Accuracy (Acc) (also known as concordance) of the prediction model is the degree of closeness of measurements of the toxicity to its true value (e.g. the proportion of true results, both true positives and true negatives), and is represented as:

(TP+TN)/(TP+FP+TN+FN).

[00189] A Receiver Operating Characteristic (ROC) analysis can provide statistical representation to select possibly optimal models and to discard suboptimal ones independently from the class distribution. So, statistical measures of the toxicity benchmark comparison can be represented by ROC space. This space can be characterized by sensitivity and 1 -specificity as x and y axes respectively, where each prediction result, (one instance of a confusion matrix) represents one point in the ROC space. A perfect classification would yield a point at the coordinate (0,1) in the ROC space, with 100% of sensitivity and specificity (no false positives, neither false negatives). A completely random guess would give a point along a diagonal line, the model is then non-discriminative. As a matter of facts, the diagonal divides the ROC space, points above the diagonal represent good classification results, points below the line poor results.

[00190] As illustrated in Figure 22, a confusion matrix is a visualization tool typically used in supervised learning. The confusion matrix can be used to illustrate statistical measures of the performance of the in silico toxicity prediction. Each row of the matrix represents the instances in a predicted class (predicted in silico toxicity), while each column represents the instances in an actual class (actual in vitro or in vivo toxicity).

EXAMPLES

Example 1. Pharmacophore screening versus a toxic pharmacophore database.

[00191] This example illustrates a standard MultiDIP screening procedure where a single compound represented by a pharmacophore is screened against

multidimensional pharmacophores generated from a reference compound database.

[00192] The compound Mefenorex was used as the source molecule for a MultiDIP toxicity Tox screen. Meferonex (CASR : 17243-57-1) is an

Amphetamine derivative which was developed in the 1970s and used for the treatment of obesity. Vincendeau MJ. A new regulator of appetite: mefenorex.

(French). Bordeaux Medical. 1970 Jul-Aug;3(7):1951-3. Beyer G, Huth K, Muller GM, Niemoller H, Raisp I, Vorberg G. The treatment of obesity with the appetite curbing agent Mefenorex. (German). Die Medizinische Welt. 1980 Feb;31(8):306-9. Mefenorex produces Amphetamine as a metabolite, and has been withdrawn in many countries since 1999 because of risk of arterial pulmonary hypertension.

http://www.afssaps.fr/Infos-de-securite/Communiques-de-presse/Retrait-definitif- du-marche-des-medicaments-anorexigenes/ (language)/ fre-FR.

http://www.afssaps.fr/Infos-de-securite/Communiques-de-presse/Anorexigenes-et- risque-d-hypertension-arterielle-pulmonaire/(language)/fre-FR. The source molecule was encoded to a pharmacophore using the process described in Example 2. In the present example, isomers of the query molecule were not generated. The mefenorex pharmacophore in this example included five pharmacophore points: an aromatic and a hydrophobic point superimposed together, a polar pharmacophore point and a halogen bond donor and a Hydrogen bond acceptor pharmacophore point superimposed together. Figure 10 illustrates an example process of multidimensional pharmacophore screening of the source molecule mefenorex versus a toxic pharmacophore database. [00193] As illustrated in Figures 1 OA-1 OD, the pharmacophore of the query molecule was screened versus a toxic pharmacophore database. In Figure 1 OA one pharmacophore for Meferonex is shown superimposed over the chemical structure. In the present example, isomers of the source molecule were not generated. The Mefenorex pharmacophore includes five pharmacophore points: an aromatic and a hydrophobic point superimposed together, a polar pharmacophore point and a halogen bond donor and a Hydrogen bond acceptor pharmacophore point. After screening against a multi-dimensional pharmacophore (MultiDIP) Tox database containing approximately 2 million pharmacophores, several hits were identified. The source pharmacophore is shown superimposed with two hits: a Mephentermine pharmacophore (Figure 10C), an alpha-adrenergic receptor agonist, and a Bephenum pharmacophore (Figure 10D). The multidimensional toxic pharmacophore database included about 2.2 million or more pharmacophores, which correspond to about 12,000 or more toxic compounds (Figure 10B). After the virtual screening procedure, several different toxic pharmacophores were found to fit with the query pharmacophore. The superimpositions of two toxic pharmacophores with the query pharmacophore are shown. The query pharmacophore is represented in dark gray. The hit pharmacophores are represented in light gray.

[00194] The first hit pharmacophore shown is the pharmacophore of

Mephentermine (Figure IOC), an alpha-adrenergic receptor agonist. This compound has been described in the FDA Maximum Daily Dose. Mephentermine may produce arrhythmias, including extrasystoles, and hypertension. Arrhythmias are most likely to occur in patients with heart disease or those receiving other drugs which may increase cardiac irritability such as cyclopropane or halogenated hydrocarbon general anesthetics.

[00195] The second pharmacophore is the pharmacophore of a Bephenum molecule (Figure 10D), described in the FDA Maximum Daily Dose. Salivation, vomiting and diarrhea have been noted in animal toxicological studies. Information is lacking on safety of bephenium during pregnancy; therefore, possible risk to fetus should be weighed against expected therapeutic benefits if this agent is considered for use in pregnant women. Because of its bitter taste, bephenium may provoke nausea and vomiting, and occasionally may cause mild and temporary looseness of stool.

Example 2: Multidimensional Pharmacophore screening versus DSSTox database

[00196] This example illustrates a standard MultiDIP screening procedure where a single compound represented by multiple pharmacophores is screened against pharmacophores generated from a reference compound database.

Reference compounds database

[00197] In this example, the Distributed Structure-Searchable Toxicity (DSSTox) database was used as reference pharmacophores database. See, for example, the original DSSTox Publication in Mutation Research, 2002 Richard AM and Williams CR (2002) Distributed Structure-Searchable Toxicity (DSSTox) Public Database Network: A Proposal, Mutation Research: New Frontiers, 499:27-52, which is incorporated herein by reference. DSSTox is a chemoinformatics resource provided by Environmental Protection Agency's (EPA) National Center for Computational Toxicology of United States, including information about more than 7,000 chemicals. The database provides SDF (structure data files) molecular files, which provide a wide range of information about the compound including both structural data and assay data for each compound. The structural data includes, for example, at least Chemical Formula, CAS number, InChi identifier, and 2D or 3D atomic structure. The assay data on the compounds includes, for example, the type of toxicity study, type of toxicity measure, species used in the toxicity study, and assay data.

Pharmacophores database construction

[00198] All the Meta information about the compounds was downloaded from the DSSTox internet server, available at http://www.epa.gov/ncct/dsstox/DataFiles.html. The SDF molecular files are processed and data are collected into the MultiDIP data warehouse. Chemaxon Marvin application was then used to calculate three- dimensional coordinates of molecules from the two-dimensional structures read from the SDF data files. Marvin was used for drawing, displaying and characterizing chemical structures, substructures and reactions, Marvin 5.2, 2009, ChemAxon (http://www.chemaxon.com). All missing Hydrogen atoms were added using the Marvin Hydrogenize function, and structures were cleaned using the fine option. Marvin MajorMicrospecies calculator plugin was used to calculate the major protonation form of each molecule at pH 7.4. All clean and correctly protonated structures were stored in MultiDIP Structural database as reference structures. In order to explore the four dimensional space during the screening procedure, main isomers of each reference molecule was estimated using Chemaxon Marvin Plugin. For one parent molecule, a maximum number of 8 stereoisomers were calculated using the Stereo Isomer Plugin, including both tetrahedral and double bonds stereoisomers. For each stereoisomer, a maximum number of 8 mesomeric structures were calculated using the Chemaxon Resonance Plugin selecting only the most relevant structures with the major contributors option. For each mesomer, a maximum number of 8 tautomeric structures were calculated using the Chemaxon Tautomerization Plugin. Then, for each of these isomers, the 128 most stable conformers were calculated using the Chemaxon ConformerPlugin, where Dreiding force field energy is used to sort molecules. To prevent selection of too identical compounds, diversity limit option is set to 0.1. Pharmacophores of all reference molecules and isomers were then calculated using a batch procedure, and stored in the MultiDIP Structural Database. Identical pharmacophores were merged to one entry in database.

Source Molecule and Virtual Screening

[00199] The source molecule selected to be screened against the DSSTox compounds was Mefenorex (CASRN: 17243-57-1). Mefenorex is not present in the DSSTox database.

[00200] The three-dimensional structure data file of the Mefenorex was downloaded from PubChem database (pubchem CID=21777).

http://pubchem.ncbi.nlm.nih.gov. Geer LY, Marchler-Bauer A, Geer RC, Han L, He

J, He S, Liu C, Shi W, Bryant SH. The NCBI BioSystems database. Nucleic Acids

Res. 2010 Jan; 38(Database issue):D492-6. (Epub 2009 Oct 23). Three-dimensional coordinates were optimized using the Chemaxon Marvin clean 3D tool, and the major protonation state at pH 7.4 was calculated using the MajorMicrospecies Plugin. In order to calculate the multidimensional pharmacophore of this compound, a set of 6 isomers was calculated with Chemaxon Calculator Pluginvi cited above. For each isomer of source molecule, the corresponding pharmacophore was calculated. All of the 6 three-dimensional pharmacophores were used for virtual screening and were used to define the multidimensional pharmacophore of the Mefenorex.

[00201] Pharmacophore virtual screening (in silico) was done using Mefenorex multidimensional pharmacophore as source pharmacophore. For each three- dimensional pharmacophore of the source, the whole target database was screened for fit. For each three-dimensional pharmacophore of the target database that fit with a three-dimensional pharmacophore of the source, a superimposition score (described in Example 4) was calculated, comprised between 0 (no superimposition) and 1 (perfect match for two identical pharmacophores).

Results & Analysis

After the virtual screening process, a total number of 34 relevant hits showing a pharmacophore superimposition score up to 0.4 were found (Table 3). Because of the multidimensional approach (a set of input compound isomer pharmacophores is screened versus the DrugBank database) multiple hits are possible for one target compound in the database: each one is corresponding to the fit of one source molecule with the best target compound. In this example, all the 34 hits, representing a total number of 27 identified targets in the DSSTox database, with a mean of 1.3 hit per target. For each result are reported the pharmacophore superimposition score, the number of fitted pharmacophore centers, the root mean square deviation (RMSD) of the pharmacophore pair according to the pharmacophore centers' position, the root mean square deviation (RMSD) of the pharmacophore pair according to the pharmacophore centers' position and the pharmacophore centers' vectors Hydrogen-bonds and Pi orientation. The raw result set with these data is presented in Table 3.

[00202] For each target molecule identified in the raw result set, additional superimposition data are shown in full fitting result sheet. The Full result sheet discloses more information about a precise hit, such as exact description of the pharmacophore target centers including pharmacophore centers types and vectors, additional data about the pharmacophores fit superimposition with detailed superimposition score for each pair of pharmacophore centers fitted together, and three-dimensional visualization of the two pharmacophore superimposition. Three dimensional representation source (Figure 24) and target pharmacophores relative to Mephentermine hit (Score = 0.7147) (Figure 26) and relative to Amphetamine hit (Score = 0.5829) (Figure 28). The figures have been realized using a Jmol applet. Jmol: an open-source Java viewer for chemical structures in 3D.

http://www.jmol.org . In one source pharmacophore representation shown in Figure 24, the dark grey spheres labeled Bl and B2 represent Hydrogen-bond

pharmacophore centers, either Hydrogen donors or Hydrogen acceptors. The vectors centered on the B2 sphere represent Hydrogen-bond acceptors vectors and the vectors centered on the Bl sphere represent Hydrogen-bond donors. The light grey ellipsoid represents s hydrophobic volumes and the optional associated vectors, which represent aromatic pi-orbitals of aromatic pharmacophore centers. Figure 25 shows a target pharmacophore representation, which corresponds to the

pharmacophore of one isomer of Mephentermine having three pharmacophore centers and Figure 27 shows target pharmacophore representation, which corresponds to the pharmacophore of one isomer of Amphetamine. Information about the isomer generation and stability (evaluated by Dreiding Energy By

Chemaxon calculation Plugin) that leads to this pharmacophore are present in the hit full report. DREIDING: A Generic Force Field for Molecular Simulations, S.L. Mayo, J . Phys. Chem. 1990, 94, 8897-8909.

[00203] Figures 26 and 28 show source pharmacophore for Mefenorex superimposed with the target pharmacophores Mephentermine and Amphetamine, respectively. The spacial superimposition of the two fitted pharmacophores, revealing multiple matches between source and target pharmacophore centers. In the case of Mephentermine, a perfect position related match was found between Hydrogen-bonds acceptors (source center B2 and hit center ΒΓ), and a perfect match was found between two aromatic and hydrophobic centers (Υ₃Η₄-Υ₂Ή_3').

Specifity of Mephentermine (CASRN: 100-92-5) :

[00204] Mephentermine is an alpha adrenergic receptor agonist, but also acts indirectly by releasing endogenous norepinephrine. Cardiac output and systolic and diastolic pressures are usually increased. A change in heart rate is variable, depending on the degree of vagal tone. http://www.drugbank.ca/drugs/DB01365.

[00205] Mephentermine may produce arrhythmias, including extrasystoles, and AV block and hypertension. McEvoy G.K. (ed.). American Hospital Formulary Service-Drug Information 96. Bethesda, MD: American Society of Health-System Pharmacists, Inc. 1996 (Plus Supplements)., p. 888. Adverse reactions to

mephentermine may be especially likely to occur in patients with cardiovascular disease, hypertension, hyperthyroidism, or chronic illness; the drug must be administered with caution to such patients. McEvoy G.K. (ed.). American Hospital Formulary Service-Drug Information 96. Bethesda, MD: American Society of Health-System Pharmacists, Inc. 1996 (Plus Supplements)., p. 889.

[00206] In the case of Amphetamine, an almost perfect position related match was found between Hydrogen-bond acceptors (source center B2 and hit center Β ), and a perfect match was found between two aromatic and hydrophobic centers (Υ3Η4-Υ2Ή3·).

Specifity of Amphetamine (CASRN: 300-62-9)

[00207] Amphetamine is used for the following indications: Psychostimulant: Accepted indications: Narcolepsy; Hyperkinetic states in children (as an adjunct to psychological, educational and social measures). Previous indications include (not currently recommended): Appetite suppressant, relief of fatigue. Main risks and target organs: Acute central nervous system stimulation, cardiotoxicity causing tachycardia, arrhythmias, hypertension and cardiovascular collapse. Summary of clinical effects: Cardiovascular: Palpitation, chest pain, tachycardia, arrhythmias and hypertension are common. International Programme on Chemical Safety; Poisons Information Monograph: Amphetamine (PIM 934) (1998). Available from, as of May 16, 2008: http://www.inchem.org/pages/pims.html.

Conclusions

[00208] Mefenorex was selected as a source molecule since it was previously withdrawn from the market due to the discovered toxicity in the marketed drug; specifically increased risk of arterial pulmonary hypertension. By Multidimensional Pharmacophore screening against a DSSTox database, the same toxicity was predicted in the first results of the screening method. Superimposition of the pharmacophores of Mefenorex with the database pharmacophores resulted in matches with other known drugs known to pose a risk of hypertension; for example, Mephentermine, as well as Amphetamine, a known metabolite of Mefenorex.

Table 3: MultiDIP Raw Results Set for Mefenorex

Fit

Target Hit Pharmacophore Rmsd C Rmsd V Score

Centers

(l-methyl-2- phenylethyl)hydrazine 3 0.0348 0.3251 0.4959 hydrochloride

Scopolamine 3 0.0000 0.6946 0.4939

Atropine N-oxide 3 0.0000 0.5114 0.4908

N-(3 -Methoxypropyl)- 3,4,5- 3 0.1727 0.9485 0.4858 trimethoxybenzylamine

3 ,4,4'-Trichlorocarbanilide 3 0.0705 0.7689 0.4823

Scopolamine hydrobromide

3 0.0501 0.8585 0.4771 trihydrate

Dobutamine 4 0.1596 0.3293 0.4747

Nylidrin 4 0.8062 0.9315 0.4382

Isopropamide 4 0.0000 0.4959 0.4376

Bephenium

3 0.0000 0.6286 0.4288 hydroxynaphthoate

Bromperidol 4 0.1662 0.8748 0.4286

Bendroflumethiazide 4 0.2378 0.7074 0.4121

Difenoxin 4 0.1472 0.4941 0.4055

HTI-286 4 0.2051 0.7741 0.403

SR 5923 OA hydrochloride 4 0.0000 0.3487 0.4027

Example 3: Multidimensional Pharmacophore screening versus DrugBank

database [00209] This example illustrates a standard MultiDIP screening procedure where a single compound pharmacophore is screened against pharmacophores of a reference compounds database.

Reference compounds database

[00210] In this example, the DrugBank database was used as reference pharmacophores database. DrugBank is a chemoinformatics resource provided by the University of Alberta, Canada, including information about more than 4800 drugs (either FDA-approved small molecule drugs, FDA-approved biotech (protein / peptide), nutraceuticals or experimental drugs). DrugBank: a knowledgebase for drugs, drug actions and drug targets. Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M. Nucleic Acids Res. 2008

Jan;36(Database issue). DrugBank: a comprehensive resource for in silico drug discovery and exploration. Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J. Nucleic Acids Res. 2006 Jan l;34(Database issue). Each entry is represented as a DrugCard field, which provide a wide range of information about the compound: structural data (Chemical Formula, CAS number, InChi identifier, Melting Point, LogP, Solubility ...), links to compounds entry to other databases (KEGG, PubChem, ChEBI, PDB, Swiss-Prot, GenBank), pharmacology data, ADMET data, information about drug metabolizing enzyme and drug target receptor.

Pharmacophores database construction

[00211] All the Meta information about the compounds were downloaded from the drugbank.ca internet server, http://www.drugbank.ca/downloads, Flat Files - Drug Flat Files - Data Set - All Drugs. The file containing all DrugCards entries is processed and data are collected into the MultiDIP data warehouse. The complete structure data file of all molecules included in the DrugBank Database was downloaded from the drugbank.ca internet server.

http://www.drugbank.ca/downloads, Structures - Drug Structures - All Structures . Chemaxon Marvin application was then used to calculate three-dimensional coordinates of molecules from the two-dimensional structures read from the DrugBank structure data file. Marvin was used for drawing, displaying and characterizing chemical structures, substructures and reactions, Marvin 5.2, 2009, ChemAxon (http://www.chemaxon.com). All missing Hydrogen atoms are added using the Marvin Hydrogenize function, and structures are cleaned using the fine option. Marvin MajorMicrospecies calculator plugin is used to calculate the major protonation form of each molecule at pH 7.4. All clean and correctly protonated structures are stored in MultiDIP Structural database as reference structures. In order to explore the four dimensional space during the screening procedure (three dimensions of the geometric space and the isomers/tautomers considered to be one dimension), main isomers of each reference molecule were estimated using

Chemaxon Marvin Plugin. For one parent molecule, a maximum number of 8 stereoisomers were calculated using the Stereo Isomer Plugin, including both tetrahedral and double bonds stereoisomers. For each stereoisomer, a maximum number of 8 mesomeric structures were calculated using the Chemaxon Resonance Plugin selecting only the most relevant structures with the major contributors option. For each mesomer, a maximum number of 8 tautomeric structures were calculated using the Chemaxon Tautomerization Plugin. Then, for each of these isomers, the 128 most stable conformers were calculated using the Chemaxon ConformerPlugin, where Dreiding force field energy is used to sort molecules. To prevent selection of too identical compounds, diversity limit option is set to 0.1.

[00212] Pharmacophores of all reference molecules and isomers were then calculated using a batch procedure, and stored in the MultiDIP Structural Database. Identical pharmacophores were merged to one entry in database.

Source Molecule and Virtual Screening

[00213] The Source molecule selected to be screened against the DrugBank compounds was beta-funaltrexamine (beta-FNA), an irreversible antagonist of the mu opioid receptor, which is not present in the DrugBank database. Opioid receptor binding characteristics of the non-equilibrium mu antagonist, beta-funaltrexamine

(beta-FNA). Ward SJ, Fries DS, Larson DL, Portoghese PS, Takemori AE. Eur J

Pharmacol. 1985 Jan 8;107(3):323-30. Three-dimensional structure data file of the beta-FNA was downloaded from PubChem database (pubchem CID=531101).

http://pubchem.ncbi.nlm.nih.gov, Geer LY, Marchler-Bauer A, Geer RC, Han L, He

J, He S, Liu C, Shi W, Bryant SH. The NCBI BioSystems database. Nucleic Acids Res. 2010 Jan; 38(Database issue):D492-6. (Epub 2009 Oct 23). Three- dimensional coordinates were optimized using the Chemaxon Marvin clean 3D tool, and the major protonation state at pH 7.4 was calculated using the

MajorMicrospecies Plugin. In order to calculate the multidimensional

pharmacophore of this compound, a set of 32 isomers was calculated with

Chemaxon Calculator Plugin cited above. For each isomer of source molecule, corresponding pharmacophore is calculated. After deleting identical

pharmacophores, a set of 23 different pharmacophores was used for virtual screening. All these 23 three-dimensional pharmacophores were used to define the multidimensional pharmacophore of the beta-funaltrexamine molecule.

[00214] Pharmacophore virtual screening was done using beta-FNA

multidimensional pharmacophore as source pharmacophore. For each three- dimensional pharmacophore of the source, the whole target database was processed. If any three-dimensional pharmacophore of the target database was found to fit with a three-dimensional pharmacophore of the source, a superimposition score

(described in Example 4) was calculated, comprised between 0 (no superimposition) and 1 (perfect match for two identical pharmacophores).

Results & Analysis

[00215] After the virtual in silico screening process, a total number of 73 relevant hits showing a pharmacophore superimposition score greater than 0.25 are found

(Table 4). Because of the multidimensional approach (a set of input compound isomer pharmacophores is screened versus the DrugBank database) multiple hits are possible for one target compound in the database: each one corresponding to the fit of one source molecule with the best target compound. In this example, the 73 hits are represented a total number of 39 identified targets in the DrugBank database, with a mean of 1.9 hit per target. For each result, the pharmacophore

superimposition score, the number of fitted pharmacophore centers, the root mean square deviation (RMSD) of the pharmacophore pair according to the

pharmacophore centers' position, the root mean square deviation (RMSD) of the pharmacophore pair according to the pharmacophore centers' position and the pharmacophore centers' vectors Hydrogen-bonds and Pi orientation are reported.

Raw result set with these data is represented in Table 4. [00216] For each target molecule identified in the raw result set, additional superimposition data were generated are shown in full fitting result sheet, not shown. A full result sheet allows transmittal of much more information about a precise hit, such as exact description of the pharmacophore target pharmacophore centers including pharmacophore centers' position type and vectors, additional data about the pharmacophores fit superimposition with detailed superimposition score for each pair of pharmacophore centers fitted together, and three-dimensional visualization of the two-pharmacophore superimposition.

[00217] In conclusion, beta-funaltrexamine, which is an irreversible antagonist of the mu opioid receptor was used as source molecule to screen the pharmacophores of this molecule against a database of pharmacophores of Drug Bank compounds. The hits of highest fitting score are presented in Table 4. The best fit corresponds to the Oxymorphone molecule, a powerful semi-synthetic opioid analgesic, and is clearly a good fit, with 6 pharmacophore centers and a score of 0.4187. Regarding these pharmacophoric similarity results, the method predicted the common physiological activity of both molecules.

Table 4: MultiDIP Raw Results Set for beta-Funaltrexamine

Fit

Target Hit Pharmacophore Rmsd C Rmsd V Score

Centers

Oxymorphone 6 0.0545 0.2829 0.4187

Dihydrocodeine 8 0.0103 0.4688 0.4109

Dihydromorphine 8 0.0553 0.1788 0.4009

Apomorphine 8 0.1488 0.3466 0.3955

Codeine 8 0.0000 0.2064 0.3921

Ethylmorphine 8 0.0680 0.3939 0.3873

Morphine 7 0.0057 0.0499 0.3834

Hydromorphinol 8 0.0324 0.1805 0.3677

Pentazocine 9 0.0057 0.4008 0.3467 Fit

Target Hit Pharmacophore Rmsd C Rmsd V Score

Centers

Desomorphine 7 0.0121 0.0912 0.3436

Benzylmorphine 8 0.0000 0.4041 0.3432

Dextromethorphan 7 0.0194 0.5868 0.3360

Dihydroetorphine 8 0.0058 0.0874 0.3321

Codeine-N-oxide 9 0.0200 0.4934 0.3208

Hydromorphone 5 0.0801 0.3772 0.3181

Heroin 7 0.2166 0.8770 0.3168

Bupropion 4 0.0985 0.9390 0.3163

2-Chlorophenol 4 0.0624 0.4596 0.3112

Etorphine 4 0.0106 0.3978 0.3081

Proparacaine 4 0.0702 0.2128 0.3000

Buprenorphine !O 0.0186 0.1494 0.2988

Galantamine 5 0.2434 0.4873 0.2968

Diprenorphine 4 0.0452 0.6804 0.2934

Lcvorphanol 7 0.1073 0.1662 0.2927

Drotebanol 7 0.2462 0.6293 0.2926

; Nalbuphine 8 0.0386 0.1799 0.2906

Capsaicin 6 0.0458 0.9922 0.2829

2-Allylphenol 4 0.0162 0.4039 0.2820

Cyclobenzaprine 8 0.0000 0.2490 0.2671

Cyprenorphine 5 0.0593 0.3044 0.2643

Reboxetine 4 0.0000 0.3204 0.2642

Venlafaxine 5 0.1061 0.4234 0.2631 Fit

Target Hit Pharmacophore Rmsd C Rmsd V Score

Centers

Gallamine

4 0.1788 0.5780 0.2622 Triethiodide

Oxycodone 6 0.0377 0.2446 0.2614

Albuterol 4 0.1568 0.5933 0.2566

Clomipramine 5 0.0443 0.7122 0.2559

N-Allyl-Aniline 3 0.0000 0.5447 0.2517

Phenindamine 6 0.0000 0.9277 0.2513

Meperidine 5 0.1029 0.7482 0.2504

Example 4. Method to Calculate the Pharmacophore Superimposition Score.

[00218] If any three-dimensional pharmacophore of the target database is found to fit with a three-dimensional pharmacophore of the source, a superimposition score is calculated, comprised between 0 (no superimposition) and 1 (perfect match for two identical pharmacophores.

[00219] Superimposition score of two pharmacophores is composed of two components, the superimposition score of all paired pharmacophoric points described by the function ^φ , and the unpaired pharmacophoric points components described by the function ^ψ . Equation 1 describes the whole superimposition scoring function considering the ^φ and ^ψ sub-functions, where N is the total number of paired pharmacophoric points in the two pharmacophore superimposition,

S the total number of unpaired points, ^φι ... ^φ» ... ^{φ !} are the paired

pharmacophoric points components and ... ^ψ* ... are the unpaired penalties.

(Equation 1 )

[00220] For each couple of fitted points, the superimposition sub-function φύι relative to the pharmacophoric point i in the first pharmacophore and the pharmacophoric point j in the second pharmacophore is defined according to Equation 2 where ^ru is the distance in Angstroms separating the origins of the two pharmacophoric points and ^Mu the specific superimposition score depending to the nature of both pharmacophoric features, also called M-score.

C. M. ,

^ ⁼ -Γ Τ- (Equation 2)

1 + r, The M-score is firstly weighted by the constant < depending to the nature of both paired pharmacophoric points, as described on Supplementary Table. Thus, this weight constant penalizes points which are not compatible, such as a polar pharmacophore center fitted with a hydrophobic pharmacophore center. In such cases, the superimposition score will be set to zero. M-score is then calculated specifically according to the properties of fitted pharmacophoric points. For example, if both points are Hydrogen bond donor or acceptors, or aromatic features, the M-score will consider vectors alignments and vectors compatibility as described in Equation 3 where V is the total number of vectors (paired and unpaired) and 0_{Vt K;} is the angle in radian between all aligned vectors V_k,i and Vy (where is a vector from the first pharmacophoric point and V a vector from the second pharmacophoric point). In the case where two vectors are fitted together but not compatible (e.g. a hydrogen bond donor vector fitted with a hydrogen bond acceptor), the value of 9_{Vk Vi} will automatically set to π, as a penalty.

^Mi (Equation 3)

[00221] When both aligned pharmacophoric points are hydrophobic features described by two ellipsoids, the M-score is calculated considering the overlapping volume between the two ellipsoids as described in Equation 4 where ^V^ [vi] is the volume of the first ellipsoid, ^vi [vj] is the volume of the second ellipsoid and

Π2 is the volume of the intersection of both ellipsoids.

V²

M, , =— (Equation 4)

'^■J v ₂

In the case where the fitted pharmacophores are charged pharmacophore centers, the M-score is depending on and ^2 the charge values associated to both pharmacophoric points and the distance r separating the origins of both points, as described in Equation 5. (Equation 5)

[00222] In the global superimposition score described in Equation 1 , the sub- function ^ψχ is relative to the unpaired pharmacophoric points that can either be additional pharmacophore features or unmatched features. For each additional point from a pharmacophore, ^ψ " describes the penalty associated to this feature according to the distance between the considered point and all pharmacophoric points of the other pharmacophore. The function is described in Equation 6 where the distance r in Angstrom from the origin of the first pharmacophore center and each pharmacophore center of the other pharmacophore is calculated and N is the total number of pharmacophoric points of the other pharmacophore.

Ws (Equation 6)

Aromatic Hydrogen Bond Hydrophobic Polar Radical Halogen Bond

Aromatic 1 0 0 0 0.5 0 Hydrogen

0.5 0.5 0.5 Bond

Hydropho

bic

Polar 0 0.5 0.5 0.5

Radical 0.5 0.5 0.5 1 0.5

Halogen 0.5 0.5 0.5

Bond

Supplementary Table. Weight constant applied to the M-Score according to the nature of both pharmacophore centers. In some possible embodiments, typical cutoffs are in the range from about 0.5 to about 1. In other possible embodiments, cut- offs are in the range of about 0.77 to about 1. In yet other embodiment, cut—offs are in the range of about 0.9 to about 1.

Example 5. Evaluation of MultiDIP Toxicity Screening in silico predictivity compared to two known toxicity programs DEREK and MCASE/MC4PC.

[00223] A benchmark compound data set was used previously by Snyder to evaluate the toxicity of marketed drugs and assess the computational predictivity of genotoxicity with two major toxicity prediction programs. An update on the genotoxicity and carcinogenicity of marketed pharmaceuticals with reference to in silico predictivity. Environ, Mol. Mutagen (2009) vol. 50 (6) pp. 435-50, which is incorporated herein by reference. The data set contained genotoxicity and carcinogenicity data on 545 marketed drugs, with Ames mutagenicity, in vivo and in vitro cytogenetics, mouse lymphoma assay, rat/mouse carcinogenicity and in silico results from DEREK and MCASE software products. Snyder used data from the Physicians' Desk Reference, Gold's carcinogenicity potency database and other literature sources. Snyder reported the percentage of positive mutagenic or genotoxic drugs per assay type was 7.1% for bacterial mutagenesis (Ames test), 26.1% for in vitro chromosome aberrations, 19.1% for mouse lymphoma assays and 1 1.1% for in vivo cytogenetics, rodent carcinogenicity (rat and mouse) were also assessed on this previous publication with 47.1% of positive drugs (positive carcinogenicity required only a single positive species, this percentage includes positive drugs with equivocal + positive result for a couple of both species). In silico results were collected by Snyder using the two most frequently used prediction programs in toxicology: DEREK (versions 5, 9 or most recently Derek for windows version 10.0; product of Lhasa Ltd, Leeds, England) and MCASE (from MultiCASE Inc., Beachwood, USA; version 3.46 or MC4PC Version 2.0).

[00224] A dataset was compiled of source molecules using the same compounds evaluated by Snyder. The compounds were downloaded from PubChem or other public databases in 2D or 3D .sdf file format, using their CAS number listed in the same publication. Snyder's published results were used to compare the performance of the presently disclosed prediction model against two currently marketed systems DEREK and MCASE, using the same test set and the same statistical assessment. [00225] All the source compounds of the toxicity test set were implemented in a batch screening process. First, the chemical structures of each of the compounds were transformed from 2D structures to 3D structures, including the generation of various conformers (rotation angle of the rotamers is fixed to x°). All the conformer structures were transformed into multidimensional pharmacophores, and screened against a multipharmacophore toxicity database. The mean number of

pharmacophores per candidate was 10 with a range of 1 to 128.

[00226] Results from the herein disclosed in silico MultiDIP Tox screening method were compared to actual data from known toxicology databases in five assays for various series of compounds, as shown in Figures 23A-23F, which show ROC analysis to provide predictivity of the in silico MultiDIP Tox screening method compared to the actual data from known toxicology databases in five assays. In silico analyses were compared for a library of about 545 compounds from Snyder 2009. The predictivity for two commercially available methods are also shown, including DEREK (versions 5, 9 or most recently Derek for windows version 10.0; product of Lhasa Ltd, Leeds, England) and/or MCASE (version 3.46 or MC4PC Version 2.0) from Snyder 2009. As illustrated in Figures 23A-23E, the symbols "■" represents results from the commercial software DEREK, "♦" represents results from the commercial software MCASE, "A" represents results from the herein disclosed MultiDIP Tox screen method.

[00227] Ames test predictivity for 501 compounds with a sensitivity of 0.949, specificity of 0.885, and a concordance of 0.890 for the MultiDIP Tox screen method is shown in Figure 23A.

[00228] In vivo cytogenetics predictivity for 435 compounds with a sensitivity of 0.976, specificity of 0.883 and concordance of 0.892 for the MultiDIP Tox screen method is shown in Figure 23B.

[00229] In vitro cytogenetics predictivity for 367 compounds with a sensitivity of 0.865, specificity of 0.888 and concordance of 0.883 for the MultiDIP Tox screen method, is shown in Figure 23 C. [00230] Mouse lymphoma assay predictivity for 157 compounds with a sensitivity of 0.813, specificity of 0.872 and concordance of 0.860 for the MultiDIP Tox screen method, is shown in Figure 23D.

[00231] Rodent carcinogenicity predictivity for 248 compounds with a sensitivity of 0.904, specificity of 0.909 and concordance of 0.907 for the MultiDIP Tox screen method, is shown in Figure 23F.

[00232] In each of the five assays, the predictivity of the MultiDIP Tox screening method, in terms of sensitivity, specificity and concordance, was improved compared to that of either of the two commercially available methods DEREK or MCASE/MC4PC.

References Cited

1. Huynh, L. et al. Drug Discov. Today 14, 401-405 (2009).

2. Pauwels, M. et al. Toxicol. App. Pharmacol. 243, 260-274 (2010).

3. Cook, J.P. et al. Pharm. Econom. 26, 551-556 (2008).

4. Tralau-Stewart, C.J. et al. Drug Discov. Today 14, 95-101 (2009).

5. Jacob, A. et al. Drug Discov. Today 14, 406-412 (2009).

6. Burk, D. J. Comput. -Mediated Com. 12, 600-617 (2007).

7. Rontani D. et al. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 80, 1-5 (2009).

8. Guerin G.-A. et al. Drug Discov. Today 11, 991-998 (2006).

9. Snyder. An update on the genotoxicity and carcinogenicity of marketed pharmaceuticals with reference to in silico predictivity. Environ. Mol. Mutagen. (2009) vol. 50 (6) pp. 435-50.

Claims

The claims are:

1. An apparatus for encoding data representative of the chemical structure of a molecule, the apapratus comprising:

memory storing chemical structures representative of a plurality of

compounds, at least one compound comprising a plurality of features; an input device; and

a programmable circuit in electrical communication with the memory and the input device, the programmable circuit programmed to select a chemical structure upon receiving input from the input device;

selectively identify a subset of the data, the subset of data comprising data representing a subset of features from the compound, selectively screen data not included in the subset of data, and storing the subset of data in a data file in the memory.

2. An apparatus for encoding data representative of a pharmacophore from a molecular compound, the apparatus comprising:

memory storing chemical structures representative of a plurality of

compounds, at least one compound comprising pharmacophore features and non-pharmacophore features;

an input device; and

a programmable circuit in electrical communication with the memory and the input device, the programmable circuit programmed to select a chemical structure, the selected chemical structure having

pharmacophore features and non-pharmacophore features, identify a set of pharmacophore features, screen the features not in the identified set of pharmacophore features, and encode the data representative of the pharmacophore features.

3. A method of encoding data representative of the chemical structure of a molecule, the method comprising:

providing electronic data representing a chemical structure of a compound, the compound having a plurality of features; identifying a subset of the data, the subset of data comprising data representing a subset of features from the compound; screening data not included in the subset of data;

storing the subset of data in a data file.

4. A method of encoding data representative of a pharmacophore from a molecular compound, the method comprising:

providing data representative of a chemical structure of a molecule, the chemical structure including pharmacophore features and non- pharmacophore features;

identifying a set of pharmacophore features;

screening the features not in the identified set of pharmacophore features; and

encoding the data representative of the pharmacophore features.

5. The method of claim 4 wherein the set of pharmacophore features includes all of the pharmacophore features in the in the molecule.

6. The method of claim 4 wherein screening the features not in the identified set of pharmacophore features comprises deleting the features not in the identified set of pharmacophore features.

7. The method of claim 4 wherein screening the features not in the identified set of pharmacophore features comprises masking the features not in the identified set of pharmacophore features.

8. The method of claim 4 wherein encoding the data representative of the pharmacophore features comprises adding decoy data to the subset of data.

9. The method of claim 8 wherein encoding the data representative of the pharmacophore features comprises encrypting the data representative of the pharmacophore features.

10. The method of claim 9 wherein encoding the data representative of the pharmacophore features comprises encrypting a data file containing the data representative of the pharmacophore features.

11. The method of claim 4 wherein providing data representative of a chemical structure of a molecule comprises modeling the chemical structure of the molecule using spherical coordinates.

12. The method of claim 4 further comprising storing the encoded data

representative of the pharmacophore features in a data file, and transferring the data file to another party.

13. A computer-based method for automatically generating a decoyed

pharmacophore molecule from a compound, the method comprising:

inputting the structure of a compound of interest;

modeling the compound to obtain a three-dimensional chemical structure of the compound in a specific conformation;

computing a pharmacophore molecule from the specific conformation, the pharmacophore molecule containing one or more real pharmacophore features defined in spherical coordinates;

erasing the structure of the compound;

generating one or more decoy pharmacophore features defined in spherical coordinates; and

adding the decoy pharmacophore features to the real pharmacophore features to create a decoyed pharmacophore molecule.

14. The method of claim 13 wherein computing and generating each further comprise:

converting the spherical coordinates to Cartesian coordinates.

15. The method of claim 13, wherein modeling comprises:

modeling the compound to obtain a low-energy three-dimensional chemical structure of the compound in a specific conformation.

16. The method of claim 13, wherein inputting comprises input of any format describing the structure of the compound selected from a two-dimensional format, three-dimensional format, SMILES, or PDB.

17. The method of claim 13, further comprising:

selecting from zero to 256 each of the number of additional conformers; stereoisomers; tautomers; and mesomers of the compound of interest; and

computing a pharmacophore molecule for each additional conformer;

stereoisomer; tautomer; or mesomer of the compound of interest, wherein each pharmacophore molecule contains one or more real pharmacophore features defined in spherical coordinates.

18. The method of claim 13, wherein generating further comprises:

randomly generating the type of decoy feature; and

modeling the real pharmacophore features to automatically select a location for the decoy pharmacophore feature in spherical coordinates.

19. The method of claim 13 wherein generating further comprises

determining if the location of each decoy pharmacophore feature overlaps with any real pharmacophore feature; and

discarding any decoy features which overlap with any real pharmacophore feature.

20. The method of claim 13 further comprising:

generating a list of unique identification numbers for each of the decoy

pharmacophore features; and

encrypting the list of unique identification numbers for the decoy features.

21. The method of claim 13, wherein modeling further comprises standardizing the three-dimensional chemical structure of the compound by:

adding explicit Hydrogen atoms to the three-dimensional structure; and refining the three-dimensional structure of the compound of interest to provide in a refined specific conformation.

22. The method of claim 13, wherein the compound of interest is selected from a source molecule or a target molecule library.

23. The method of claim 13, further comprising:

removing one or more selected real pharmacophore features from the

pharmacophore molecule.

24. The method of claim 13, further comprising:

describing each pharmacophore feature in a point data string, wherein each point data string comprises: (1) a unique point identification number, (2) a character identifying the type of data string, and (3) a set of spherical coordinates for the pharmacophore center of the pharmacophore feature.

25. The method of claim 24, wherein the character identifying the type of data string are selected from the group consisting of B, P, H, Y, F, and R.

26. The method of claim 24, wherein for, the method further comprises:

describing further each pharmacophore feature available for a potential non- covalent molecular bond by use of one or more vector data strings, wherein each vector data string comprises: (1) a unique vector identification number, (2) the unique point identification number, (3) a set of spherical coordinates for the base of the vector; and (4) a set of spherical coordinates for the end of the vector.

27. A method for allowing secure transfer of chemical structural data for a proprietary compound of interest from a user to a third-party for in silico analysis without disclosing the proprietary compound structure, comprising: transmitting the decoyed pharmacophore molecule of claim 8 from the user to the third-party on any support selected from electronic, DVD, or CD;

supplying the third-party with a key for de-encrypting the list of unique

identification numbers for the decoy features; and

removing the decoy features of the decoyed pharmacophore prior to

pharmacophore analysis by the third-party.

28. A method for predicting toxicity for a source molecule, wherein the method comprises

comparing: (a) one or more source pharmacophore molecule(s) derived from the source molecule, to (b) a library of multiple target pharmacophore molecules derived from known compounds with known toxicological data in one or more biological assays, and

wherein geometric similarity between the one or more source pharmacophore molecule(s) and a target pharmacophore molecule for a known compound is predictive of similarity in toxicological profile of the source molecule compared to the toxicological profile of the known compound.

29. The method according to claim 28 wherein the geometric similarity is determined by the root-mean square deviation (RMSD) between one or more pharmacophoric features from the source pharmacophore molecule superimposed on the target pharmacophore molecule is not greater than about 1.0 angstrom.

30. The method of claim 29, wherein the geometric similarity is determined by the root-mean square deviation (RMSD) between one or more pharmacophoric features from the pharmacophore molecule superimposed on the target pharmacophore molecule is not greater than about 0.5 angstroms.

31. The method of claim 30, wherein the geometric similarity between the source pharmacophore molecule and the target pharmacophore molecule is determined by calculating a superimposition score, the superimposition score calculated by a method comprising:

superimposing all paired pharmacophoric points; and

comparing unpaired pharmacophoric points;

wherein the similarity is determined by obtaining a superimposition score of greater than, or equal to, about 0.4.

32. The method of claim 31, wherein the geometric similarity between the source pharmacophore molecule and the target pharmacophore molecule is determined by obtaining a superimposition score of greater than, or equal to, about 0.5.

33. A method for predicting physiological activity for a source molecule, wherein the method comprises:

comparing: (a) one or more source pharmacophore molecule(s) derived from the source molecule, to (b) a library of multiple target pharmacophore molecules derived from known compounds with known data in one or more biological assays, and

wherein geometric similarity between the one or more source pharmacophore molecule(s) and a target pharmacophore molecule for a known compound is predictive of similarity in physiological activity of the source molecule compared to the physiological activity of the known compound.