US20030114990A1 - Prediction of molecular polar surface area and bioabsorption - Google Patents

Prediction of molecular polar surface area and bioabsorption Download PDF

Info

Publication number
US20030114990A1
US20030114990A1 US10/319,294 US31929402A US2003114990A1 US 20030114990 A1 US20030114990 A1 US 20030114990A1 US 31929402 A US31929402 A US 31929402A US 2003114990 A1 US2003114990 A1 US 2003114990A1
Authority
US
United States
Prior art keywords
bonded
molecules
psa
molecule
logp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/319,294
Inventor
William Egan
Giorgio Lauri
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dassault Systemes Biovia Corp
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US10/319,294 priority Critical patent/US20030114990A1/en
Assigned to PHARMACOPEIA, INC. reassignment PHARMACOPEIA, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: EGAN, WILLIAM J., LAURI, GIORGIO
Publication of US20030114990A1 publication Critical patent/US20030114990A1/en
Assigned to ACCELRYS SOFTWARE INC. reassignment ACCELRYS SOFTWARE INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: ACCELRYS SOFTWARE SOLUTIONS INC.
Assigned to ACCELRYS INC. reassignment ACCELRYS INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: PHARMACOPEIA, INC.
Assigned to ACCELRYS SOFTWARE SOLUTIONS INC. reassignment ACCELRYS SOFTWARE SOLUTIONS INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: ACCELRYS INC.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/30Prediction of properties of chemical compounds, compositions or mixtures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99943Generating database or data structure, e.g. via user interface

Definitions

  • the invention relates to computational methods of pharmaceutical discovery. Specifically, the invention relates to the prediction of membrane permeability and physiological absorption of molecules.
  • BBB blood-brain barrier
  • %HIA % human intestinal absorption
  • logBB logarithm of the steady state ratio of the compound's concentration in the central nervous system and the blood
  • molecular PSA has been suggested as a parameter which can be used to distinguish between well absorbed and poorly absorbed compounds, with 140 square angstroms being proposed as a cutoff value.
  • Clark Rapid Calculation of Polar Molecular Surface Area and its Application to the Prediction of Transport Phenomena 1. Prediction of Intestinal Absorption, Journal of Pharmaceutical Sciences , Vol. 88, No. 8, p. 807 (1999).
  • PSA has also been used as a variable in linear formulas for predicting membrane permeability and logBB. In some cases, logP is used along with PSA in such linear formulas. Clark, Rapid Calculation of Polar Molecular Surface Area and its Application to the Prediction of Transport Phenomena 2.
  • the invention comprises a method of estimating the polar surface area of a molecule comprising making separate counts of the number of atoms or atom clusters in the molecule which fall within the definition of each of a plurality of atomic classes, and estimating a polar surface area of the molecule using at least some of the separate counts.
  • Computed polar surface areas are useful in computer implemented compound analysis methods, and the invention also comprises a method of predicting the propensity of a molecule for membrane permeability comprising computing a polar surface area for the molecule without reference to an energy minimized three dimensional structure of the molecule and using the computed polar surface area to predict the propensity.
  • such a method comprises selecting a subset of molecules from a database, wherein the selecting comprises numerically estimating the PSA and logP of each molecule in the database, determining the position of each compound in the database in a PSA-logP plane, and categorizing compounds based on their position on the PSA-logP plane.
  • FIG. 1 is a flowchart of a method of calculating polar surface area in accordance with one embodiment of the invention.
  • FIG. 2 is a plot of estimated versus calculated polar surface areas for 440 compounds from the Physician's Desk Reference.
  • FIG. 3 is a scatter plot of compounds in a PSA-logP plane including a statistically defined boundary between readily absorbed compounds and poorly absorbed compounds.
  • FIG. 4 is another scatter plot of compounds in a PSA-logP plane showing the boundary of FIG. 3 and regions of chemical space having predicted high logBB and low logBB.
  • FIG. 5 is a block diagram of a drug discovery system in one embodiment of the invention.
  • the invention is useful in the computer implemented drug candidate evaluation process.
  • drug candidates from one or more databases of compounds are preliminarily evaluated using software code running on general purpose computers for desirable characteristics or properties.
  • the general purpose computers used for this purpose can take a wide variety of forms, including network servers, workstations, personal computers, mainframe computers and the like.
  • the code which configures the computer to perform these compound evaluations is typically provided to the user on a computer readable medium such as a CD-ROM.
  • the code may also be downloaded by a user from a network server which is part of a local or wide area network such as the Internet.
  • the general purpose computer running the software will typically include one or more input devices such as a mouse and/or keyboard, a display, and computer readable memory media such as random access memory integrated circuits and a hard disk drive. It will be appreciated that one or more portions, or all of the code may be remote from the user and, for example, resident on a network resource such as a LAN server, Internet server, network storage device, etc.
  • the software receives as an input a variety of information concerning candidate drug compounds or compositions, and from this information, derives, estimates, or predicts the expected biological or chemical characteristics of the candidate drug compounds.
  • the databases may include 10,000, 100,000, or possibly more candidate compounds.
  • the information input to the software program comprises structural information about each of the candidate compounds. This information is usually limited to the atomic constituents and the bonds between them, essentially the information found in a 2-dimensional molecular bonding diagram. From this information, predictions of biological and/or chemical activity are advantageously made.
  • the molecular polar surface area is often used as part of this prediction.
  • the atomic constituents and interatomic bond configurations are used to generate energy minimized 3-dimensional molecular structures using well known molecular modeling tools. These calculations may be performed in vacuum or in solution.
  • the PSA is defined as the portion of van der Waals surface area of all oxygen atoms, nitrogen atoms, and their attached hydrogen atoms which does not lie within the van der Waals surface area of any other atom, where the van der Waals surface is defined by the space filling hard sphere having the van der Waals radius for the atom type.
  • the total amount of “exposed” oxygen, nitrogen, and associated hydrogen of the molecule defines the PSA.
  • PSA may be calculated by an atom and atom cluster classification method that speeds the calculation by a factor of 300 to 1000, making it possible to compute PSA for each molecule of a 100,000 molecule database in less than an hour. This process is illustrated in FIG. 1.
  • each atomic class may comprise a single atom or a cluster of atoms.
  • Each class is also typically defined by bond configurations within the cluster and/or to other atoms.
  • an atomic class may comprise “a double bonded oxygen atom,” “a single bonded NH 2 group,” and the like.
  • the classes involve nitrogen atoms, oxygen atoms, and attached hydrogen atoms, as these molecular constituents are the ones that contribute to the molecule's PSA.
  • the number of atoms or atom clusters of a molecule being analyzed that fall within each defined class are separately counted, thus producing a list of integers corresponding to the number of atomic constituents within each defined class.
  • the separate counts are used to estimate the polar surface area of the molecule being evaluated.
  • the counts are used in a simple arithmetic calculation which produces an estimated PSA value. Therefore, rather than calculating PSA by computing the van der Waals surface areas of relevant atoms of an energy minimized three dimensional molecular structure, a simple computation is used which takes far less computation time.
  • N i is the number of atoms or atom clusters in the molecule falling within class i
  • C i is a coefficient associated with class i.
  • the coefficients represent the contribution of each atom type to the total PSA of a molecule.
  • the coefficients are all non-negative because the classes include all polar atoms and atom clusters that tend to add to molecular PSA.
  • Coefficients for the above formula may be computed by performing a non-negative least squares linear regression using a plurality of training molecules. This technique is well known in a few other applications, with the prediction of boiling point for organic molecules being one example.
  • the procedure for generating the coefficients is to select a set of training molecules, and compute all of their PSA values the conventional way, by calculating energy minimized three dimensional structures and computing the van der Waals surface areas of nitrogens, oxygens, and their attached hydrogens. Then, counts of the atoms or atom clusters of each class are made for each molecule. The coefficients are then computed such that the sum of the squares of the PSA errors computed with the above formula is minimized across all of the molecules of the training set with the constraint that the each of the regression coefficients be greater than zero.
  • FIG. 2 is a scatter plot of estimated PSA using formula (1) above and the coefficients of Table 1 vs. PSA calculated with three dimensional structures for the validation dataset. Agreement between arithmetically estimated and structurally calculated PSA is very good, with R 2 >0.991, and a root-mean-square PSA error when using the fast calculation method of formula (1) of only 5.9 square angstroms.
  • the speed of calculation is anywhere from twenty to over a hundred molecules per second when using arithmetic formula (1) and the atomic classifications and coefficients of Table 1. This is a dramatic improvement over the 10-15 second per molecule for conventional PSA calculation methods using energy minimized three dimensional structures.
  • the above described method of PSA calculation can be advantageously applied to the evaluation of candidate drugs in large compound databases for their propensity for intestinal absorption.
  • three dimensional structures of the compounds need not be generated to compute PSA, thereby speeding the process of evaluating candidate drugs dramatically.
  • one proposed method referred to above for selecting compounds likely to be well absorbed biologically is to compare the PSA to the threshold value of 140 square angstroms. According to this model, compounds above this threshold are likely to be poorly absorbed.
  • a linear equation for PSA calculation such as formula (1) above, a 100,000 compound library could be screened and all compounds likely to be poorly absorbed could be separated from the remainder in less than an hour of computation time with workstations currently in widespread use for these applications.
  • PSA and logP are relevant descriptors of intestinal absorption and logBB. This is because passive diffusion into the intestine or through the blood-brain barrier requires diffusion of the molecule through cellular membranes comprising a lipid bilayer with both hydrophilic and lipophilic regions. Accordingly, highly hydrophilic molecules and molecules which readily form hydrogen bonds do not easily enter the membrane. Furthermore, highly lipophilic molecules do not readily leave the membrane once reaching the lipophilic interior. PSA, being a measure of hydrophilicity, and logP, which includes both hydrophilic and lipophilic contributions, have thus been found to be parameters from which membrane permeability information may be derived.
  • the model assumes that a molecule's propensity for intestinal absorption via passive diffusion is a function of the PSA and logP of the molecule, especially the interactions between PSA and logP. Every molecule may be thus assigned a location in a two dimensional PSA-logP plane. Rather than quantitatively predicting intestinal absorption as a function of position in the PSA-logP plane, a bounded region of the PSA-logP plane is defined. If the PSA and logP of a molecule are within the bounded region, the molecule is considered likely to be readily intestinally absorbed.
  • PSA and logP of a molecule are outside the bounded region, the molecule is considered unlikely to be readily intestinally absorbed.
  • a statistical analysis of compounds which are known to be readily absorbed was performed.
  • the PSA and logP for 182 compounds known to be readily absorbed and not actively transported across cellular membranes were calculated.
  • the distribution of (PSA, logP) coordinates for these molecules was analyzed statistically assuming a multivariate normal distribution.
  • FIG. 3 illustrates the distribution in the PSA-logP plane of known readily absorbed compounds which are illustrated with “+” symbols in this Figure.
  • the mean PSA of these molecules is 64.5867, and the mean logP is 2.3226.
  • the model predicts that new molecules being analyzed which have a PSA and logP “close” to these mean values will also be readily absorbed, and that new molecules being analyzed which have a PSA and logP “far” from these mean values will not be readily absorbed.
  • the model utilizes Hotelling's T 2 distance as a reference to measure distance from the mean values.
  • the boundary 30 illustrated in FIG. 3 is the 95% confidence region for the Hotelling's T 2 distance calculated using standard statistical analysis of the data points provided by the known readily absorbed compounds.
  • the coefficients required to compute the T 2 distance for compounds are determined as follows: 1) compute the average PSA and logP for a selected set of well absorbed compounds. 2) mean center the individual values of PSA and logP for each of these compounds (subtract the average values), 3) decompose this mean-centered dataset with any algorithm which provides numerically stable eigenvalues and eigenvectors (the singular value decomposition (SVD) may be used, giving standard U, S, and V output matrices).
  • SSD singular value decomposition
  • T 2 distance also known as the squared Mahalanobis distance
  • SVD results or eigenvalues/eigenvectors from some similar method
  • the confidence region/probability of a compound's similarity to well absorbed compounds may be computed using the F-distribution relationship to T 2 , per Rencher.
  • T 2 distance from the mean is computed as follows:
  • T 2 (181)*diagonal(( A*B )*( A*B ) T ) (2)
  • a and B are the matrices:
  • T 2 distance calculated in this manner is greater than 6.126, the compound falls outside the 95% confidence region, the boundary 30 of FIG. 3 in the PSA-logP plane, and is predicted by the model to be poorly absorbed.
  • the model may be tested by plotting (PSA, logP) datapoints for additional molecules known to be poorly absorbed. These are shown as open circles on FIG. 3, and it is immediately apparent that almost all of them fall outside the boundary 30 . Further verification of model validity is available by plotting molecules which show high membrane permeability or low membrane permeability in caco-2 cell assays, a commonly used in vitro assay for intestinal permeability. In FIG. 3, molecules which show high caco-2 permeability are plotted with “x” symbols, and molecules which show low caco-2 permeability are plotted with open triangles. It can be seen that the high permeability molecules fall predominantly within the boundary 30 , and low caco-2 permeability molecules fall predominantly outside the boundary 30 . Comparison of the T 2 distances and their associated probabilities with in vitro data reveals that the permeability through caco-2 cell membranes drops sharply as the compounds move outside the 95% probability region.
  • One important aspect of this method is that the multivariate normal distribution permits estimation of the probability that a compound is similar to specified reference compounds which are well absorbed. As a molecule moves farther away from the centroid of the PSA-logP space, its probability of being similar to well absorbed molecules decreases.
  • the model as shown in FIG. 3 uses “either/or” logic, characterizing a molecule as being well or poorly absorbed by comparing its T 2 distance to a threshold. However, it may also be noted that the base calculation of T 2 distance, which is computed first, is one measure of the probability of a molecule's similarity to well absorbed compounds.
  • the model may not only be used to categorize molecules as simply well or poorly absorbed, but may be used to quantify the probability that a compound will have absorption characteristics similar to the set of known readily absorbed compounds.
  • compounds in a database can be ordered according to their likelihood of having similar absorption properties to well absorbed compounds. This ordering can in turn be used to prioritize compounds for synthesis and screening.
  • An improved numerical formula for calculating a predicted logBB may also be developed based on the statistical model set forth above. It has been shown that logBB is modeled with reasonable accuracy with a linear formula having PSA and logP as variables. Clark, supra. However, an unbounded linear model like the one proposed by Clark can be improved by restricting its application to compounds that fall within the boundary 30 of FIG. 3. Penetration of the blood-brain-barrier is known to be more difficult than intestinal absorption, and the rapid drop in caco-2 permeability seen near the boundary 30 of FIG. 3 suggests that this is a likely point at which linearity is significantly lost.
  • the above described model has a variety of advantages over conventional absorption prediction techniques, especially when applied to the evaluation of large libraries of compounds.
  • the model may be used advantageously no matter how PSA and logP are derived for candidate compounds.
  • both PSA and logP can be calculated using linear sums of counts of atoms and atom clusters within each molecule that fall within defined atom classifications, resulting in far lower computation times.
  • qualitative predictive accuracy is also improved.
  • high quality and extremely fast chemical database screening may be performed with the principles of the invention.
  • FIG. 5 illustrates a computer implemented molecule screening system in accordance with one embodiment of the invention.
  • information about the atomic constituents and associated bonds is retrieved from a compound database 50 .
  • This information is routed to a PSA estimation module 52 and a logP estimation module 54 .
  • PSA is calculated using a linear equation as set forth in formula (1), and uses the coefficients from Table I.
  • Log P may be estimated using a variety of known methods, including the ALOGP or CLOGP methods.
  • the estimated PSA and logP values are then forwarded to a molecule selection/categorization module 56 . This module 56 categorizes compounds of interest in accordance with their probability similarity to well absorbed molecules.
  • the module 56 may separate candidate molecules as being within or outside the boundary 30 of FIG. 3 using formula (2) above, and may also order compounds according to their probability of showing similarities to well absorbed compounds based on their distance from the centroid of well absorbed compounds in the PSA-logP plane.
  • PSA and logP values for some compounds may be routed to a logBB estimation module 60 , which numerically calculates logBB for desired molecules using linear formula 3 above.
  • the invention serves the broad purpose of prioritizing candidate molecules.
  • Candidate molecules existing only in a virtual sense may be prioritized for synthesis, while candidate molecules which have been synthesized may be prioritized for screening. Furthermore, this process may be iterated numerous times. After synthesis and screening, in vitro and/or in vivo absorption data is produced, creating a new data point which may be added to model. New information gained during the drug discovery process may thus be used to improve the model and thus the synthesized compounds.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Hematology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Urology & Nephrology (AREA)
  • Biomedical Technology (AREA)
  • Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Analytical Chemistry (AREA)
  • Biotechnology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biophysics (AREA)
  • Food Science & Technology (AREA)
  • Medicinal Chemistry (AREA)
  • Cell Biology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The polar surface area of molecules is computed without reference to three dimensional molecular structures using a linear equation incorporating counts of nitrogens, oxygens, and related atom clusters. Methods and systems for predicting intestinal absorption of candidate compounds use the polar surface area and the octanol/water partition coefficient as descriptors.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This application is a divisional of U.S. patent application Ser. No. 09/552,549 filed on Apr. 19, 2000, entitled “Prediction of Molecular Polar Surface Area and Bioabsorption” and claims priority thereto under 35 U.S.C. §120. The content of the Ser. No. 09/552,549 application is hereby incorporated by reference in its entirety.[0001]
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention [0002]
  • The invention relates to computational methods of pharmaceutical discovery. Specifically, the invention relates to the prediction of membrane permeability and physiological absorption of molecules. [0003]
  • 2. Description of the Related Art [0004]
  • In the development of pharmaceutical compounds, it is well known that identifying a compound with a desired biological activity is not itself sufficient to determine that compound's suitability as a drug. Not only must the compound exhibit the necessary biological activity, but it must also be deliverable to the target tissue(s), preferably in a cost effective and convenient manner such as oral administration. This has been a problem with several treatment protocols. For example, a wide variety of peptide molecules have been shown to have useful pharmacological activity, but their generally limited capacity to diffuse through biomembranes such as the human gastrointestinal epithelium has limited their clinical development. Effective oral administration requires that a drug be absorbed through the intestinal membranes to enter systemic circulation and if such absorption is limited, a compound's promise for clinical development is poor. [0005]
  • Not only is intestinal absorption an important concern during drug development, but the ability of a candidate compound to penetrate the blood-brain barrier is also of significant interest. The blood-brain barrier (BBB) is a cellular system that separates the fluids of the central nervous system from the circulatory system. Drugs intended for targets in the central nervous system should be able to penetrate the BBB. On the other hand, drugs intended for other target tissues may cause unwanted side effects if they freely pass into the fluids of the central nervous system. [0006]
  • In vivo animal testing for bioabsorption and blood-brain barrier penetration has long been practiced. In addition, a cell based in vitro assay using human intestinal caco-2 cells is in widespread use to measure the biomembrane permeability of drug candidate compounds. Because both of these protocols are slow, expensive, and labor intensive, computational methods to predict the potential for gastrointestinal absorption and blood-brain barrier penetration based on more easily obtained molecular characteristics have been developed. Also, such computational methods are of great interest for the in silico prediction of absorption and blood-brain barrier penetration for virtual libraries of compounds which have not been synthesized, for the purposes of determining which compounds should be synthesized. In these computational models, a formula for estimating either the % human intestinal absorption (%HIA) or the logarithm of the steady state ratio of the compound's concentration in the central nervous system and the blood (often called logBB) is constructed. The formula typically uses molecular properties and parameters that may be derived from the molecular structure of the compound. Using these formulas, %HIA and logBB may be estimated without the need to perform in vivo experiments. [0007]
  • Many models focus on molecular characteristics related to hydrogen bonding, lipophilicity, and molecular weight to predict propensity for intestinal absorption or blood-brain barrier penetration. A sigmoidal relationship between the polar surface area (PSA) of a molecule and its %HIA has been observed, with high polar surface area correlated to low %HIA. This is shown in Palm, et al. Polar Molecular Surface Properties Predict the Intestinal Absorption of Drugs in Humans, [0008] Pharmaceutical Research, Vol. 14, No. 5, p. 568 (1997). It has also been observed that molecules having either especially high or especially low octanol/water partition coefficients (logP), which is a measure of lipophilicity, are associated with low %HIA. Palm, supra, and Wils, et al., High Lipophilicity Decreases Drug Transport Across Intestinal Epithelial Cells, The Journal of Pharmacology and Experimental Therapeutics, Volume 269, No. 2, p. 654 (1994). The disclosures of both the Palm and Wils articles are hereby incorporated by reference in their entireties.
  • Accordingly, molecular PSA has been suggested as a parameter which can be used to distinguish between well absorbed and poorly absorbed compounds, with 140 square angstroms being proposed as a cutoff value. Clark, Rapid Calculation of Polar Molecular Surface Area and its Application to the Prediction of [0009] Transport Phenomena 1. Prediction of Intestinal Absorption, Journal of Pharmaceutical Sciences, Vol. 88, No. 8, p. 807 (1999). PSA has also been used as a variable in linear formulas for predicting membrane permeability and logBB. In some cases, logP is used along with PSA in such linear formulas. Clark, Rapid Calculation of Polar Molecular Surface Area and its Application to the Prediction of Transport Phenomena 2. Prediction of Blood-Brain Barrier Penetration, Journal of Pharmaceutical Sciences, Vol. 88, No. 8, p. 815 (1999) and Winiwarter, et al. Correlation of Human Jejunal Permeability (in Vivo) of Drugs with Experimentally and Theoretically Derived Parameters. A Multivariate Data Analysis Approach, Journal of Medicinal Chemistry 41, p. 4939 (1998), both of which are hereby incorporated by reference in their entireties.
  • Although these models have improved the speed of the drug candidate evaluation process by reducing reliance on in vivo and in vitro chemical testing, they remain computationally expensive, and in many cases, the strict linear modeling limits their predictive value. PSA calculations have required the calculation of energy minimized three dimensional molecular structures, which requires 10-15 seconds of CPU time on a Sun or SGI-R1000 workstation. The effective application of these techniques to large libraries of candidate compounds requires techniques which reduce the computation time required for each molecule. [0010]
  • SUMMARY OF THE INVENTION
  • In one embodiment, the invention comprises a method of estimating the polar surface area of a molecule comprising making separate counts of the number of atoms or atom clusters in the molecule which fall within the definition of each of a plurality of atomic classes, and estimating a polar surface area of the molecule using at least some of the separate counts. [0011]
  • Computed polar surface areas are useful in computer implemented compound analysis methods, and the invention also comprises a method of predicting the propensity of a molecule for membrane permeability comprising computing a polar surface area for the molecule without reference to an energy minimized three dimensional structure of the molecule and using the computed polar surface area to predict the propensity. [0012]
  • Methods of drug discovery are also provided. In one embodiment, such a method comprises selecting a subset of molecules from a database, wherein the selecting comprises numerically estimating the PSA and logP of each molecule in the database, determining the position of each compound in the database in a PSA-logP plane, and categorizing compounds based on their position on the PSA-logP plane.[0013]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a flowchart of a method of calculating polar surface area in accordance with one embodiment of the invention. [0014]
  • FIG. 2 is a plot of estimated versus calculated polar surface areas for 440 compounds from the Physician's Desk Reference. [0015]
  • FIG. 3 is a scatter plot of compounds in a PSA-logP plane including a statistically defined boundary between readily absorbed compounds and poorly absorbed compounds. [0016]
  • FIG. 4 is another scatter plot of compounds in a PSA-logP plane showing the boundary of FIG. 3 and regions of chemical space having predicted high logBB and low logBB. [0017]
  • FIG. 5 is a block diagram of a drug discovery system in one embodiment of the invention.[0018]
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
  • Embodiments of the invention will now be described with reference to the accompanying Figures, wherein like numerals refer to like elements throughout. The terminology used in the description presented herein is not intended to be interpreted in any limited or restrictive manner, simply because it is being utilized in conjunction with a detailed description of certain specific embodiments of the invention. Furthermore, embodiments of the invention may include several novel features, no single one of which is solely responsible for its desirable attributes or which is essential to practicing the inventions herein described. [0019]
  • In many embodiments, the invention is useful in the computer implemented drug candidate evaluation process. In these processes, drug candidates from one or more databases of compounds are preliminarily evaluated using software code running on general purpose computers for desirable characteristics or properties. The general purpose computers used for this purpose can take a wide variety of forms, including network servers, workstations, personal computers, mainframe computers and the like. The code which configures the computer to perform these compound evaluations is typically provided to the user on a computer readable medium such as a CD-ROM. The code may also be downloaded by a user from a network server which is part of a local or wide area network such as the Internet. [0020]
  • The general purpose computer running the software will typically include one or more input devices such as a mouse and/or keyboard, a display, and computer readable memory media such as random access memory integrated circuits and a hard disk drive. It will be appreciated that one or more portions, or all of the code may be remote from the user and, for example, resident on a network resource such as a LAN server, Internet server, network storage device, etc. In typical embodiments, the software receives as an input a variety of information concerning candidate drug compounds or compositions, and from this information, derives, estimates, or predicts the expected biological or chemical characteristics of the candidate drug compounds. The databases may include 10,000, 100,000, or possibly more candidate compounds. In some advantageous embodiments, the information input to the software program comprises structural information about each of the candidate compounds. This information is usually limited to the atomic constituents and the bonds between them, essentially the information found in a 2-dimensional molecular bonding diagram. From this information, predictions of biological and/or chemical activity are advantageously made. [0021]
  • As discussed above, it is desirable to predict which compounds in a database are likely to be readily intestinally absorbed, or are likely to penetrate the blood-brain barrier. As also described above, the molecular polar surface area is often used as part of this prediction. In conventional methods of polar surface area calculation, the atomic constituents and interatomic bond configurations are used to generate energy minimized 3-dimensional molecular structures using well known molecular modeling tools. These calculations may be performed in vacuum or in solution. After 3-dimensional structures are calculated, the PSA is defined as the portion of van der Waals surface area of all oxygen atoms, nitrogen atoms, and their attached hydrogen atoms which does not lie within the van der Waals surface area of any other atom, where the van der Waals surface is defined by the space filling hard sphere having the van der Waals radius for the atom type. Thus, the total amount of “exposed” oxygen, nitrogen, and associated hydrogen of the molecule defines the PSA. [0022]
  • The conventional PSA calculation is useful in the process of predicting intestinal absorption. However, the generation of energy minimized three dimensional molecular structures is computationally difficult and time consuming. Calculating PSA for every compound in a 100,000 compound combinatorial library would take anywhere from 10-20 days of computer time on widely used workstations. [0023]
  • It is one aspect of the present invention that PSA may be calculated by an atom and atom cluster classification method that speeds the calculation by a factor of 300 to 1000, making it possible to compute PSA for each molecule of a 100,000 molecule database in less than an hour. This process is illustrated in FIG. 1. [0024]
  • Referring now to this Figure, the method begins at [0025] block 20 by defining a plurality of atomic classes. Each atomic class may comprise a single atom or a cluster of atoms. Each class is also typically defined by bond configurations within the cluster and/or to other atoms. For example, an atomic class may comprise “a double bonded oxygen atom,” “a single bonded NH2 group,” and the like. In the present embodiment, the classes involve nitrogen atoms, oxygen atoms, and attached hydrogen atoms, as these molecular constituents are the ones that contribute to the molecule's PSA. At block 22, the number of atoms or atom clusters of a molecule being analyzed that fall within each defined class are separately counted, thus producing a list of integers corresponding to the number of atomic constituents within each defined class. As illustrated by block 24 of FIG. 1, the separate counts are used to estimate the polar surface area of the molecule being evaluated. In one advantageous embodiment described in additional detail below, the counts are used in a simple arithmetic calculation which produces an estimated PSA value. Therefore, rather than calculating PSA by computing the van der Waals surface areas of relevant atoms of an energy minimized three dimensional molecular structure, a simple computation is used which takes far less computation time.
  • In an especially advantageous embodiment, the PSA is expressed as a linear equation with non-negative coefficients as follows: [0026] PSA = i C i N i ( 1 )
    Figure US20030114990A1-20030619-M00001
  • where N[0027] i is the number of atoms or atom clusters in the molecule falling within class i, and Ci is a coefficient associated with class i. In this model, the coefficients represent the contribution of each atom type to the total PSA of a molecule. The coefficients are all non-negative because the classes include all polar atoms and atom clusters that tend to add to molecular PSA. Coefficients for the above formula may be computed by performing a non-negative least squares linear regression using a plurality of training molecules. This technique is well known in a few other applications, with the prediction of boiling point for organic molecules being one example. Hall and Kier, Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information, J Chem. Inf. Comput. Sci. Vol. 35, 1039-1045 (1995), hereby incorporated by reference in its entirety. The procedure for generating the coefficients is to select a set of training molecules, and compute all of their PSA values the conventional way, by calculating energy minimized three dimensional structures and computing the van der Waals surface areas of nitrogens, oxygens, and their attached hydrogens. Then, counts of the atoms or atom clusters of each class are made for each molecule. The coefficients are then computed such that the sum of the squares of the PSA errors computed with the above formula is minimized across all of the molecules of the training set with the constraint that the each of the regression coefficients be greater than zero.
  • These methods have never been applied to the calculation of polar surface area, or any other calculated (as opposed to experimentally measured) quantities, prior to the present invention. Other closely related surface-area quantities, such as the total surface area or the non-polar surface area, can vary considerably with changes in conformation. In contrast, the surprising accuracy of the PSA model set forth above throughout a wide range of PSA values indicates that the polar surface area for solvated conformations is considerably less dependent on the conformation than are these related measures. Only under those circumstances is it possible to identify a set of bonded states of the heteroatoms of the molecule forming a suitable basis for the regression calculation. [0028]
  • Table I below sets forth a list of atomic classifications and associated coefficients in one embodiment of the invention. [0029]
    TABLE I
    Atomic Atom/Atom Linear
    Classification Cluster Coefficient
    sNH3 single bonded NH3 —NH3 32.254
    sNH2 single bonded NH2 —NH2 26.54
    ssNH2 NH2 with two single bonds
    Figure US20030114990A1-20030619-C00001
    0
    dNH double bonded NH ═NH 23.0878
    ssNH NH with two single bonds
    Figure US20030114990A1-20030619-C00002
    12.8102
    aaNH aromatic NH
    Figure US20030114990A1-20030619-C00003
    15.0551
    tN triple bonded N ≡N 22.9351
    sssNH NH with three single bonds
    Figure US20030114990A1-20030619-C00004
    3.204
    dsN N with double and single bond ═N— 11.3233
    aaN aromatic N
    Figure US20030114990A1-20030619-C00005
    11.261
    sssN N with three single bonds
    Figure US20030114990A1-20030619-C00006
    3.3525
    ddsN N with two double bonds and one single bond
    Figure US20030114990A1-20030619-C00007
    8.2215
    aasN aromatic N with single bond
    Figure US20030114990A1-20030619-C00008
    5.3483
    ssssN N with four single bonds
    Figure US20030114990A1-20030619-C00009
    0
    sOH single bonded OH —OH 20.8155
    dO double bonded O ═O 17.3008
    ssO O with two single bonds —O— 8.9301
    aaO aromatic O
    Figure US20030114990A1-20030619-C00010
    12.5543
  • These coefficients were produced by calculating the PSA explicitly using energy minimized three dimensional structures for the 5,386 most drug like molecules contained in the Comprehensive Medicinal Chemistry (CMC) Database using a single extended conformer, not multiple conformers. The molecules were separated into training, test, and validation datasets. The validation dataset consisted of the 440 molecules which are listed in the Physician's Desk Reference as currently available pharmaceuticals in tablet, capsule, or oral suspension form. Compounds listed in the validation dataset were removed from the CMC dataset prior to the creation of the training and test datasets, so as to not bias the regression. The training and test datasets were created by sorting the remaining molecules from the CMC in the order of their explicitly calculated PSA value, and assigning odd molecules to the training dataset and even molecules to the test dataset. This created two datasets spanning equal PSA range. Counts of each of the 18 atomic classifications set forth above in Table 1 were made for each molecule. The counts for the molecules in the training dataset were used as independent variables in the linear equation of the form set forth above in formula (1). Coefficients were derived using conventional and well known non-linear least squares regression techniques so as to minimize the total error between predicted PSA using the linear equations and calculated PSA from energy minimized three dimensional structures. The coefficients were limited to non-negative constants, and the intercept was fixed to zero. [0030]
  • FIG. 2 is a scatter plot of estimated PSA using formula (1) above and the coefficients of Table 1 vs. PSA calculated with three dimensional structures for the validation dataset. Agreement between arithmetically estimated and structurally calculated PSA is very good, with R[0031] 2>0.991, and a root-mean-square PSA error when using the fast calculation method of formula (1) of only 5.9 square angstroms. The speed of calculation is anywhere from twenty to over a hundred molecules per second when using arithmetic formula (1) and the atomic classifications and coefficients of Table 1. This is a dramatic improvement over the 10-15 second per molecule for conventional PSA calculation methods using energy minimized three dimensional structures.
  • Due to the large improvement in computation speed, the above described method of PSA calculation can be advantageously applied to the evaluation of candidate drugs in large compound databases for their propensity for intestinal absorption. Using the principles of the invention described above, three dimensional structures of the compounds need not be generated to compute PSA, thereby speeding the process of evaluating candidate drugs dramatically. For example, one proposed method referred to above for selecting compounds likely to be well absorbed biologically is to compare the PSA to the threshold value of 140 square angstroms. According to this model, compounds above this threshold are likely to be poorly absorbed. Using a linear equation for PSA calculation such as formula (1) above, a 100,000 compound library could be screened and all compounds likely to be poorly absorbed could be separated from the remainder in less than an hour of computation time with workstations currently in widespread use for these applications. [0032]
  • Other models have been proposed for identifying compounds which are likely to be poorly absorbed. Some of these models include consideration of both molecular PSA and molecular octanol/water partition coefficient, logP. It has long been known that logP, although it can be determined experimentally, is also well modeled by linear equations of the same form as formula (1) above. Thus, in the evaluation of molecules of a database, logP can be estimated based on counts of atoms or atom clusters in various atomic classifications and associated multiplicative coefficients. Two linear models, known as ALOGP and CLOGP to those of skill in the art, are widely used. See, e.g. Ghose, et al., prediction of Hydrophilic (Lipophilic) Properties of Small Organic Molecules Using Fragmental Methods: An Analysis of ALOGP and CLOGP Methods, [0033] Journal of Physical Chemistry A, Vol. 102, 3762-3772, incorporated herein by reference in its entirety.
  • The molecular parameters PSA and logP are relevant descriptors of intestinal absorption and logBB. This is because passive diffusion into the intestine or through the blood-brain barrier requires diffusion of the molecule through cellular membranes comprising a lipid bilayer with both hydrophilic and lipophilic regions. Accordingly, highly hydrophilic molecules and molecules which readily form hydrogen bonds do not easily enter the membrane. Furthermore, highly lipophilic molecules do not readily leave the membrane once reaching the lipophilic interior. PSA, being a measure of hydrophilicity, and logP, which includes both hydrophilic and lipophilic contributions, have thus been found to be parameters from which membrane permeability information may be derived. [0034]
  • The relationships between PSA and logP to membrane permeability, however, are highly non-linear. As described above, extremes of high and low logP are associated with poor intestinal absorption. Molecular PSA shows a sigmoidal relationship to intestinal absorption. If accurate bioabsorption data for a significant portion of chemical space were available, an accurate non-linear model would be derivable which could predict numerically measured intestinal absorption and/or logBB properties of candidate drugs with sufficient accuracy. Because such data is unavailable, a novel modeling technique, based on PSA and logP values, has been developed, and is described below with reference to FIGS. [0035] 3-5. In accordance with this aspect of the invention, it has been found that a statistically based pattern recognition model based on PSA and logP, rather than a standard quantitative model is surprisingly successfull at distinguishing well absorbed from poorly absorbed compounds.
  • In accordance with one aspect of the invention, the model assumes that a molecule's propensity for intestinal absorption via passive diffusion is a function of the PSA and logP of the molecule, especially the interactions between PSA and logP. Every molecule may be thus assigned a location in a two dimensional PSA-logP plane. Rather than quantitatively predicting intestinal absorption as a function of position in the PSA-logP plane, a bounded region of the PSA-logP plane is defined. If the PSA and logP of a molecule are within the bounded region, the molecule is considered likely to be readily intestinally absorbed. If the PSA and logP of a molecule are outside the bounded region, the molecule is considered unlikely to be readily intestinally absorbed. To define this bounded region of high intestinal absorption within the PSA-logP plane, a statistical analysis of compounds which are known to be readily absorbed was performed. The PSA and logP for 182 compounds known to be readily absorbed and not actively transported across cellular membranes were calculated. The distribution of (PSA, logP) coordinates for these molecules was analyzed statistically assuming a multivariate normal distribution. [0036]
  • FIG. 3 illustrates the distribution in the PSA-logP plane of known readily absorbed compounds which are illustrated with “+” symbols in this Figure. The mean PSA of these molecules is 64.5867, and the mean logP is 2.3226. Broadly stated, the model predicts that new molecules being analyzed which have a PSA and logP “close” to these mean values will also be readily absorbed, and that new molecules being analyzed which have a PSA and logP “far” from these mean values will not be readily absorbed. To define how close is close enough, the model utilizes Hotelling's T[0037] 2 distance as a reference to measure distance from the mean values. The boundary 30 illustrated in FIG. 3 is the 95% confidence region for the Hotelling's T2 distance calculated using standard statistical analysis of the data points provided by the known readily absorbed compounds.
  • The coefficients required to compute the T[0038] 2 distance for compounds are determined as follows: 1) compute the average PSA and logP for a selected set of well absorbed compounds. 2) mean center the individual values of PSA and logP for each of these compounds (subtract the average values), 3) decompose this mean-centered dataset with any algorithm which provides numerically stable eigenvalues and eigenvectors (the singular value decomposition (SVD) may be used, giving standard U, S, and V output matrices). Compute the T2 distance (also known as the squared Mahalanobis distance) for any compound(s), new or those used in the model creation, using the average values and SVD results (or eigenvalues/eigenvectors from some similar method) to perform the transformations given by mean centering and multiplication by the covariance matrix, per Rencher, Methods of Multivariate Analysis, John Wiley & Sons, Inc.: New York, 1995, hereby incorporated by reference in its entirety. The confidence region/probability of a compound's similarity to well absorbed compounds may be computed using the F-distribution relationship to T2, per Rencher.
  • For the distribution of readily absorbed compounds of FIG. 3, the T[0039] 2 distance from the mean is computed as follows:
  • T 2=(181)*diagonal((A*B)*(A*B)T)  (2)
  • where A and B are the matrices:[0040]
  • A=[PSA−64.5867 logP−2.3226] and B = [ 0.00274246014581 - 0.00114013550277 - 0.00005529017061 - 0.05655211663560 ]
    Figure US20030114990A1-20030619-M00002
  • If the T[0041] 2 distance calculated in this manner is greater than 6.126, the compound falls outside the 95% confidence region, the boundary 30 of FIG. 3 in the PSA-logP plane, and is predicted by the model to be poorly absorbed.
  • The model may be tested by plotting (PSA, logP) datapoints for additional molecules known to be poorly absorbed. These are shown as open circles on FIG. 3, and it is immediately apparent that almost all of them fall outside the [0042] boundary 30. Further verification of model validity is available by plotting molecules which show high membrane permeability or low membrane permeability in caco-2 cell assays, a commonly used in vitro assay for intestinal permeability. In FIG. 3, molecules which show high caco-2 permeability are plotted with “x” symbols, and molecules which show low caco-2 permeability are plotted with open triangles. It can be seen that the high permeability molecules fall predominantly within the boundary 30, and low caco-2 permeability molecules fall predominantly outside the boundary 30. Comparison of the T2 distances and their associated probabilities with in vitro data reveals that the permeability through caco-2 cell membranes drops sharply as the compounds move outside the 95% probability region.
  • One important aspect of this method is that the multivariate normal distribution permits estimation of the probability that a compound is similar to specified reference compounds which are well absorbed. As a molecule moves farther away from the centroid of the PSA-logP space, its probability of being similar to well absorbed molecules decreases. The model as shown in FIG. 3 uses “either/or” logic, characterizing a molecule as being well or poorly absorbed by comparing its T[0043] 2 distance to a threshold. However, it may also be noted that the base calculation of T2 distance, which is computed first, is one measure of the probability of a molecule's similarity to well absorbed compounds. Thus, the model may not only be used to categorize molecules as simply well or poorly absorbed, but may be used to quantify the probability that a compound will have absorption characteristics similar to the set of known readily absorbed compounds. In the latter case, compounds in a database can be ordered according to their likelihood of having similar absorption properties to well absorbed compounds. This ordering can in turn be used to prioritize compounds for synthesis and screening.
  • An improved numerical formula for calculating a predicted logBB may also be developed based on the statistical model set forth above. It has been shown that logBB is modeled with reasonable accuracy with a linear formula having PSA and logP as variables. Clark, supra. However, an unbounded linear model like the one proposed by Clark can be improved by restricting its application to compounds that fall within the [0044] boundary 30 of FIG. 3. Penetration of the blood-brain-barrier is known to be more difficult than intestinal absorption, and the rapid drop in caco-2 permeability seen near the boundary 30 of FIG. 3 suggests that this is a likely point at which linearity is significantly lost. Least-median-of-squares linear regression was used to compute a robust linear regression whose coefficients are different from those set forth by Clark, and an improved linear model is produced having an R2 of 0.861 rather than Clark's 0.787. Furthermore, the boundary 30 was used as a second criterion to deal with the non-linearity of the permeability, and any predictions for logBB made by the linear regression are considered invalid if the molecule is outside boundary 30 Thus, this logBB model does not numerically predict logBB for compounds that fall outside the region of FIG. 3, but predicts qualitatively that such compounds will poorly penetrate the blood-brain barrier. A linear regression performed according to these principles produces the following formula for logBB:
  • logBB=−0.01577*(PSA)+0.217697*(logP)+0.119233  (3)
  • This is illustrated graphically in FIG. 4. Compounds residing in the PSA-logP plane within the [0045] boundary 30 and to the left of line 36 are predicted to have a BB of greater than 1 (i.e. logBB of greater than 0), and compounds residing in the PSA-logP plane within the boundary 30 and to the right of line 40 are predicted to have a BB of less than 0.5.
  • The above described model has a variety of advantages over conventional absorption prediction techniques, especially when applied to the evaluation of large libraries of compounds. The model may be used advantageously no matter how PSA and logP are derived for candidate compounds. However, using the linear arithmetic PSA calculation described above, both PSA and logP can be calculated using linear sums of counts of atoms and atom clusters within each molecule that fall within defined atom classifications, resulting in far lower computation times. Furthermore, qualitative predictive accuracy is also improved. Thus high quality and extremely fast chemical database screening may be performed with the principles of the invention. [0046]
  • FIG. 5 illustrates a computer implemented molecule screening system in accordance with one embodiment of the invention. In this system, information about the atomic constituents and associated bonds is retrieved from a [0047] compound database 50. This information is routed to a PSA estimation module 52 and a logP estimation module 54. In some advantageous embodiments, PSA is calculated using a linear equation as set forth in formula (1), and uses the coefficients from Table I. Log P may be estimated using a variety of known methods, including the ALOGP or CLOGP methods. The estimated PSA and logP values are then forwarded to a molecule selection/categorization module 56. This module 56 categorizes compounds of interest in accordance with their probability similarity to well absorbed molecules. The module 56 may separate candidate molecules as being within or outside the boundary 30 of FIG. 3 using formula (2) above, and may also order compounds according to their probability of showing similarities to well absorbed compounds based on their distance from the centroid of well absorbed compounds in the PSA-logP plane.
  • Compounds predicted to be poorly absorbed [0048] 58 may be separated out. If desired, PSA and logP values for some compounds may be routed to a logBB estimation module 60, which numerically calculates logBB for desired molecules using linear formula 3 above.
  • The invention serves the broad purpose of prioritizing candidate molecules. Candidate molecules existing only in a virtual sense may be prioritized for synthesis, while candidate molecules which have been synthesized may be prioritized for screening. Furthermore, this process may be iterated numerous times. After synthesis and screening, in vitro and/or in vivo absorption data is produced, creating a new data point which may be added to model. New information gained during the drug discovery process may thus be used to improve the model and thus the synthesized compounds. [0049]
  • The foregoing description details certain embodiments of the invention. It will be appreciated, however, that no matter how detailed the foregoing appears in text, the invention can be practiced in many ways. As is also stated above, it should be noted that the use of particular terminology when describing certain features or aspects of the invention should not be taken to imply that the terminology is being re-defined herein to be restricted to including any specific characteristics of the features or aspects of the invention with which that terminology is associated. The scope of the invention should therefore be construed in accordance with the appended claims and any equivalents thereof. [0050]

Claims (21)

What is claimed is:
1. A method of estimating the polar surface area of a molecule comprising:
making separate counts of the number of atoms or atom clusters in said molecule which fall within the definition of each of a plurality of atomic classes;
estimating a polar surface area of said molecule using at least some of said separate counts.
2. The method of claim 1, wherein said estimating comprises multiplying at least some of said counts by one or more coefficients, and summing the products thereof.
3. The method of claim 1, wherein said plurality of atomic classes includes one or more of single bonded NH2, double bonded NH, doubly single bonded NH, aromatically bonded NH, aromatically bonded N, triple bonded N, single bonded OH, double bonded O, doubly single bonded O, and aromatically bonded O.
4. The method of claim 3, wherein said plurality of atomic classes includes all of single bonded NH2, double bonded NH, doubly single bonded NH, aromatically bonded NH, aromatically bonded N, triple bonded N, single bonded OH, double bonded O, doubly single bonded O, and aromatically bonded O.
5. A method of estimating the molecular polar surface area of a molecule comprising performing counts of pre-defined atom types and/or atom cluster types present in said molecule and estimating said molecular polar surface area using said counts weighted by coefficients and summed together, without reference to an energy minimized three-dimensional molecular structure of said molecule.
6. The method of claim 5, wherein said pre-defined atom types and/or atom cluster types includes one or more of single bonded NH2, double bonded NH, doubly single bonded NH, aromatically bonded NH, aromatically bonded N, triple bonded N, single bonded OH, double bonded O, doubly single bonded O, and aromatically bonded O.
7. A method of predicting the propensity of a molecule for membrane permeability comprising:
computing a polar surface area for said molecule without reference to an energy minimized three dimensional structure of said molecule; and
using said computed polar surface area to predict said propensity.
8. A method of drug discovery comprising:
computing polar surface areas for a plurality of molecules without reference to an energy minimized three dimensional structure for at least some of said molecules; and
using said computed polar surface area in a membrane permeability prediction model so as to select one or more of said molecules for further analysis.
9. The method of claim 8, additionally comprising computing logP for said plurality of molecules, and using such computed logP in said membrane permeability prediction model.
10. The method of claim 8, comprising prioritizing compounds for synthesis and screening.
11. A computer readable medium having instructions stored thereon which cause a general purpose computer to perform a method of molecular PSA estimation, said method comprising:
making separate counts of the number of atoms or atom clusters in a molecule which fall within the definition of each of each of a plurality of atomic classes;
retrieving a plurality of coefficients;
multiplying said separate counts by selected ones of said plurality of coefficients and summing the products thereof.
12. A computer implemented drug discovery system comprising:
a PSA estimation module for estimating molecular PSA without reference to three dimensional molecular structures;
a logP estimation module for estimating molecular logP without reference to three dimensional molecular structures;
a molecule selection module for categorizing molecules based on the results of PSA and logP estimation.
13. The system of claim 12, wherein said categorizing comprises separating said molecules into a first class predicted to be readily intestinally absorbed, and a second class predicted to be poorly intestinally absorbed.
14. The system of claim 12, wherein said categorizing comprises ordering at least some of said molecules according to their probabilities of having absorption characteristics similar to known well absorbed compounds.
15. The system of claim 12, wherein said molecule selection module calculates a distance in a PSA-logP plane between molecules being selected and a pre-defined point of said PSA-logP plane.
16. The system of claim 12, additionally comprising a logBB estimation module configured to numerically predict logBB from PSA and logP estimations.
17. The system of claim 16, wherein said categorizing comprises a prioritization of compounds for synthesis and screening.
18. The system of claim 12, wherein said categorizing comprises a prioritization of compounds for synthesis and screening.
19. A system for drug discovery comprising:
a database storing information regarding the atomic constituents and interatomic bonds for a plurality of molecules; and
means for estimating the PSA for each of said plurality of molecules without reference to an energy minimized three dimensional structure for any of said plurality of molecules.
20. The system of claim 19, additionally comprising means for estimating logP for each of said plurality of molecules without reference to an energy minimized three dimensional structure for any of said plurality of molecules.
21. The system of claim 20, additionally comprising means for predicting propensity for intestinal absorption for each of said molecules.
US10/319,294 2000-04-19 2002-12-13 Prediction of molecular polar surface area and bioabsorption Abandoned US20030114990A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/319,294 US20030114990A1 (en) 2000-04-19 2002-12-13 Prediction of molecular polar surface area and bioabsorption

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/552,549 US6522975B1 (en) 2000-04-19 2000-04-19 Prediction of molecular polar surface area and bioabsorption
US10/319,294 US20030114990A1 (en) 2000-04-19 2002-12-13 Prediction of molecular polar surface area and bioabsorption

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US09/552,549 Division US6522975B1 (en) 2000-04-19 2000-04-19 Prediction of molecular polar surface area and bioabsorption

Publications (1)

Publication Number Publication Date
US20030114990A1 true US20030114990A1 (en) 2003-06-19

Family

ID=24205808

Family Applications (3)

Application Number Title Priority Date Filing Date
US09/552,549 Expired - Lifetime US6522975B1 (en) 2000-04-19 2000-04-19 Prediction of molecular polar surface area and bioabsorption
US10/270,797 Expired - Lifetime US7113870B2 (en) 2000-04-19 2002-10-11 Prediction of molecular polar surface area and bioabsorption
US10/319,294 Abandoned US20030114990A1 (en) 2000-04-19 2002-12-13 Prediction of molecular polar surface area and bioabsorption

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US09/552,549 Expired - Lifetime US6522975B1 (en) 2000-04-19 2000-04-19 Prediction of molecular polar surface area and bioabsorption
US10/270,797 Expired - Lifetime US7113870B2 (en) 2000-04-19 2002-10-11 Prediction of molecular polar surface area and bioabsorption

Country Status (6)

Country Link
US (3) US6522975B1 (en)
EP (1) EP1279034A4 (en)
JP (1) JP2004501348A (en)
AU (1) AU2001250005A1 (en)
CA (1) CA2404929A1 (en)
WO (1) WO2001079841A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030114991A1 (en) * 2000-04-19 2003-06-19 Egan William J. Prediction of molecular polar surface area and bioabsorption

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005529158A (en) * 2002-05-28 2005-09-29 ザ・トラスティーズ・オブ・ザ・ユニバーシティ・オブ・ペンシルベニア Method, system and computer program product for computer analysis and design of amphiphilic polymers
WO2009083020A1 (en) 2007-12-28 2009-07-09 F. Hoffmann-La Roche Ag Assessment of physiological conditions
US8666677B2 (en) * 2009-12-23 2014-03-04 The Governors Of The University Of Alberta Automated, objective and optimized feature selection in chemometric modeling (cluster resolution)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030100475A1 (en) * 2001-04-05 2003-05-29 Charles Pidgeon Predicting taxonomic classification of drug targets
US20030120430A1 (en) * 2001-12-03 2003-06-26 Icagen, Inc. Method for producing chemical libraries enhanced with biologically active molecules
US6675136B1 (en) * 1998-11-27 2004-01-06 Astrazeneca Ab Global method for mapping property spaces

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6522975B1 (en) * 2000-04-19 2003-02-18 Pharmacopeia, Inc. Prediction of molecular polar surface area and bioabsorption

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6675136B1 (en) * 1998-11-27 2004-01-06 Astrazeneca Ab Global method for mapping property spaces
US20030100475A1 (en) * 2001-04-05 2003-05-29 Charles Pidgeon Predicting taxonomic classification of drug targets
US20030130799A1 (en) * 2001-04-05 2003-07-10 Charles Pidgeon Structure/properties correlation with membrane affinity profile
US20030120430A1 (en) * 2001-12-03 2003-06-26 Icagen, Inc. Method for producing chemical libraries enhanced with biologically active molecules

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030114991A1 (en) * 2000-04-19 2003-06-19 Egan William J. Prediction of molecular polar surface area and bioabsorption
US7113870B2 (en) * 2000-04-19 2006-09-26 Acclerys Software, Inc. Prediction of molecular polar surface area and bioabsorption

Also Published As

Publication number Publication date
EP1279034A4 (en) 2006-04-19
AU2001250005A1 (en) 2001-10-30
CA2404929A1 (en) 2001-10-25
WO2001079841A1 (en) 2001-10-25
US6522975B1 (en) 2003-02-18
US20030114991A1 (en) 2003-06-19
EP1279034A1 (en) 2003-01-29
JP2004501348A (en) 2004-01-15
US7113870B2 (en) 2006-09-26

Similar Documents

Publication Publication Date Title
Bustad et al. Parametric and nonparametric population methods: their comparative performance in analysing a clinical dataset and two Monte Carlo simulation studies
Collins et al. Omega: A general formulation of the rand index of cluster recovery suitable for non-disjoint solutions
Willmann et al. Development of a physiology-based whole-body population model for assessing the influence of individual variability on the pharmacokinetics of drugs
Zhang et al. Simultaneous vs. sequential analysis for population PK/PD data I: best-case performance
Brooks et al. Markov chain Monte Carlo convergence assessment via two-way analysis of variance
Ishwaran et al. Bayesian model selection in finite mixtures by marginal density decompositions
Medvedovic et al. Bayesian mixture model based clustering of replicated microarray data
Lewis et al. Similarity measures for rational set selection and analysis of combinatorial libraries: the diverse property-derived (DPD) approach
AU5370098A (en) Method and means for synthesis-based simulation of chemicals having biological functions
US20040107054A1 (en) Method for determining discrete quantitative structure activity relationships
Zhu et al. ADME properties evaluation in drug discovery: in silico prediction of blood–brain partitioning
Sargent A flexible approach to time-varying coefficients in the Cox regression setting
Wakefield Bayesian individualization via sampling-based methods
Tsiros et al. Population pharmacokinetic reanalysis of a Diazepam PBPK model: a comparison of Stan and GNU MCSim
Bottegoni et al. AClAP, Autonomous hierarchical agglomerative Cluster Analysis based protocol to partition conformational datasets
US6522975B1 (en) Prediction of molecular polar surface area and bioabsorption
Ette et al. The process of knowledge discovery from large pharmacokinetic data sets
Hawkins et al. The application of statistical methods to cognate docking: a path forward?
US20100112724A1 (en) Method of determination of protein ligand binding and of the most probable ligand pose in protein binding site
Espinosa-Garcia et al. Quasi-classical trajectory dynamics study of the Cl (2P)+ C2H6→ HCl (v, j)+ C2H5 reaction. Comparison with experiment
Cobelli et al. Compartmental models of physiological systems
US20040073375A1 (en) Methods for identifying a molecule that may bind to a target molecule
US8374837B2 (en) Descriptors of three-dimensional objects, uses thereof and a method to generate the same
Charpiat et al. A population pharmacokinetic model of cyclosporine in the early postoperative phase in patients with liver transplants, and its predictive performance with Bayesian fitting
D'Argenio et al. Uncertain pharmacokinetic/pharmacodynamic systems: design, estimation and control

Legal Events

Date Code Title Description
AS Assignment

Owner name: PHARMACOPEIA, INC., NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:EGAN, WILLIAM J.;LAURI, GIORGIO;REEL/FRAME:013588/0367

Effective date: 20000418

AS Assignment

Owner name: ACCELRYS SOFTWARE INC., CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:ACCELRYS SOFTWARE SOLUTIONS INC.;REEL/FRAME:015953/0718

Effective date: 20040520

Owner name: ACCELRYS SOFTWARE SOLUTIONS INC., CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:ACCELRYS INC.;REEL/FRAME:015953/0708

Effective date: 20040512

Owner name: ACCELRYS INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PHARMACOPEIA, INC.;REEL/FRAME:015953/0679

Effective date: 20040624

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION