WO2009114591A1 - Procédé et appareil de criblage de médicaments pour des variables prédictives d'événements mesurés quantitativement - Google Patents

Procédé et appareil de criblage de médicaments pour des variables prédictives d'événements mesurés quantitativement Download PDF

Info

Publication number
WO2009114591A1
WO2009114591A1 PCT/US2009/036752 US2009036752W WO2009114591A1 WO 2009114591 A1 WO2009114591 A1 WO 2009114591A1 US 2009036752 W US2009036752 W US 2009036752W WO 2009114591 A1 WO2009114591 A1 WO 2009114591A1
Authority
WO
WIPO (PCT)
Prior art keywords
statistical
drugs
event
associations
data
Prior art date
Application number
PCT/US2009/036752
Other languages
English (en)
Inventor
June Sherie Almenoff
Dana E. Vanderwall
Original Assignee
Smithkline Beecham Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Smithkline Beecham Corporation filed Critical Smithkline Beecham Corporation
Publication of WO2009114591A1 publication Critical patent/WO2009114591A1/fr

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/80Data visualisation

Definitions

  • the present invention relates to the field of drug safety and, more particularly, to methods and apparatus for screening drugs for predictors of quantitatively measured events.
  • An adverse event can therefore be any unfavorable and unintended sign (including an abnormal laboratory finding, for example), symptom, or disease temporally associated with the use of a medicinal product, whether or not considered related to the medicinal product.”
  • adverse events refer to any untoward medical occurrence in a patient administered a medicinal product associated with a treatment and which does not necessarily have to have a causal relationship with this treatment.
  • An adverse event can, therefore, be described as any unfavorable and unintended sign (for example, an abnormal laboratory finding), symptom, or disease temporally
  • Adverse events reports which may be generated from the use of marketed drugs or biological products or also from investigational products in clinical trials, generally contain the following four elements: 1) an identifiable patient, 2) an identifiable reporter, 3) a suspect drug or biological product and 4) an adverse event or fatal outcome. More information regarding adverse event reporting is provided at the website fda.gov/medwatch/report/guide2.htm.
  • Adverse events for a drug may be revealed only after the drug is approved or when it is used in conjunction with other therapies.
  • events that are described as adverse for a particular patient group have the potential to provide benefit or efficacy in other clinical situations.
  • toxicity refers to adverse events of medicinal products that are detrimental to a subject's health
  • efficacy refers to situations where untoward medical occurrence may lead to clinical benefit. Because of the limits in characterizing safety profiles of drugs prior to marketing, pharmaceutical manufacturers and regulatory agencies typically collect adverse events reports on the marketed drugs that are used to form databases of adverse events.
  • adverse event reports represent one of the largest sources of information relating specifically to the safety profile of marketed drugs.
  • Adverse event reports are submitted by health professionals and consumers for marketed drugs and by health professionals for investigational drugs in a clinical trial setting. Pharmaceutical companies are typically under legal obligation to provide received reports to various regulatory authorities.
  • monitoring techniques may include a case-by-case examination of newly generated reports, a tabulation of counts of events for specific drugs, and a detailed review of all the data fields, such as any free-text medical narratives of reports associated with a
  • a non-analysis approach tends to be dependent on the knowledgeability and attentiveness of the individual safety reviewers.
  • the present invention relates to methods and systems for screening drugs for predictors of quantitatively measured events. At least one predictor of an event is selected. Data for a plurality of drugs is retrieved, where the data for each of the drugs has an associated quantitative measure of the event. One or more relationships are statistically analyzed between the selected predictor and the event to determine one or more statistical associations among the relationships, using the quantitative measure of the event for each of the plurality of drugs. The statistical associations are determined without any a priori associations between the predictor and the event. The determined statistical associations are presented and include presentation of a measure of statistical significance.
  • Fig. 1 is a functional block diagram illustrating an example system for screening drugs for predictors of quantitatively measured events, according to an embodiment of the present invention
  • Fig. 2 is a flow chart illustrating an example method for screening drugs for predictors of quantitatively measured events, according to an embodiment of the present invention
  • Fig. 3 is a flow chart illustrating an example method for screening drugs for predictors of quantitative measures of adverse events, according to an embodiment of the present invention
  • Figs. 4A and 4B are partial screen layouts for selecting predictors of s adverse events, according to an embodiment of the present invention.
  • Fig. 5 is a decision tree diagram illustrating presentation of drug screening results for statistical associations between predictors and an event, according to the method shown in Figs. 2 and 3;
  • Fig. 6A is scatter plot of drugs associated with tardive dyskinesia as ao function of a dopamine receptor, illustrating another presentation of drug screening results, according to the method shown in Fig. 3;
  • Fig. 6B is a portion of a heat map illustrating statistical associations between a plurality of predictors and a plurality of events, according to another example embodiment of the present invention
  • s Figs 7 is a decision tree diagram of the predictor Human Ether a Go-Go
  • Fig. 10 is a decision tree diagram of predefined structural feature predictors of methemoglobinaemia. 5 DETAILED DESCRIPTION OF THE INVENTION
  • aspects of the present invention relate to methods and systems for screening drugs for predictors of quantitatively measured events. At least one predictor of an event is selected.
  • the predictor may include pharmacological assays, biochemical assays and/or compound properties.
  • the event may include an adverse0 event or an efficacy associated with drug or drug dose. Data for a plurality of drugs is
  • One or more relationships are statistically analyzed between the selected predictor and the event to determine one or more statistical associations among the relationships, using the quantitative measure of the event for each of the plurality of drugs.
  • the statistical associations are determined without any a priori associations between the predictor and the event.
  • the determined statistical associations are presented and include presentation of a measure of statistical significance.
  • the event may include tardive dyskinesia and the predictor may include pharmacological activity at alpha-1 receptor subtypes.
  • One or more statistical associations may be determined and presented between data for drugs including an association with the alpha-1 receptor subtypes and tardive dyskinesia.
  • the present invention uses quantitative measures of events from drugs that exhibit a range of expression of a particular event.
  • the statistical modeling process determines statistical associations, without using a priori information of predictors and associated events, and generates a statistical model.
  • the broad information on drugs that do not exhibit the particular event may contribute to the robustness and specificity of the determined statistical model.
  • the statistical modeling process is typically multivariable in nature and may provide an improved result as compared to testing a single hypothesis about a single predictor and response variable.
  • System 100 may include modeling system 102 and one or more local user devices 104 connected to modeling system 102.
  • local user device 104 is connected to modeling system 102 by a global network, e.g. the Internet (not shown). It is understood that local user device 104 may be coupled to modeling system 104 by any suitable means, including any wired or wireless connection. Although one local user device 104 is shown, it is understood that modeling system 102 may be coupled to a number of local user devices 104.
  • Local user device 104 may include user interface 120 for selecting parameters for modeling statistical associations between predictors and events and display 118 for displaying the parameters and results of the statistical modeling process.
  • local user device 104 is a computer, it
  • PB_H ⁇ NRPORTBL ⁇ RP ⁇ PCBOCCELLA ⁇ 262570_1.
  • local user device 104 may be any suitable device capable of providing selection of parameters and displaying the results of the modeling. It is understood that display 118 may include any display capable of presenting information including textual and graphical information. It is also understood that user interface 120 may be any suitable interface for selecting parameters for modeling statistical associations between predictors and events.
  • the parameters for selection may include event selection, a response variable (i.e. a quantitative measure) associated with an event, one or more predictors (described further below), and a statistical model.
  • Event selection may include selection among types of events such as an adverse event or efficacy.
  • the parameters may also include selection of one or more databases having information from which quantitative measures of events can be determined, for example from among external database(s) 122.
  • events database 110 may include measures of efficacy for a number of drugs. Accordingly, the response variable may be associated with various quantitative clinical pharmacology measures associated with the efficacy of the drugs.
  • events database 110 may include measures of adverse events for a number of drugs. Accordingly, the response variable may be associated with quantitative measures of the adverse events (described further below).
  • Examples of external databases 122 include, but are not limited to, post marketing safety databases, clinical trials databases, administrative medical claims databases, electronic health records databases or the like.
  • the quantitative measure may include, but is not limited to, the empirical Bayes geometric mean (EBGM), the log (base 10) transform of the EBGM (log EBGM), a relative reporting (RR) ratio, a reporting odds ratio (ROR), a proportional reporting rate ratio (PRR), a multi-item gamma Poisson shrinker (MGPS) methodology, a Bayesian confidence propagation neural network (BCPNN) methodology and/or odds ratios or relative risks based on placebo-control, case-control, or controlled cohort studies.
  • EBGM empirical Bayes geometric mean
  • log 10) transform of the EBGM log EBGM
  • RR relative reporting
  • ROR reporting odds ratio
  • PRR proportional reporting rate ratio
  • BCPNN Bayesian confidence propagation neural network
  • EBGM scores have been calculated, for example, from adverse event data collected in the Food and Drug Administration (FDA) Adverse Event Reporting System (AERS) or World Health Organization's (WHO) Vigibase adverse event reporting systems.
  • FDA Food and Drug Administration
  • AERS Adverse Event Reporting System
  • WHO World Health Organization's
  • the EBGM scores are typically obtained for individual "preferred terms.” The preferred terms generally refer
  • QT prolongation refers to the interval between the Q wave and the T wave in an electrocardiogram of the heart.
  • the QT interval is typically a measure of a total duration of electrical activity of ventricular depolarization and repolarization.
  • the predictors may include pharmacological or biochemical assay results represented as continuous or categorical values and/or compound properties.
  • the compound properties may include biochemical or pharmacological properties represented as categorical values, chemical structure- or substructure- based descriptors and other modeled physical or biochemical properties of the compounds.
  • Biochemical assay results refer to the measurement of interactions of molecules (such as drugs) with protein or other molecular targets.
  • the assay results data typically includes assays that measure binding, inhibition or activation responses (referred to as modes) as a result of the interaction, depending on the protein target and type of assay. Some protein targets may be processed in multiple assays and may measure different modes.
  • Assay results are typically quantitatively determined in a standardized form as a result of a dose-response protocol, and a concentration at which 50% of a maximum response is observed.
  • a negative log value of the concentration is typically defined as a pIC 50 for antagonist or binding assays and a pEC 50 is defined for activation or agonist assays.
  • activity values are used to provide a quantitative measurement enabling a statistically significant distinction in the response variables of subset of all entities, or drugs, in an analysis,
  • PB_H ⁇ NRPORTBL ⁇ RP ⁇ PCBOCCELLA ⁇ 262570_1.
  • the activity value of an assay may be used as binary measure of statistical significance.
  • Biochemical or pharmacological properties of compounds, i.e. drugs may also be represented by categorical values.
  • the categorical values may be generated by assigning ranges of activity values (such as ranges of pXC 50 values) and labeling the compounds with the appropriate category. For example, pIC 50 values between 4 to 5.8 may be labeled as "weak,” PIC 50 values greater than 5.8 and less than or equal to 6.8 may be labeled as "medium,” and PIC 50 values greater than 6.8 may be labeled as "strong.”
  • Binary values may also be used, for example, to describe whether a compound is a substrate for a particular enzyme or whether a specific property is included in a compound.
  • the biochemical or pharmacological properties may be determined, for example, from known biological data.
  • Chemical structure- or substructure-based predictors may include abstractions of chemical structures of drug molecules.
  • the chemical structure/substructure- based predictors may include numerical, categorical or binary values that may be statistically associated with a response variable.
  • the structure/substructure predictors may include molecular fragments, atom pairs and topological torsions, Daylight fingerprints algorithms provided by Daylight Chemical Information Systems, Inc. (described at the website daylight.com/dayhtml/doc/theory/theory.finger.html), pharmacophore bit strings, keys that capture an electronic character of each atom and an associated environment in a molecule, as well as user generated fragments to define chemical substructures.
  • the other modeled physical or biochemical properties may include models of physico-chemical and physio-chemical or biochemical properties that are built from chemical structure predictors and subsequently used as predictors.
  • Examples of physical or biochemical properties include solubility, blood brain barrier penetration or hydrophobicity.
  • Modeling system 102 may include controller 106, pre-processing filter 108, events database 110, statistical modeling processor 112, visualization processor 114 and modeling results database 116. It is contemplated that modeling system 106 may include any computer having a processor (e.g. a microprocessor or a dual-core microprocessor) for determining statistical associations between predictors and events and to identify drugs associated with the predictors. Modeling system 102 may
  • Modeling system 102 may be coupled to one or more external databases 122 for retrieving event data, such as adverse events or clinical pharmacology events for a number of drugs, and may store the event data from external database(s) 122 in events database 110.
  • external database(s) 122 may include any of a number of databases of spontaneous adverse event reports.
  • each major pharmaceutical company may have a proprietary database of reports focused on cases in which one of the company's products was considered to be "suspect" (which may number about 500,000 reports for the larger pharmaceutical companies).
  • AERS Access Management Entity
  • the databases maintained by the regulatory agencies may contain as many as about 3,000,000 reports.
  • the databases contain adverse event reports that may include a demographic record (for example, age, gender, a date of the event, a seriousness of the event), one or more drug records (for example, a generic or trade name, a suspect or concomitant designation, a route of administration, a dosage), and one or more records documenting a sign, symptom or diagnosis.
  • Individual databases may also contain narratives (from which event terms were coded), outcomes (e.g., hospitalization, death), and report source (consumer or health professional, domestic or foreign).
  • Controller 106 may parse one or more external databases 122 to retrieve and store data for the events in events database 110.
  • events database 110 may include quantitative information about adverse event reporting, chemical structure information, clinical pharmacology information and physicochemical properties, chemical descriptors (both measured and derived) for a number of drugs.
  • External databases 122 typically include information on drugs that have been approved for human use and are considered to be relatively safe. Accordingly, events database 110 may use quantitative information on marketed drugs.
  • external databases 122 may include multiple versions of a single case, For example, adverse events databases may contain multiple reports because of regulatory requirements to submit a series of reports as additional information about a case becomes available. In addition, there may be other sources of report duplication, such as multiple reports of the same medical event by different manufacturers or reports that may arrive through different pathways (such as from consumers or from manufacturers).
  • controller 106 may control the formatting of events data from external database(s) 122 such that events data from a number of different external databases 122 may be entered into events database 110 in a predetermined format.
  • controller 106 may combine multiple versions of reports into one best representative version to present a uniform view of event data from various data sources. Controller 106 may receive parameters, such as predictors and response variables, from local user device 104 and control pre-processing filter 108 to filter the associated adverse event data in event database 110.
  • pre-processing filter 108 may perform a data aggregation, for example on protein targets that have been processed in multiple assays, that measure a same mode of interaction but with multiple assay technologies. Pre-processing filter 108 may also exclude event data that includes fewer than a predetermined number of reports.
  • Controller 106 may provide the pre-processed event data from preprocessing filter 108 to statistical modeling processor 112. Controller 106 may also control visualization filter 114 to modify statistically modeled data determined by statistical modeling processor 112 for display and interpretation. In addition, controller 106 may control display 118 to display the statistically modeled data from modeling processor 112 or the modified modeled data from visualization filter 114.
  • Statistical modeling processor 116 may determine statistical associations among the selected predictors and the quantitative measures of events. In particular, statistical modeling processor 116 may statistically analyze one or more relationships between the selected predictor(s) and the event to determine the statistical associations among the analyzed relationships, by using the quantitative measure of the event for each of the drugs. In an example embodiment, statistical modeling processor 116 may determine one or more statistical associations between EBGM values for a preferred term (i.e., of an adverse event) and the selected predictors. Statistical modeling processor 112 may use any suitable statistical method for determining statistical associations. In general, statistical modeling processor 116 desirably identifies stronger, more probable associations quantitative measures of events and a predictor. In an example embodiment, recursive partitioning or random forest modeling is used.
  • Other statistical approaches may include: measures of correlation; other multiple tree methods that utilize boosting, bagging, or random splits; or multiple linear regression; partial least squares; Fischer's exact test; Chi- squared test, regression analysis, neural networks, and other artificial intelligence methods.
  • Other data mining approaches may include a correlation analysis approach, a principle component analysis (PCA), a multi-dimensional scaling (MDS), as well as graph based analyses.
  • PCA principle component analysis
  • MDS multi-dimensional scaling
  • One example of an alternative data mining approach is the correlation analysis. Using this technique, correlation is used to highlight relationships between the target activities of drug molecules, and then visualization is used to look for patterns in those relationships. According to an embodiment of the present invention, a Pearson's correlation is performed, with no a priori associations between the predictors and the events.
  • a ⁇ heat map' (for example, shown in Fig. 6B) is commonly used for analysis in the genomic and biomarker analysis, and enables a broad view of the relationships of many entities in terms of many columns of variables.
  • a heat map relates to a table of data arranged in a grid of cells, where the cells are colored based on the value in that cell, transforming a table (a.k.a. grid or matrix), of numbers to a table of colors.
  • a column may be sorted to identify objects with similar values, but as the number of variables grows, simple sorting may no longer be able group entities with similar profiles across all of the variables.
  • a typical data mining approach is to organize the table, so as to group entities with
  • a heat map allows for identifying patterns and groups visually.
  • Clustering methods are typically used to group the objects by their similarity to one another.
  • hierarchical (or agglomerative) clustering methods are used.
  • One clustering approach includes clustering the table, where each row is one drug, and includes a column for each event being analyzed. Clustering may organize the rows so that drugs with similar patterns of events are grouped together. Optionally, the clustering can also be used to organize the columns, so that events that occur in similar drugs may be grouped together.
  • clustering may be performed on a determined correlation matrix.
  • the rows and columns are rearranged to group similar rows and columns with one another.
  • the events and predictors may have similar patterns and may be grouped together for a number of different reasons. For example: (a) the grouped events might actually be describing what a medical expert would consider to be the same phenomena, and so reflect some redundancy in the medical dictionary used for adverse event coding; (b) the grouped events may be somewhat distinct medical implications, but may be driven from the same underlying biology or toxicity; (c) the grouped events may reflect the most notable adverse events of a particular class of drugs, but may or may not share a common mechanism; (d) the grouped predictors may reflect that the drugs that bind to one target assay generally also bind to another target assay; and (e) the predictors may be grouped together because they may be part of a more general biological network or system, and may reflect several different specific mechanisms of causing an adverse biological response or toxicity.
  • the additional analysis may include either or both model building using recursive partitioning and random forests on one or more preferred terms, as well as additional literature research to provide more background and context to the hypothetical relationships identified.
  • PB_H ⁇ NRPORTBL ⁇ RP ⁇ PCBOCCELLA ⁇ 262570_1.
  • Hierarchical (or agglomerative) clustering methods are described, it is understood that other clustering methods may also be used. For example, partitional methods and self-organizing maps.
  • heat maps are described it is understood that other visualization approaches may be used.
  • profile plots i.e. parallel coordinate plots
  • MDS multi-dimensional scaling
  • a related analysis and visualization includes a graph based analyses.
  • the similarities between all entities are calculated based on one or more columns of variables, and where the similarity is above a particular threshold a link between the entities is created.
  • a network is drawn where the objects are connected by their links to one another, and an algorithm is used to organize the graph to facilitate the identification of groups with many common links, or conversely that are distinct with few if any links.
  • Different types of graph analyses include: (a) calculating a similarity between compounds based on chemical structure (using atom pairs & topological torsions, or Daylight fingerprints), and then coloring the objects (compounds) based on the value of specific events (for example using hepatotoxicity, prolonged QT, Stevens-Johnson Syndrome); (b) creating a graph based on links between any pair of predictors, any pair of events, and any event-predictor pair above a threshold of either an r-value (i.e. a correlation coefficient) or a p-value (i.e. the probability of an observed result happening by chance rather than due to an actual relationship) from a Pearson's correlation calculation; and (c) creating a network using only the links between predictors and events.
  • an r-value i.e. a correlation coefficient
  • p-value i.e. the probability of an observed result happening by chance rather than due to an actual relationship
  • statistical modeling processor 112 provides statistical processing that may determine chemical, biological, or physical properties of drugs which may be related to quantitative measures of events of the drugs.
  • the quantitative data may be selected such that it has some statistical significance in the explanation or prediction of the event.
  • Recursive partitioning and random forest methods typically provide a multivariate analysis of predictors and the response variable and provide a classification and prediction that may identify relationships among predictive features. See, for example, an article to R.A. Berk, entitled “An Introduction to Ensemble Methods for Data Analysis,” Department of Statistics, UCLA, July 25, 2004 and at website stat.berkeley.edu/ ⁇ breiman/RandomForests/ cc_home.htm.
  • Recursive partitioning typically determines a decision tree structure in a feature space. Each node of the tree may define a subset of compounds that share similar features and have relatively homogenous response values. Under recursive partitioning, a search may be performed to determine a best partition (i.e. a split) among the compounds. For example, a single feature may be determined that is used to define two groups of compounds having a substantially different response distribution. Accordingly, a root node (i.e. a parent node) may be split into two initial "leaf" nodes. A search on the current leaf nodes may also be performed to determine a further best split by a single feature to provide a most different response distribution.
  • a best partition i.e. a split
  • Random forests models are typically based on recursive partitioning but build a model that may include a collection of many trees, for example, hundreds of trees. Random forests are typically better at prediction as compared to recursive partitioning but may be more difficult to interpret or visualize. For example, to generate a random forest of 500 trees, 500 subsets of available features may be randomly selected. A recursive partitioning tree may be constructed based on each feature subset. A random subset of compounds may also be reserved to evaluate the performance of the model.
  • a prediction may be generated by processing each compound through all 500 trees and then forming an averaged result for all of the trees.
  • models determined using a random forest method are typically more predictive, because these models are developed to work around, or to be more robust to, artifacts or weaknesses of a single tree (as in recursive partitioning).
  • single trees may not be as predictive as random forests, but
  • splits in single trees are typically driven by a most significant p-value, such that another predictor which is similar may not be sampled. Because one predictor may be a surrogate for another predictor and because data sets may not be as balanced and diverse as desired, the determined most significant descriptor (by recursive portioning) may not necessarily be the best predictor.
  • random forests use hundreds of trees and are typically generated by omitting a certain fraction of data and predictors for different trees in order to emphasize other predictors and data in the statistical modeling, in order to reduce the effects of surrogate predictors.
  • the subsequent prediction of a substantially toxic or of a substantially efficacious compound by random forest is typically an average prediction of hundreds of trees.
  • Fig. 6A shows a graph of EBGM as a function of dopamine affinity.
  • statistical modeling of event reports may determine the occurrence of statistical associations among predictors and the event. To determine the occurrence of statistical associations, no a priori assumptions are used in the statistical modeling with respect to relationships between the predictors and the events. In addition, drugs that both are and are not associated with events, and broad information on molecules that do and do not exhibit a particular event may be included in the statistical modeling, and may help to contribute to the robustness and specificity of the models.
  • modeling system 102 may include visualization filter 114 that receives the modeled data from statistical modeling processor 112 and formats the modeled data in a manner suitable for display, such as in a decision tree diagram (Fig. 5), as a scatter plot (Fig. 6A) or as a heat map (Fig. 6B).
  • the modeled data may be analyzed to determine whether the identified statistical associations are biologically or chemically relevant and interpretable with respect to the model generated by statistical modeling processor 112. Accordingly, the modeled
  • PB_H ⁇ NRPORTBL ⁇ RP ⁇ PCBOCCELLA ⁇ 262570_1.
  • Modeling system 102 may include modeling results database 116 for storing the resulting model, the resulting statistical associations determined by statistical modeling processor 112 and/or optimized statistical associations from visualization processor. Controller 106 may verify the generated statistical model stored in modeling results database 116. For example, the model may be processed for the same predictors and results variables using another external database 122 or to compare previously determined modeling results with modeling results of other related medical events (for example, tardive dyskinesia and extrapyramidal symptoms).
  • a suitable display 118, user interface 120, external database(s) 122, events database 110, modeling results database 116, controller 106, pre-processing filter 108, statistical modeling processor 112, and visualization processor 114 will be understood from the description herein.
  • Fig. 2 is a flowchart illustrating an example method for screening drugs using a quantitative measure of events, according to an embodiment of the present invention.
  • drugs may be screened for efficacy using quantitative measures of events such as from a clinical pharmacology database.
  • predictors are selected, for example, by user interface 120 (Fig. 1).
  • an event is selected. For example, if events are associated with efficacy, various treatments of a particular disease may be selected.
  • steps 200 and 202 are illustrated as being performed sequentially, it is contemplated that steps 200 and 202 may be performed in a reverse order or concurrently.
  • a statistical modeling method may be selected, such as recursive partitioning.
  • pre-processing of the data may be applied.
  • events stored in events database 110 may be processed by pre-processing filter 108 (Fig. 1), which conditions the data to disregard sparsely populated events.
  • step 208 statistical modeling is applied to determine one or more statistical associations between the predictors selected at step 200 and the event
  • one or more relationships are statistically analyzed among the selected predictor(s) and the event to determine the statistical associations among the analyzed relationships, by using the quantitative measure of the event for each of the drugs.
  • drugs that are associated with the statistically significant predictors are identified. Accordingly, either the most toxic or the most efficacious compounds associated with an event may be identified based on the statistical associations.
  • steps 208 and 210 are illustrated as being performed sequentially, it is understood that steps 208 and 210 may be performed concurrently.
  • statistical associations are displayed, including a measure of significance, for example, by display 118 (Fig. 1).
  • the measure of significance may be a quantitative measure, such as an activity, or a binary measure of significance, such as the inclusion or exclusion of specific chemical structures/substructures.
  • drugs identified as being associated with an event may be inspected for biological relevance with respect to the event. In this manner, identified drugs that do not exhibit biological relevance may be excluded from the screened drugs.
  • Fig. 3 is a flowchart illustrating an example method for screening drugs based on adverse events.
  • the following figures illustrate diagrams useful for illustrating the example method shown in Fig. 3 :
  • Figs. 4A and 4B illustrate respective partial screen layouts 402A, 402B of parameter input screen layout 400 for selecting predictors;
  • Fig. 5 is a decision tree diagram 500 illustrating presentation of statistical associations between predictors and to adverse events;
  • Fig. 6A is a scatter plot of drugs associated with a particular adverse event, tardive dyskinesia, as a function of a quantitative activity of a dopamine receptor, illustrating another presentation of statistical associations between a predictor (the dopamine receptor) and an adverse event (tardive dyskinesia);
  • Fig. 6B is a portion of a heat map illustrating statistical associations between a plurality of predictors and a plurality of events, according to another example embodiment of the present invention.
  • predictors are selected, for example, by user interface 120
  • predictors may include biochemical assay results, biochemical or pharmacological properties, chemical structure or substructure-based descriptors and other modeled physical or biochemical properties of compounds.
  • Partial input screen layout 402A includes a feature quantity parameter 404 for selecting a feature in a percentage of compounds, molecular fragments section 406 for selecting molecular fragments such as molecular substructures , and chemical descriptors section 408.
  • Parameter selection 410 represents one or more calculated or modeled physical or biochemical properties.
  • Parameter selection 412 (Fig. 4B) represents a source of target assay data.
  • Parameter selection 414 represents a source of data for P450 enzyme inhibition and enzyme substrates.
  • Parameter selection 416 represents a chemical descriptor that includes a combination of structural and electronic character (i.e.
  • the electrotopological state is associated with contributions of individual electronegativies of atoms in a molecule or fragment, the valence (or bonding) of those atoms and the electronegativies of the other atoms in the molecule or fragment
  • the chemical descriptor is described as a continuous numeric value.
  • an adverse event is selected, for example, tardive dyskinesia, a neurological disorder.
  • steps 300 and 302 are illustrated as being performed sequentially, it is contemplated that steps 300 and 302 may be performed in a reverse order or concurrently.
  • a statistical modeling method may be selected, such as recursive partitioning.
  • data aggregation may be performed.
  • a common biological or pharmacological target assay may have been processed in multiple assays that measure a same mode of interaction (i.e. inhibition) but with different assay technologies.
  • target refers to proteins, or more specifically, a gene product, or complex of gene products which are thought to have potential therapeutic benefit if their activity, function, or state can be modulated (e.g., inhibited, activated, increased or decreased) by a candidate drug molecule. Not all proteins produced in the human proteome (the protein equivalent of the genome) are thought to have potential for a therapeutically beneficial modulation.
  • the assays can use either isolated and highly purified forms of the proteins, or can use engineered cellular systems which enable the protein to be expressed and to be studied in a cellular context. This cellular context often allows for the presence of the other proteins of cellular components that the biological function of the target naturally uses, while still utilizing a engineered method of measuring a response resulting from the interaction of a small molecule specifically with the target.
  • sparsely populated adverse event data may be filtered, for example, by pre-processing filter 108 (Fig. 1).
  • the sparsely populated data may be filtered to optimize computational efficiency and to avoid identifying biologically implausible associations.
  • columns of predictor values may be excluded that include fewer than about 5 compounds, where a difference between minimum pXC 50 values in the column and a maximum pXC 50 value in the column is less than about 0.5.
  • a standard deviation of the pXC 50 values in a column is less than about 0.1 and a maximum pXC 50 value in a column is less than about 5.
  • steps 306 and 308 are illustrated as being performed in order, it is understood that steps 306 and 308 may be performed in a reverse order or concurrently.
  • step 310 statistical modeling is applied to determine one or more statistical associations between the predictors selected at step 300 and the adverse
  • step 312 drugs are identified that are associated with the statistically significant predictors.
  • steps 310 and 312 are illustrated as being performed sequentially, it is understood that steps 310 and 312 may be performed concurrently.
  • visualization filtering may be applied, for example, by visualization filter 114 (Fig. 1).
  • the generated statistical model may be interpreted by suitable skilled persons in biological, pharmacological, medical and chemical sciences to understand the relationships in the larger context of medicine and drug discovery, and to verify that the relationships and descriptors identified are meaningful and reasonable. The level and type of interpretation possible may depend on the statistical method that is used. For example, some methods may be evaluated via performance metrics such as predictive ability, whereas others may be evaluated, for example, graphically in more detail.
  • An analysis of the modeling results may include considerations of model quality and robustness, as well as an inspection of the suggested relationships and the predictors identified by the pharmacological and biological implications of the models. Suggested statistical associations may be verified such that the relationships are not unduly influenced by artifacts in the data set. In cases where chemical descriptors are used, for example, they may be verified to identify parts of the molecules which can reasonably be expected to contribute to the chemical and pharmacological properties of the molecules. In the cases where measurements from biochemical assays are used as predictors, other considerations may be used.
  • assay results are not an intrinsic property of the molecules.
  • not all molecules may be tested in all assays, such that there may be a degree of sparseness to the data. Therefore, the biological activity predictors that are suggested by a statistical model may be verified to identify meaningful predictors. The verification may determine whether the predictors have been selected simply because a preponderance of the molecules in the active subset have been tested in one of the assays, whereas on average the data for that assay is sparse.
  • a decision tree diagram 500 is shown as an example of presenting drug screening results for an adverse event, such as an adverse event selected at step 302.
  • Parent node 502 includes all drugs 504 in a training or analysis set. Drugs 504 may be associated with different predicator variables that are illustrated in this example with different shapes and patterns.
  • the statistical modeling processing identifies drugs 504 that have a higher quantitative score (for example EBGM) for a particular adverse event and a statistical association with one or more predictors (such as a biological target activity, a chemical descriptor or substructure) that are most strongly associated with the quantitative score for this adverse event.
  • EBGM quantitative score
  • predictors such as a biological target activity, a chemical descriptor or substructure
  • decision tree diagram 500 four drugs 504a-504d are identified in node 506 at split 518 for condition 514 as being associated with a specific predictor (a "triangle" feature).
  • the remaining drugs shown in node 508 are also processed to identify other features in the remaining drugs that are associated with an adverse event.
  • decision tree diagram 500 three drugs 504e-504g are identified in node 510 at split 520 for condition 516 as being associated with predictors of the adverse event.
  • the remaining drugs in node 512 may have a low association with predictors of an adverse event.
  • scatter plot 600 may also be used to illustrate statistical associations between predictors and an adverse event.
  • Scatter plot 600 illustrates an association between tardive dyskinesia and the dopamine-2 receptor subtype (D2) antagonism.
  • Drugs 602 are plotted according to an EBGM score for tardive dyskinesia (y-axis) and dopamine D2 receptor activity (x-axis), based on the statistical modeling processing as described above.
  • scatter plot 600 may provide a simple graphical approach that displays an adverse event as a function of predictor variables for all of the drugs in a dataset. For example, the results may be interpreted to determine a multitude of factors that may influence the level of adverse events.
  • heat map 640 (of which a portion of which is shown in Fig. 6B) may also be used to illustrate statistical associations between a plurality of predictors and a plurality of events, according to another embodiment of the present invention.
  • Heat map 640 includes adverse events along the x-axis, predictors, such as biological assays, on the y-axis, a grid 642 indicating any statistical associations between predictors and adverse events, and a measure of significance 644.
  • Grid 642 presents the measure of significance between an adverse event and the predictors. In this manner, any patterns may be observed for a number of adverse events and a number of predictors.
  • heat map 640 indicates a pattern 646 between several target assays and several adverse events.
  • Heat map 640 may be generated using pairwise methods such as a Pearson's correlation, a Chi squared test, a Fishers exact test, and the like, to determine statistical associations between a plurality of predictors and a plurality of events.
  • statistical modeling processor 116 may compare each column of predictor values (for example, assay data) to each column of quantitative measures of events (for example, EBGM values) for a list of drugs, in order to find statistical relationships between the predictors and the quantitative measures of events, with no a priori assumptions regarding relationships between the predictors and the events.
  • the identified statistical associations may be provided as a list according to a pairwise relationship with a measure of significance 644 (such as a score, a p-value, an r value (i.e. a Pearson's product-moment correlation coefficient), etc).
  • the measure of significance 644 is used to sort, filter and interrogate the resulting identified statistical associations.
  • the list may be converted into a matrix format, as shown in heat map 640, with events along the x-axis and predictors (such as assays on the y-axis). Alternatively, the predictors may be provided along the x- axis and the events may be provided along the y-axis.
  • the statistical associations are determined and populated in grid 642 with the measure of significance 644 (presented, for example, according to color) in the intersecting cells. In this manner, an all-vs.-all view of the identified statistical associations is provided and may be used to identify patterns in the statistical associations.
  • at least one of the prediction results and the generated statistical model may be stored, for example in modeling results database 116 (Fig. 1).
  • the generated statistical model may be applied to other similar adverse events (i.e. for events in the same database, such as events database 110).
  • prediction results may be compared with different adverse events databases, such as external database 122 (Fig. 1), using the generated statistical model stored in modeling results database 116.
  • a node purity and the predictability of a model may be adversely affected by drugs that have a predictive feature but do not exhibit a clinical toxicity for reasons related, for example, to absorption, indication, an amount of use and a route of administration.
  • drugs that have a predictive feature but do not exhibit a clinical toxicity for reasons related, for example, to absorption, indication, an amount of use and a route of administration.
  • a drug that has HERG binding which is used only as an occasional topical ointment may not be likely to have cardiac toxicity, and would therefore lower the statistical association of the toxicity response with the predictor.
  • adverse event scores may be confounded by underlying conditions in the patient.
  • seizure patients may report seizures when being treated with anti-convulsant drugs, and these seizures may represent a baseline disease rather than adverse events.
  • statistical modeling according to present invention may be more robust when used to analyze toxicities that are typically drug related (such as torsades de pointe, tardive dyskinesia, etc.).
  • the predictor variable may show an association as opposed to a causation for a toxicity. If all drugs in the analysis have high affinity for D2, for example, the drugs may also have another property that is highly correlated with D2 in the drugs analyzed (a colinearity with a similar but distinct receptor, i.e. the dopamine-3 receptor subtype (D3)), so that it may be difficult to distinguish which of the two variables is more predictive. Additionally, D2 may be associated with another variable that has not been measured or included in the model and thus may represent a surrogate for the other predictor.
  • D3 dopamine-3 receptor subtype
  • statistical modeling has been done on between about 1000 and 2000 marketed drugs.
  • the data derived from these molecules may not be generalizable to all molecules, because it is taken from pools of compounds that are relatively safe, as they have been approved for use in humans.
  • recursive partitioning was used to demonstrate the modeling or predictor and response variables in a decision-tree format (such as shown in Fig. 5).
  • the recursive partitioning processing identifies drugs that have a higher EBGM score for a particular adverse event and a statistical association with a predictor variable (such as a biological target activity, a chemical predictor or a substructure) and that are most strongly associated with the EBGM for this adverse event.
  • the EBGM score in each node represent an average EBGM for all compounds in the respective node.
  • the analysis may yield a result where several different predictor variables may partition the data in a similar way, such as to yield a similar result at a given decision point (i.e., a split such as split 518 or split 520).
  • Decision points where multiple predictors yield a similar solution may be referred to as having primary and surrogate splits. In some instances, the multiple solutions may provide similar results.
  • the primary split is typically the split that has a highest statistical significance, but the surrogate splits may have similar statistical significance.
  • the primary split is displayed. If the predictor (for example the "triangle" predictor) shown as the primary split is biologically or chemically relevant and interpretable with respect to the model, then the primary split is displayed. If the primary split is not scientifically relevant (for example, a receptor binding that has a very low affinity or a chemical structure that is not pharmacologically meaningful), then the secondary splits are inspected in rank order of significance. Accordingly, the predictor that is most statistically significant, most scientifically relevant to the model and which is present in a majority of molecules within the node may be displayed. By using this approach, both the statistical and biological relevance of the model may be optimized.
  • PB_H ⁇ NRPORTBL ⁇ RP ⁇ PCBOCCELLA ⁇ 262570_1.
  • Fig. 7 is a decision tree diagram 700 that illustrates an example of a statistical association between the Human Ether a Go-Go (HERG) cellular ion channel
  • condition 708 described at the each split (i.e. split 710) indicating a false statement is represented by drugs in a node to the left of split 710 (i.e. node 706).
  • a true statement for condition 708 is represented by the drugs in node 704 to is the right of split 710.
  • the predictor(s) indicated by condition 708 i.e. HERG are determined to provide the most statistically significant partitioning of the drugs (for example, in parent node 702) in terms of the difference in the response variable (i.e. the quantitative measure EBGM of an adverse event) between the set of drugs that are part of node 704, and the remaining drugs that are part of node 706.
  • Figs. 7-10 are determined to provide the most statistically significant partitioning of the drugs (for example, in parent node 702) in terms of the difference in the response variable (i.e. the quantitative measure EBGM of an adverse event) between the set of drugs that are part of node 704, and the remaining drugs that are part of node 706.
  • n represents a number of drugs in a node.
  • Figs. 7-10 substantially significant relationships between predictors and an adverse event are illustrated by the average EBGM score of the drugs in a node (for example, nodes 704, 706).
  • the significant relationships may also be emphasized by the presentation of the nodes, such as by color, line style and/or fill of the node.
  • a thicker solid line for a node (such as node 804 in Fig. 8) represents a very significant relationship and a thicker dashed line for a node (such as node 816 in Fig. 8) represents a significant relationship.
  • the line style may be determined, for example, based on the average EBGM score for nodes 804 and 816 as compared with nodes that do not have significant relationships for the adverse event (i.e. node 806)
  • torsades de pointes was analyzed as the adverse event. Torsades de pointes is a medically life-threatening cardiac arrhythmia that is associated with the use of certain medications.
  • parent node 702 includes a set of 1337 drugs that were analyzed from the FDA-AERS database to determine whether any of the pharmacology as measured by in vitro target assays or metabolizing enzyme activities were associated with torsades de pointes. The results from about 450 macro-molecular target assays (using standard functional or receptor binding assays and enzyme activity assays) and more than 150 metabolizing enzymes (for example, see the website druginteractioninfo.org) were included in the model to identify one or more predictive targets.
  • Parent node 702 contains 1337 marketed drugs, whose average EBGM is 1.4. Of the hundreds of potential predictors, only one target was identified as being predictive of torsades de pointes.
  • the target identified by the model was the HERG cellular ion channel, shown in condition 708 (i.e. "HERG/ERG, pIC 50 >6.41").
  • Decision tree diagram 700 includes a quantitative measure of significance for HERG, represented by the pIC 50 value shown in condition 708.
  • the modeling algorithm identified 1324 drugs in node 706 that bind to HERG with a pIC 50 of less than 6.41 as having lower risk of torsades.
  • the modeling algorithm also identified 13 drugs in node 704 with a pIC 50 of greater than or equal to 6.41 as having higher risk of torsades de pointes.
  • the algorithm identified drugs with an affinity for HERG greater than or equal to 6.41 as having an approximately 12.5-fold increased risk reporting for torsades de pointes compared to 1337 drugs in node 702 (with an average EBGM of 16.2 for the 13 compounds in node 704).
  • This model provides a validation of the approach illustrated in the present invention, because drugs that have HERG inhibitory activity are well known to be strongly associated with torsades de pointes.
  • the ability to link in vitro binding of HERG to the human clinical toxicity represents an important advance in predictive toxicology because (1) it directly links laboratory data to clinical outcomes and (2) it enables the skilled person to relate (both associatively and quantitatively) the degree of in vitro binding activity to a clinical outcome risk (based on the pIC 50 value 6.41 shown in condition 708).
  • the modeling process and presentation provided by present invention thus, may allow for a more informed selection of molecules in drug development, because it provides a way to estimate a
  • cutpoint for human risk in a clinical setting. For example, if a candidate drug has a pIC 50 of 5.5 for HERG, it may have previously been discarded because it interacted with HERG.
  • the ability to model the clinical effects of HERG using a variety of drug and toxicity responses, may modify the decision making process, for example, because the model suggests that drugs for torsades de pointe, with a pIC 50 of 5.5 is likely to have a markedly lower human risk.
  • This model may provide a proof-of-concept for the present invention, as HERG is a well-established biomarker for drugs with an increased risk of torsades de pointes.
  • Decision tree diagram 700 indicates that compounds with a pIC 50 for HERG of greater than or equal to 6.41 had approximately a 12.5 fold higher risk of torsades de pointes compared to compounds where the pIC50 was less than 6.41.
  • a percentage (%) variance i.e. a measure of how much of the overall variance in the results is accounted for by the model) explained by this single predictor was about 4.2%.
  • Fig. 8 is a decision tree diagram 800 that illustrates an example of determined statistical associations between the molecular markers or targets associated for a serious drug-induced adverse event known as tardive dyskinesia.
  • Tardive dyskinesia is a potentially disabling and irreversible neurological syndrome that is associated with chronic use of anti-schizophrenia medications. Although the cause of tardive dyskinesia is not fully understood, it is believed to be related to chronic suppression of the dopamine brain pathway, by anti-schizophrenia medications.
  • Parent node 802 includes 1274 drugs having an average EBGM of 1.1.
  • the modeling processing such as by statistical modeling processor 112 (Fig. 1), a substantially very significant relationship was determined for the dopamine (D2) receptor with a pIC 50 of greater than or equal to 7.75 (condition 820) and indicated by node 804.
  • Another substantially significant relationship was determined for the dopamine (D3) receptor with a pIC 50 of greater than or equal to 6.52 (condition 824) and indicated by node 812.
  • a further substantially significant relationship was determined for the alpha-IB adrenergic functional assay with a pKi of greater than or equal to 7.33 (condition 826) and indicated by node 816.
  • Ki represents an experimentally determined equilibrium dissociation constant for a molecule (typically a small drug-like molecule) binding to proteins like enzymes or receptors.
  • PB_H ⁇ NRPORTBL ⁇ RP ⁇ PCBOCCELLA ⁇ 262570_1.
  • pKi represents the log base 10 of Ki, and which is a quantity generally used for structure activity relationship analyses.
  • the % variance explained for the recursive partitioning model was 32.2%.
  • the recursive partitioning determined that there is a strong association (condition 820) between high EBGM's (high toxicity) for tardive dyskinesia and drugs that have a relatively high affinity (i.e. that measure a pIC 50 ⁇ 7.55) for the dopamine receptor (dopamine subtype D2 receptor).
  • high EBGM's high toxicity
  • drugs that have a relatively high affinity i.e. that measure a pIC 50 ⁇ 7.55
  • the degree of affinity may vary by a factor of 100.
  • a strong statistical relationship is shown between drugs that have a pIC 50 greater than or equal to 7.55 and tardive dyskinesia.
  • Node 808 contains 223 drugs which are predicted to have good penetration of the blood brain barrier.
  • other descriptors were also identified as being associated with split 834 (i.e. surrogate splits) but were discounted and are not shown.
  • Node 816 contains 13 drugs with an EBGM of 4.6 that have good brain penetration.
  • the 13 drugs have an approximately three-fold increase in EBGM score for tardive dyskinesia, as compared with drugs in node 808.
  • the drugs in node 816 bind to the alpha IB receptor with an affinity of greater than or equal to 7.33 (with respect to the functional pKi). Accordingly, drugs in node 816 may be considered to be associated with tardive dyskinesia. Although this association is not as strong as that seen with dopamine receptors (conditions 820 and 824), it appears to be independent association.
  • alpha-1 receptor subtypes with tardive dyskinesia shown in decision tree diagram 800 has not been described in the literature and may represent a novel finding in the area of developing safer drugs for schizophrenia.
  • Rat data suggest that alpha-1 receptor subtypes and dopamine receptors are co-localized in the basal ganglia, a region of the brain associated with the control of locomotion and with tardive dyskinesia. Additional animal data suggests that the alpha-IB neurons modulate dopaminergic transmission in this region of the brain.
  • Similar results for the major predictive variables for tardive dyskinesia were obtained by running a second model, using random forest processing, involving biological target and physical properties data, along with imputation data for sparse or missing data.
  • Fig. 9 is a decision tree diagram 900 that illustrates an example of determined statistical associations between molecular substructures from a set of molecules (associated with chemical substructures) and a severe-rash hypersensitivity syndrome known as Stevens-Johnson syndrome.
  • a fragmentation algorithm of software program Molecular Substructure Miner (MoSS) is used for molecular substructure data mining.
  • MoSS software program Molecular Substructure Miner
  • the MoSS software is described at the website borgelt.net/doc/moss/moss.html and in the publication to Borgelt et al. entitled, "MoSS : A Program for Molecular Substructure Mining," in Workshop Open Source Data Mining Software, 2005. It is understood, however, that any suitable software program may be used to perform molecular substructure mining.
  • All 1458 drugs structures in the WHO database which met the minimum criteria in the preprocessing steps (step 306, 308 (Fig. 3)), were included in the statistical modeling process. Chemical fragments from about 5-20 atoms were also selected for all of the 14
  • the adverse event refers to a particular drug event pair, that a model describes, and which may be used as the specific response variable.
  • condition 920 describes the inclusion of fragments (i.e. moieties) I -VI, all of which are substructures of a more general para-amino benzenesulfonyl substructure.
  • Moieties I-VI include:
  • PB_H ⁇ N RPORTBL ⁇ RP ⁇ PCBOCCELLA ⁇ 262570_1.
  • condition 922 describes the inclusion of fragments (i.e. moieties) VII and VIII, which are substructures of a more general benzylic amine structure.
  • Moieties VII and VIII include:
  • Fragments I-VI are substructures derived from the surrogate split 928. Fragment VII and VIII are substructures derived from the surrogate split 930. The % variance explained for the determined statistical model was 7.2%.
  • condition 920 representing moieties I-VI indicates that node 904 is strongly associated with Stevens-Johnson syndrome and includes drugs with an average EBGM of 2.3.
  • condition 922 representing moieties VII and VIII indicates that node 908 is strongly associated with Stevens-Johnson syndrome and includes drugs with an average EBGM of 2.0.
  • conditions 920, 922 represent the inclusion of any one of the fragments (i.e. a logical or operation).
  • the surrogate splits are rendered sequentially in an order according to the most frequently occurring and statistically significant predictors.
  • the first listed predictor (such as the first predictor in condition 922) is the most significant predictor
  • the remaining predictors in the condition 920, 922 are surrogate splits, and may or may not be present in all of the molecules in a daughter node, but would otherwise give rise to the same split and node population, aside from any cases of molecules missing in the predictor.
  • the predictors shown in the conditions are often correlated, although the statistics are typically not correlated, because the predictor may not be present in all molecules.
  • Fig. 9 illustrates the use of system 100 as a tool for identifying chemical structures that may have an increased risk of toxicity. Such information may be helpful when trying to analyze structure-toxicity (i.e. function) relationships and the associations may be helpful in the design process for new drugs.
  • Fig. 10 is a decision tree diagram 1000 that illustrates an example of determined statistical associations between specific chemical structural features (i.e., substructures within molecules) and an increased risk of a particular adverse event.
  • a series of predictors including: anilines, amines, sulfonamides, sulfones, and sulfonyl urea substructures were selected as potential explanatory variables for a toxicity known as methemoglobinemia.
  • Table 2 illustrates selected parameters including the predictors for methemoglobinemia.
  • PB_H ⁇ NRPORTBL ⁇ RP ⁇ PCBOCCELLA ⁇ 262570_1.
  • Aromatic primary amine generic Benzylic acid-NSAIDs Benzylic amine Secondary amine Secondary aniline Secondary aniline alkyl Secondary aniline amides Secondary sulfonamide Secondary sulfonamide aniline Secondary, diaromatic anilines Secondary, primary anilines Sulfonamide Sulfone Sulfonyl Urea Tert aniline Tertiary aniline alkyl Tertiary diaromatic amine Tertiary aniline amides Tertiary aniline sulfonamides
  • PB_H ⁇ N RPORTBL ⁇ RP ⁇ PCBOCCELLA ⁇ 262570_1.
  • Methemoglobinemia is a severe blood disorder where cellular hemoglobin is altered so that is cannot bind to oxygen. In this model, there was a very slight average increase in methemoglobinemia for compounds containing anilines (of EBGM 0.79 shown in node 1004 as compared to an EBGM of 0.63 in node 1006). A very low percentage, however, of all aniline containing drugs (about 7.5%) of 12 out of 159 compounds had an increased score for methemoglobinemia. Although aniline itself may cause hemoglobin toxicity, it appears that its incorporation into drug molecules is not associated with a significant risk for this adverse event.
  • system 100 may be used to interrogate the clinical impact of substructures that may be of concern (based on anecdotal evidence).
  • System 100 may objectively determine statistical associations between predictors and events that considers structural and clinical evidence, by using statistical model processing and one or more databases that includes vast clinical experience over a variety of drug structures. Accordingly, system 100 may be used to exonerate a particular substructure from being excluded in drug design (previously based on anecdotal evidence). As another example, system 100 may be used identify compounds having a particular fragment and determine which toxicities a particular substructure is most strongly associated with by using a correlation- or regression- based analysis.
  • one or more components may be implemented in software on microprocessors/general purpose computers (not shown).
  • one or more of the functions of the various components may be implemented in software that controls a general purpose computer.
  • This software may be embodied in a computer readable medium, for example, a magnetic or optical disk, a memory-card.

Landscapes

  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Chemical & Material Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

L'invention concerne des procédés et des systèmes de criblage de médicaments pour des variables prédictives d'événements mesurés quantitativement. Au moins une variable prédictive d'un événement est sélectionnée et des données pour une pluralité de médicaments sont récupérées. Les données pour chacun des médicaments ont une mesure quantitative associée de l'événement. Une ou plusieurs relations entre la variable prédictive sélectionnée et l'événement sont analysées statistiquement pour déterminer une ou plusieurs associations statistiques parmi les relations, en utilisant la mesure quantitative de l'événement pour chacun parmi la pluralité de médicaments. Les associations statistiques sont déterminées sans aucune association a priori entre la variable prédictive et l'événement. Les associations statistiques sont présentées et comprennent la présentation d'une mesure de signification statistique.
PCT/US2009/036752 2008-03-11 2009-03-11 Procédé et appareil de criblage de médicaments pour des variables prédictives d'événements mesurés quantitativement WO2009114591A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US3560908P 2008-03-11 2008-03-11
US61/035,609 2008-03-11

Publications (1)

Publication Number Publication Date
WO2009114591A1 true WO2009114591A1 (fr) 2009-09-17

Family

ID=41065558

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2009/036752 WO2009114591A1 (fr) 2008-03-11 2009-03-11 Procédé et appareil de criblage de médicaments pour des variables prédictives d'événements mesurés quantitativement

Country Status (1)

Country Link
WO (1) WO2009114591A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013163348A1 (fr) * 2012-04-24 2013-10-31 Laboratory Corporation Of America Holdings Procédés et systèmes d'identification d'un site de liaison protéique
CN109214282A (zh) * 2018-08-01 2019-01-15 中南民族大学 一种基于神经网络的三维手势关键点检测方法和系统
CN112903625A (zh) * 2021-01-25 2021-06-04 北京工业大学 基于偏最小二乘法分析药物中活性物质含量的集成参数优化建模方法
US11139051B2 (en) 2018-10-02 2021-10-05 Origent Data Sciences, Inc. Systems and methods for designing clinical trials
US20210327543A1 (en) * 2018-07-27 2021-10-21 Karydo Therapeutix, Inc. Artificial Intelligence Model for Predicting Actions of Test Substance in Humans
CN113628697A (zh) * 2021-07-28 2021-11-09 上海基绪康生物科技有限公司 一种针对分类不平衡数据优化的随机森林模型训练方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040171056A1 (en) * 1999-02-22 2004-09-02 Variagenics, Inc., A Delaware Corporation Gene sequence variations with utility in determining the treatment of disease, in genes relating to drug processing
US20050148040A1 (en) * 2003-09-23 2005-07-07 Thadhani Ravi I. Screening for gestational disorders
US20070059685A1 (en) * 2005-06-03 2007-03-15 Kohne David E Method for producing improved results for applications which directly or indirectly utilize gene expression assay results
US20080050370A1 (en) * 2006-03-17 2008-02-28 Scott Glaser Stabilized polypeptide compositions

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040171056A1 (en) * 1999-02-22 2004-09-02 Variagenics, Inc., A Delaware Corporation Gene sequence variations with utility in determining the treatment of disease, in genes relating to drug processing
US20050148040A1 (en) * 2003-09-23 2005-07-07 Thadhani Ravi I. Screening for gestational disorders
US20070059685A1 (en) * 2005-06-03 2007-03-15 Kohne David E Method for producing improved results for applications which directly or indirectly utilize gene expression assay results
US20080050370A1 (en) * 2006-03-17 2008-02-28 Scott Glaser Stabilized polypeptide compositions

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013163348A1 (fr) * 2012-04-24 2013-10-31 Laboratory Corporation Of America Holdings Procédés et systèmes d'identification d'un site de liaison protéique
US20210327543A1 (en) * 2018-07-27 2021-10-21 Karydo Therapeutix, Inc. Artificial Intelligence Model for Predicting Actions of Test Substance in Humans
US11676684B2 (en) * 2018-07-27 2023-06-13 Karydo Therapeutix, Inc. Artificial intelligence model for predicting actions of test substance in humans
CN109214282A (zh) * 2018-08-01 2019-01-15 中南民族大学 一种基于神经网络的三维手势关键点检测方法和系统
CN109214282B (zh) * 2018-08-01 2019-04-26 中南民族大学 一种基于神经网络的三维手势关键点检测方法和系统
US11139051B2 (en) 2018-10-02 2021-10-05 Origent Data Sciences, Inc. Systems and methods for designing clinical trials
US12002553B2 (en) 2018-10-02 2024-06-04 Origent Data Sciences, Inc. Systems and methods for designing clinical trials
CN112903625A (zh) * 2021-01-25 2021-06-04 北京工业大学 基于偏最小二乘法分析药物中活性物质含量的集成参数优化建模方法
CN112903625B (zh) * 2021-01-25 2024-01-19 北京工业大学 基于偏最小二乘法分析药物中活性物质含量的集成参数优化建模方法
CN113628697A (zh) * 2021-07-28 2021-11-09 上海基绪康生物科技有限公司 一种针对分类不平衡数据优化的随机森林模型训练方法

Similar Documents

Publication Publication Date Title
Camargo et al. Adverse drug reactions: a cohort study in internal medicine units at a university hospital
Ponzoni et al. QSAR classification models for predicting the activity of inhibitors of beta-secretase (BACE1) associated with Alzheimer’s disease
Wu et al. Study of serious adverse drug reactions using FDA-approved drug labeling and MedDRA
US8175816B2 (en) System and method for analyzing metabolomic data
Ferguson et al. Derivation of multivariate syndromic outcome metrics for consistent testing across multiple models of cervical spinal cord injury in rats
US7433787B2 (en) System, method, and computer program product using a database in a computing system to compile and compare metabolomic data obtained from a plurality of samples
KR101450784B1 (ko) 전자의무기록과 약물/질환 네트워크 정보 기반의 신약 재창출 후보 예측 방법
CN109815532B (zh) 一种内分泌干扰物高通量筛选的方法
WO2009114591A1 (fr) Procédé et appareil de criblage de médicaments pour des variables prédictives d'événements mesurés quantitativement
EP1817709A2 (fr) Procede, systeme et logiciel d'analyse de donnees de pharmacovigilance
Métivier et al. Discovering structural alerts for mutagenicity using stable emerging molecular patterns
WO2016112025A1 (fr) Système et procédé d'exploration de données dans de très grandes bases de données de médicaments et d'effets cliniques
US20180276340A1 (en) System and method for drug target and biomarker discovery and diagnosis using a multidimensional multiscale module map
Duran-Frigola et al. A chemo-centric view of human health and disease
US11481701B2 (en) Computer-based dynamic data analysis
McMullen et al. Addressing systematic inconsistencies between in vitro and in vivo transcriptomic mode of action signatures
AbdulHameed et al. ToxProfiler: toxicity-target profiler based on chemical similarity
Di Lena et al. NET-GE: a novel NETwork-based Gene Enrichment for detecting biological processes associated to Mendelian diseases
US20090099784A1 (en) Software assisted methods for probing the biochemical basis of biological states
He et al. The use of artificial intelligence in the treatment of rare diseases: A scoping review
Zamora et al. Characterizing chronic disease and polymedication prescription patterns from electronic health records
Niemantsverdriet et al. Added diagnostic value of routinely measured hematology variables in diagnosing immune checkpoint inhibitor mediated toxicity in the emergency department
James et al. Machine Learning Applied to Routine Blood Tests and Clinical Metadata to Identify and Classify Heart failure
Waghmare et al. Analytical study using data mining for periodical medical examination of employees
Sevilla-Villanueva A methodology for pre-post intervention studies: An application for a nutritional case study

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09719256

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2010550831

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: JP

122 Ep: pct application non-entry in european phase

Ref document number: 09719256

Country of ref document: EP

Kind code of ref document: A1