CN117916392A - Methods and systems for therapy monitoring and trial design - Google Patents

Methods and systems for therapy monitoring and trial design Download PDF

Info

Publication number
CN117916392A
CN117916392A CN202280057506.9A CN202280057506A CN117916392A CN 117916392 A CN117916392 A CN 117916392A CN 202280057506 A CN202280057506 A CN 202280057506A CN 117916392 A CN117916392 A CN 117916392A
Authority
CN
China
Prior art keywords
disease
gene expression
subject
therapy
genes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280057506.9A
Other languages
Chinese (zh)
Inventor
苏珊·吉亚西安
维亚切斯拉夫·R·埃克麦弗
伊凡·沃伊塔罗夫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Saifu Pharmaceutical Co
Original Assignee
Saifu Pharmaceutical Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Saifu Pharmaceutical Co filed Critical Saifu Pharmaceutical Co
Priority claimed from PCT/US2022/034375 external-priority patent/WO2022271724A1/en
Publication of CN117916392A publication Critical patent/CN117916392A/en
Pending legal-status Critical Current

Links

Abstract

Methods and systems for identifying disease gene expression signatures are described that are determined to restore disease gene expression signatures of subjects with disease to non-disease expression signatures (e.g., gene expression of non-diseased subjects). Also provided herein are methods of designing a study (e.g., a clinical trial) comprising identifying a diseased subject that exhibits a quantifiable change in gene expression of the disease gene expression signature to a non-diseased subject.

Description

Methods and systems for therapy monitoring and trial design
Cross Reference to Related Applications
The present application claims the benefit of U.S. provisional application No. 63/213,431 filed on 22 th 6 months of 2021 and U.S. provisional application No. 63/329,008 filed on 8 th 4 months of 2022, each of which is incorporated herein by reference in its entirety.
Background
The therapeutic response of many complex diseases may remain unintelligible to researchers and practitioners. A single stratification factor or biomarker may not be sufficient to determine whether a therapy is effective in treating a particular patient. In contrast, many diseases, such as autoimmune diseases, cancers, etc., affect many biological subsystems. (see, e.g., frohlich et al, BMC Med,16,150:1122-1127 (2018), which is incorporated herein by reference for all purposes). Methods for identifying the responsiveness of patient treatment (e.g., trial and error methods) can be costly and present risks of adverse side effects, potential disease progression, and appropriate treatment delays. (see, e.g., mathur & Sutton, biomed. Rep.,7:3-5 (2017), which is incorporated herein by reference for all purposes).
Furthermore, for subjects who are only exposed to dangerous side effects without any benefit from therapy, the inability to determine individual response and therapy efficacy in clinical trials can be costly and dangerous. Once subjects are identified on an individual basis as non-responders to a particular therapy, they can be removed from the clinical trial, but delays in such removal can increase the risk of serious side effects, as well as time and cost losses for the study responsible. Predicting which subjects may or may not respond early in the clinical trial may be challenging, or may not always be possible if there are no predictive biomarkers available. Rather, changes in clinical characteristics can be used to determine whether a subject is responding to therapy, but such changes can take time and are subjective in nature, especially with respect to changes that are marked by self-assessment of the individual subject.
Disclosure of Invention
To date, many methods of determining the suitability of a therapy for a particular subject may rely on trying multiple therapies in an attempt to measure the responsiveness of a patient's response by assessing clinical characteristics. These methods may delay the necessary treatment and may falsely characterize the actual responsiveness of the patient's therapy by examining only the clinical features of the response. Thus, there is a need for methods and systems that provide patients with personalized therapies that reliably quantify responsiveness to therapy.
The present disclosure provides methods and systems that cover the insight of treating patients at the molecular level, e.g., proactively providing a treatment that converts a subset of gene expression profiles from a diseased subject to a similar one to a healthy subject, possibly better metrics for assessing drug molecular response and identifying effective therapies than by reactive methods or searching for single universal biomarkers. The techniques provided allow the provider to identify particular treatment methods and patterns that may be applicable to this particular patient, and allow the provider to monitor disease progression and treatment response, without relying on subjective measures, such as clinical features or patient self-assessment, among others. In some embodiments, a change in certain gene expression patterns in a patient suffering from a disease is indicative of a response to therapy, and a reversal in gene expression of this gene expression pattern in the patient suffering from the disease is indicative of an improvement in the health of the patient suffering from the disease ("disease gene expression signature"). This approach differs from other approaches that compare differences in gene expression (e.g., intra-group checks) between patients with disease to identify whether the patient has a biomarker or expression profile that is indicative of a response to therapy as compared to other patients that do not have the biomarker or expression profile.
In some embodiments, reversal of gene expression of some or all of the genes in the disease gene expression signature may result in gene expression in the diseased subject that is similar to a healthy control subject. A reversal of some or all (e.g., all or substantially all) of the gene expression within the disease gene expression signature may indicate disease regression, and the subject may return to a healthy state. In some embodiments, the reversal of the disease gene expression signature is achieved by therapy of one or more genes that modulate the disease gene expression signature.
In some embodiments, disease gene expression signatures are identified using a machine learning algorithm that identifies genes differentially expressed between diseased subjects, subsets of diseased subjects, and healthy subjects in a significant manner. Furthermore, the present disclosure provides methods and systems that cover the insight that certain genes within the gene expression profile of a disease subject produce potential therapeutic targets when compared to the gene expression profile of a healthy subject that are different from genes that are differentially expressed in the diseased subject compared to the healthy subject. That is, while other approaches focus on genes that are differentially expressed in diseased and healthy subjects, the methods and systems of the present disclosure can identify therapeutic targets that have significant linkages to (and thus affect) these differentially expressed genes, but which themselves may not be differentially expressed between diseased and healthy subjects.
In some embodiments, the present disclosure provides methods and systems that cover the insight that subjects suffering from a disease can be stratified into responders or non-responders to a particular therapy by analyzing changes in gene expression in the disease gene expression signature after administration of the therapy. Such changes may be observed faster than changes in clinical characteristics that may be used to determine responsiveness to therapy (e.g., non-responders may stop therapy or remove from clinical trials before losing excessive time and cost).
In one aspect, the present disclosure provides a method of determining a disease gene expression signature for quantifying responsiveness of a subject suffering from a disease, disorder or condition to therapy, the method comprising: receiving gene expression data from a group of subjects suffering from the same disease, disorder or condition; layering the group of subjects into two or more groups based at least in part on the gene expression data; calculating differences in gene expression between two or more groups of the subject and a group of non-diseased subjects; selecting one or more genes ("disease candidate genes") having a significant difference in gene expression between two or more groups of the subject and the group of non-diseased subjects; compiling a disease gene set comprising the disease candidate gene; and selecting at least a subset of the disease gene set, thereby determining the disease gene expression signature.
In some embodiments, the method further comprises mapping the disease candidate genes onto a biological network, and selecting adjacent genes on the biological network that have significant junctions with each other or with the disease candidate genes, wherein the disease gene set comprises the disease candidate genes and the adjacent genes. In some embodiments, the biological network comprises a set of human interactions. In some embodiments, the adjacent genes form a distinct sub-network with each other or with the disease candidate gene. In some embodiments, the adjacent genes are identified via a machine learning algorithm. In some embodiments, the machine learning algorithm includes random walk.
In some embodiments, the disease, disorder, or condition comprises ulcerative colitis, crohn's disease, rheumatoid arthritis, juvenile arthritis, psoriatic arthritis, plaque psoriasis, ankylosing spondylitis, guillain-Barre syndrome, sjogren's syndrome, scleroderma, vitiligo, bipolar disorder, graves ' disease, schizophrenia, alzheimer's disease, multiple sclerosis, parkinson's disease, or a combination thereof. In some embodiments, the disease, disorder, or condition comprises ulcerative colitis. In some embodiments, the disease, disorder, or condition comprises rheumatoid arthritis. In some embodiments, the disease, disorder, or condition comprises alzheimer's disease. In some embodiments, the disease, disorder, or condition comprises multiple sclerosis.
In some embodiments, the layering of the subject groups into two or more groups is random or based at least in part on whether a previous subject responded to the therapy. In some embodiments, the therapy comprises a member selected from table 1. In some embodiments, the therapy comprises anti-TNF therapy. In some embodiments, the subject group suffers from the same disease, disorder, or condition as the subject being evaluated for responsiveness to therapy. In some embodiments, the layering further comprises grouping objects from the same group with similar gene expression.
In some embodiments, the method further comprises training a machine learning classifier using the disease gene expression signature, wherein the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of a test subject suffering from the disease, disorder, or condition based at least in part on analyzing gene expression data of the test subject.
In some embodiments, the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with an accuracy of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95%. In some embodiments, the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a sensitivity of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%. In some embodiments, the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a specificity of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95%. In some embodiments, the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a positive predictive value of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%. In some embodiments, the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a negative predictive value of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%. In some embodiments, the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a true positive rate of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%. In some embodiments, the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a true negative rate of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%. In some embodiments, the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with an area under the curve (AUC) of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95%.
In some embodiments, the method further comprises administering a therapeutically effective amount of the therapy to the test subject when the trained machine learning classifier predicts that the test subject is responsive to the therapy. In some embodiments, the method further comprises administering to the test subject a therapeutically effective amount of a second therapy different from the therapy when the trained machine learning classifier predicts that the test subject is non-responsive to the therapy.
In another aspect, the present disclosure provides a method comprising administering to the test subject a therapeutically effective amount of (i) a therapy that predicts that the test subject is responsive to the therapy based at least in part on a trained machine learning classifier analyzing a disease gene expression signature, or (ii) a second therapy that is different from the therapy that predicts that the test subject is non-responsive to the therapy based at least in part on the trained machine learning classifier analyzing the disease gene expression signature, wherein the disease gene expression signature is determined at least in part by: receiving gene expression data from a group of subjects suffering from the disease, disorder or condition; layering the group of subjects into two or more groups based at least in part on the gene expression data; calculating differences in gene expression between two or more groups of the subject and a group of non-diseased subjects; selecting one or more genes ("disease candidate genes") having a significant difference in gene expression between two or more groups of the subject and the group of non-diseased subjects; compiling a disease gene set comprising the disease candidate gene; and selecting at least a subset of the disease gene set, thereby determining the disease gene expression signature.
In some embodiments, the disease gene expression signature is determined at least in part by: the disease candidate genes are further mapped onto a biological network and adjacent genes having significant junctions with each other or with the disease candidate genes are selected on the biological network, wherein the disease gene set comprises the disease candidate genes and the adjacent genes. In some embodiments, the biological network comprises a set of human interactions. In some embodiments, the adjacent genes form a distinct sub-network with each other or with the disease candidate gene. In some embodiments, the adjacent genes are identified via a machine learning algorithm. In some embodiments, the machine learning algorithm includes random walk.
In some embodiments, the disease, disorder, or condition comprises ulcerative colitis, crohn's disease, rheumatoid arthritis, juvenile arthritis, psoriatic arthritis, plaque psoriasis, ankylosing spondylitis, guillain-barre syndrome, sjogren's syndrome, scleroderma, vitiligo, bipolar disorder, graves ' disease, schizophrenia, alzheimer's disease, multiple sclerosis, parkinson's disease, or a combination thereof. In some embodiments, the disease, disorder, or condition comprises ulcerative colitis. In some embodiments, the disease, disorder, or condition comprises rheumatoid arthritis. In some embodiments, the disease, disorder, or condition comprises alzheimer's disease. In some embodiments, the disease, disorder, or condition comprises multiple sclerosis.
In some embodiments, the layering of the subject groups into two or more groups is random or based at least in part on whether a previous subject responded to the therapy. In some embodiments, the therapy comprises a member selected from table 1. In some embodiments, the therapy comprises anti-TNF therapy. In some embodiments, the layering further comprises grouping objects from the same group with similar gene expression.
In some embodiments, the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with an accuracy of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95%. In some embodiments, the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a sensitivity of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%. In some embodiments, the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a specificity of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95%. In some embodiments, the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a positive predictive value of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%. In some embodiments, the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a negative predictive value of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%. In some embodiments, the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a true positive rate of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%. In some embodiments, the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a true negative rate of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%. In some embodiments, the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with an area under the curve (AUC) of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95%.
In another aspect, the present disclosure provides a method of verifying a response to a therapy in a subject suffering from a disease, disorder, or condition, the method comprising: analyzing changes in disease gene expression signature in the subject after administration of the therapy, wherein the disease gene expression signature is determined to quantify responsiveness to the therapy.
In some embodiments, the disease gene expression signature is determined at least in part by: receiving gene expression data from a group of subjects suffering from the disease, disorder or condition; layering the group of subjects into two or more groups based at least in part on the gene expression data; calculating differences in gene expression between two or more groups of the subject and a group of non-diseased subjects; selecting one or more genes ("disease candidate genes") having a significant difference in gene expression between two or more groups of the subject and the group of non-diseased subjects; compiling a disease gene set comprising the disease candidate gene; and selecting at least a subset of the disease gene set, thereby determining the disease gene expression signature.
In another aspect, the present disclosure provides a method of monitoring the efficacy of a treatment of a subject suffering from a disease, disorder or condition, the method comprising monitoring for a change in a disease gene expression signature after administration of a therapy, wherein the disease gene expression signature is determined at least in part by: analyzing gene expression data from a group of subjects suffering from the same disease, disorder or condition as the subject; layering the group of subjects into two or more groups based on the gene expression data; determining differences in gene expression between two or more groups of the subject and a group of non-diseased subjects; selecting one or more genes ("disease candidate genes") having a significant difference in gene expression between two or more groups of the subject and the group of non-diseased subjects; compiling a disease gene set comprising the disease candidate gene; and selecting at least a subset of the disease gene set, thereby determining the disease gene expression signature.
In some embodiments, the disease gene expression signature is determined at least in part by: the disease candidate genes are further mapped onto a biological network and adjacent genes having significant junctions with each other or with the disease candidate genes are selected on the biological network, wherein the disease gene set comprises the disease candidate genes and the adjacent genes. In some embodiments, the biological network comprises a set of human interactions. In some embodiments, the adjacent genes form a distinct sub-network with each other or with the disease candidate gene. In some embodiments, the adjacent genes are selected by a machine learning process.
In some embodiments, the disease, disorder, or condition comprises ulcerative colitis, crohn's disease, rheumatoid arthritis, juvenile arthritis, psoriatic arthritis, plaque psoriasis, ankylosing spondylitis, guillain-barre syndrome, sjogren's syndrome, scleroderma, vitiligo, bipolar disorder, graves ' disease, schizophrenia, alzheimer's disease, multiple sclerosis, parkinson's disease, or a combination thereof. In some embodiments, the disease, disorder, or condition comprises ulcerative colitis. In some embodiments, the disease, disorder, or condition comprises rheumatoid arthritis. In some embodiments, the disease, disorder, or condition comprises alzheimer's disease. In some embodiments, the disease, disorder, or condition comprises multiple sclerosis.
In some embodiments, the layering of the subject groups into two or more groups is random or based at least in part on whether a previous subject responded to the therapy. In some embodiments, the therapy comprises a member selected from table 1. In some embodiments, the therapy comprises anti-TNF therapy. In some embodiments, the layering further comprises grouping objects from the same group with similar gene expression.
In some embodiments, the method further comprises selecting the test subject for clinical testing based at least in part on whether the disease gene expression signature of the test subject exhibits a quantifiable change in the disease gene expression signature to a non-diseased subject.
In another aspect, the present disclosure provides a method of identifying and selecting a subject for clinical trials comprising: receiving gene expression data of a group of subjects; analyzing the gene expression data to detect the presence of a disease gene expression signature; administering at least one dose of therapy to the subject group; identifying a change in the disease gene expression signature relative to gene expression in a non-diseased subject; and selecting a subject exhibiting a quantifiable change in gene expression of the disease gene expression signature to a healthy subject for the clinical trial, wherein the disease gene expression signature is determined by any of the methods provided herein.
In another aspect, the present disclosure provides a system comprising: a processor of a computing device; and a memory having instructions stored thereon, wherein the instructions, when executed by the processor, cause the processor to perform any of the methods provided herein.
In another aspect, the present disclosure provides a method of determining a disease gene expression signature for quantifying responsiveness of a subject suffering from a disease, disorder or condition to therapy, the method comprising: receiving gene expression data from a group of subjects having the same disease, disorder, or condition (e.g., having the same disease, disorder, or condition as the subject being evaluated for responsiveness to therapy); layering the group of subjects into two or more groups based on the gene expression data (e.g., grouping subjects from groups with similar gene expression); calculating differences in gene expression between two or more groups of the subject and a group of healthy subjects; selecting one or more genes ("disease candidate genes") having a significant difference in gene expression between two or more groups of the subject and the group of healthy subjects; mapping the disease candidate genes onto a biological network (e.g., a human interaction group); selecting adjacent genes (e.g., genes on adjacent nodes, e.g., on a human interaction panel) that have significant junctions to each other (e.g., forming a significant subnetwork) or to the disease candidate gene; compiling a disease gene list comprising the disease candidate gene and adjacent genes; selecting some or all of the genes from the list of disease genes, thereby providing the disease gene expression signature.
In some embodiments, the adjacent genes are identified via a machine learning algorithm.
In some embodiments, the machine learning process includes random walk.
In some embodiments, the disease, disorder, or condition comprises ulcerative colitis, crohn's disease, rheumatoid arthritis, juvenile arthritis, psoriatic arthritis, plaque psoriasis, ankylosing spondylitis, guillain-barre syndrome, sjogren's syndrome, scleroderma, vitiligo, bipolar disorder, graves ' disease, schizophrenia, alzheimer's disease, multiple sclerosis, parkinson's disease, or a combination thereof.
In some embodiments, the layering of the subject groups into two or more groups is random or based at least in part on whether a previous subject responded to the therapy.
In some embodiments, the therapy comprises a member selected from table 1.
In some embodiments, the therapy comprises anti-TNF therapy.
In another aspect, the present disclosure provides a method of verifying a response to a therapy in a subject suffering from a disease, disorder, or condition, the method comprising: analyzing changes in disease gene expression signature in the subject after administration of the therapy, wherein the disease gene expression signature is determined to quantify responsiveness to the therapy.
In some embodiments, the disease gene expression signature is derived by: receiving gene expression data from a group of subjects suffering from the same disease, disorder or condition as the subject; layering the group of subjects into two or more groups based on the gene expression data (e.g., grouping subjects from groups with similar gene expression); calculating differences in gene expression between two or more groups of the subject and a group of healthy subjects; selecting one or more genes ("disease candidate genes") having a significant difference in gene expression between two or more groups of the subject and the group of healthy subjects; mapping the disease candidate genes onto a biological network (e.g., a human interaction group); selecting adjacent genes (e.g., genes on adjacent nodes, e.g., on a human interaction panel) that have significant junctions with the disease candidate gene; compiling a disease gene list comprising the disease candidate gene and adjacent genes; selecting some or all of the genes from the list of disease genes, thereby providing the disease gene expression signature.
In another aspect the present disclosure provides a method of monitoring the efficacy of a treatment of a subject suffering from a disease, disorder or condition, the method comprising monitoring for a change in a disease gene expression signature after administration of a therapy, wherein the disease gene expression signature is derived by a process comprising: analyzing gene expression data from a group of subjects suffering from the same disease, disorder or condition as the subject; layering the group of subjects into two or more groups based on the gene expression data (e.g., grouping subjects from groups with similar gene expression); determining a difference in gene expression between two or more groups of the subject and a group of healthy subjects; selecting one or more genes ("disease candidate genes") having a significant difference in gene expression between two or more groups of the subject and the group of healthy subjects; mapping the disease candidate genes onto a biological network (e.g., a human interaction group); selecting adjacent genes (e.g., genes on adjacent nodes, e.g., on a human interaction panel) that have significant junctions with the disease candidate gene; compiling a disease gene list comprising the disease candidate gene and adjacent genes; selecting some or all of the genes from the list of disease genes, thereby providing the disease gene expression signature.
In some embodiments, the adjacent genes are selected by a machine learning algorithm.
In some embodiments, the disease, disorder, or condition comprises ulcerative colitis, crohn's disease, rheumatoid arthritis, juvenile arthritis, psoriatic arthritis, plaque psoriasis, ankylosing spondylitis, guillain-barre syndrome, sjogren's syndrome, scleroderma, vitiligo, bipolar disorder, graves ' disease, schizophrenia, alzheimer's disease, multiple sclerosis, parkinson's disease, or a combination thereof.
In some embodiments, the layering of the subject groups into two or more groups is random or based at least in part on whether a previous subject responded to the therapy.
In some embodiments, the therapy comprises a member selected from table 1.
In some embodiments, the therapy comprises anti-TNF therapy.
In another aspect, the present disclosure provides a method of identifying and selecting a subject for clinical trials comprising: receiving gene expression data of a group of subjects; analyzing the gene expression data to detect the presence of a disease gene expression signature; administering at least one dose of therapy to the subject; identifying a change in the disease gene expression signature relative to gene expression in a healthy subject; and selecting a subject exhibiting a quantifiable change in gene expression of the disease gene expression signature to a healthy subject for the clinical trial, wherein the disease gene expression signature is determined by the methods described herein.
In another aspect, the present disclosure provides a system for determining or verifying responsiveness of a subject having a disease to therapy, the system comprising: a processor of a computing device; and a memory having instructions stored thereon, wherein the instructions, when executed by the processor, cause the processor to perform the operations of any of the methods described herein.
Another aspect of the present disclosure provides a non-transitory computer-readable medium comprising machine-executable code that, when executed by one or more computer processors, implements any of the methods described above or elsewhere herein.
Another aspect of the present disclosure provides a system comprising one or more computer processors and computer memory coupled thereto. The computer memory includes machine executable code that, when executed by the one or more computer processors, implements any of the methods described above or elsewhere herein.
Other aspects and advantages of the present disclosure will become readily apparent to those skilled in the art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments and its several details are capable of modification in various obvious respects, all without departing from the present disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive.
Incorporation by reference
All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. To the extent publications and patents or patent applications incorporated by reference contradict the disclosure contained in this specification, this specification is intended to supersede or take precedence over any such contradictory material.
Drawings
FIG. 1 depicts an exemplary workflow for identifying disease expression signatures.
Figure 2 depicts a graph illustrating 2D representations of gene expression profiles of responders and non-responders to treatment and healthy controls after baseline and treatment.
Fig. 3A and 3B are a series of overlapping graphs illustrating that the non-responder biomarker set is almost completely contained within the responder biomarker set, and that for each study group, the responder biomarker set is typically twice as large as the non-responder biomarker set (fig. 3A represents study 1 of example 1; fig. 3B represents study 2 of example 1).
FIG. 4 depicts an exemplary network environment and computing device for use in various embodiments.
Fig. 5 depicts an example of a computing device 500 and a mobile computing device 550 that may be used to implement the various techniques provided herein.
Fig. 6 depicts a graph illustrating nodes up-and down-regulated in response to anti-TNF therapy, such as clustering and linking on a biological network (e.g., a human interaction panel graph).
Fig. 7 depicts an overview of a module triplet framework. (a) Pipeline for finding UC module triplets on human interaction sets: the response module is derived from genes that are differentially expressed before and after treatment in active UC patients responding to TNFi therapies (infliximab and golimumab); deriving a genotype module by mapping genes associated with UC on a human interaction group; therapeutic modules were derived by selecting small molecule compounds that resulted in altered gene expression of the response module genes using experimental data in the HT29 cell line and mapping the compounds to their protein targets. Target prioritization based on discovered module triplets: (b) (d) measuring the topological relevance (proximity) of a node to a genotype module by calculating the average shortest path length of the node to all genotype module nodes and comparing it with an empirical distribution of average longest path lengths to a randomized connection sub-network of the same size as the genotype module using a Z-score; (c) (e) measuring the functional similarity (selectivity) of a node to a treatment module by calculating the mean Diffusion State Distance (DSD) of the node to all treatment module nodes and comparing it with the empirical distribution of the mean DSD to a randomized connection sub-network of the same size as the treatment module using a Z-score. All nodes are ranked based on proximity and selectivity, and their rankings are combined using a ranking product (rank product) to obtain the final target ranking.
Figure 8 depicts gene expression profiles of normal tissue controls and UC active patients before and after TNFi therapy. The first two coordinates of UMAP embedding of the gene expression profile are based on the therapeutic response to (a) infliximab TNFi; (b) A collection of 545 differentially expressed genes between active UC patients treated with golimumab TNFi and normal controls.
Fig. 9 depicts recovery of targets approved for 4 complex diseases based on Diffusion State Distance (DSD). For (a) Alzheimer's disease; (b) ulcerative colitis; (c) rheumatoid arthritis; (d) Treatment of multiple sclerosis is known to approve the restored subject operating profile (ROC) curve of the target. A single ROC curve demonstrates recovery of a given one of the known approval targets and the approval target of DSD from it to the remaining HI node. The red line represents the average ROC curve obtained by averaging the individual ROC curves, and the area under the curve (AUC) of the average ROC curve is reported.
FIG. 10 depicts computer verification of modular triplet target prioritization. (a) Highlighted are selective-proximity scatter plots of HI nodes of 23 targets approved for UC treatment. The more selective and proximal targets are located toward the lower left of the scatter plot. (b) Using proximity to the genotype module, selectivity to the treatment module, a combination of both, and local irradiations relative to the response module, a subject operating profile (ROC) curve for recovery of an approved UC target, and a corresponding area under the curve (AUC). (c) Violin plots with combined selectivity-proximity ordering against targets on the market of UC and targets at pre-clinical and clinical trial development stages of UC.
Fig. 11 depicts an overview of DE analysis. (a): schematic representation of differentially expressed gene sets obtained by comparing different status pairs of responders, non-responders and normal controls, and DE gene set names used throughout the specification; (b): venturi diagrams of R, NR and RBA sets in infliximab and golimumab studies (VENN DIAGRAM); (c): the R, NR and RBA sets overlap each other throughout the study.
FIG. 12 depicts KEGG pathway enrichment analysis of genes differentially expressed in responders and non-responders relative to healthy controls at baseline. (a) Venturi figures of genes differentially expressed relative to healthy controls for responders (R) and non-responders (NR) at baseline after combining groups of base Yu Yingfu liximab and golimumab. (b) a venturi diagram of the same gene set in the KEGG pathway database. (c) The KEGG pathway, which significantly enriches the NR gene set, also has significantly more NR-exclusive genes than R-exclusive genes.
Fig. 13 depicts target number/drug. Most drugs approved for UC treatment or under development have a maximum of 4 simultaneous targets. We filtered out drugs with >4 targets in the analysis.
FIG. 14 illustrates a computer system 1401 that is programmed or otherwise configured to perform analysis or operation of various methods.
Detailed Description
While various embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed.
Provided herein are systems and methods, for example, that can be used to determine and verify responses to therapies. In some embodiments, the present disclosure provides systems and methods for identifying a gene set that is indicative of a response to therapy when differentially expressed compared to healthy subjects. In some embodiments, the present disclosure provides systems and methods for patient stratification (e.g., in clinical trials) to identify responders and non-responders to therapy at the molecular level without reliance on changes in clinical characteristics.
Definition of the definition
And (3) application: as used herein, the term "administering" generally refers to administering a composition to a subject or system, for example, to effect delivery as a composition or as an agent contained in or otherwise delivered by a composition.
The preparation method comprises the following steps: as used herein, the term "agent" generally refers to an entity (e.g., a lipid, a metal, a nucleic acid, a polypeptide, a polysaccharide, a small molecule, etc., or a complex, combination, mixture, or system thereof [ e.g., a cell, tissue, organism ]) or phenomenon (e.g., heat, electrical current or electric field, magnetic force or magnetic field, etc.).
Amino acid: as used herein, the term "amino acid" generally refers to any compound or substance that may be incorporated into a polypeptide chain, for example, by forming one or more peptide bonds. In some embodiments, the amino acid has the general structure H 2 N-C (H) (R) -COOH. In some embodiments, the amino acid is a naturally occurring amino acid. In some embodiments, the amino acid is a non-natural amino acid; in some embodiments, the amino acid is a D-amino acid; in some embodiments, the amino acid is an L-amino acid. As used herein, the term "standard amino acid" refers to any of the twenty L-amino acids commonly found in naturally occurring peptides. "non-standard amino acid" refers to any amino acid other than a standard amino acid, whether or not it is present or may be present in a natural source. In some embodiments, amino acids, including carboxyl or amino terminal amino acids in polypeptides, may contain structural modifications as compared to the general structures above. For example, in some embodiments, an amino acid may be modified by methylation (e.g., of an amino group, a carboxylic acid group, one or more protons, or a hydroxyl group), amidation, acetylation, pegylation, glycosylation, phosphorylation, or substitution, as compared to the general structure. In some embodiments, such modifications may, for example, alter the stability or circulation half-life of a polypeptide containing a modified amino acid as compared to a polypeptide containing an otherwise identical unmodified amino acid. In some embodiments, such modifications do not significantly alter the activity associated with a polypeptide containing a modified amino acid compared to a polypeptide containing an otherwise identical unmodified amino acid. In some embodiments, the term "amino acid" may be used to refer to a free amino acid; in some embodiments, it may be used to refer to an amino acid residue of a polypeptide, e.g., an amino acid residue within a polypeptide.
An analog: as used herein, the term "analog" generally refers to a substance that shares one or more specific structural features, elements, components, or portions with the reference substance. In some embodiments, "analogs" exhibit significant structural similarity to a reference substance, e.g., sharing a core or shared structure, but also differ in some discrete fashion. In some embodiments, the analog is a substance that can be generated from a reference substance, for example, by chemical manipulation of the reference substance. In some embodiments, an analog is a substance that can be generated by performing a synthetic process that is substantially similar (e.g., sharing multiple operations) to the synthetic process that generates the reference substance. In some embodiments, the analog is or may be generated by performing a different synthetic process than that used to generate the reference substance.
Antagonists: as used herein, the term "antagonist" generally refers to an agent or condition whose presence, level, degree, type or form correlates to a decrease in the level or activity of a target. Antagonists may include any chemical class of agents including, for example, small molecules, polypeptides, nucleic acids, carbohydrates, lipids, metals, or any other entity that exhibits the relevant inhibitory activity. In some embodiments, an antagonist may be a "direct antagonist" in that it binds directly to its target; in some embodiments, an antagonist may be an "indirect antagonist" in that it exerts its effect by a mechanism other than direct binding to its target; for example, by interacting with a modulator of the target such that the level or activity of the target is altered. In some embodiments, an "antagonist" may be referred to as an "inhibitor".
Antibody: as used herein, the term "antibody" generally refers to a polypeptide comprising classical immunoglobulin sequence elements sufficient to confer specific binding to a particular target antigen. The intact antibody as produced in nature is an approximately 150kD tetrameric agent, consisting of two identical heavy chain polypeptides (each approximately 50 kD) and two identical light chain polypeptides (each approximately 25 kD) that associate with each other into a structure commonly referred to as a "Y-shaped" structure. Each heavy chain consists of at least four domains (each about 110 amino acids long) -one amino terminal Variable (VH) domain (located at the end of the Y structure), followed by three constant domains: CH1, CH2 and carboxy-terminal CH3 (at the base of the Y stem). A short region, or "switch," connects the heavy chain variable and constant regions. The "hinge" connects the CH2 and CH3 domains to the remainder of the antibody. Two disulfide bonds in this hinge region link the two heavy chain polypeptides in the intact antibody to each other. Each light chain consists of two domains-one amino-terminal Variable (VL) domain followed by a carboxy-terminal Constant (CL) domain, separated from each other by another "switch". The intact antibody tetramer is composed of two heavy chain-light chain dimers, wherein the heavy and light chains are linked to each other by a single disulfide bond; the other two disulfide bonds connect the heavy chain hinge regions to each other, so that dimers are connected to each other and tetramers are formed. Naturally occurring antibodies are also glycosylated, such as on the CH2 domain. Each domain in a natural antibody has a structure characterized by an "immunoglobulin fold" formed by two beta sheets (e.g., 3-, 4-, or 5-chain sheets) packaged relative to each other in a compressed antiparallel beta barrel. Each variable domain contains three hypervariable loops ("complement determining regions") (CDR 1, CDR2, and CDR 3) and four slightly invariant "framework" regions (FR 1, FR2, FR3, and FR 4). When the natural antibody is folded, the FR regions form beta sheets that provide the structural framework for the domain, and the CDR loop regions from the heavy and light chains are joined together in three dimensions such that they create a single hypervariable antigen binding site at the end of the Y structure. The Fc region of naturally occurring antibodies binds to elements of the complement system and also to receptors on effector cells, including, for example, effector cells that mediate cytotoxicity. The affinity or other binding properties of the Fc region for Fc receptors may be modulated by glycosylation or other modifications. In some embodiments, antibodies produced or utilized according to the present disclosure include glycosylated Fc domains, including Fc domains having such glycosylation modified or engineered. For the purposes of this disclosure, in certain embodiments, any polypeptide or polypeptide complex that comprises sufficient immunoglobulin domain sequence as found in a natural antibody may be referred to or used as an "antibody," whether such polypeptide is naturally-occurring (e.g., produced by an organism that reacts with an antigen), or produced by recombinant engineering, chemical synthesis, or other artificial systems or methods. In some embodiments, the antibody is polyclonal; in some embodiments, the antibody is monoclonal. In some embodiments, the antibody has a constant region sequence specific for a mouse, rabbit, primate, or human antibody. In some embodiments, the antibody sequence elements are humanized, primatized, chimeric, or the like. Furthermore, as used herein, the term "antibody" in appropriate embodiments (unless otherwise indicated or clear from context) may refer to any development construct or form that is used to exploit the structural and functional characteristics of an antibody in alternative presentations. For example, in embodiments, the form of antibody used in accordance with the present disclosure is selected from, but is not limited to, an intact IgA, igG, igE or IgM antibody; bispecific or multispecific antibodies (e.g.,Etc.); antibody fragments, such as Fab fragments, fab ' fragments, F (ab ') 2 fragments, fd ' fragments, fd fragments, and isolated CDRs, or a collection thereof; a single chain Fv; a polypeptide-Fc fusion; single domain antibodies (e.g., shark single domain antibodies, such as IgNAR or fragments thereof); camel antibodies; masking antibodies (e.g./>) ; Small modular immunopharmaceuticals ("SMIPs TM"); single-chain or tandem diabodiesVHH;/>A minibody; /(I)Ankyrin repeat protein or/>DART; TCR-like antibodies; /(I) A micro protein; /(I) />In some embodiments, the antibody may lack covalent modifications (e.g., attachment of glycans) that may be present when naturally occurring. In some embodiments, the antibodies can contain covalent modifications (e.g., attachment of glycans, payloads [ e.g., detectable moieties, therapeutic moieties, catalytic moieties, etc. ] or other pendant groups [ e.g., polyethylene glycol, etc. ]).
Associated with: as used herein, two events or entities are typically "associated with" each other if the presence, level, degree, type, or form of one is correlated with the presence, level, degree, type, or form of the other. For example, if the presence, level, or form of a particular entity (e.g., polypeptide, genetic feature, metabolite, microorganism, etc.) is associated with the incidence or susceptibility of a disease, disorder, or condition (e.g., across a related population), it is considered to be associated with the particular disease, disorder, or condition. In some embodiments, two or more entities are physically "associated" with each other if they interact directly or indirectly such that they are or remain in physical proximity to each other. In some embodiments, two or more entities physically associated with each other are covalently linked to each other; in some embodiments, two or more entities that are physically associated with each other are not covalently linked to each other, but are non-covalently associated, such as by hydrogen bonding, van der waals interactions, hydrophobic interactions, magnetic properties, and combinations thereof.
Biological sample: as used herein, the term "biological sample" generally refers to a sample obtained or derived from a biological source of interest (e.g., a tissue or organism or cell culture), as described herein. In some embodiments, the source of interest includes an organism, such as an animal or a human. In some embodiments, the biological sample is or includes biological tissue or fluid. In some embodiments, the biological sample may be or include bone marrow; blood; blood cells; ascites; tissue or fine needle biopsy samples; a body fluid containing cells; free floating nucleic acids; sputum; saliva; urine; cerebrospinal fluid, peritoneal fluid; pleural fluid; feces; lymph; gynecological liquid; a skin swab; a vaginal swab; an oral swab; a nasal swab; wash or lavage fluid, such as catheter lavage fluid or bronchoalveolar lavage fluid; aspirate; scraping scraps; a bone marrow sample; a tissue biopsy sample; a surgical sample; feces, other body fluids, secretions or excretions; or cells therein, etc. In some embodiments, the biological sample is or includes cells obtained from an individual. In some embodiments, the cells obtained are or include cells from the individual from which the sample was obtained. In some embodiments, the sample is a "primary sample" obtained directly from a source of interest by any suitable method. For example, in some embodiments, the primary biological sample is obtained by a method selected from biopsy (e.g., fine needle aspiration or tissue biopsy), surgery, collection of bodily fluids (e.g., blood, lymph, stool, etc.), and the like. In some embodiments, the term "sample" refers to a formulation obtained by processing a primary sample (e.g., by removing one or more components of the primary sample or by adding one or more agents thereto). Filtration is performed, for example, using a semi-permeable membrane. Such "treatment of a sample" may include, for example, nucleic acids or proteins extracted from the sample, or obtained by subjecting a primary sample to techniques such as amplification or reverse transcription of mRNA, isolation or purification of certain components, and the like.
Biological network: as used herein, the term "biological network" generally refers to any network suitable for use in a biological system having subunits (e.g., "nodes") connected in an entirety, such as species units connected in an entire network. In some embodiments, the biological network is a protein-protein interaction network (PPI), meaning interactions between proteins present in the cell, wherein the proteins are nodes and their interactions are edges (edges). In some embodiments, the connection between nodes in the PPI is verified experimentally. In some embodiments, the connections between nodes are a combination of experimental verification and mathematical computation. In some embodiments, the biological network is a human interaction set (a network of experimentally derived interactions that occur in human cells that includes protein-protein interaction information as well as gene expression and co-expression, cellular co-localization of proteins, genetic information, metabolic and signaling pathways, and the like). In some embodiments, the biological network is a gene regulation network, a gene co-expression network, a metabolic network, or a signaling network.
Combination therapy: as used herein, the term "combination therapy" generally refers to a clinical intervention in which a subject is simultaneously exposed to two or more therapeutic regimens (e.g., two or more therapeutic agents). In some embodiments, two or more treatment regimens may be administered simultaneously. In some embodiments, two or more treatment regimens may be administered sequentially (e.g., a first regimen, followed by any dose of a second regimen). In some embodiments, the two or more treatment regimens are administered in an overlapping dosing regimen. In some embodiments, administration of the combination therapy may involve administration of one or more therapeutic agents or modes to a subject receiving other therapeutic agents or modes. In some embodiments, combination therapies do not necessarily require that the individual agents be administered together (or even at the same time) in a single composition. In some embodiments, two or more therapeutic agents or modes of combination therapy are administered separately to a subject, e.g., in separate compositions, via separate routes of administration (e.g., one agent is oral, another agent is intravenous), or at different time points. In some embodiments, two or more therapeutic agents may be administered together in a combined composition or even in a combined compound (e.g., as part of a single chemical complex or covalent entity) via the same route of administration or at the same time.
The method is equivalent to that of: as used herein, the term "comparable" generally refers to two or more agents, entities, situations, sets of conditions, etc., which may be different from each other but sufficiently similar to allow comparison therebetween such that a conclusion may be reasonably drawn based on the observed differences or similarities. In some embodiments, a comparable set of conditions, individual or population is characterized by a plurality of substantially identical features and one or a few varying features. In various approaches, different degrees of identity may be required under any given circumstances for two or more such agents, entities, situations, sets of conditions, etc. to be considered equivalent. For example, in various approaches, when an environment, individual, or group of groups is characterized by a sufficient number and type of substantially identical features to warrant a reasonable conclusion: i.e. under or with different sets of circumstances, individuals or groups of circumstances, where the differences in the results or observed phenomena obtained with the different sets of circumstances, individuals or groups are caused by or indicative of changes in those different characteristics, then the sets of circumstances, individuals or groups are comparable to each other.
Corresponding to: as used herein, the phrase "corresponding to" generally refers to a relationship between two entities, events or phenomena that share sufficient characteristics but are reasonably equivalent such that the "corresponding" attribute is apparent. For example, in some embodiments, the term may be used in reference to a compound or composition to designate the position or identity of a structural element in the compound or composition by comparison to an appropriate reference compound or composition. For example, in some embodiments, a monomer residue in a polymer (e.g., an amino acid residue in a polypeptide or a nucleic acid residue in a polynucleotide) may be identified as "corresponding to" a residue in an appropriate reference polymer. For example, for simplicity, residues in a polypeptide are often specified using a canonical numbering system based on the reference relevant polypeptide, such that, for example, an amino acid "corresponding to" a residue at position 190 may not actually be the 190 th amino acid in a particular amino acid chain, but rather corresponds to the residue found at 190 in the reference polypeptide; various methods can be used to identify "corresponding" amino acids. For example, various methods may use different sequence alignment strategies, including software programs, such as BLAST、CS-BLAST、CUSASW++、DIAMOND、FASTA、GGSEARCH/GLSEARCH、Genoogle、HMMER、HHpred/HHsearch、IDF、Infernal、KLAST、USEARCH、parasail、PSI-BLAST、PSI-Search、ScalaBLAST、Sequilab、SAM、SSEARCH、SWAPHI、SWAPHI-LS、SWIMM or SWIPE, for example, which may be used to identify "corresponding" residues in polypeptides or nucleic acids according to the present disclosure.
Dosing regimen or treatment regimen: the terms "dosing regimen" and "therapeutic regimen" may be generally used to refer to a set of unit doses (e.g., more than one) that are administered to a subject separately, which doses may be separated by time periods. In some embodiments, a given therapeutic agent has a recommended dosing regimen, which may involve one or more doses. In some embodiments, the dosing regimen comprises a plurality of doses, each dose separated in time from the other doses. In some embodiments, the individual doses are separated from each other by a period of time that is the same length of time; in some embodiments, the dosing regimen comprises a plurality of doses, and the individual doses are separated by at least two different time periods. In some embodiments, all doses within a dosing regimen have the same unit dose amount. In some embodiments, different doses within a dosing regimen have different amounts. In some embodiments, the dosing regimen includes a first dose of a first dose amount followed by one or more additional doses of a second dose amount different from the first dose amount. In some embodiments, the dosing regimen includes a first dose of a first dose amount followed by one or more additional doses of a second dose amount that is the same as the first dose amount. In some embodiments, the dosing regimen is associated with a beneficial outcome (e.g., is a therapeutic dosing regimen) when administered in the relevant population.
Increased, increased or decreased: as used herein, the terms "increased," "increased," or "decreased," or grammatically equivalent comparative terms thereof, generally refer to values measured relative to equivalent references. For example, in some embodiments, the evaluation achieved with an agent of interest may be "enhanced" relative to the evaluation obtained with a comparable reference agent. Alternatively or additionally, in some embodiments, the evaluation value achieved in a subject or system of interest may be "enhanced" relative to an evaluation value obtained in a different condition (e.g., before or after an event such as administration of an agent of interest) in the same subject and system, or in a different equivalent subject (e.g., in the presence of one or more indicators of a particular disease, disorder, or condition of interest, or prior exposure to a condition or agent, etc.).
Patient or subject: as used herein, the term "patient" or "subject" generally refers to any organism to which or to which the provided composition may be administered, e.g., for experimental, diagnostic, prophylactic, cosmetic, or therapeutic purposes. Some patients or subjects include animals (e.g., mammals such as mice, rats, rabbits, non-human primates, or humans). In some embodiments, the patient is a human. In some embodiments, the patient or subject is suffering from or susceptible to one or more disorders or conditions. In some embodiments, the patient or subject exhibits one or more symptoms of the disorder or condition. In some embodiments, the patient or subject has been diagnosed with one or more disorders or conditions. In some embodiments, the patient or subject is receiving or has received a therapy to diagnose or treat a disease, disorder, or condition.
Pharmaceutical composition: as used herein, the term "pharmaceutical composition" generally refers to an active agent formulated with one or more pharmaceutically acceptable carriers. In some embodiments, the active agent is present in a unit dose amount suitable for administration to a subject of interest in a therapeutic regimen (e.g., in an amount that has been demonstrated to exhibit a statistically significant probability of achieving a predetermined therapeutic effect upon administration), or in a different equivalent subject (e.g., in a different equivalent subject or system than the subject or system of interest in the presence of one or more indicators of a particular disease, disorder, or condition of interest, or prior exposure to a condition or agent, etc.). In some embodiments, the comparison term refers to a statistically relevant difference (e.g., having a prevalence or magnitude sufficient to achieve a statistical correlation). In a given case, various methods may be used to determine the degree of difference or prevalence needed or sufficient to achieve such statistical significance.
Pharmaceutically acceptable: as used herein, the term "pharmaceutically acceptable" generally refers to those compounds, materials, compositions, or dosage forms which are, within the scope of sound medical judgment, suitable for use in contact with the tissues of human beings or animals without excessive toxicity, irritation, allergic response, or other problem or complication, commensurate with a reasonable benefit/risk ratio.
Prevention of: as used herein, the term "preventing," when used in connection with the occurrence of a disease, disorder, or condition, generally refers to reducing the risk of developing the disease, disorder, or condition, or delaying the onset of one or more features or symptoms of the disease, disorder, or condition. Prevention may be considered complete when the onset of a disease, disorder or condition is delayed by a predetermined period of time.
Reference is made to: as used herein, the term "reference" generally describes a standard or control against which a comparison is made. For example, in some embodiments, an agent, animal, individual, population, sample, sequence, or value of interest is compared to a reference or control agent, animal, individual, population, sample, sequence, or value. In some embodiments, the reference or control is tested or determined substantially simultaneously with the test or determination of interest. In some embodiments, the reference or control is a historical reference or control, optionally embodied in a tangible medium. The reference or control is determined or characterized under conditions or circumstances comparable to the conditions or circumstances under which the evaluation was made. There is sufficient similarity to justify the reliance or comparison on a particular possible reference or control.
Therapeutic agent: as used herein, the phrase "therapeutic agent" generally refers to any agent that elicits a pharmacological effect when administered to an organism. In some embodiments, an agent is considered a therapeutic agent if the agent exhibits a statistically significant effect in an appropriate population. In some embodiments, the suitable population may be a population of model organisms. In some embodiments, the appropriate population may be defined by various criteria, such as a certain age group, gender, genetic background, pre-existing clinical conditions, and the like. In some embodiments, a therapeutic agent is a substance that can be used to reduce, ameliorate, alleviate, inhibit, prevent, delay the onset of, reduce the severity of, or reduce the incidence of one or more symptoms or features thereof. In some embodiments, a "therapeutic agent" is an agent that has been or needs to be approved by a government agency for sale for administration to humans. In some embodiments, a "therapeutic agent" is an agent that requires a medical prescription for administration to a human.
Therapeutically effective amount of: as used herein, the term "therapeutically effective amount" generally refers to the amount of a substance (e.g., a therapeutic agent, composition, or formulation) that elicits a biological response when administered as part of a therapeutic regimen. In some embodiments, a therapeutically effective amount of a substance is an amount sufficient to treat, diagnose, prevent, or delay the onset of a disease, disorder, or condition when administered to a subject suffering from or susceptible to the disease, disorder, or condition. The effective amount of the substance may vary depending on factors such as the biological endpoint, the substance to be delivered, the target cell or tissue, and the like. For example, an effective amount of a compound in a formulation for treating a disease, disorder or condition is an amount that reduces, ameliorates, alleviates, inhibits, prevents, delays onset of, reduces the severity of, or reduces the incidence of one or more symptoms or features thereof. In some embodiments, a therapeutically effective amount is administered in a single dose; in some embodiments, multiple unit doses are required to deliver a therapeutically effective amount.
Treatment: as used herein, the term "treatment" generally refers to any method for partially or completely alleviating, ameliorating, alleviating, inhibiting, preventing, delaying the onset of, reducing the severity of, or reducing the incidence of one or more symptoms or features thereof. The treatment may be administered to a subject that does not exhibit signs of the disease, disorder, or condition. In some embodiments, the treatment may be administered to a subject exhibiting early signs of a disease, disorder, or condition, e.g., in order to reduce the risk of developing a pathology associated with the disease, disorder, or condition.
Variants: as used herein, the term "variant" refers to an entity that exhibits significant structural identity to a reference entity, but is structurally different from the reference entity in the presence or level of one or more chemical moieties as compared to the reference entity. In many embodiments, the variant is functionally different from its reference entity as well. Whether a particular entity is properly considered a "variant" of a reference entity may be based on the degree of structural identity with the reference entity. Any biological or chemical reference entity has certain characteristic structural elements. By definition, a variant is a unique chemical entity that shares one or more such characteristic structural elements. A small molecule may have a characteristic core structural element (e.g., a macrocyclic core) or one or more characteristic-dependent moieties, such that variants of the small molecule are variants that share the core structural element and the characteristic-dependent moiety, but differ in other dependent moieties or types of bonds present within the core (single and double, E and Z, etc.), a polypeptide may have a characteristic sequence element composed of multiple amino acids that have a specified position relative to one another in linear or three-dimensional space or contribute to a particular biological function, and a nucleic acid may have a characteristic sequence element composed of multiple nucleotide residues that have a specified position relative to one another in linear or three-dimensional space. For example, a variant polypeptide may differ from a reference polypeptide by one or more differences in amino acid sequence or one or more differences in chemical moieties (e.g., carbohydrates, lipids, etc.) covalently attached to the polypeptide backbone. In some embodiments, the variant polypeptide exhibits at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, or 99% overall sequence identity to the reference polypeptide. Alternatively or additionally, in some embodiments, the variant polypeptide does not share at least one characteristic sequence element with the reference polypeptide. In some embodiments, the reference polypeptide has one or more biological activities. In some embodiments, the variant polypeptides share one or more biological activities of the reference polypeptide. In some embodiments, the variant polypeptide lacks one or more biological activities of the reference polypeptide. In some embodiments, the variant polypeptide exhibits a reduced level of one or more biological activities as compared to a reference polypeptide. In many embodiments, a polypeptide of interest is considered to be a "variant" of a parent or reference polypeptide if it has the same amino acid sequence as the parent, but has a small number of sequence changes at a particular position. In some embodiments, fewer than 20%, 15%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2% of the residues in the variant are substituted compared to the parent. In some embodiments, the variant has 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 substituted residues compared to the parent. Variants often have very few (e.g., less than 5, 4, 3, 2, or 1) substituted functional residues (e.g., residues involved in a particular biological activity). Furthermore, variants may have no more than 5, 4, 3, 2 or 1 additions or deletions compared to the parent, and often no additions or deletions. Further, any addition or deletion may be less than about 25, about 20, about 19, about 18, about 17, about 16, about 15, about 14, about 13, about 10, about 9, about 8, about 7, about 6 residues, and typically less than about 5, about 4, about 3, or about 2 residues. In some embodiments, the parent or reference polypeptide is a polypeptide that occurs in nature.
Disease gene expression signature (response module)
The present disclosure provides, among other things, a disease gene expression signature that, when reversed (in whole or in large part, e.g., after administration of one or more doses of therapy), indicates that a subject is responding to therapy. This approach is advantageous over other approaches because the methods described in this disclosure allow quantitative responses at the molecular level, rather than relying on observing changes in clinical characteristics. Indeed, the present disclosure provides methods and systems that cover the insight that, when adjusted to resemble healthy subjects, the expression of specific molecular signatures, e.g., specific genes, indicates that the diseased subject responds to therapy. In some embodiments, the disease expression signature is a pattern of genes differentially expressed in the diseased subject as compared to the healthy subject. The disease expression signature described in this disclosure accounts for subtle differences between diseased and healthy subjects at the molecular level.
In some embodiments, the present disclosure provides methods and systems that cover the insight that gene expression indicative of a response to therapy is not necessarily derived between subgroups of subjects with the same disease. That is, for example, in a group of subjects suffering from a disease, the present disclosure recognizes that analyzing differences in gene expression between one or more subgroups of the group of subjects may not produce a pattern of gene expression that indicates whether the subject may respond to therapy or otherwise begin recovery from the disease, disorder, or condition. In contrast, in some embodiments, the present disclosure analyzes gene expression between a subset of diseased subjects and healthy subjects having similar gene expression patterns. By analyzing the differences between a diseased subject and a healthy subject, and by identifying key gene expression targets in the diseased subject that are different from the healthy subject and also play an important role in driving the response, it will be appreciated (without being bound by theory) that by modulating key differentially expressed genes, the gene expression pattern of the diseased subject can be similar to that of the healthy subject, and thus lead to disease regression.
An exemplary workflow for identifying disease gene expression signatures is shown in fig. 1. In some embodiments, a group of gene expression data of a set of subjects suffering from a disease is analyzed (101). Each object within the group is then layered (102) according to a particular metric. For example, in some embodiments, subjects within a group are stratified according to whether they are responders or non-responders to a particular therapy (e.g., anti-TNF therapy). In some embodiments, the objects within the group are layered using a supervised or unsupervised clustering algorithm. In some embodiments, the objects within the group are layered using a supervised clustering algorithm. In some embodiments, the objects within the group are layered using an unsupervised clustering algorithm. In some embodiments, the grouping of the groups of subjects into two or more groups of prior subjects is based on whether the prior subjects responded to a particular therapy. As used herein, "therapy" refers to a therapeutic agent as defined herein, gene knockout (e.g., rendering one or more specific genes of a subject inoperative), or gene overexpression (e.g., increasing expression of one or more specific genes in a subject beyond a normal amount).
In some embodiments, baseline expression profiles of subgroups within the clusters are analyzed and compared (103) to one or more healthy control subjects. Differentially expressed genes are identified, which are termed "disease candidate genes". In some embodiments, certain differentially expressed genes are selected as "disease candidate genes". In some embodiments, genes that are significantly differentially expressed are selected as disease candidate genes. In some embodiments, the significant difference in gene expression is measured by an absolute fold change in p-value of 0.05 or less and 0.5 or more.
In some embodiments, the disease expression signature includes all, substantially all, or a subset of the identified disease candidate genes. In some embodiments, disease candidate genes are optionally mapped onto a biological network (104). Without being bound by theory, it will be appreciated that understanding the connectivity of genes within disease candidate genes allows identification of the most relevant genes, thereby knocking out genes that may not be significantly affected by the response in treating a subject for a particular disease. For example, in some embodiments, the biological network is a human interaction panel. In some embodiments, genes from a disease candidate gene set that are significantly linked or otherwise clustered on a human interaction panel are selected as disease gene expression signatures. In some embodiments, all, substantially all, or a subset of the disease candidate genes are clustered or significantly linked on the human interaction group map. In some embodiments, the disease gene expression signature comprises disease candidate genes clustered on a biological network (e.g., a human interaction panel). In some embodiments, the disease gene expression signature comprises disease candidate genes that are significantly linked to each other on a biological network (e.g., a human interaction panel). In some embodiments, the disease candidate gene is mapped onto a biological network prior to incorporation into the disease gene expression signature.
In some embodiments, the disease gene expression signature is determined by: analyzing gene expression data from a group of subjects suffering from the same disease, disorder or condition as the subject; layering the group of objects into two or more groups of previous objects based on the gene expression data; and selecting one or more genes (e.g., "disease candidate genes") having a significant difference in gene expression between two or more groups of the previous subject and the group of healthy subjects, thereby providing a disease gene expression signature.
As used herein, a "healthy gene expression signature" refers to gene expression of a responsive gene in a healthy control (e.g., as a subject to be treated as described herein, a subject not suffering from a disease, disorder, or condition).
In some embodiments, the present disclosure provides a method of determining a disease gene expression signature for quantifying responsiveness of a subject suffering from a disease, disorder or condition to therapy, the method comprising: receiving gene expression data from a group of subjects having the same disease, disorder, or condition (e.g., having the same disease, disorder, or condition as the subject being evaluated for responsiveness to therapy); grouping the groups of objects into two or more groups based on the gene expression data (e.g., grouping objects from groups with similar gene expression); calculating differences in gene expression between two or more groups of subjects and groups of healthy subjects; and selecting one or more genes ("disease candidate genes") having a significant difference in gene expression between two or more groups of subjects and the group of healthy subjects, thereby providing a disease gene expression signature.
In some embodiments, the disease candidate genes are mapped onto a biological network (e.g., human interaction group) prior to incorporation into the disease gene expression signature.
In some embodiments, the subset of genes having a close or spatial relationship on the biological network is selected from disease candidate genes for incorporation into a disease gene expression signature. In some embodiments, the subset of genes that have a close or spatial relationship on the biological network may be, for example, genes that have close proximity on a human interaction set-up graph. For example, in some embodiments, genes represented by nodes on a biological network connected to two or more nodes are selected so as to exclude abnormal nodes.
In some embodiments, a subset of genes with significant junctions in disease candidate genes is selected for incorporation into the disease gene expression signature. For example, in some embodiments, a score is assigned to each connection between each node within a disease candidate gene. Disease candidate genes may be ranked based on the score, and only the highest ranked disease candidate gene (e.g., the first 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90% of genes from the disease candidate genes) is selected.
Thus, in some embodiments, the present disclosure provides a method of determining a disease gene expression signature for quantifying responsiveness of a subject suffering from a disease, disorder or condition to therapy, the method comprising: receiving gene expression data from a group of subjects having the same disease, disorder, or condition (e.g., having the same disease, disorder, or condition as the subject being evaluated for responsiveness to therapy); grouping the groups of objects into two or more groups based on the gene expression data (e.g., grouping objects from groups with similar gene expression); calculating differences in gene expression between two or more groups of subjects and groups of healthy subjects; selecting one or more genes ("disease candidate genes") having a significant difference in gene expression between two or more groups of subjects and a group of healthy subjects; mapping disease candidate genes onto a biological network (e.g., a human interaction group); selecting adjacent genes (e.g., genes on adjacent nodes, e.g., on a human interaction panel) that have significant junctions with the disease candidate gene; compiling a disease gene list comprising disease candidate genes and adjacent genes; some or all of the genes are selected from the disease gene list to provide a disease gene expression signature.
In some embodiments, some or all of the genes are selected from the disease gene list for incorporation into the disease gene expression signature by ordering according to the connection strength to other nodes on the biological network. In some embodiments, the first 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90% of the genes in the disease gene list are selected for incorporation into the disease gene expression signature.
The genes of the subject are measured by at least one of microarray, RNA sequencing, real-time quantitative reverse transcription PCR (qRT-PCR), bead array, ELISA, and protein expression, as described herein. In some embodiments, the gene expression of the subject is measured by subtracting background data, correcting for batch effects, and dividing by the average expression of housekeeping genes. (see, e.g., eisenberg & Levanon, "Human housekeeping genes, revisited," TRENDS IN GENETICS,29 (10): 569-574 (month 10 2013), which is incorporated herein by reference for all purposes). In the context of microarray data analysis, background subtraction refers to subtracting the average fluorescent signal generated by probe features on the chip that are not complementary to any mRNA sequences, e.g., signals generated by non-specific binding, from the fluorescent signal intensity of each probe feature. Background subtraction can be performed using different software packages, such as Affymetrix TM Gene Expression Console. Housekeeping genes are involved in basic cell maintenance, and thus are expected to maintain constant expression levels in all cells and conditions. The expression levels of the genes of interest, such as those in response signatures, may be normalized by dividing the expression levels by the average expression levels of a selected panel of housekeeping genes. This housekeeping gene normalization procedure calibrates gene expression levels for experimental variability. In addition, normalization methods, such as robust multi-array averaging ("RMA") to correct for variability of different batches of microarrays, are provided in R-packets recommended by Illumina TM or Affymetrix TM microarray platforms. Logarithmic conversion is carried out on the normalized data, and probes with lower detection rate in the sample are removed. In addition, probes that do not have available genetic symbols or Entrez ID are removed from the analysis.
Therapeutic methods and monitoring therapies
The present disclosure provides, among other things, methods of treating a subject having a disease, disorder, or condition and monitoring therapy of the subject, comprising evaluating changes in gene expression within a disease gene expression signature. For example, the present disclosure provides methods and systems that cover the insight that a change in the molecular level of a particular gene expression within a disease gene expression signature that is similar to the (full or partial) gene expression of a healthy subject indicates that the subject is responding to therapy, or that the disease is regressing. For example, in some embodiments, the present disclosure provides a method of treating a subject exhibiting a disease gene expression signature, the method comprising administering a therapy determined to restore (or reverse or otherwise alter) the disease gene expression signature to be similar to a healthy gene expression signature.
In some embodiments, the present disclosure provides techniques for verifying a response to a therapy in a subject suffering from a disease, disorder, or condition, comprising analyzing changes in disease gene expression signatures in the subject after administration of the therapy, wherein the disease gene expression signatures are determined to quantify the response to the therapy.
Furthermore, the present disclosure provides techniques for monitoring therapy for a given subject or group of subjects. Since the gene expression level of a subject may vary over time, in some cases it may be necessary or desirable to evaluate the subject at one or more points in time, e.g., at specified and/or periodic intervals.
In some embodiments, repeated monitoring over time allows or enables detection of one or more changes in gene expression profile or characteristics of a subject that may affect an ongoing treatment regimen. In some embodiments, a change is detected, in response to which a particular therapy administered to the subject is continued, altered, or paused. In some embodiments, therapy may be altered, for example, by increasing or decreasing the frequency or amount of administration of one or more agents or treatments with which a subject has been treated. Alternatively or additionally, in some embodiments, the therapy may be altered by adding therapy with one or more new agents or treatments. In some embodiments, the therapy may be altered by suspending or stopping one or more specific agents or treatments.
In some embodiments, monitoring comprises quantifying or analyzing changes in disease gene expression signatures. In some embodiments, the disease gene expression signature is determined by: analyzing gene expression data from a group of subjects suffering from the same disease, disorder or condition as the subject; layering the group of objects into two or more groups of previous objects based on the gene expression data; and selecting one or more genes ("disease candidate genes") having a significant difference in gene expression between two or more groups of the previous subject and the group of healthy subjects, thereby providing a disease gene expression signature.
In some embodiments, the present disclosure provides a method of monitoring the efficacy of a treatment of a subject suffering from a disease, disorder or condition, the method comprising monitoring for a change in disease gene expression signature after administration of a therapy, wherein the disease gene expression signature is derived by a process comprising: analyzing gene expression data from a group of subjects suffering from the same disease, disorder or condition as the subject; grouping the groups of objects into two or more groups based on the gene expression data (e.g., grouping objects from groups with similar gene expression); determining a difference in gene expression between two or more groups of subjects and a group of healthy subjects; and selecting one or more genes ("disease candidate genes") having a significant difference in gene expression between two or more groups of subjects and the group of healthy subjects, thereby providing a disease gene expression signature.
In some embodiments, the layering of the previous subject groups into two or more groups comprises layering the subjects based on whether the previous subject was a responder or a non-responder to a particular therapy (e.g., anti-TNF therapy, or a therapy selected from table 1). In some embodiments, the previous objects are randomly layered. In some embodiments, the previous subjects are stratified by similarity based on gene expression. In some embodiments, the analysis is performed by a machine learning process based on similarity of gene expression in previous subjects.
In some embodiments, the therapy is selected from table 1.
TABLE 1
/>
/>
/>
/>
/>
/>
/>
In some embodiments, the therapy is anti-TNF therapy. In some embodiments, the anti-TNF therapy is selected from infliximab, etanercept, adalimumab, polyethylene glycol conjugated cetuximab, golimumab, and biological analogs thereof. In some embodiments, the anti-TNF therapy is infliximab. In some embodiments, the anti-TNF therapy is etanercept. In some embodiments, the anti-TNF therapy is adalimumab. In some embodiments, the anti-TNF therapy is polyethylene glycol-conjugated cetuximab. In some embodiments, the anti-TNF therapy is golimumab. In some embodiments, the anti-TNF therapy is infliximab, etanercept, adalimumab, polyethylene glycol conjugated cetuximab, or a biological analog of golimumab.
In some embodiments, the therapy is selected from rituximab, s Lu Lishan anti (sarilumab), tofacitinib citrate (tofacitinib citrate), leflunomide, vedelizumab (vedolizumab), tobulab (tocilizumab), anakinra (anakinra), and abacavir (abatacept). In some embodiments, the therapy is rituximab. In some embodiments, the therapy is a Lu Lishan antibody. In some embodiments, the therapy is tofacitinib citrate. In some embodiments, the therapy is leflunomide. In some embodiments, the therapy is vedelizumab. In some embodiments, the therapy is tobrazumab. In some embodiments, the therapy is anakinra. In some embodiments, the therapy is abacavir.
In some embodiments, the disease, disorder, or condition comprises ulcerative colitis, crohn's disease, rheumatoid arthritis, juvenile arthritis, psoriatic arthritis, plaque psoriasis, or ankylosing spondylitis. In some embodiments, the disease, disorder, or condition is ulcerative colitis. In some embodiments, the disease, disorder, or condition is crohn's disease. In some embodiments, the disease, disorder, or condition is rheumatoid arthritis. In some embodiments, the disease, disorder, or condition comprises ulcerative colitis, crohn's disease, rheumatoid arthritis, juvenile arthritis, psoriatic arthritis, plaque psoriasis, or ankylosing spondylitis.
Patient stratification and trial design
The present disclosure also provides methods and systems that cover the insight that changes in gene expression at the molecular level can occur more rapidly and are easy to quantify than changes in the clinical characteristics of a subject receiving therapy. For example, the present disclosure provides methods and systems that cover an insight, namely, quantifying patient responsiveness to therapy early in a dosing regimen, thereby allowing a practitioner to alter the course of treatment of an individual subject, or otherwise suspend treatment of a subject, including in large scale studies, such as in clinical trials. Such measures allow the study designer to determine based on the individual biology which subjects do not respond to therapy and to remove it from the study, thereby preventing the risk of potential injury to any non-responding subjects while saving time and resources for the study designer.
Accordingly, in some embodiments, the present disclosure provides methods and systems encompassing a method of identifying and selecting a subject for clinical trials, comprising: receiving gene expression data of a group of subjects; analyzing the gene expression data to detect the presence of a disease gene expression signature; administering at least one dose of therapy to a subject; identifying a change in the disease gene expression signature relative to gene expression in a healthy subject; subjects exhibiting a quantifiable change in gene expression of the disease gene expression signature to healthy subjects were selected for clinical trials.
System and architecture
Also provided herein is a method for engineering personalized therapy of a subject, the method comprising: receiving or generating a disease gene expression signature comprising a response gene set; the computing device receives or generates one or more sets of potential therapies that alter the expression of one or more response genes; ranking each of the one or more potential therapy sets according to the significance of the change in the one or more response genes to provide one or more candidate therapy sets; determining one or more potential targets directly modulated by the one or more candidate therapy sets, optionally by mapping the one or more potential targets onto a biological network; ranking the significance of the linkage between each of the one or more potential targets and the set of response genes; selecting a therapeutic target from one or more potential targets; and selecting a personalized therapy that modulates the therapeutic target.
In some embodiments, the disease gene expression signature is determined by: receiving or generating gene expression data from a group of subjects suffering from the same disease, disorder or condition as the subject; layering the group of objects into two or more groups of previous objects based on the gene expression data; and selecting one or more genes ("disease candidate genes") having a significant difference in gene expression between two or more groups of the previous subject and the group of healthy subjects, thereby providing a disease gene expression signature.
In some embodiments, the disease candidate gene is mapped onto the biological network prior to being selected as part of the disease gene expression signature.
In some embodiments, determining one or more potential targets further comprises mapping targets of one or more candidate therapies onto a biological network, and selecting the potential targets based on topology information provided by the biological network.
In some embodiments, ordering each of the one or more potential therapies comprises: calculating the difference in the expression level of the set of response genes after treatment with the one or more potential therapies relative to the expression level of the set of response genes prior to treatment with the one or more potential therapies; and calculating a p-value for each of the one or more potential therapies.
In some embodiments, the potential targets are identified by a machine learning process.
In some embodiments, the machine learning process includes random walk.
As shown in FIG. 4, an implementation of a network environment 400 for providing the systems, methods, and architecture as described herein is shown and described. In brief overview, referring now to fig. 4, a block diagram of an exemplary cloud computing environment 400 is shown and described. Cloud computing environment 400 may include one or more resource providers 402a, 402b, 402c (collectively 402). Each resource provider 402 may include computing resources. In some implementations, the computing resources may include any hardware or software for processing data. For example, a computing resource may comprise hardware or software capable of executing an algorithm, a computer program, or a computer application. In some implementations, exemplary computing resources may include application servers or databases with storage and retrieval capabilities. Each resource provider 402 may be connected to any other resource provider 402 in the cloud computing environment 400. In some implementations, the resource provider 402 can be connected through a computer network 408. Each resource provider 402 may be connected to one or more computing devices 404a, 404b, 404c (collectively 404) through a computer network 408.
Cloud computing environment 400 may include resource manager 406. Resource manager 406 may be connected to resource provider 402 and computing device 404 through computer network 408. In some implementations, the resource manager 406 may facilitate the provision of computing resources by one or more resource providers 402 to one or more computing devices 404. The resource manager 406 may receive a request for a computing resource from a particular computing device 404. The resource manager 406 may authenticate one or more resource providers 402 capable of providing computing resources requested by the computing device 404. The resource manager 406 may select the resource provider 402 to provide the computing resource. The resource manager 406 may facilitate a connection between the resource provider 402 and a particular computing device 404. In some implementations, the resource manager 406 may establish a connection between a particular resource provider 402 and a particular computing device 404. In some implementations, the resource manager 406 may redirect a particular computing device 404 to a particular resource provider 402 having the requested computing resource.
Fig. 5 illustrates an example of a computing device 500 and a mobile computing device 550 that may be used to implement the techniques described herein. Computing device 500 is intended to represent various forms of digital computers, such as notebook computers, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The mobile computing device 550 is intended to represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smartphones, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and not limiting.
Computing device 500 includes a processor 502, a memory 504, a storage device 506, a high-speed interface 508 connected to memory 504 and a plurality of high-speed expansion ports 510, and a low-speed interface 512 connected to a low-speed expansion port 514 and storage device 506. Each of the processor 502, memory 504, storage 506, high-speed interface 508, high-speed expansion port 510, and low-speed interface 512 are interconnected using various buses, and may be mounted on a common motherboard or in other manners as appropriate. The processor 502 may process instructions for execution within the computing device 500, including instructions stored on the memory 504 or the storage device 506, to display graphical information of a GUI on an external input/output device, such as a display 516 coupled to the high speed interface 508. In other implementations, multiple processors or multiple buses may be used, as appropriate, along with multiple memories and types of memory. In addition, multiple computing devices may be connected, each providing some of the necessary operations (e.g., as a server bank, a set of blade servers, or a multiprocessor system). Thus, as the term is used herein, where multiple functions are described as being performed by a "processor," this encompasses embodiments in which the multiple functions are performed by any number of processor(s) in any number of computing device(s). Furthermore, where a function is described as being performed by a "processor," this encompasses embodiments in which the function is performed by any number of processor(s) in any number of computing device(s) (e.g., in a distributed computing system).
Memory 504 stores information within computing device 500. In some implementations, the memory 504 is one or more volatile memory units. In some implementations, the memory 504 is one or more non-volatile memory units. Memory 504 may also be another form of computer-readable medium, such as a magnetic or optical disk.
The storage 506 is capable of providing mass storage for the computing device 500. In some implementations, the storage device 506 may be or contain a computer-readable medium, such as a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory, or other similar solid state storage device or array of devices, including devices in a storage area network or other configurations. The instructions may be stored in an information carrier. The instructions, when executed by one or more processing devices (e.g., processor 502), perform one or more methods, such as those described above. The instructions may also be stored by one or more storage devices, such as a computer-readable medium or machine-readable medium (e.g., memory 504, storage 506, or memory on processor 502).
High-speed interface 508 manages bandwidth-intensive operations for computing device 500, while low-speed interface 512 manages lower bandwidth-intensive operations. This allocation of functions is only one example. In some implementations, the high-speed interface 508 is coupled to the memory 504, the display 516 (e.g., via a graphics processor or accelerator), and a high-speed expansion port 510, which may accept various expansion cards (not shown). In this implementation, low-speed interface 512 is coupled to storage 506 and low-speed expansion port 514. May include various communication ports (e.g., USB,Ethernet, wireless ethernet) low-speed expansion port 514 may be coupled to one or more input/output devices, such as a keyboard, pointing device, scanner, or networking device, such as a switch or router, for example, through a network adapter.
The computing device 500 may be implemented in a number of different forms, as shown. For example, it may be implemented as a standard server 520, or multiple times in a group of such servers. In addition, it may be implemented in a personal computer such as notebook computer 522. It may also be implemented as part of a rack server system 524. Alternatively, components from computing device 500 may be combined with other components in a mobile device (not shown), such as mobile computing device 550. Each of such devices may include one or more of computing device 500 and mobile computing device 550, and the entire system may be made up of multiple computing devices communicating with each other.
The mobile computing device 550 includes, among other components, a processor 552, memory 564, input/output devices such as a display 554, a communication interface 566, and a transceiver 568. The mobile computing device 550 may also be provided with a storage device, such as a microdrive or other device, to provide additional storage. Each of the processor 552, the memory 564, the display 554, the communication interface 566, and the transceiver 568 are interconnected using various buses, and several of the components may be mounted on a common motherboard or in other manners as appropriate.
Processor 552 can execute instructions within mobile computing device 550, including instructions stored in memory 564. Processor 552 may be implemented as a chipset including separate and multiple analog and digital processors. Processor 552 can provide, for example, for coordination of the other components of mobile computing device 550, such as control of user interfaces, applications run by mobile computing device 550, and wireless communication by mobile computing device 550.
Processor 552 may communicate with a user through control interface 558 and display interface 556 coupled to a display 554. The display 554 may be, for example, a TFT (thin film transistor liquid crystal display) display or an OLED (organic light emitting diode) display, or other suitable display technology. The display interface 556 may comprise appropriate circuitry for driving the display 554 to present graphical and other information to a user. The control interface 558 may receive commands from a user and convert them for submission to the processor 552. In addition, external interface 562 may provide communication with processor 552 to enable near area communication of mobile computing device 550 with other devices. External interface 562 may provide, for example, for wired communication in some implementations, or for wireless communication in other implementations, and multiple interfaces may also be used.
The memory 564 stores information within the mobile computing device 550. The memory 564 may be implemented as one or more of a computer-readable medium, a volatile memory unit, or a non-volatile memory unit. Expansion memory 574 may also be provided and connected to mobile computing device 550 by an expansion interface 572, which may include, for example, a SIMM (in-line memory module) card interface. Expansion memory 574 may provide additional storage for mobile computing device 550, or may also store applications or other information for mobile computing device 550. Specifically, expansion memory 574 may include instructions for carrying out or supplementing the processes described above, and may include secure information as well. Thus, for example, expansion memory 574 may be provided as a security module for mobile computing device 550, and may be programmed with instructions that allow secure use of mobile computing device 550. In addition, secure applications may be provided via the SIMM card along with additional information, such as placing authentication information on the SIMM card in an indestructible manner.
The memory may include, for example, flash memory or NVRAM memory (non-volatile random access memory), as discussed below. In some implementations, instructions are stored in an information carrier that, when executed by one or more processing devices (e.g., processor 552), perform one or more methods, such as those described above. The instructions may also be stored by one or more storage devices, such as one or more computer-readable media or machine-readable media (e.g., memory 564, expansion memory 574, or memory on processor 552). In some implementations, the instructions may be received in a propagated signal, e.g., through transceiver 568 or external interface 562.
The mobile computing device 550 may communicate wirelessly through a communication interface 566, which may include digital signal processing circuitry, if desired. Communication interface 566 may provide for communication under various modes or protocols, such as GSM voice calls (global system for mobile communications), SMS (short message service), EMS (enhanced message service) or MMS messages (multimedia message service), CDMA (code division multiple access), TDMA (time division multiple access), PDC (personal digital cellular), WCDMA (wideband code division multiple access), CDMA2000 or GPRS (general packet radio service), and the like. Such communication may occur, for example, through transceiver 568 using radio frequencies. In addition, short-range communications may occur, such as usingWi-Fi TM, or other such transceivers (not shown). In addition, the GPS (Global positioning System) receiver module 570 may provide additional navigation-and location-related wireless data to the mobile computing device 550, which may be suitably used by applications running on the mobile computing device 550.
The mobile computing device 550 may also communicate audibly using an audio codec 560 that may receive voice information from a user and convert it to usable digital information. The audio codec 560 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of the mobile computing device 550. Such sound may include sound from voice telephone calls, may include recorded sound (e.g., voice messages, music files, etc.), and may also include sound generated by applications operating on mobile computing device 550.
The mobile computing device 550 may be implemented in a number of different forms, as shown. For example, it may be implemented as a cellular telephone 580. It may also be implemented as part of a smart phone 582, personal digital assistant, or other similar mobile device.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, or combinations thereof. These various implementations may include implementations in one or more computer programs that are executable or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (e.g., as a program, software application, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural or object-oriented programming language, or in assembly/machine language. The terms machine-readable medium and computer-readable medium as used herein refer to any computer program product, apparatus or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term machine-readable signal refers to any signal used to provide machine instructions or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server) or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a client computer or a Web browser having a graphical user interface through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a Local Area Network (LAN), a Wide Area Network (WAN), and the Internet.
The computing system may include clients and servers. The client and server may be remote from each other and may interact through a communication network. The relationship between client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
In some implementations, the modules described herein may be separated, combined, or incorporated into a single or combined module. The modules depicted in the figures are not intended to limit the systems described herein to the software architecture shown therein.
Elements of different implementations described herein may be combined to form other implementations not specifically set forth above. Elements may be excluded from the processes, computer programs, databases, etc. described herein without adversely affecting their operation. In addition, the logic flows depicted in the figures do not require the particular order shown, or sequential order, to achieve desirable results. The various individual elements may be combined into one or more single elements to perform the functions described herein. In view of the structure, function, and devices of the systems and methods described herein, in some implementations.
The present disclosure provides a computer system programmed to implement the methods of the present disclosure. FIG. 14 illustrates a computer system 1401 that is programmed or otherwise configured to perform analysis or operation of various methods. The computer system 1401 may adjust various aspects of the methods and systems of the present disclosure, such as, for example, executing an algorithm, analyzing data, or outputting results of an algorithm. The computer system 1401 may be the user's electronic device or a computer system that is remotely located relative to the electronic device. The electronic device may be a mobile electronic device.
The computer system 1401 includes a central processing unit (CPU, also referred to herein as a "processor" and a "computer processor") 1405, which may be a single-core or multi-core processor, or multiple processors for parallel processing. The computer system 1401 also includes memory or storage locations 1410 (e.g., random access memory, read only memory, flash memory), an electronic storage unit 1415 (e.g., a hard disk), a communication interface 1420 (e.g., a network adapter) for communicating with one or more other systems, and peripheral devices 1425 such as cache, other memory, data storage, and/or electronic display adapters. The memory 1410, the storage unit 1415, the interface 1420, and the peripheral device 1425 communicate with the CPU 1405 through a communication bus (solid line) such as a motherboard. The storage unit 1415 may be a data storage unit (or data repository) for storing data. The computer system 1401 may be operably coupled to a computer network ("network") 1430 by way of a communication interface 1420. The network 1430 may be the Internet, and/or an external network, or an intranet and/or an external network in communication with the Internet. In some cases, network 1430 is a telecommunications and/or data network. Network 1430 may include one or more computer servers, which may implement distributed computing, such as cloud computing. In some cases, the network 1430 may implement a peer-to-peer network with the aid of the computer system 1401, which may enable devices coupled to the computer system 1401 to appear as clients or servers.
The CPU 1405 may execute a sequence of machine-readable instructions, which may be embodied in a program or software. The instructions may be stored in a memory location, such as memory 1410. Instructions may be directed to CPU 1405, which may then program or otherwise configure CPU 1405 to implement the methods of the present disclosure. Examples of operations performed by the CPU 1405 may include extraction, decoding, execution, and write back.
CPU 1405 may be part of a circuit such as an integrated circuit. One or more other components of system 1401 may be included in the circuit. In some cases, the circuit is an Application Specific Integrated Circuit (ASIC).
The storage unit 1415 may store files such as drivers, libraries, and saved programs. The storage unit 1415 may store user data, such as user preferences and user programs. In some cases, the computer system 1401 may include one or more additional data storage units external to the computer system 1401, such as on a remote server in communication with the computer system 1401 via an intranet or the Internet.
The computer system 1401 may communicate with one or more remote computer systems over a network 1430. For example, the computer system 1401 may communicate with a user's remote computer system (e.g., a medical professional or patient). Examples of remote computer systems include personal computers (e.g., portable PCs), tablet/tablet PCs (e.g.,iPad、Galaxy Tab), phone, smart phone (e.g./>IPhone, android supporting device,) Or a personal digital assistant. A user may access the computer system 1401 via the network 1430.
The methods as described herein may be implemented by machine (e.g., a computer processor) executable code stored on an electronic storage location of the computer system 1401 (e.g., such as on the memory 1410 or the electronic storage unit 1415). The machine-executable or machine-readable code may be provided in the form of software. During use, code may be executed by the processor 1405. In some cases, the code may be retrieved from the storage unit 1415 and stored on the memory 1410 for access by the processor 1405. In some cases, electronic storage 1415 may be eliminated and machine executable instructions stored on memory 1410.
The code may be pre-compiled and configured for use with a machine having a processor adapted to execute the code, or may be compiled at runtime. The code may be provided in a programming language, which may be selected to enable execution of the code in a precompiled or real-time compiled manner.
Aspects of the systems and methods provided herein, such as the computer system 1401, may be implemented in programming. Aspects of the technology may be considered to be "articles of manufacture" or "articles of manufacture," typically in the form of machine (or processor) executable code and/or associated data, which are carried or embodied in one type of machine readable medium. The machine executable code may be stored on an electronic storage unit such as a memory (e.g., read only memory, random access memory, flash memory) or hard disk. A "storage" type of medium may include any or all of the tangible memory of a computer, processor, etc., or related modules thereof, such as various semiconductor memories, tape drives, disk drives, etc., which may provide non-transitory storage for software programming at any time. All or part of the software may sometimes communicate over the internet or various other telecommunications networks. For example, such communication may enable loading of software from one computer or processor into another computer or processor, e.g., from a management server or host computer into a computer platform of an application server. Thus, another type of medium that can carry software elements includes optical, electrical, and electromagnetic waves, such as those used over wired and optical landline networks and over various air links over physical interfaces between local devices. Physical elements carrying such waves, such as wired or wireless links, optical links, etc., may also be considered as media carrying software. As used herein, unless limited to a non-transitory, tangible "storage" medium, terms, such as computer or machine "readable medium," refer to any medium that participates in providing instructions to a processor for execution.
Accordingly, a machine-readable medium (such as computer-executable code) may take many forms, including but not limited to, tangible storage media, carrier wave media, or physical transmission media. Nonvolatile storage media includes, for example, optical or magnetic disks, such as any storage devices in any computer, etc., such as may be used to implement the databases shown in the figures. Volatile storage media include dynamic memory, such as the main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier wave transmission media can take the form of electrical or electromagnetic signals, or acoustic or light waves, such as those generated during Radio Frequency (RF) and Infrared (IR) data communications. Thus, common forms of computer-readable media include, for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, RAM, ROM, PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, a cable or link transporting such a carrier wave, or any other medium from which a computer can read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
The computer system 1401 may include an electronic display 1435 or be in communication with an electronic display 1435 that includes a User Interface (UI) 1440 for providing, for example, input or output of data, or visual output related to algorithms. Examples of UIs include, but are not limited to, graphical User Interfaces (GUIs) and web-based user interfaces.
The methods and systems of the present disclosure may be implemented by one or more algorithms. The algorithm may be implemented in software when executed by the central processing unit 1405. For example, an algorithm may, for example, perform an analysis or operation of the methods of the present disclosure.
Examples
The following non-limiting examples are intended to illustrate various embodiments of the subject matter described herein.
Example 1-System bioinformatics and network-based analysis of ulcerative colitis
Gene expression data for a group of 8 Ulcerative Colitis (UC) patients receiving anti-TNF therapy were downloaded and studied in two separate batches (study 1 and 2, described in tables 2 and 3, respectively).
TABLE 2
TABLE 3 Table 3
When compared to each other and to healthy controls, the gene expression profiles of responders and non-responders treated after baseline and treatment (fig. 2). Analysis showed that the molecular signature of the treated (post-treatment) responders was similar to healthy controls.
The molecular differences for specific disease subgroups are subtle. Comparison of baseline expression profiles for UC responders and non-responders did not reveal any significantly differentiated genes. In contrast, the molecular differences were more pronounced for the patient subpopulations compared to the healthy control group.
By comparing the baseline expression profile of non-responders to healthy controls, gene expression of non-responders is derived. Reverse comparisons were also performed (e.g., comparing the baseline expression profile of responders to healthy controls). Both studies showed that the responder biomarker set was almost completely contained within the non-responder biomarker set, and that the non-responder biomarker set was typically twice that of the responder biomarker set, which may indicate that the non-responder is in a more severe disease state (fig. 3A and 3B).
FIG. 1 shows an exemplary workflow (also referred to herein as a response module) for identifying disease gene expression signatures.
For example, in some embodiments, in the response module discovery, biomarkers associated with a particular patient subpopulation are identified as compared to healthy controls. To achieve molecular remission, for example, to make the patient's transcriptomics resemble healthy controls, a desirable downstream effect was identified in which the responder genes were reversed.
Objects are layered using supervised and unsupervised clustering algorithms. To identify subject subpopulation biomarkers, baseline expression profiles of different patient subpopulations were compared to healthy controls. These biomarkers are then mapped on a map of the human interaction group. The identified biomarkers were found to form significant clusters on the network, e.g. the nodes were not scattered, but interacted significantly with each other, forming a sub-network (response module) consisting of sub-population specific biomarkers. It was also found that the post-treatment expression profile of patients responding to treatment was similar to healthy controls, so that the response to treatment could be converted to a restored response module gene, similar to healthy controls.
Example 2-validated system-based multiple-set of chemical data analysis platform for identifying novel drug targets in ulcerative colitis
Tumor necrosis factor-alpha inhibitors (TNFi) have been the standard treatment for Ulcerative Colitis (UC) for the last 20 years. However, not every patient responds to TNFi therapy, which has prompted the development of alternative UC therapies. Disclosed herein are multi-set of chemical network biological methods for prioritization of protein targets for UC treatment. The disclosed methods can identify network modules on the human interaction group, including genes that contribute to susceptibility to UC (genotype modules), genes whose expression can be altered to achieve low disease activity (response modules), and proteins whose perturbation can alter the expression of response module genes in an advantageous direction (therapeutic modules). Targets may be prioritized based on their topological relevance to the genotyping module and functional similarity to the therapeutic module. In one example, the methods described herein in UC can effectively restore protein targets associated with marketed and undeveloped drugs for UC treatment. Means for finding new therapeutic opportunities and reusing the therapeutic opportunities in UC and other complex diseases may be implemented.
Introduction to the invention
Ulcerative Colitis (UC) is a complex disease characterized by chronic intestinal inflammation and is thought to be caused by an abnormal immune response to the intestinal microbiota in genetically susceptible patients. (see, e.g., C.Abraham et al, "Inflammatory Bowel Disease," NEW ENGLAND Journal of Medicine 361,2066 (2009), which is incorporated herein by reference for all purposes). Treatment of UC may include aminosalicylates and steroids, and if low disease activity cannot be achieved, biological agents such as tumor necrosis factor-alpha inhibitors (TNFi) may be recommended. (see, e.g., S.C.park et al, "Current AND EMERGING biologics for ulcerative colitis," Gut and liver, 18 (2015); K.Hazel et al ,Emerging treatments for inflammatory bowel disease,"Therapeutic advances in chronic disease."11,2040622319899297(2020),, which is incorporated herein by reference for all purposes). Nonetheless, about 40% of patients may not respond to TNFi treatments, and up to 10% of the initial responders per year may lose their response to TNFi therapy. (see, e.g., S.C.park et al; P.Rutgaerts et al ,"Infliximab for induction and maintenance therapy for ulcerative colitis,"New England Journal of Medicine 353,2462(2005),, incorporated herein by reference for all purposes). The difficulties of TNFi therapies and economic incentives have led to the study and development of alternative therapeutic approaches, such as JAK inhibitors, IL-12/IL-23 inhibitors, S1P-receptor modulators, anti-integrin agents or novel TNFi compounds. (see, e.g., e.troncone et al ,"Novel therapeutic options for people with ulcerative colitis:an update on recent developments with Janus kinase(JAK)inhibitors,"Clinical and Experimental Gastroenterology 13,131(2020);A.Kashani et al ,"The Expanding Role of Anti–IL-12or Anti–IL-23Antibodies in the Treatment of Inflammatory Bowel Disease,"Gastroenterology&Hepatology 15,255(2019);S.Danese et al ,"Targeting S1P in inflammatory bowel disease:new avenues for modulating intestinal leukocyte migration,"Journal of Crohn's and Colitis 12,S678(2018);S.C.Park et al ,"Anti-integrin therapy for inflammatory bowel disease,"World journal of gastroenterology 24,1868(2018);K.Hazel et al, which are incorporated herein by reference for all purposes). Some approaches are directed to the biological mechanisms that lead to abnormal immune responses, and may require detailed knowledge about the pathogenesis of UC. However, there is increasing interest in developing additional orally administered small molecule drugs due to concerns about immunogenicity and inconvenience in drug delivery by injection.
Development of new drugs may require identification of molecular targets whose modulation may lead to low disease activity or remission. With the proliferation of multiple sets of chemical data, machine Learning (ML) and Artificial Intelligence (AI) are widely used for many tasks in therapeutics, such as target prioritization, drug design, drug target interaction prediction, or small molecule optimization. (see, e.g., J.Vamathevan et al ,"Applications of machine learning in drug discovery and development,"Nature reviews Drug discovery 18,463(2019),, incorporated herein by reference for all purposes). The current ML/AI approach for target prioritization may focus on searching for genes that are relevant to a given disease. Genes can be inferred by training classifiers, for example, using features constructed from disease-specific gene expression and mutation data, as well as information regarding related protein-protein, metabolic, or transcriptional interactions, or by analyzing existing text databases or research literature for disease-gene associations using Natural Language Processing (NLP) methods. (see, e.g., P.R.Costa et al, in BMC Genomics, volume 11 (Springer, 2010) pages 1-15; J.Jeon et al ,"A systematic approach to identify novel cancer drug targets using machine learning,inhibitor design and high-throughput screening,"Genome medicine 6,1(2014);E.Ferrero et al ,"In silico prediction of novel therapeutic targets using gene-disease association data,"Journal of translational medicine 15,1(2017);P.Mamoshina et al ,"Machine learning on human muscle transcriptomic data for biomarker discovery and tissue-specific drug target identification,"Frontiers in genetics 9,242(2018);A.Bravo et al ,"Extraction of relations between genes and diseases from text and large-scaledata analysis:implications for translational research,"BMC Bioinformatics 16,1(2015);J.Kim et al ,"An analysis of disease-gene relationship from Medline abstracts by DigSee,"Scientific Reports 7,1(2017),, which are incorporated herein by reference for all purposes).
However, many ML/AI methods can suffer from exploratory bias or incomplete data. (see, e.g., T.Rolland et al, "A protein-scale map of the human interactome network," cell 159,1212 (2014); J.Menche et al ,"Uncovering disease-disease relationships through the incomplete interactome,"Science 347,1257601(2015),, incorporated herein by reference for all purposes). In addition, systematic analysis suggests that drugs approved by the U.S. food and drug administration (Food and Drug Administration, FDA) may not be directed to protein products of disease-related genes. (see, e.g., M.A.Y1ld1rm et al, "drug-target network," Nature biotechnology, 1119 (2007); E.Guney et al, "network-based in silico drug EFFICACY SCREENING," Nature communications 7,1 (2016), which is incorporated herein by reference for all purposes). Network-based target prioritization methods can address these issues by aggregating proteomic, metabolomic, and transcriptomic interactions and associations between drugs, diseases, and genes in a network format, and by deriving network-based features that distinguish viable targets in an unbiased and unsupervised manner. (see, e.g., s.zhao et al ,"Network-based relating pharmacological and genomic spaces for drug target identification,"PloS one 5,e11764(2010);Z.Isik et al ,"Drug target prioritization by perturbed gene expression and network information,"Scientific reports5,1(2015);T.Katsila et al ,"Computational approaches in target identification and drug discovery,"Computational and structural biotechnology journal 14,177(2016);E.Guney et al, incorporated herein by reference for all purposes). Nevertheless, none of the network-based frameworks is capable of capturing the relationship between disease formation and successful treatment simultaneously as a method of identifying novel potential targets.
To address at least these issues, disclosed herein are network-based methods for target prioritization of UC that utilize three network regions (modules) of the human interaction group (HI), i.e., protein-protein interaction networks in human cells, referred to as module triplets, comprising:
1. Genotype module-a set of genes associated with genetic susceptibility to UC;
2. response module-a set of genes whose expression needs to be altered to achieve low disease activity;
3. Therapeutic module-a protein set that needs to be targeted to alter expression of the response module gene in a favorable direction to achieve low disease activity.
The viable targets may be both (a) topologically related to the genotype module, e.g., near the network of genes associated with a particular disease, and (b) functionally similar to the therapy module, e.g., having a transcriptome downstream effect similar to the therapy module protein upon perturbation thereof. (see, e.g., E.Guney et al). The methods disclosed herein may use UC as an example to demonstrate the utility of the proposed framework by effectively restoring known targets approved for UC and distinguishing targets at different stages of UC development based on network-derived ranking. The modular triplet framework may be the first attempt to dynamically connect biological mechanisms that are the cause of complex disease progression with their treatment from a network perspective. The modular triplet framework can be extended directly to other complex diseases with known gene-disease associations, available gene expression data for pre-and post-treatment patients, and perturbation experiments in appropriate cell lines.
Overview of the Module triad target prioritization framework
The module triplet framework includes: (1) discovery of modular triplets for a given disease; (2) Novel target discovery based on identified module triplets is illustrated in fig. 7.
For the discovery of module triplets, each module may be mapped onto HI using ancillary disease-specific information. The genotyping module may be constructed by analyzing a gene-disease association database to locate genes whose mutations can predetermine disease phenotype formation. The response module includes genes that can be significantly down-regulated or up-regulated following treatment of patients achieving low disease activity. The treatment module construction comprises: (1) Using an integrated network-based cell signature Library (LINCS) L1000 perturbation database to identify small molecule compounds that result in a gene expression profile similar to that observed for the response module gene after treatment; (2) The DrugBank and Reurposing Hub databases were used to extract the protein sets targeted by these compounds; these proteins were mapped onto HI, resulting in a therapeutic module. (see, e.g., A. Subramannian et al ,"A next generation connectivity map:L1000platform and the first 1,000,000profiles,"Cell 171,1437(2017);C.Knox et al ,"DrugBank 3.0:a comprehensive resource for'omics'research on drugs,"Nucleic acids research 39,D1035(2010);S.M.Corsello et al ,"The Drug Repurposing Hub:a next-generation drug library and information resource,"Nature medicine 23,405(2017),, which is incorporated herein by reference for all purposes).
At least some of the proteins (nodes) of HI are ordered based at least in part on the genotype and treatment module constructed. For each node, its topological relevance to the genotype module is evaluated based on its proximity, which is calculated based on the average shortest distance from the node to the genotype module node. (see, e.g., E.Guney et al). The node-to-treatment module node-average Diffusion State Distance (DSD) based calculated selectivities are used to evaluate the functional similarity of the nodes to the treatment module. (see, e.g., M.Cao et al ,"Going the distance for protein function prediction:a new distance metric for protein interaction networks,"PloS one 8,e76339(2013),, incorporated herein by reference for all purposes). For detailed information on computing proximity and selectivity, see fig. 7 and methods (described elsewhere herein). HI nodes may be ranked based on their proximity and selectivity scores, and the ranking product may be used to merge the two rankings into a single combined ranking. (see, e.g., r.breitling et al ,"Rank products:a simple,yet powerful,new method to detect differentially regulated genes in replicated microarray experiments,"FEBS letters 573,83(2004),, incorporated herein by reference for all purposes).
UC genotype module
The protein products of genes associated with disease are not generally randomly scattered over HI, but rather form clusters of interconnected nodes, reflecting the existence of potential biological mechanisms behind disease formation. (see, e.g., J.Xu et al ,Discovering disease-genes by topological features in human protein-protein interaction network,"Bioinformatics22,2800(2006);K.-I.Goh et al ,"The human disease network,"Proceedings of the National Academy of Sciences 104,8685(2007);T.Ideker et al, "protein networks IN DISEASE," genome research 18,644 (2008); A.—L.Barab a si et al, "network media: a network-based approach to human disease," Nature REVIEWS GENETICS 12,56 (2011), which is incorporated herein by reference for all purposes). The network characteristics of these interconnected clusters are studied, allowing insight into the molecular mechanisms of disease, target discovery and drug reuse. (see, e.g., J.Menche et al; A.Shalma et al ,"A disease module in the interactome explains disease heterogeneity,drug response and captures novel pathways and genes in asthma,"Human molecular genetics 24,3005(2015);E.Guney et al; F.Cheng et al ,"Network-based approach to prediction and population-based validation of in silico drug repurposing,"Nature communications 9,1(2018),, which is incorporated herein by reference for all purposes).
To include the concept of UC gene association in the module triplet framework, the GWAS catalyst, clinVar, or MALACARDS database may be used to extract genes reported to be associated with UC (see methods described elsewhere herein). (see, e.g., A. Buniello et al ,"The NHGRI-EBI GWAS Catalog of published genome-wide association studies,targeted arrays and summary statistics 2019,"Nucleic acids research 47,D1005(2019);M.J.Landrum et al ,"ClinVar:improving access to variant interpretations and supporting evidence,"Nucleic acids research 46,D1062(2018);N.Rappaport, "MALACARDS: AN INTEGRATED compendium for DISEASES AND THEIR Annonation," Database 2013 (2013), which is incorporated herein by reference for all purposes). In at least one of the three databases, 194 genes in total were reported to be associated with UC, and 174 of them (89.7%) were mapped onto the corresponding protein products in HI. The protein product is not randomly dispersed over the network; 64.9% (113/174) of the proteins are linked to each other to form a largest linkage component (LCC) that is significantly larger than would be expected at random (e.g., Z fraction = 4.82, p <10 -4). The methods described herein define this LCC as a genotypic module representing the genetic susceptibility of UC. The viable targets may be located near the topology of the genotyping module. (see, e.g., E.Guney et al).
Reflecting successful UC treatment at transcriptome level
In addition to topologically close proximity to the gene responsible for susceptibility to UC, viable targets may also be functionally related to treatment of UC. For example, UC treatment kinetics may be reflected at the transcriptome level, and interfering with a viable target may result in transcriptional changes similar to those observed upon successful UC treatment.
The UC treatment can be reflected at the transcriptome level in gene expression data from normal tissue controls from several studies and active UC patients receiving TNFi drug (infliximab or golimumab) treatment. (see, e.g., I.Arijs et al ,"Mucosal gene expression of antimicrobial peptides in inflammatory bowel disease before and after first infliximab treatment,"PloS one 4,e7984(2009);G.Toedter et al ,"Gene expression profiling and response signatures associated with differential responses to infliximab treatment in ulcerative colitis,"Official journal of the American College of Gastroenterology-ACG 106,1272(2011);S.Pavlidis et al ,"I MDS:an inflammatory bowel disease molecular activity score to classify patients with differing disease-driving pathways and therapeutic response to anti-TNF treatment,"PLoS Computational Biology 15,e1006951(2019);N.Planell et al ,"Transcriptional analysis of the intestinal mucosa of patients with ulcerative colitis in remission reveals lasting epithelial cell alterations,"Gut 62,967(2013);T.Montero-Melendez et al ,"Identification of novel predictor classifiers for inflammatory bowel disease by gene expression profiling,"PloS one 8,e76235(2013);J.T.Bjerrum et al ,"Transcriptional analysis of left-sided colitis,pancolitis,and ulcerative colitis-associated dysplasia,"Inflammatory bowel diseases 20,2340(2014);S.E.Telesco et al ,"Gene expression signature for prediction of golimumab response in a phase 2a open-label trial of patients with ulcerative colitis,"Gastroenterology 155,1008(2018),, which are incorporated herein by reference for all purposes). Table 4 summarizes TNFi treatment studies of molecular signatures used to identify UC patient responses.
TABLE 4 Table 4
A collection of 545 genes can be identified that is differentially expressed between active UC patients and normal controls. These genes can be used as a feature of Unified Manifold Approximation and Projection (UMAP) embedding of gene expression profiles before and after treatment in normal control and UC patients, and are divided into two groups: patients who achieved low disease activity after treatment (responders) and patients who did not achieve low disease activity (non-responders). (see FIG. 8). (see, e.g., L.McInnes et al ,"Umap:Uniform manifold approximation and projection for dimension reduction,"arXiv preprint arXiv:1802.03426(2018),, incorporated herein by reference for all purposes).
From UMAP embedment, no significant differences may be observed between pre-treatment gene expression profiles of infliximab or golimumab responders and non-responders. In addition, no differentially expressed genes may be found between pre-treatment gene expression profiles of responders and non-responders. (see "differential gene expression analysis of responders and non-responders to TNFi's therapy", described elsewhere herein). In contrast, the post-treatment gene expression profile of responders closely clustered with that of the normal control group, while the post-treatment profile of non-responders to infliximab or golimumab clustered separately from that of the normal control, indicating that a gene expression profile highly similar to that of the normal control may reflect successful UC treatment. Inspired by these observations, we define a "molecular response" to UC treatment as the reversal of the gene expression profile of the UC patient after treatment to be similar to that of the normal control.
UC response module
To further understand which transcriptional changes may lead to a gene expression profile of the responders that is more similar to that of the normal controls, differential expression analysis was performed on the responders' pre-and post-treatment gene expression profiles. A small proportion of the responders had a deregulation of the gene prior to treatment, and exhibited a significant change in expression after treatment relative to the normal control (see "differential gene expression analysis of responders and non-responders to TNFi therapy", described elsewhere herein). Expression of these genes may be restored in the responders after treatment, e.g., genes down-regulated in responders relative to normal controls may be up-regulated after treatment, and vice versa, prior to treatment. However, based on the spectral embedding shown in fig. 8, these transcriptional changes may be sufficient to make the gene expression profiles of responders and normal controls similar and indicate patients achieving lower disease activity after treatment. This set of genes that indicate molecular responses to UC treatment may be referred to as the RBA (front and rear responders) set. The RBA set specific for TNFi treatments of UC can be constructed by taking the union of RBA genes determined from infliximab and golimumab-based studies. (see methods described elsewhere herein).
Genes belonging to the RBA set may be related to each other via one or more biological pathways, the normal function of which may be restored by inhibiting TNF- α, and thus may be in proximity to each other in HI. To test this, TNFi-RBA genes can be mapped onto HI to construct a subnetwork of nodes corresponding to the RBA genes. Compared to the randomly selected node set with retention sequence (Z score=9.24, p <10 -4), the RBA set forms significant LCCs on HI (91 of 271 nodes, 34%). This complete gene set in RBA LCC is defined as the region of altered HI transcription when, for example, UC patients achieve low disease activity in response to therapeutic intervention.
UC treatment module
Successful treatment of UC may require restoration of the expression profile of the response module node by studying the gene expression profile of UC patients receiving TNFi therapy. Inhibition of TNF- α may not be the only way to achieve a predetermined transcriptome effect in a responder gene and interference with other proteins may achieve a similar downstream effect.
The experimentally validated alternative perturbations may be analyzed to generate a molecular response similar to that observed following successful TNFi therapy. Differential gene expression (signature) may be caused by small molecule compounds obtained from the LINCS L1000 database interfering with human cell lines. (see, e.g., A. Subramannian et al ,"A next generation connectivity map:L1000 platform and the first 1,000,000profiles,"Cell 171,1437(2017),, incorporated herein by reference for all purposes). The perturbation signature may be derived from LINCS L1000 grade 5 data containing a gene aspect Z score indicative of the magnitude and direction of gene expression changes of 14,513 compound experiments in an HT29 cell line (e.g., a human colorectal adenocarcinoma cell line). Perturbation experiments in HT29 cell lines can be considered as it relates to the tissues affected by UC (colon) and the coverage of small molecule compounds is relatively broad.
To find compounds that restore expression of the responder genes and the corresponding target proteins, the LINCS L1000 experiment can be evaluated by calculating the weighted connectivity scores of up-and down-regulated genes in the responder using the gene-wise perturbation Z score of each HT29 cell line experiment (WTCS). (see, e.g., A. Subramannian et al ,"A next generation connectivity map:L1000 platform and the first 1,000,000profiles,"Cell 171,1437(2017),, incorporated herein by reference for all purposes). To evaluate the statistical significance of WTCS for a given experiment, a randomization procedure can be employed to assign a pair of p values, p Upper part and p Lower part(s) , associated with the enrichment scores of up-and down-regulated genes. (see methods described elsewhere herein). Excluding compound experiments with p Upper part not less than 0.05 and p Lower part(s) not less than 0.05 and WTCS not less than 0. This filtration ensures that compounds having a positive and significant therapeutic effect in restoring expression of the responder genes are considered.
Of the 14,513 compound experiments performed in the HT29 cell line, 68 had statistically significant WTCS ranging from-0.642 to-0.480. According to the DrugBank TM and Repurposing Hub TM databases, 69 proteins appear to be targets for at least one of the 25 unique compounds evaluated in these 68 experiments. Two proteins may not map to HI (e.g., they do not have known protein interaction partners), and 43 of the 67 remaining proteins (64%) form LCCs of significant size (Z fraction = 3.39, p <10 -4). This LCC is referred to as a therapy module.
One of the targets belonging to the therapeutic module is TNF- α. Furthermore, by construction, targeting proteins belonging to the therapeutic module may lead to transcriptional changes within the response module, similar to those observed in successful TNFi therapies. Thus, proteins belonging to the therapeutic module may provide an opportunity for intervention in the treatment of UC patients.
Target ordering
In addition to potential intervention opportunities suggested directly from the treatment module nodes, the genotype and treatment module may also be used to prioritize all nodes in HI in an unsupervised manner because they have the potential to be targets for UC treatment. The viable targets may simultaneously meet the following network characteristics. The viable targets may be topologically close to the HI node (genotype block) associated with the genetic susceptibility of UC. Target prioritization based on network proximity of nodes to disease modules can predict therapeutic effects of drugs with known targets in a variety of diseases. (see, e.g., E.Guney et al). Thus, to quantify the topological relevance of a given HI node to a UC genotype module, its proximity to the genotype module may be calculated based on the average network shortest path from node to genotype module (see methods described elsewhere herein).
In addition, targeting viable targets may result in transcriptional changes similar to those observed following successful UC treatment. The treatment module defines a network region of nodes that, when perturbed, may result in the desired transcriptional changes in the response module gene. Thus, proteins functionally similar to therapeutic modular proteins may also be promising targets. However, to find such targets, one approach may quantify the similarity of transcription downstream of the HI node based on network structure. To this end, a Diffusion State Distance (DSD), a measure based on network random walk, may be used, designed to capture the propagation-based topological similarity between each pair of nodes in the network, as it has superior performance in predicting protein function annotations. (see, e.g., M.Cao et al).
To assess whether DSD reflects similarity in downstream transcription between different proteins, recovery of approved drugs for four complex diseases (e.g., alzheimer's disease, ulcerative colitis, rheumatoid arthritis, and multiple sclerosis) can be analyzed based on DSD between HI nodes. (see methods described elsewhere herein). The targets of each approved drug may result in similar therapeutic effects for the treatment of a given disease. Thus, by knowing one drug target and its DSD to other HI nodes, an approved target can be effectively restored. This target recovery can be performed separately for each approved target and complex disease to derive a subject operating profile (ROC) curve, as shown in fig. 9. Knowing the DSD from the approved drug target to the remaining nodes in HI may be sufficient to restore the remainder of the known approved targets in each complex disease.
However, nodes with low DSD for a therapy module may have equal distance to other randomly selected modules of the same size in the HI. To illustrate this, functional similarity between HI nodes and therapy modules may be quantified using selectivity, e.g., a DSD-based network-based measurement that accounts for statistical significance of DSD between nodes and a given network module. (see methods described elsewhere herein).
Finally, all HI nodes may be ranked based on their proximity to the genotype module and selectivity to the therapy module, and the ranking product may be used to determine the final combined ranking of the nodes. (see methods described elsewhere herein). (see, e.g., R.Breitling et al).
Computer verification of modular triplet target prioritization
To test whether the proposed target ordering produces meaningful results, drug targets approved for UC treatment are obtained from PHARMAINTELLIGENCE TM Citeline database. (see methods described elsewhere herein). The resulting list includes 23 targets mapped on HI. The approved targets were also highly proximal to the genotype module and selective for the treatment module compared to the remaining HI nodes, as shown in figure 10, panel (a). While proximity and selectivity alone effectively restored the known approved targets, the combination of the two performed better, suggesting a synergistic effect of these network measurements on target prioritization, as shown in fig. 10, panel (b). In addition to the proposed network measurement for target prioritization, another measurement based on a combination of network and gene expression data can be examined, i.e. local irradiations that show high performance in restoring known drug targets. (see, e.g., Z.Isik et al). Local irradiativity is similar to the modular triplet prioritization method described herein in that it uses topology and gene expression data to prioritize targets. The main difference is that the local radiometric assumption is that the HI node (downstream node) affected by the target perturbation can be near the network of the target. Using the methods described herein, targets may be prioritized based on their local radioactivity relative to the transponder module nodes reflecting the intended downstream effect. (see methods described elsewhere herein). Local irradiations may also effectively restore approved UC targets, although not as effective as the modular triad prioritization method described herein. The sensitivity of approval of UC target recovery for all test methods is reported in table 5, which shows the proportion of approval targets recovered in top-K protein for UC treatment, by selectivity, proximity, combined proximity and selectivity, and local radiometric ranking of response modules.
TABLE 5
Finally, drugs considered as treatments for UC (e.g., tested in clinical and preclinical trials) can target nodes with lower combinatorial ordering based on proximity and selectivity than targets already marketed for UC. This is because marketed targets have been evaluated in the clinical stage for their ability to improve disease activity in patients with UC, whereas not yet marketed targets may not necessarily be effective for UC treatment. In clinical trials (phase I, II, III) or preclinical studies, the combined rank distribution of targets of marketed drugs can be compared, as shown in fig. 10, panel (c). The median combinations corresponding to the targets of marketed drugs were ranked higher, followed by drugs in clinical trials, and then drugs in preclinical studies.
Discussion of the invention
Described herein is a network-based framework and method for prioritizing protein targets to novel therapies for complex diseases using UC as an exemplary disease. The modular triplet framework is the first attempt to capture disease formation and successful treatment at the network level, assuming that the mechanisms behind complex disease formation and treatment can be captured by interactions between the three network modules of genetic susceptibility, transcriptional changes, and protein targets of drugs on HI. In the methods described herein, the development of a disease phenotype is predetermined by genetic mutations in a collection of genes located in the HI region, referred to as the genotype module. These gene changes within the genotypic module are manifested as changes in gene expression in active UC patients. By tracking genes whose expression levels vary significantly in patients achieving low disease activity following TNFi therapy, a set of genes can be derived that can be transcriptionally altered to achieve a positive response to treatment. These genes occupy a localized region of HI called the responder module.
Protein targeting can be identified that results in a transcriptional perturbation profile similar to that achieved following successful TNFi therapy. The methods described herein can do this by scanning experimental data for small molecule compounds that interfere with human cells and matching the response spectrum after compound interference to the spectrum achieved after successful treatment. The collection of compound targets that effect a predetermined downstream change in gene expression also occupies a localized region in HI and is referred to as a therapeutic module. While the identified compounds that match the effects downstream of the intended transcriptome may appear different, as shown in table 6 (which indicates that the drugs and their known mechanisms of action map to protein targets belonging to the therapeutic module), their targets belong to the local region of HI, reflecting the underlying biology behind UC treatment, and indicating that other protein targets functionally similar to the therapeutic module node are promising UC treatment targets. By ordering the HI nodes based on their proximity to the genotype module and selectivity for the treatment module, the methods disclosed herein can prioritize HI proteins that are both topologically related to genes associated with UC phenotype formation and functionally similar to proteins that have the desired downstream effects of treatment when targeted.
TABLE 6
/>
The proximity of topological relevance for quantitative targets to genotype modules has been demonstrated to provide an unbiased measure of therapeutic effects on various drugs and diseases and to distinguish between palliative and effective treatments. (see, e.g., E.Guney et al). Drugs whose targets are close to genes associated with disease may be more effective than drugs farther away. (see, e.g., E.Guney et al). The method described herein uses DSD as a surrogate for measuring the similarity between downstream effects due to interfering with a given node pair in the HI. DSD between pairs of nodes is based on the similarity between random walks starting from these nodes. The frequency of access of random walkers to each node was successfully used to assess the pattern of perturbation caused by basic mutations (e.g., single nucleotide variations and insertion/deletion mutations) of genes associated with cancer. (see, e.g., M.D. Leiserson et al ,"Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes,"Nature genetics 47,106(2015),, incorporated herein by reference for all purposes). The access frequency of the random walk from a given node may correspond to the amount of disturbance imposed by this node on the rest of the network, and the downstream disturbance contribution is reflected in the vector of the access frequency of the random walk from the given node. Since the DSD measures the distance between vectors of random walk access frequencies (see methods described elsewhere herein), node pairs with small DSDs correspond to nodes with similar downstream perturbation effects. By restoring known approved targets for 4 complex diseases including UC based on DSD, DSD does reflect the similarity between therapeutic effects of different targets.
The modular triplet framework and methods disclosed herein can utilize knowledge of the therapeutic kinetics of active UC patients that achieve low disease activity following TNFi therapies. However, patients that do not exhibit adequate response to TNFi therapy account for a significant portion of the diseased population and may have a sub-type of UC that differs or more severely disrupts normal cellular processes in terms of underlying biology. (see "pathway enrichment analysis of differentially expressed genes in responders and non responders to TNFi's therapy" described elsewhere herein). (see, e.g., P. Rutgaerts et al). While the novel targets identified using the methods described herein may help to find therapies appropriate for TNFi non-responders, it may still be desirable to study the exact biology after responding inadequately to TNFi therapy.
The modular triad framework and methods utilizing patient genomic and transcriptomic data described herein can provide an overall network-based perspective regarding the formation and therapeutic kinetics of complex diseases, and can provide a bias-free approach for novel target identification. The methods disclosed herein can be generalized to any complex disease with available gene-disease association data, transcriptome data for pre-and post-treatment patients, perturbation experiments in appropriate cell lines. In addition to target prioritization, the methods disclosed herein may suggest reuse opportunities based on targets that belong to the therapeutic module. Modular triad approaches can be enhanced by considering available perturbation experiments, such as single gene overexpression and knock-down, including information about the agonist or antagonist effect of a drug on its targets, or by further refining the prioritized target list by considering its toxicity and pharmaceutical availability.
Method of
Human interaction group. The HI map of the experimentally deduced protein-protein interaction is compiled from a public database. (see, e.g., T.Melles et al ,"Clinical validation of a blood-based predictive test for stratification of response to tumor necrosis factor inhibitor therapies in rheumatoid arthritis patients,"Network and Systems Medicine 3,91(2020),, incorporated herein by reference for all purposes). HI as used herein is compiled using a database version up to 3 months of 2021, for example.
Construction of UC genotype module. Identifying a gene associated with UC, as indicated by: (1) GWAS category; (2) ClinVar database, in particular genes which are indicated as "pathogenicity", "possible pathogenicity" and have a "contradictory interpretation of pathogenicity"; and (3) MALACARDS databases. (see, e.g., A. Buniello et al; M.J. Landrem et al; N.Rapapaort et al) genes were collected from a database, e.g., 9 months up to 2021. All genes mentioned in at least one of the three databases may be retained and genes not belonging to the HI network may be filtered out. The remaining genes can be used to construct a subnetwork and extract its Largest Connecting Component (LCC).
The significance of LCC size can be assessed by randomly sampling a sub-network with a sequence of degrees in the original sub-network. By repeatedly sampling 10,000 sub-networks, one can find the LCC size of the randomly sampled sub-network and its empirical distribution of mean μ LCC and standard deviation σ LCC. The method disclosed herein defines LCC Z score as:
where S LCC is the LCC size of the original subnetwork. The method disclosed herein also defines the observed empirical p-value of S LCC as the fraction of the random sampling subnetwork whose LCC size exceeds S LCC.
Gene expression data processing of active UC cases and normal controls. Tissue mucosa samples were collected from normal controls and moderate to severe active UC patients in the integrated gene expression database (Gene Expression Omnibus, GEO) as shown in table 4. (see, e.g., T.Barrett et al, "NCBI GEO: archive for functional genomics data sets-update," Nucleic ACIDS RESEARCH, D991 (2012), which is incorporated herein by reference for all purposes). Three studies reported the status of response after patient treatment, where the response was determined by endoscopic and histological examination or Mayo score. See table 7 for detailed information on response definitions (e.g., definitions of TNFi responses in a group with a specified UC patient response tag). The methods disclosed herein were carried out in each study from, for exampleNormalized data is obtained from the database. (see, e.g., T.Hruz et al ,"Genevestigator v3:a reference expression database for the meta-analysis of transcriptomes,"Advances in bioinformatics 2008(2008),, incorporated herein by reference for all purposes).
TABLE 7
The methods disclosed herein can integrate expression data from 6 infliximab studies. UsingStatistical methods correct for batch effects between different studies. (see, e.g., J.T. Leek et al, "sva: surrogate Variable ANALYSIS R PACKAGE version 3.10.0," DOI 10, B9 (2014), which is incorporated herein by reference for all purposes). Some studies included baseline samples and samples collected at follow-up. To avoid underestimating the variance introduced by longitudinally correlated sample analysis, the methods disclosed herein may treat/>Statistical methods were applied to baseline samples to derive correction factors for individual studies, regarding response and health as covariates. Correction factors were performed on baseline and follow-up samples.
Clustering and differential gene expression analysis. To reduce the dimensionality of the gene expression data, the methods disclosed herein can select a subset of gene signatures that are significantly differentially expressed between normal controls and UC activity samples. Genes with Fold Change (FC) of FC >2.5 and adjusted p-value of p adj. <0.05 (Benjamini-Hochberg correction) can be extracted. (see, e.g., y. Benjamini et al ,"Controlling the false discovery rate:a practical and powerful approach to multiple testing,"Journal of the Royal statistical society:series B(Methodological)57,289(1995),, incorporated herein by reference for all purposes). For cluster analysis, the methods disclosed herein can use UMAP to embed the gene expression vectors of the identified differentially expressed genes in 8-dimensional space. (see, e.g., L.McInnes et al).
When comparing gene expression profiles before and after treatment of active UC patients, FC >1.8 and p adj. <0.05 thresholds can be used to identify differentially expressed genes. Differentially expressed genes with negative log fold changes were considered significantly down-regulated, while genes with positive log fold changes were considered significantly up-regulated. For more details regarding the paired analysis of differentially expressed genes, see "differential gene expression analysis of responders and non-responders to TNFi's therapy" described elsewhere herein.
And (3) constructing a UC response module. To identify genes that are indicative of responses to TNFi therapies, the methods disclosed herein can extract genes that are significantly differentially expressed in responders to infliximab and golimumab, thereby comparing their gene expression profiles before and after treatment, as described above. Two RBA gene sets can be obtained from infliximab and golimumab-based studies (see "differential gene expression analysis of responders and non-responders to TNFi therapy", described elsewhere herein), and the union of these two sets can be used to account for possible drug-specific gene expression changes. A subnetwork based on the obtained combined RBA gene set and HI may be constructed. The LCCs of the resulting subnetworks can be identified as UC response modules and can be evaluated for their size similar to genotype modules.
Analysis of LINCS L1000 disturbance spectrum. The methods disclosed herein can evaluate the agreement between the differential gene expression profiles following gene perturbation of HT29 cells using various compounds and belonging to the up-and down-regulated subset using weighted connectivity scores (WTCS). (see, e.g., A. Subramannian et al ,"A next generation connectivity map:L1000 platform and the first 1,000,000profiles,"Cell 171,1437(2017),, incorporated herein by reference for all purposes). WTCS measures the enrichment score ES of a gene ordered list with up-and down-regulated gene sets of a given pair, referred to herein as up-and down-regulated queries. (see, e.g., A. Subramannian et al ,"Gene set enrichment analysis:a knowledge-based approach for interpreting genome-wide expression profiles,"Proceedings of the National Academy of Sciences102,15545(2005),, which is incorporated herein by reference for all purposes and incorporated herein by reference for all purposes). WTCS the ESs for the up survey poll (ES Upper part ) and the down survey poll (ES Lower part(s) ) are combined into a single score. Positive WTCS indicates that the perturbation results in a change in gene expression consistent with the query set of response modules, e.g., up-survey query genes are also predominantly up-regulated in a given perturbation, while down-survey query genes are predominantly down-regulated in a given perturbation. In contrast, negative WTCS indicates that the downsurvey query was up-regulated and the up-survey query was down-regulated in a given experiment. Since we are interested in restoring the expression pattern of the responder genes, we sought experiments with negative WTCS. The following is a brief overview of the procedure used to calculate this score and evaluate its statistical significance.
The LINCS L1000 series data stores differential gene expression profiles in terms of gene-specific Z scores, which indicate changes in gene expression levels relative to controls. A larger positive Z score indicates a significant up-regulation of the gene after the perturbation, while a larger negative Z score indicates a significant down-regulation of the gene after the perturbation. Genes whose differential expression patterns are inferred with high fidelity belong to the best inferred gene (BING) set and are used in WTCS calculations. (see, e.g., A. Subramannian et al ,"A next generation connectivity map:L1000 platform and the first1,000,000profiles,"Cell 171,1437(2017),, incorporated herein by reference for all purposes). The up-and down-regulated genes observed in the response module, which are also part of the BING pool, are denoted herein as s Upper part and s Lower part(s) , respectively. For each set, the method disclosed herein may calculate enrichment scores (ES Upper part and ES Lower part(s) ), and WTCS is a combination of these two scores:
To evaluate the significance of the enrichment score, the size gene set |s Upper part |,|s Lower part(s) | can be sampled uniformly from the BING genes. By repeating the sampling procedure 1,000 times, an empirical distribution ρ Upper part (ES),ρ Lower part(s) (ES) of up-and down-regulating enrichment scores from random samples can be obtained. The obtained distribution can be compared with the observed ES Upper part and ES Lower part(s) : if ES Upper part observed is positive, the score of the random sample with the greater or equal enrichment score is selected as p-value p Upper part , and if negative, the score of the random sample with the lesser or equal enrichment score is selected as p-value p Upper part . P Lower part(s) was calculated in a similar manner. WTCS, p Upper part , and p Lower part(s) may be obtained for each perturbation experiment and used to filter the relevant perturbations.
Construction of UC treatment module. Using LINCS L1000 data, the methods disclosed herein can identify compounds that are capable of restoring the expression pattern observed in the responder module nodes. The above-described WTCS <0 and p Upper part <0.05、p Lower part(s) <0.05 filter extraction-related experiments can be used. Protein targets of compounds remaining after filtration were identified using DrugBank and Repurposing Hub databases. Then, we mapped the resulting set of protein targets on HI and constructed a subnetwork on this basis, similar to constructing a response and genotype module. The treatment module is the LCC of this sub-network.
Diffusion state distance. Diffusion State Distance (DSD) is a metric defined at a network node that was originally designed to predict the function of a protein in a protein interaction network. (see, e.g., m.cao et al) when a random walk starts from two different nodes, the DSD captures the similarity between the final states of the network. To define a DSD, we first define He (v i,vj) -the expected number of times a Random Walk (RW) starting from node v i and performing k operations may end at node v j. Next, for node v i, we define a vector
He(vi)={He(vi,v1),...,He(vi,vn)}。
The DSD between nodes v i and v j is then defined as
DSD(vi,vj)=||He(vi)-He(vj)||1
Where 1 stands for L 1 norm. For any fixed k, DSD is a metric that converges to k→infinity. (see, e.g., M.Cao et al).
DSD as a measure of therapeutic similarity between targeted proteins. To quantify the relevance of DSD as a measure of similarity in therapeutic effects between proteins, a complex disease and its collection of approved targets can be analyzed by: for each known approved target for a given disease, calculating DSD between that target and the remaining nodes in HI; the remaining nodes are ranked based on DSD to the known targets, and based on this ranking, a restored subject operating characteristic (ROC) curve corresponding to the remaining approved targets for the given disease is constructed. By iterating through all known approved targets, a set of individual ROC curves for each complex disease is obtained. Interpolation may be used to average the individual curves and obtain an average ROC curve, and calculate the area under the curve, quantifying the likelihood of finding an approval target knowing the individual approval target and its DSD to the remaining network nodes.
Proximity to the UC genotype module. The proximity of the computing node to the genotype module includes: calculating an average shortest path length from a given node to a node of a genotype moduleStatistical significance of node proximity to a genotype module is assessed by comparing the average shortest path length to the genotype module to the average shortest path distance to a randomized network module of the same size. Specifically, the method disclosed herein samples the same size ligation module as the genotype module 500 times (see below for details of sampling) and constructs an empirical distribution of mean shortest path distances to the randomization module, where μ p is the mean value and σ p is the standard deviation of this distribution. Finally, the proximity of a node is defined as the Z fraction of the average shortest path distance from the node to the genotype module relative to this distribution:
Selectivity for UC treatment modules. The computing node's selectivity for the treatment module is similar to the computation of proximity, including: calculating an average DSD of nodes relative to nodes of the treatment module Similar to the proximity calculation, the statistical significance of the observed DSD was assessed by sampling 500 randomized network modules of the same size as the treatment module. However, we calculate the average DSD of the nodes to each randomization module, instead of the average shortest path distance, and construct an empirical distribution of the average DSD to the randomization module, where μ s is the average of this distribution and σ s is the standard deviation of this distribution. We define the selectivity as:
The network module randomizes. Both proximity and selectivity calculations may require sampling of the randomization module on HI. Since by construction both the genotype module and the therapy module are connected subnetworks, uniformly sampling connected subnetworks from the fixed HI network can avoid any possible deviation of average shortest path length or DSD with respect to subnetwork connectivity. Adjacent reservoir sampling (Neighbor Reservoir Sampling, NRS) algorithms can be used to uniformly sample the connected fixed-size sub-networks. (see, e.g., X.Lu et al, "International Conference on SCIENTIFIC AND STATISTICAL Database Management," Springer, (2012) pages 195-212, which is incorporated herein by reference for all purposes).
Node ordering based on proximity and selectivity. Given the genotype module and the treatment module, we calculate the proximity and selectivity scores of all nodes in HI and derive their corresponding ranks r p and r s, respectively. To obtain a single combined ordering r for each node, we use an ordering product defined as follows:
relative to the local emissivity of the transponder module. The local irradiations of node i with respect to the transponder module may be determined using the following equation:
Where RM is the set of answering module nodes, G is the human interaction group network, spl (i, G) is a function of measuring the shortest path length from node i to node G.
UC approval targets. To validate the proposed target prioritization framework, a list of targets approved for UC treatment may be compiled by retrieving all drug lists that have been marketed or under development for UC up to month 2022, e.g., using PHARMAINTELLIGENCE TM Citeline database. All drugs marketed against UC are considered approved drugs. In addition, drugs tested for UC in clinical trials (I, II and stage III) and preclinical trials are also contemplated to compare their combined ordering with that of approved drugs. For each drug, its known target is extracted, for example, from PHARMAINTELLIGENCE TM Citeline database, repurposing Hub database, and DrugBank database. Since one target may be mapped to several drugs, the highest state reached is assigned to the target based on the state of the drug to which it is mapped. For example, if one target is mapped to two drugs, one of which is in a phase II clinical trial and the other is in a preclinical trial, then this target is labeled as a clinical trial target. In addition, to avoid drugs that may have many off-targets due to high drug-hybridity, two drugs with more than 4 targets (sulfasalazine and mesalamine) were filtered out as shown in fig. 13. (see, e.g., V.J.Haupt et al, "Drug promiscuity in PDB: protein binding SITE SIMILARITY IS KEY," PLoS one 8, e65894 (2013), which is incorporated herein by reference for all purposes). In addition to these two drugs, all other drugs being developed for UC treatment have 4 or fewer targets at the same time. In addition, because the indication of UC is ambiguous, the tecatide is filtered (tetracosactide).
Further description of Module triplets
Differential gene expression analysis of responders and non responders to TNFi therapy. To assess whether responders and non-responders to TNFi therapy can be stratified based on gene expression profile prior to treatment, the methods disclosed herein can use their complete gene expression profile for differential gene expression analysis. No significant difference may be found in the case of Fold Change (FC) of fc=1.8 and an adjusted p-value of p <0.05 (Benjamini-Hochberg correction). Thus, there may be no significant difference between responders and non-responders prior to treatment, whether in UMAP embedding space or in the actual complete gene expression profile space.
The method disclosed herein may consider normal tissue controls as a comparative reference to derive a more pronounced difference in gene expression profile between responders and non-responders, given the fact that the gene expression profile of UC active patients prior to treatment is insufficient to distinguish responders from non-responders. By comparing different patient groups with normal control groups, the following four sets of differentially expressed genes can be constructed (see FIG. 11 for a description of these sets):
1. Before-after responder set (RBA): differentially expressed genes in responders between before and after treatment;
2. pre-post non-responder set (NRBA): differentially expressed genes in non-responders between before and after treatment;
3. responder set (R): differentially expressed genes between baseline responders and normal controls;
4. non-responder set (NR): differential expression of genes between baseline non-responders and normal controls.
In the study of basil Yu Yingfu liximab and golimumab, each of these paired states was measured separately.
Non-responders may not show a significant change in gene expression profile after treatment, and thus NRBA may not contain any significant differentially expressed genes. R, NR and RBA sets are highly consistent and can have significant intersection sizes for both infliximab and golimumab studies, as shown in fig. 11, panel (b). In infliximab and golimumab studies, respectively, the paired hypergeometric distribution test resulted in p=9.10 10 -910 and 5.10 10 -1249 for the intersection between NR and R sets, p=4.10 10 -64 and 8.10 10 -91 for the intersection between NR and RBA sets, and p=2.10 10 -226 and 1.10 10 -103 for the intersection between R and RBA sets.
Furthermore, most RBA genes were differentially expressed in baseline responder samples relative to normal controls, indicating that treatment with TNFi may result in reversal of expression of a small subset of R genes. In contrast, although the NR set contained a significant portion of RBA genes, these genes were not significantly altered in non-responders after treatment with TNFi.
The RBA gene set is almost entirely composed of the genes contained within the R and NR sets. In addition, as shown in figure UMAP, which is shown in figure 8, the gene expression profile of responders after treatment is closer to that of the normal control, while non-responders remain close to their initial pre-treatment position in UMAP space after treatment. This suggests that TNFi treatment may be sufficient to restore the expression profile of the subset of differentially expressed genes that make up the RBA set in order to achieve low disease activity in responders.
Pathway enrichment analysis of differentially expressed genes in responders and non-responders to TNFi therapy. To better understand the underlying molecular mechanisms of non-response, the methods disclosed herein can conduct pathway enrichment analysis on the R and NR sets. For each KEGG pathway, the proportion of nodes that are part of the R and NR gene sets can be determined, as shown in fig. 12. (see, e.g., M.Kanehisa et al, "KEGG: kyoto encyclopedia of genes and genomes," Nucleic ACIDS RESEARCH, 27 (2000), which is incorporated herein by reference for all purposes). Of 282 KEGG pathways that include at least one gene from the R and NR sets, 40 pathways are significantly enriched for NR genes (e.g., hypergeometric distribution assays, p < 0.05). Most of the genes in these pathways are common to the NR and R sets. To identify pathways that are more enriched in NR-exclusive genes, the methods disclosed herein can be statistically tested based on random sampling to assess the significance of differences between NR-exclusive genes and the number of R-exclusive genes within these pathways. Of the 40 pathways, 28 pathways had significantly more NR-exclusive genes (p < 0.05) than the retained R-exclusive genes, as shown in fig. 12, panel (c). The pathways associated with UC, such as the "inflammatory bowel disease", "TNF signaling pathway", "IgA-produced intestinal immune network", "rheumatoid arthritis", "cell adhesion molecule" or "IL-17 signaling pathway", are significantly more disrupted in non-responders. This observation is supported by another pathway enrichment analysis. (see, e.g., M.V. Kuleshov et al ,"Enrichr:a comprehensive gene set enrichment analysis web server 2016update,"Nucleic acids research 44,W90(2016),, incorporated herein by reference for all purposes). There may be a nearly identical list of enriched biological pathways between the R and NR gene sets; however, a single pathway tends to have a greater number of genes, the p-value and q-value of the NR gene set. In these pathways, non-responder specific differentially expressed genes may include genes involved in cytokine signaling (e.g., IL6, OSM, IL1A, IL R1, IL11, CXCL8/IL8, or IL 21R), receptor mediation (e.g., toll-like receptor, TLR1, TLR2, or TLR 8), and signaling (e.g., src-like kinase: HCK or FYN).
The UC-related KEGG pathway is more rich in NR-exclusive genes than the responders, as shown in fig. 12, panel (c). This includes other inflammatory conditions, such as rheumatoid arthritis and diabetes, for example, and may represent a common general immune system dysfunction common to these conditions. It is estimated that 25% -35% of patients with autoimmune diseases may develop one or more additional autoimmune disorders. (see, e.g., M.Cojoku et al, "Multiple autoimmune syndrome," Maedica, 132 (2010); J. -M.Anaya et al ,"The autoimmune tautology:from polyautoimmunity and familial autoimmunity to the autoimmune genes,"Autoimmune diseases 2012(2012),, which is incorporated herein by reference for all purposes). Other enrichment pathways highlight the role of intestinal microbiomes in ulcerative colitis. Genes annotated in the IgA-produced intestinal immune network are enriched in non-responders. IgA antibodies are the predominant secretory immunoglobulins, and in inflammatory bowel disease patients, the proinflammatory bacterial population may be more significantly encapsulated by IgA than in healthy controls. (see, e.g., J.M. Shapiro et al ,"Immunoglobulin A targets a unique subset of the microbiota in inflammatory bowel disease,"Cell Host&Microbe 29,83(2021),, incorporated herein by reference for all purposes). In particular, staphylococcus aureus (Staphylococcus aureus) infection is an enriched bacterial KEGG pathway. Gram positive bacteria (such as staphylococcus aureus) induce TNF- α secretion from macrophages, and TNF- α enhances neutrophil mediated bacterial killing. (see, e.g., K.P. van Kessel et al, "Neutrophil-mediated phagocytosis of Staphylococcus aureus," Frontiers in immunology, 467 (2014), which is incorporated herein by reference for all purposes). The perturbation of TNF- α affects the ability of the immune system to control staphylococcus aureus infection, resulting in an increased risk of infection following TNFi treatments. (see, e.g., S.Bassetti et al ,"Staphylococcus aureus in patients with rheumatoid arthritis under conventional and anti-tumor necrosis factor-alpha treatment,"The Journal of rheumatology 32,2125(2005),, incorporated herein by reference for all purposes). As highlighted by TLR and NOD-like signaling KEGG pathways, innate immunity plays an important role in maintaining intestinal homeostasis. TLR pattern recognition receptors detect conserved structures of microorganisms, including the structure of intestinal microbiota, and upon activation induce inflammatory signaling pathways and modulate antibody-producing B cell responses. (see, e.g., L.A. O' neill et al ,"The history of Toll-like receptors-redefining innate immunity,"Nature Reviews Immunology 13,453(2013);Z.Hua et al ,"TLR signaling in B-cell development and activation,"Cellular&molecular immunology 10,103(2013),, which is incorporated herein by reference for all purposes). TLR2, 4, 8 and 9 were upregulated in the colonic mucosa of active UC patients relative to stationary stage UC or healthy control samples. (see, e.g., F S a nchez-And et al ,"Transcript levels of Toll-Like Receptors 5,8and 9correlate with inflammatory activity in Ulcerative Colitis,"BMC gastroenterology 11,1(2011),, incorporated herein by reference for all purposes). Cytokine signaling, including TNF- α and IL-17 pathways, is enriched in non-responders. In addition to being a potent pro-inflammatory cytokine that amplifies TNF- α and IL-16 signaling, IL-17 signaling can induce gene recruitment and activation of neutrophils, and promote expression of epithelial barrier genes. (see, e.g., T.Kinugasa et al ,"Claudins regulate the intestinal barrier in response to immune mediators,"Gastroenterology 118,1001(2000);K.Maloy et al, "IL-23and Th17 cytokines in intestinal homeostasis," Mucosal immunology, 339 (2008), which is incorporated herein by reference for all purposes). Additional disruption of colonic epithelial barrier integrity by non-responders is highlighted by enrichment of the cell adhesion molecule and genes in the fluid shear stress KEGG pathway. Loss of barrier integrity increases the permeability of nutrients, water, bacterial toxins and pathogens across the epithelial barrier. (see, e.g., S.C. Bischoff et al ,"Intestinal permeability-a new target for disease prevention and therapy,"BMC gastroenterology 14,1(2014),, incorporated herein by reference for all purposes). Overall, the more significantly enriched pathways indicate that UC disease biology, such as inflammation, barrier integrity and microbiome imbalance, is more broadly disrupted in TNFi non-responders.
To determine whether the gene expression profile of a non-responder is more severely deregulated in various ways than a responder, the methods disclosed herein can be used to determine whether the gene expression profile is more severely deregulated in various ways than in a responderGene and genome encyclopedia (/ >)Encyclopedia of Genes and Genomes, KEGG) database. A significance threshold of p adj. <0.05 (hypergeometric distribution test with Benjamini-Hochberg correction) was used to select a pathway to significantly enrich for differentially expressed genes of non-responders. Each selected pathway will identify genes from only the R and NR gene sets. The difference between the numbers of the R exclusive and NR exclusive genes was calculated using the random arrangement of the R exclusive and NR exclusive markers on the remaining genes to evaluate the significance thereof. Pathways that retain a significant difference between the number of NR and R exclusive genes (p adj. <0.05, random permutation test with Benjamini-Hochberg correction).
While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. The present invention is not intended to be limited to the specific embodiments provided within this specification. While the invention has been described with reference to the above detailed description, the descriptions and illustrations of the embodiments herein are not intended to be construed in a limiting sense. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. Furthermore, it should be understood that all aspects of the invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is therefore contemplated that the present invention shall also cover any such alternatives, modifications, variations or equivalents. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

Claims (65)

1. A method of determining a disease gene expression signature for quantifying responsiveness of a subject suffering from a disease, disorder or condition to therapy, the method comprising:
receiving gene expression data from a group of subjects suffering from the same disease, disorder or condition;
Layering the group of subjects into two or more groups based at least in part on the gene expression data;
calculating differences in gene expression between two or more groups of the subject and a group of non-diseased subjects;
Selecting one or more genes ("disease candidate genes") having a significant difference in gene expression between two or more groups of the subject and the group of non-diseased subjects;
Compiling a disease gene set comprising the disease candidate gene; and
At least a subset of the disease gene set is selected to determine the disease gene expression signature.
2. The method of claim 1, further comprising mapping the disease candidate genes onto a biological network and selecting adjacent genes on the biological network that have significant junctions with each other or with the disease candidate genes, wherein the disease gene set comprises the disease candidate genes and the adjacent genes.
3. The method of claim 2, wherein the biological network comprises a human interaction set.
4. The method of claim 2 or 3, wherein the adjacent genes form a distinct sub-network with each other or with the disease candidate gene.
5. The method of claim 2, wherein the adjacent genes are identified via a machine learning algorithm.
6. The method of claim 5, wherein the machine learning algorithm comprises a random walk.
7. The method of any one of claims 1-6, wherein the disease, disorder, or condition comprises ulcerative colitis, crohn's disease, rheumatoid arthritis, juvenile arthritis, psoriatic arthritis, plaque psoriasis, ankylosing spondylitis, guillain-barre syndrome, sjogren's syndrome, scleroderma, vitiligo, bipolar disorder, graves ' disease, schizophrenia, alzheimer's disease, multiple sclerosis, parkinson's disease, or a combination thereof.
8. The method of claim 7, wherein the disease, disorder, or condition comprises ulcerative colitis.
9. The method of claim 7, wherein the disease, disorder, or condition comprises rheumatoid arthritis.
10. The method of claim 7, wherein the disease, disorder, or condition comprises alzheimer's disease.
11. The method of claim 7, wherein the disease, disorder, or condition comprises multiple sclerosis.
12. The method of any one of claims 1-11, wherein layering the group of subjects into two or more groups is random or based at least in part on whether a previous subject responded to the therapy.
13. The method of any one of claims 1-12, wherein the therapy comprises a member selected from table 1.
14. The method of any one of claims 1-12, wherein the therapy comprises anti-TNF therapy.
15. The method of any one of claims 1-14, wherein the subject group suffers from the same disease, disorder, or condition as the subject being evaluated for responsiveness to therapy.
16. The method of any one of claims 1-15, wherein the layering further comprises grouping objects from the same group with similar gene expression.
17. The method of any one of claims 1-16, further comprising training a machine learning classifier using the disease gene expression signature, wherein the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of a test subject suffering from the disease, disorder, or condition based at least in part on analyzing gene expression data of the test subject.
18. The method of claim 17, wherein the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with an accuracy of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%.
19. The method of claim 17, wherein the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a sensitivity of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%.
20. The method of claim 17, wherein the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a specificity of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%.
21. The method of claim 17, wherein the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a positive predictive value of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%.
22. The method of claim 17, wherein the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a negative predictive value of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%.
23. The method of claim 17, wherein the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a true positive rate of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%.
24. The method of claim 17, wherein the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a true negative rate of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%.
25. The method of claim 17, wherein the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with an area under the curve (AUC) of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%.
26. The method of claim 17, further comprising administering a therapeutically effective amount of the therapy to the test subject when the trained machine learning classifier predicts that the test subject is responsive to the therapy.
27. The method of claim 17, further comprising administering to the test subject a therapeutically effective amount of a second therapy different from the therapy when the trained machine learning classifier predicts that the test subject is non-responsive to the therapy.
28. A method comprising administering to a test subject a therapeutically effective amount of (i) a therapy that predicts that the test subject is responsive to the therapy based at least in part on a trained machine learning classifier analyzing a disease gene expression signature, or (ii) a second therapy that is different from the therapy that predicts that the test subject is non-responsive to the therapy based at least in part on the trained machine learning classifier analyzing the disease gene expression signature,
Wherein the disease gene expression signature is determined at least in part by:
receiving gene expression data from a group of subjects suffering from the disease, disorder or condition;
Layering the group of subjects into two or more groups based at least in part on the gene expression data;
calculating differences in gene expression between two or more groups of the subject and a group of non-diseased subjects;
Selecting one or more genes ("disease candidate genes") having a significant difference in gene expression between two or more groups of the subject and the group of non-diseased subjects;
Compiling a disease gene set comprising the disease candidate gene; and
At least a subset of the disease gene set is selected to determine the disease gene expression signature.
29. The method of claim 28, wherein the disease gene expression signature is determined at least in part by: the disease candidate genes are further mapped onto a biological network and adjacent genes having significant junctions with each other or with the disease candidate genes are selected on the biological network, wherein the disease gene set comprises the disease candidate genes and the adjacent genes.
30. The method of claim 29, wherein the biological network comprises a human interaction set.
31. The method of claim 29 or 30, wherein the adjacent genes form a distinct sub-network with each other or with the disease candidate gene.
32. The method of claim 29, wherein the adjacent genes are identified via a machine learning algorithm.
33. The method of claim 32, wherein the machine learning algorithm comprises a random walk.
34. The method of any one of claims 28-33, wherein the disease, disorder, or condition comprises ulcerative colitis, crohn's disease, rheumatoid arthritis, juvenile arthritis, psoriatic arthritis, plaque psoriasis, ankylosing spondylitis, guillain-barre syndrome, sjogren's syndrome, scleroderma, vitiligo, bipolar disorder, graves ' disease, schizophrenia, alzheimer's disease, multiple sclerosis, parkinson's disease, or a combination thereof.
35. The method of claim 34, wherein the disease, disorder, or condition comprises ulcerative colitis.
36. The method of claim 34, wherein the disease, disorder, or condition comprises rheumatoid arthritis.
37. The method of claim 34, wherein the disease, disorder, or condition comprises alzheimer's disease.
38. The method of claim 34, wherein the disease, disorder, or condition comprises multiple sclerosis.
39. The method of any one of claims 28-38, wherein layering the group of subjects into two or more groups is random or based at least in part on whether a previous subject responded to the therapy.
40. The method of any one of claims 28-39, wherein the therapy comprises a member selected from table 1.
41. The method of any one of claims 28-39, wherein the therapy comprises anti-TNF therapy.
42. The method of any one of claims 28-41, wherein the layering further comprises grouping objects from the same group with similar gene expression.
43. The method of claim 28, wherein the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with an accuracy of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%.
44. The method of claim 28, wherein the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a sensitivity of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%.
45. The method of claim 28, wherein the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a specificity of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%.
46. The method of claim 28, wherein the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a positive predictive value of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%.
47. The method of claim 28, wherein the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a negative predictive value of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%.
48. The method of claim 28, wherein the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a true positive rate of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%.
49. The method of claim 28, wherein the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with a true negative rate of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%.
50. The method of claim 28, wherein the trained machine learning classifier is configured to predict responsiveness or non-responsiveness of the test subject with an area under the curve (AUC) of at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%.
51. A method of verifying a response to a therapy in a subject suffering from a disease, disorder or condition, the method comprising:
Analyzing changes in disease gene expression signature in the subject after administration of the therapy, wherein the disease gene expression signature is determined to quantify responsiveness to the therapy.
52. The method of claim 51, wherein the disease gene expression signature is determined at least in part by:
receiving gene expression data from a group of subjects suffering from the disease, disorder or condition;
Layering the group of subjects into two or more groups based at least in part on the gene expression data;
calculating differences in gene expression between two or more groups of the subject and a group of non-diseased subjects;
Selecting one or more genes ("disease candidate genes") having a significant difference in gene expression between two or more groups of the subject and the group of non-diseased subjects;
Compiling a disease gene set comprising the disease candidate gene; and
At least a subset of the disease gene set is selected to determine the disease gene expression signature.
53. A method of monitoring the efficacy of a treatment of a subject having a disease, disorder or condition, the method comprising monitoring a change in a disease gene expression signature after administration of a therapy, wherein the disease gene expression signature is determined at least in part by:
Analyzing gene expression data from a group of subjects suffering from the same disease, disorder or condition as the subject;
Layering the group of subjects into two or more groups based on the gene expression data;
determining differences in gene expression between two or more groups of the subject and a group of non-diseased subjects;
Selecting one or more genes ("disease candidate genes") having a significant difference in gene expression between two or more groups of the subject and the group of non-diseased subjects;
Compiling a disease gene set comprising the disease candidate gene; and
At least a subset of the disease gene set is selected to determine the disease gene expression signature.
54. The method of claim 53, wherein the disease gene expression signature is determined at least in part by: the disease candidate genes are further mapped onto a biological network and adjacent genes having significant junctions with each other or with the disease candidate genes are selected on the biological network, wherein the disease gene set comprises the disease candidate genes and the adjacent genes.
55. The method of claim 54, wherein the biological network comprises a human interaction group.
56. The method of claim 53 or 54, wherein said adjacent genes form a distinct sub-network with each other or with said disease candidate gene.
57. The method of claim 53, wherein the adjacent genes are selected by a machine learning process.
58. The method of any one of claims 53-58, wherein the disease, disorder, or condition comprises ulcerative colitis, crohn's disease, rheumatoid arthritis, juvenile arthritis, psoriatic arthritis, plaque psoriasis, ankylosing spondylitis, guillain-barre syndrome, sjogren's syndrome, scleroderma, vitiligo, bipolar disorder, graves ' disease, schizophrenia, alzheimer's disease, multiple sclerosis, parkinson's disease, or a combination thereof.
59. The method of any one of claims 53-58, wherein layering the group of subjects into two or more groups is random or based at least in part on whether a previous subject responded to the therapy.
60. The method of any one of claims 53-59, wherein the therapy comprises a member selected from table 1.
61. The method of any one of claims 53-60, wherein the therapy comprises anti-TNF therapy.
62. The method of any one of claims 53-61, wherein the layering further comprises grouping objects from the same group with similar gene expression.
63. The method of any one of claims 51-62, further comprising selecting the test subject for clinical trials based at least in part on whether the disease gene expression signature of the test subject exhibits a quantifiable change in disease gene expression signature to a non-diseased subject.
64. A method of identifying and selecting a subject for clinical trials, comprising:
Receiving gene expression data of a group of subjects;
analyzing the gene expression data to detect the presence of a disease gene expression signature;
Administering at least one dose of therapy to the subject group;
identifying a change in the disease gene expression signature relative to gene expression in a non-diseased subject; and
Selecting a subject exhibiting a quantifiable change in gene expression of the disease gene expression signature to a healthy subject for the clinical trial, wherein the disease gene expression signature is determined by the method of any one of claims 1-65.
65. A system, comprising:
A processor of a computing device; and
A memory having instructions stored thereon, wherein the instructions, when executed by the processor, cause the processor to perform the method of any of claims 1-66.
CN202280057506.9A 2021-06-22 2022-06-21 Methods and systems for therapy monitoring and trial design Pending CN117916392A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US63/213,431 2021-06-22
US202263329008P 2022-04-08 2022-04-08
US63/329,008 2022-04-08
PCT/US2022/034375 WO2022271724A1 (en) 2021-06-22 2022-06-21 Methods and systems for therapy monitoring and trial design

Publications (1)

Publication Number Publication Date
CN117916392A true CN117916392A (en) 2024-04-19

Family

ID=90688089

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280057506.9A Pending CN117916392A (en) 2021-06-22 2022-06-21 Methods and systems for therapy monitoring and trial design

Country Status (1)

Country Link
CN (1) CN117916392A (en)

Similar Documents

Publication Publication Date Title
EP3881233A1 (en) Machine learning disease prediction and treatment prioritization
US11456056B2 (en) Methods of treating a subject suffering from rheumatoid arthritis based in part on a trained machine learning classifier
US20220154284A1 (en) Determination of cytotoxic gene signature and associated systems and methods for response prediction and treatment
EP4150623A2 (en) Methods and systems for machine learning analysis of single nucleotide polymorphisms in lupus
US20220319638A1 (en) Predicting response to treatments in patients with clear cell renal cell carcinoma
US20240076368A1 (en) Methods of classifying and treating patients
Zheng et al. Epimix is an integrative tool for epigenomic subtyping using dna methylation
CN117916392A (en) Methods and systems for therapy monitoring and trial design
CN117981011A (en) Methods and systems for personalized therapy
EP4359567A1 (en) Methods and systems for therapy monitoring and trial design
KR20240044417A (en) Method and system for personalized therapy
Coto-Segura et al. A quantitative systems pharmacology model for certolizumab pegol treatment in moderate-to-severe psoriasis
CN117813402A (en) Methods of classifying and treating patients
Singh Falsifiable Network Models. A Network-based Approach to Predict Treatment Efficacy in Ulcerative Colitis
Mikhaylov Integrating Biologic and Clinical Data towards Resolving Heterogeneity in Childhood Inflammatory Diseases
CA3212448A1 (en) Methods of classifying and treating patients
Guttapadu et al. Profiling system-wide variations and similarities between Rheumatic Heart Disease and Acute Rheumatic Fever–A pilot analysis
TW202331734A (en) Methylation biomarker selection apparatuses and methods

Legal Events

Date Code Title Description
PB01 Publication