CN117947166A - Marker group, product, system and application thereof for prognosis of intestinal cancer - Google Patents

Marker group, product, system and application thereof for prognosis of intestinal cancer Download PDF

Info

Publication number
CN117947166A
CN117947166A CN202410127289.7A CN202410127289A CN117947166A CN 117947166 A CN117947166 A CN 117947166A CN 202410127289 A CN202410127289 A CN 202410127289A CN 117947166 A CN117947166 A CN 117947166A
Authority
CN
China
Prior art keywords
cms
gene
marker
markers
subject
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410127289.7A
Other languages
Chinese (zh)
Inventor
盛伟琪
许蜜蝶
陈丽萌
黄凯
彭俊杰
彭海翔
黄丹
陆彬彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Puen Haihui Medical Laboratory Co ltd
Fudan University Shanghai Cancer Center
Original Assignee
Shanghai Puen Haihui Medical Laboratory Co ltd
Fudan University Shanghai Cancer Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Puen Haihui Medical Laboratory Co ltd, Fudan University Shanghai Cancer Center filed Critical Shanghai Puen Haihui Medical Laboratory Co ltd
Priority to CN202410127289.7A priority Critical patent/CN117947166A/en
Publication of CN117947166A publication Critical patent/CN117947166A/en
Pending legal-status Critical Current

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Provided herein are marker sets, products, systems, and uses thereof for the prognosis of intestinal cancer. In particular, provided herein is a product or set of products for assessing the prognosis of a bowel cancer in a subject, comprising reagents for detecting each marker of a set of markers from a biological sample of the subject, the set of markers having the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4. Low cost, high efficiency, high quality prognostic evaluation means for bowel cancer are provided herein.

Description

Marker group, product, system and application thereof for prognosis of intestinal cancer
Technical Field
The present invention relates to the biomedical field; in particular, it relates to marker sets, products, systems and uses thereof for assessing prognosis of bowel cancer.
Background
Colorectal cancer (CRC) has become the third most common malignancy in China. For colorectal cancer which has been treated by surgery, postoperative pathology assessment is the most important prognostic index and postoperative treatment basis. Patients with advanced CRC should receive adjuvant therapy post-operatively, but adjuvant therapy decisions for patients with stage II/III remain largely controversial.
Retrospective studies have found that about 10-20% of phase II patients undergo postoperative recurrence and metastasis. In addition, high-risk phase II patients may benefit from adjuvant chemotherapy. The current evaluation mode for whether patients with stage II colon cancer need to receive auxiliary chemotherapy remains mainly at the histological level: tumor infiltration depth, differentiation degree, presence or absence of lymphatic infiltration, presence or absence of nerve infiltration, total number of lymph nodes and positive number, and incisional margin; molecular level studies suggest that patients with MSI-H (microsatellite instability high, high satellite instability) or dMMR (MISMATCH REPAIR DEFICIENT, mismatch repair deficiency) receive reduced benefit from 5-FU chemotherapy. For patients with stage III CRC, the duration of chemotherapy is still controversial, and the latest study results of IDEA show that: the 5-year survival difference between 3 months and 6 months of adjuvant chemotherapy for stage III patients is small; the 5-year survival rate of the III-phase low-risk 4-period XELOX is superior to that of the 8-period XELOX, the 5-year survival rate of the III-phase high-risk 4-period XELOX is only reduced by 1% compared with that of the 8-period XELOX, and the toxic and side effects are greatly reduced. Furthermore, current clinical treatment outcomes reflect that not only high-risk stage II/III can benefit from adjuvant chemotherapy. The key to these problems is the lack of specific molecular marker definition or judgment of "high risk". Therefore, how to screen specific molecular markers, perform accurate molecular characteristics and subtype analysis on stage II/III CRC patients, and identify truly "high-risk" patients to benefit from adjuvant chemotherapy has been a problem to be solved in clinical settings.
Disclosure of Invention
In order to solve the above-mentioned problems, the inventors have found and refined a marker group that can be used for evaluating the prognosis of intestinal cancer in a subject through long-term and intensive research and analysis. By adopting the marker group, accurate molecular characteristics and subtype analysis can be carried out on patients with intestinal cancer (such as colorectal cancer), and the accuracy of identification and clinical prognosis of patients with high risk is greatly improved.
In one aspect, provided herein is a marker panel for assessing the prognosis of bowel cancer in a subject, having the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4.
In another aspect, provided herein is a product or set of products for assessing the prognosis of a bowel cancer in a subject, comprising reagents for detecting each marker of a set of markers from a biological sample of the subject, the set of markers having the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4.
In another aspect, provided herein is a system for assessing the prognosis of bowel cancer in a subject, comprising a memory and one or more processors;
wherein the memory comprises:
Each gene expression data for a marker panel from a biological sample of a subject,
CMS typing feature genome expression template data, and
One or more processor-executable instructions;
Wherein the marker panel has the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4; and
Wherein the one or more processor-executable instructions are configured to:
(a) Obtaining gene expression data for the marker set from a biological sample of a subject;
(b) Calculating cosine distances between the gene expression data in (a) and the CMS typing characteristic genome expression templates;
(c) Determining CMS typing based on the computed cosine distances of (b); and
(D) Determining a prognosis of the bowel cancer in the subject based on the CMS type obtained in (c).
In another aspect, provided herein is a computer-readable medium comprising:
Each gene expression level data for a marker panel of biological samples from a subject, wherein the marker panel has the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4;
CMS typing feature genome expression template data, and
Instructions for performing a method comprising:
(a) Obtaining gene expression level data of a marker set of a biological sample from a subject;
(b) Calculating cosine distances between the gene expression quantity data in the step (a) and the CMS typing characteristic genome expression templates;
(c) Determining CMS typing based on the computed cosine distances of (b); and
(D) Determining a prognosis of the bowel cancer in the subject based on the CMS type obtained in (c).
In another aspect, provided herein is an electronic device loaded with the computer-readable medium.
In another aspect, provided herein is the use of a reagent for detecting each marker of a marker panel of a biological sample from a subject in the manufacture of a product or product panel for assessing the prognosis of bowel cancer in a subject, wherein the marker panel has the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4.
In another aspect, provided herein is a use of a product or set of products comprising reagents for detecting each marker of a set of markers from a biological sample of a subject for assessing the prognosis of bowel cancer in a subject, wherein the set of markers has the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4.
Drawings
The invention is further described below with reference to the accompanying drawings. These displays are merely illustrative of exemplary embodiments of the invention and are not intended to limit the scope of the invention.
FIG. 1 is an exemplary flow chart of a general method of screening for a prognostic marker of bowel cancer according to one of the exemplary embodiments herein;
FIG. 2 is the result of a volcanic plot of differentially expressed genes in example 1 herein;
FIG. 3 is a matrix plot of correlation coefficients between CMS40 markers in example 1 herein;
FIG. 4 is a graph showing the results of survival analysis of CMS molecule typing data obtained with CMS40 marker set in example 1 herein;
FIG. 5 is a graph showing the results of survival analysis of CMS molecule typing data obtained with CMS20 marker sets in example 1 herein;
FIG. 6 is a graph showing the results of survival analysis of CMS molecule typing data obtained with CMS14 marker sets in example 1 herein;
FIG. 7 is a graph showing the results of three-class relapse-free survival analysis of CMS molecular typing data obtained with the CMS40 marker group in example 3 herein.
Detailed Description
The meaning of technical terms in the present application is consistent with the general understanding of those skilled in the art unless otherwise indicated. In the present application, "a" or a combination of various words thereof includes both singular and plural meanings unless specifically stated otherwise. In the present application, when a plurality of values, ranges of values, or combinations thereof are given for the same parameter or variable, it is equivalent to specifically disclose the values, the range ends, and the ranges of values formed by any combination thereof. Any numerical value, whether or not bearing modifiers such as "about", is intended to uniformly cover the approximate range, e.g., plus or minus 10%, 5%, etc., as would be understood by one of ordinary skill in the art. Each "embodiment" herein equally refers to and encompasses embodiments of the methods and systems of the present application. In the present application, one or more technical features of any embodiment may be freely combined with one or more technical features of any one or more other embodiments, and thus the resulting embodiment is also included in the present disclosure.
Some terms used in the embodiments of the present invention are enumerated below. Within the scope of the present description and claims, the relevant terms are defined as follows. Other terms not listed are defined as commonly used in the art, the meaning of which is well known to those skilled in the art.
The term "cancer" as used herein refers to the presence of cells that have characteristics typical of oncogenic cells, such as uncontrolled proliferation, immortality, metastatic potential, rapid growth and proliferation rate, as well as certain characteristic morphological features known in the art. In one embodiment, the "cancer" may be a bowel cancer or a liver cancer. In one embodiment, "cancer" may include premalignant cancer and malignant cancer.
In one embodiment, the method as described herein does not involve steps performed by a physician/physician, as will be appreciated by those skilled in the art. Thus, the results obtained by the methods as described herein require the combination of clinical data and other clinical manifestations before the final diagnosis by the physician can be provided to the subject. The final diagnosis as to whether a subject has bowel cancer is a physician's scope and is not considered part of the present disclosure. Thus, the terms "determining," "detecting," and "diagnosing" as used herein refer to identifying a subject as having a probability or likelihood of having a disease at any stage of development (e.g., bowel cancer) or determining a subject's susceptibility to developing the disease. In one embodiment, "diagnosis," "determination," "detection" is performed prior to manifestation of symptoms. In one embodiment, "diagnosing," "determining," "detecting" allows a clinician/physician (in combination with other clinical manifestations) to confirm bowel cancer in a subject suspected of having bowel cancer.
As used herein, the term "sample," "sample," or "biological sample" means a sample taken from a subject for detection of the type and amount of a marker of intestinal cancer therein. The subject sample may or may not be from the circulatory system, i.e., from blood. The subject sample may be any sample comprising a wash solution suitable for detecting markers of intestinal cancer, sources of which include tissue, whole blood, bone marrow, pleural fluid, peritoneal fluid, central spinal fluid, milk, urine, tears, sweat, saliva, organ secretions, bronchi, nasal cavities, throats, and the like. In some embodiments, the biological sample is selected from the group consisting of: fresh tissue samples, frozen tissue samples, or paraffin embedded tissue samples (e.g., FFPE samples).
As used herein, the term "prognosis" refers to the estimation of a likely outcome (with or without treatment) over a period of time in the future based on the subject's current condition. Generally, the results are in the form of a result probability (%), such as a cure probability (%), a recurrence probability (%), a death probability (%), and the like.
As used herein, the term "correlation analysis (Correlation Analysis)" refers to a method in statistics for studying whether or not there is a relationship between two or more random variables. The primary purpose of this is to determine if there is a statistical correlation or dependence between two or more variables and to attempt to quantify the degree and form of such correlation. The basic method of correlation analysis includes: linear correlation, rank correlation, distance correlation, etc.
As used herein, the term "correlation coefficient" is a statistic used in correlation analysis to quantify how closely a variable is related, reflecting the strength of a linear correlation between two variables. The larger the absolute value is, the stronger the linear correlation of the two variables is; a near 0 indicates weaker correlation. Common examples are pearson correlation coefficients, spearman correlation coefficients, and the like. The correlation coefficient referred to in this study is referred to as pearson correlation coefficient.
As used herein, the term "quantile normalization" is a commonly used method of sequencing data processing. The fractional normalization is an important pretreatment step in RNA-seq analysis, and can well eliminate systematic errors among different samples, so that the samples have comparability. The main idea is as follows: firstly, combining the expression quantity data of a certain gene of all samples, and after sequencing, calculating the expression quantity corresponding to each sequence site (such as 1%, 5%, 25% equivalent number). Then, for each sample, the original expression level is replaced with the average expression level of the quantile corresponding to the original expression level of the gene in the sample. Finally, the above procedure is repeated for all genes in the sample. Thus, through fractional conversion, the expression quantity distribution of different batches and different samples tends to be consistent, and the influence caused by different sequencing depths and technical errors among samples is eliminated. The quantile normalization has the advantages that: the method is simple and effective, the program is easy to realize, a reference sample is not needed, the robustness to the missing value and the abnormal value is good, and the relative magnitude relation of the expression quantity among the samples is kept unchanged.
As used herein, the term "consensus subtype (Consensus Molecular Subtypes)" or "CMS" is a method of molecular typing of colorectal cancer based on gene expression. The CMS typing is determined by calculating the gene expression similarity of each sample and the four modes, and determining which type the sample belongs to based on cosine similarity and a classification model. CMS typing is related to patient prognosis and efficacy, and can guide accurate treatment. Compared with single biomarker, CMS typing comprehensively utilizes whole genome expression information, and can reflect biological characteristics of tumor more comprehensively. Is an important tool for accurate medical treatment of intestinal cancer at present.
The term "CMS 1-inflammatory type (CMS 1-Inflammatory)" is a consensus subtype of intestinal cancer, representing inflammatory intestinal cancer. The main characteristics are that: increased immune cell infiltration, particularly T lymphocyte and macrophage infiltration; up-regulation of inflammation-associated genes, such as inflammatory cytokines IL-6, IL-8, etc.; increased expression of immune checkpoints such as PD-L1; natural killer T cell and TH1 type T cell-related gene activation; often microsatellite instability is high; the prognosis is better and the sensitivity to immunotherapy such as anti-PD-1 is higher.
The term "CMS 2-transient proliferative form (CMS 2-TRANSIT AMPLIFYING)" is a consensus molecule subtype of intestinal cancer, representing a class of hyperproliferative, poorly differentiated intestinal cancer subtypes, and is mainly characterized by high expression of genes associated with intestinal epithelial cell proliferation and transfer, such as cyclin, proliferation-associated antigens, etc.; down-regulation of genes associated with intestinal epithelial cell differentiation; abnormal activation of WNT pathway; mutations associated with tumorigenic driving genes APC, TP53, KRAS, etc.; the pathology is expressed as hypodifferentiation adenocarcinoma; poor prognosis; is sensitive to standard chemotherapy.
The term "CMS 2-intestinal epithelial cell type (CMS 2-Enterocyte)" is a consensus molecular subtype of intestinal cancer, representing a differentiated intestinal cancer subtype of intestinal epithelium, associated with the differentiation and absorption function of normal intestinal epithelial cells. The main characteristics of the method include: the intestinal epithelial cell differentiation and absorption related genes are highly expressed, such as alkaline phosphatase, intestinal alkaline phosphatase and the like; up-regulation of cell adhesion related proteins E-cadherin, etc.; WNT pathway down-regulation; the driving genes comprise APC, KRAS, TP and the like; the pathological type is highly differentiated adenocarcinoma; the prognosis is better; is sensitive to standard first-line chemotherapy.
The term "CMS 3-calix (CMS 3-Goblet like)" is a consensus subtype of intestinal cancer, representing a Goblet-like subtype of intestinal cancer, and is characterized mainly by: goblet cell differentiation related genes are highly expressed, such as MUC2, TFF3, etc.; mucus production-related pathway activation; often the RAS pathway is activated, KRAS mutations are more common; the pathological types are usually mucous adenocarcinoma; less immune infiltration; more occurs in the right (junction) half junction intestine; poor sensitivity to standard first-line chemotherapy; the prognosis is poor.
The term "CMS 4-Stem like" is a consensus subtype of intestinal cancer, representing a Stem cell-like subtype of intestinal cancer, and is characterized mainly by: high expression of stem cell marker genes, such as LGR5, EPHB2, etc.; upregulation of EMT-related genes; WNT pathway and Notch pathway activation; tumor heterogeneity is high and differentiation degree is poor; the pathological types are mainly ring cell carcinoma and mucinous carcinoma; proliferation of tumor stem cells, drug resistance to treatment; poor prognosis; the risk of tumor recurrence is high.
As used herein, a "p-value" represents the probability that observed data appears under a hypothetical space. Specifically, the p value represents: when the null hypothesis of the hypothesis test is true, a probability equal to or more extreme to the observed data is obtained. In general, if the p-value is very small, e.g., less than 0.01, then it is highly unlikely that the result is a random event under a null hypothesis, then the null hypothesis is rejected, i.e., the result is statistically significant. If the p-value is large, e.g. greater than 0.05, the null hypothesis cannot be rejected, i.e. the result is not statistically significant. The smaller the p-value, the higher the statistical significance of the result. Common significance determination thresholds are 0.05 and 0.01. The p value reflects the probability of observing the current result on the premise that the zero hypothesis is established, and is an important basis for judging whether the hypothesis test result is obvious or not. The smaller the p value, the more pronounced the result.
As used herein, the term "t-test" is a statistical method to test whether there is a significant difference in the mean of two samples. the basic idea of the t test is to construct a hypothesis, calculate the t value of the observation statistic, determine the p value according to the t distribution, and finally judge whether the original hypothesis is true according to the p value.
As used herein, the term "survival analysis" means that the survival analysis requires preparation of time to live (time), status (status) and other characteristic data. Where the status is generally indicated by 0 (no) or 1 (yes) whether an event (e.g., death) has occurred.
As used herein, the terms "subject," "patient," "subject" are used interchangeably and generally refer to a mammal, such as a bovine, equine, ovine, porcine, canine, feline, rodent, primate, such as a human or a non-human mammal.
A. marker panel for assessing prognosis of intestinal cancer
In one aspect, provided herein is a marker panel for assessing the prognosis of bowel cancer in a subject, having the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4.
Herein, a marker set having the 14 gene markers described above may also be equivalently referred to as "CMS14" or "CMS14 marker set".
Herein, a "product set" refers to two or more pieces of a product that may be provided together (e.g., in the same kit or package (e.g., kit)) or separately (e.g., not in the same kit or package (e.g., kit)), which are used in combination and cannot be used alone (e.g., for assessing a subject's prognosis of bowel cancer).
In some embodiments, the bowel cancer is colorectal cancer. In some embodiments, the intestinal cancer is stage II/III colorectal cancer. In some embodiments, the bowel cancer may be selected from rectal cancer, left-half colon cancer, or right-half colon cancer.
In some embodiments, the expression (e.g., amount of expression) of each gene marker in the marker set can be used to determine CMS typing of a subject.
In some embodiments, the comparison of the expression (e.g., amount of expression) of each genetic marker in the marker set to the CMS typing feature genome expression template can be used to determine CMS typing of a subject to further assess the prognosis of the subject's bowel cancer.
As used herein, the term "CMS-typed feature genome expression template" refers to a pre-set classification template, e.g., a pre-set classification template data table. The classification templates may be a set of labeled genes that are labeled with different CMS classifications. As a specific gene, if the gene is labeled as a specific class, the gene has a higher expected expression in a sample belonging to the class than in a sample not belonging to the class. In one exemplary embodiment, the classification template data table may include at least two sets (e.g., two columns) of information: probes (e.g., entrez ID) and categories (e.g., CMS categories). In yet another exemplary embodiment, the classification template data table may include three sets (e.g., three columns) of information: probes (e.g., entrez ID), categories (e.g., CMS categories), genetic symbols.
In some embodiments, the comparison of the expression (e.g., amount of expression) of each gene marker in the marker set to the CMS-typed feature genome expression template is expressed as the cosine distance of the expression (e.g., amount of expression) of each gene marker in the CMS14 marker set to the CMS-typed feature genome expression template.
The term "cosine distance" is defined herein as a feature distance, which is the default feature distance from the nearest template predictive model. d indicates that the sample is closest to the characteristic genomic expression template for a certain CMS typing, i.e. most likely this typing. As a typing p-value for the metric statistical significance test, a random permutation test was used. The p-value of the significance test is calculated by randomly extracting the characteristic genes (default value is 1000 times) to generate random distribution of characteristic distances, comparing the distance between the detected sample and the typing characteristic template with the randomly generated distance distribution and correcting the False Discovery Rate (FDR). Smaller p values indicate a stronger statistical significance of the shortest cosine feature distance, representing a more reliable CMS typing of the prediction (typically the threshold for statistical significance p is p < 0.05).
In some embodiments, CMS typing may include one or more of the following: CMS1-inflammatory (CMS 1-Inflammatory), CMS 2-transient proliferative (CMS 2-TRANSIT AMPLIFYING), CMS 2-intestinal epithelial cell (CMS 2-Enterocyte), CMS3-Goblet like (CMS 3-Goblet like) and CMS 4-dry like (CMS 4-Stem like).
In a specific embodiment, the IFIT3 gene has a nucleotide structure as shown in ENSG00000119917, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the CXCL13 gene has a nucleotide structure as shown in, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to, the nucleotide structure of ENSG 00000156234.
In a specific embodiment, the STAT1 gene has a nucleotide structure as shown in ENSG00000115415, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the CXCL9 gene has a nucleotide structure as shown in, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to, an ENSG 00000138755.
In a specific embodiment, the CA4 gene has a nucleotide structure as shown in ENSG00000167434, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the AQP8 gene has a nucleotide structure as shown in ENSG00000103375, or a nucleotide sequence having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto.
In a specific embodiment, the SLC4A4 gene has a nucleotide structure as shown in ENSG00000080493, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the EREG gene has a nucleotide structure as shown in ENSG00000124882, or a nucleotide sequence that is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the AREG gene has a nucleotide structure as shown in ENSG00000109321, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the REG4 gene has a nucleotide structure as shown in ENSG00000134193, or a nucleotide sequence having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto.
In a specific embodiment, the SPINK4 gene has a nucleotide structure as shown in ENSG00000122711, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the SFRP2 gene has a nucleotide structure as shown in ENSG00000145423, or a nucleotide sequence that is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the ZEB2 gene has a nucleotide structure as shown in ENSG00000169554, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the SFRP4 gene has a nucleotide structure as shown in ENSG00000106483, or a nucleotide sequence that is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
Without intending to be bound by a particular theory, it was found that IFIT3, CXCL13, STAT1, and CXCL9, among the CMS14 marker set, are beneficial for specifically distinguishing CMS 1-inflammatory forms; CA4, AQP8 and SLC4A4 are beneficial for specific differentiation of CMS 2-intestinal epithelial cell types; EREG and AREG are advantageous for specifically distinguishing CMS 2-transient proliferative forms; SPINK4 and REG4 are advantageous for specifically distinguishing CMS 3-cuplike; SFRP2, ZEB2 and SFRP4 are advantageous for specifically differentiating CMS 4-stem.
In some embodiments, the marker set for assessing the prognosis of intestinal cancer in a subject may further comprise the following 6 gene markers on the basis of the CMS14 marker set: CA1, CLCA4, MS4a12, CLDN8, MUC2 and ZEB1. The marker set thus formed with 20 gene markers may also be equivalently referred to herein as "CMS20" or "CMS20 marker set".
In some embodiments, the marker panel for assessing the prognosis of bowel cancer in a subject has the following 20 gene markers :IFIT3、CXCL13、STAT1、CXCL9、CA4、CA1、CLCA4、MS4A12、AQP8、CLDN8、SLC4A4、EREG、AREG、SPINK4、REG4、MUC2、SFRP2、ZEB1、ZEB2 and SFRP4.
In a specific embodiment, the CA1 gene has a nucleotide structure as shown in ENSG00000133742, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the CLCA4 gene has a nucleotide structure as shown in ENSG00000016602, or a nucleotide sequence that is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the MS4a12 gene has a nucleotide structure as shown in ENSG00000071203, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the CLDN8 gene has a nucleotide structure as shown in ENSG00000156284, or a nucleotide sequence having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto.
In a specific embodiment, the MUC2 gene has a nucleotide structure as shown in ENSG00000198788, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the ZEB1 gene has a nucleotide structure as shown in ENSG00000148516, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
Without intending to be bound by a particular theory, it was found that, among the CMS20 marker group, IFIT3, CXCL13, STAT1 and CXCL9 are beneficial for specifically distinguishing CMS 1-inflammatory forms; CA4, CA1, CLCA4, MS4A12, AQP8, CLDN8 and SLC4A4 facilitate specific differentiation of CMS 2-intestinal epithelial cell types; EREG and AREG are advantageous for specifically distinguishing CMS 2-transient proliferative forms; SPINK4, REG4 and MUC2 are advantageous for specifically distinguishing CMS 3-cuplike; SFRP2, ZEB1, ZEB2 and SFRP4 are advantageous for specifically differentiating CMS 4-stem.
In some embodiments, the marker set for assessing the prognosis of intestinal cancer in a subject may further comprise the following 20 gene markers :CXCL10、AIM2、GBP5、CXCL11、KRT20、SLC26A3、CA2、ASCL2、VAV3、CELP、RNF43、MLPH、TFF3、AQP3、COL3A1、SNAI2、CCDC80、AEBP1、TIMP2 and TWIST1 on the basis of the CMS20 marker set. The marker set thus formed with 40 gene markers may also be equivalently referred to herein as "CMS40" or "CMS40 marker set".
In some embodiments, the marker panel for assessing the prognosis of bowel cancer in a subject has the following 40 gene markers :CXCL10、AIM2、IFIT3、CXCL13、STAT1、GBP5、CXCL11、CXCL9、CA4、CA1、CLCA4、KRT20、MS4A12、AQP8、CLDN8、SLC26A3、SLC4A4、CA2、ASCL2、VAV3、CELP、EREG、RNF43、AREG、SPINK4、REG4、MLPH、TFF3、MUC2、AQP3、COL3A1、SFRP2、ZEB1、SNAI2、ZEB2、SFRP4、CCDC80、AEBP1、TIMP2 and TWIST1.
In a specific embodiment, the CXCL10 gene has a nucleotide structure as shown in, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to, an ENSG 00000169245.
In a specific embodiment, the AIM2 gene has a nucleotide structure as shown in ENSG00000163568, or a nucleotide sequence having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto.
In a specific embodiment, the GBP5 gene has a nucleotide structure as shown in ENSG00000154451, or a nucleotide sequence that is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the CXCL11 gene has a nucleotide structure as shown in, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to, an ENSG 00000169248.
In a specific embodiment, the KRT20 gene has a nucleotide structure as shown in ENSG00000171431, or a nucleotide sequence having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto.
In a specific embodiment, the SLC26A3 gene has a nucleotide structure as shown in ENSG00000091138, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the CA2 gene has a nucleotide structure as shown in ENSG00000104267, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the ASCL gene has a nucleotide structure as shown in ENSG00000183734, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the VAV3 gene has a nucleotide structure as shown in ENSG00000134215, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the CELP gene has a nucleotide structure as shown in ENSG00000170827, or a nucleotide sequence that is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the RNF43 gene has a nucleotide structure as shown in ENSG00000108375, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the MLPH gene has a nucleotide structure as shown in ENSG00000115648, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the TFF3 gene has a nucleotide structure as shown in ENSG00000160180, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the AQP3 gene has a nucleotide structure as shown in ENSG00000165272, or a nucleotide sequence having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto.
In a specific embodiment, the COL3A1 gene has a nucleotide structure as shown in ENSG00000168542, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the SNAI2 gene has a nucleotide structure as shown in ENSG00000019549, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the CCDC80 gene has a nucleotide structure as shown in ENSG00000091986, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the AEBP1 gene has a nucleotide structure as shown in ENSG00000106624, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the TIMP2 gene has a nucleotide structure as shown in ENSG00000035862, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the TWIST1 gene has a nucleotide structure as shown in ENSG00000122691, or a nucleotide sequence having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto.
Without intending to be bound by a particular theory, it is found that CXCL10, AIM2, IFIT3, CXCL13, STAT1, GBP5, CXCL11, and CXCL9 are beneficial in specifically distinguishing CMS 1-inflammatory types in the CMS40 marker group; CA4, CA1, CLCA4, KRT20, MS4A12, AQP8, CLDN8, SLC26A3, SLC4A4 and CA2 are beneficial for specific differentiation of CMS 2-intestinal epithelial cell types; ASCL2, VAV3, CELP, EREG, RNF43 and AREG are advantageous for specific differentiation of CMS 2-transient proliferative forms; SPINK4, REG4, MLPH, TFF3, MUC2 and AQP3 are beneficial for specific differentiation of CMS 3-cupped; COL3A1, SFRP2, ZEB1, SNAI2, ZEB2, SFRP4, CCDC80, AEBP1, TIMP2 and TWIST1 facilitate specific differentiation of CMS 4-stem.
B. Products for assessing prognosis of intestinal cancer
In another aspect, provided herein is a product or set of products for assessing the prognosis of a bowel cancer in a subject, comprising reagents for detecting each marker of a set of markers from a biological sample of the subject, the set of markers having the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4 (i.e., "CMS14 marker panel").
The term "product set" as used herein is intended to mean a combination of more than one product, e.g., two, three, four or more products. The gene markers in the set of markers described herein may be present separately in different products in the set of products. More than one product of the product group can be combined to assess the prognosis of bowel cancer in a subject.
In some embodiments, the marker set may further comprise the following 6 gene markers on the basis of the CMS14 marker set: CA1, CLCA4, MS4a12, CLDN8, MUC2, and ZEB1 (i.e., forming a "CMS20 marker set"). In some such embodiments, the marker set has the following 20 gene markers :CXCL10、AIM2、GBP5、CXCL11、KRT20、SLC26A3、CA2、ASCL2、VAV3、CELP、RNF43、MLPH、TFF3、AQP3、COL3A1、SNAI2、CCDC80、AEBP1、TIMP2 and TWIST1.
In some embodiments, the marker set may further include the following 20 gene markers :CXCL10、AIM2、GBP5、CXCL9、KRT20、SLC26A3、CA2、ASCL2、VAV3、CELP、RNF43、MLPH、TFF3、AQP3、COL3A1、SNAI2、CCDC80、AEBP1、TIMP2 and TWIST1 (i.e., forming a "CMS40 marker set") on the basis of the CMS20 marker set. In some such embodiments, the marker set has the following 40 gene markers :CXCL10、AIM2、IFIT3、CXCL13、STAT1、GBP5、CXCL11、CXCL9、CA4、CA1、CLCA4、KRT20、MS4A12、AQP8、CLDN8、SLC26A3、SLC4A4、CA2、ASCL2、VAV3、CELP、EREG、RNF43、AREG、SPINK4、REG4、MLPH、TFF3、MUC2、AQP3、COL3A1、SFRP2、ZEB1、SNAI2、ZEB2、SFRP4、CCDC80、AEBP1、TIMP2 and TWIST1.
In some embodiments, the product may be a reagent product or a kit. In some embodiments, the product may be a combination of products selected from the group consisting of: a reagent product or kit.
In some embodiments, the reagents for detecting each marker of the set of markers from the biological sample of the subject can be packaged (e.g., packaged) in a container that the product contains. In some embodiments, the container is a sealed container (e.g., a capped sealed container).
In some embodiments, the reagent for detecting each marker of a set of markers in a biological sample from a subject is a reagent that facilitates detection of expression of each gene marker in the set of markers in the biological sample.
Herein, "expression" includes the production of mRNA from a gene or gene portion and includes the production of a protein encoded by the RNA or gene portion and also includes the presence of a detection substance associated with expression. For example, binding of a cDNA binding ligand (e.g., an antibody) to a gene or other oligonucleotide, protein, or protein fragment, and chromogenic portion of the binding ligand are included within the scope of the term "expressed"; the increase in half-pel density on immunoblots such as Western blots is also within the term "expression" based on biological molecules.
In some embodiments, the agent is an agent capable of detecting mRNA levels of the marker. Such reagents are well known in the art and include, but are not limited to, nucleic acid probes that specifically bind to a target sequence, primers that amplify a target sequence, non-specific fluorescent dyes (e.g., SYBR Green I), or combinations thereof.
In some embodiments, the nucleic acid probe may be a single-labeled nucleic acid probe, such as a radionuclide (e.g., 32P, 3H, 35S, etc.) labeled probe, biotin-labeled probe, horseradish peroxidase-labeled probe, digoxin-labeled probe, or a fluorophore (e.g., FITC, FAM, TET, HEX, TAMRA, cy, cy5, etc.) labeled probe; the nucleic acid probe may also be a double-labeled nucleic acid probe, such as a Taqman probe, a molecular beacon, a displacement probe, a scorpion primer probe, a QUAL probe, a FRET probe, or the like.
In some embodiments, the agent is an agent capable of detecting the protein level of the marker. In some embodiments, the reagent for protein level of the marker comprises a reagent required for immunological detection; the immunological assay is selected from ELISA assay, elispot assay, western blot or surface plasmon resonance. Reagents required for immunological assays are well known in the art and include, but are not limited to, antibodies, targeting polypeptides capable of specifically binding to at least one of ZNF33B, PRKX, LEF1, FKBP1A, SERPINB8, SULT1B 1.
In some embodiments, the reagent carries a detectable label, such as an enzyme (e.g., horseradish peroxidase, alkaline phosphatase, etc.), a radionuclide (e.g., 3H, 125I, 35S, 14C, 32P, etc.), a fluorescent dye (e.g., FITC, TRITC, PE, texas Red, quantum dots, cy7, alexa 750, etc.), an acridine ester compound, magnetic beads, colloidal gold or colored glass or plastic (e.g., polystyrene, polypropylene, latex, etc.) beads, and biotin for binding to the label-modified avidin (e.g., streptavidin) described above.
In some embodiments, the product may further comprise a reagent for pre-treating the sample. In some embodiments, reagents for pre-treating a sample include, but are not limited to, the following: a diluent (e.g., phosphate buffer or physiological saline) for diluting the sample; anticoagulants for preventing blood coagulation (e.g., heparin).
In some embodiments, the product further comprises an instrument (e.g., a tool and/or instrument) for detecting the gene expression level of the subject.
In some embodiments, the product further comprises reagents and/or instruments (e.g., tools and/or instruments) to detect other disease markers.
C. system for assessing prognosis of intestinal cancer
In another aspect, provided herein is a system for assessing the prognosis of bowel cancer in a subject, comprising a memory and one or more processors;
wherein the memory comprises:
Each gene expression data for a marker panel from a biological sample of a subject,
CMS typing feature genome expression template data, and
One or more processor-executable instructions;
Wherein the marker panel has the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4 (i.e., "CMS14 marker set"); and
Wherein the one or more processor-executable instructions are configured to:
(a) Obtaining gene expression data of a marker set CMS14 from a biological sample of a subject;
(b) Calculating cosine distances between the gene expression data in (a) and the CMS typing characteristic genome expression templates;
(c) Determining CMS typing based on the computed cosine distances of (b); and
(D) Determining a prognosis of the bowel cancer in the subject based on the CMS type obtained in (c).
In some embodiments, the marker set may further comprise the following 6 gene markers on the basis of the CMS14 marker set: CA1, CLCA4, MS4a12, CLDN8, MUC2, and ZEB1 (i.e., forming a "CMS20 marker set"). In some such embodiments, the marker set has the following 20 gene markers :IFIT3、CXCL13、STAT1、CXCL9、CA4、CA1、CLCA4、MS4A12、AQP8、CLDN8、SLC4A4、EREG、AREG、SPINK4、REG4、MUC2、SFRP2、ZEB1、ZEB2 and SFRP4.
In some embodiments, the marker set may further include the following 20 gene markers :CXCL10、AIM2、GBP5、CXCL11、KRT20、SLC26A3、CA2、ASCL2、VAV3、CELP、RNF43、MLPH、TFF3、AQP3、COL3A1、SNAI2、CCDC80、AEBP1、TIMP2 and TWIST1 (i.e., forming a "CMS40 marker set") on the basis of the CMS20 marker set. In some such embodiments, the marker set has the following 40 gene markers :CXCL10、AIM2、IFIT3、CXCL13、STAT1、GBP5、CXCL11、CXCL9、CA4、CA1、CLCA4、KRT20、MS4A12、AQP8、CLDN8、SLC26A3、SLC4A4、CA2、ASCL2、VAV3、CELP、EREG、RNF43、AREG、SPINK4、REG4、MLPH、TFF3、MUC2、AQP3、COL3A1、SFRP2、ZEB1、SNAI2、ZEB2、SFRP4、CCDC80、AEBP1、TIMP2 and TWIST1.
In some embodiments, the bowel cancer is colorectal cancer. In some embodiments, the intestinal cancer is stage II/III colorectal cancer. In some embodiments, the bowel cancer may be selected from rectal cancer, left-half colon cancer, or right-half colon cancer.
In some embodiments, each gene expression data of a marker set from a biological sample of a subject is derived from RNA sequencing data of the sample tissue.
Sequencing may include any suitable sequencing technique known to those skilled in the art. In some embodiments, sequencing comprises high throughput sequencing.
In some embodiments, each gene expression data for a marker panel from a biological sample of a subject is obtained via data normalization processing based on RNA sequencing data for the sample tissue.
Normalization may include any suitable normalization method known to those skilled in the art. In some embodiments, the normalization comprises quantile normalization. In some embodiments, the quantile normalization comprises calculating LOG2 values of the original sequenced molecular numbers for each sample, then sorting the LOG2 molecular numbers for each sample, calculating an arithmetic average of all sample LOG2 molecular numbers corresponding to the order, and replacing the LOG2 values to form a normalized corrected molecular number matrix.
In some embodiments, CMS typing may include one or more of the following: CMS 1-inflammatory, CMS 2-transiently proliferative, CMS 2-intestinal epithelial cell, CMS 3-goblet-shaped and CMS 4-dry.
In some embodiments, the system comprises the following modules:
sequencing library building block: the module is used for constructing a sequencing library from sample RNA;
quantitative sequencing module: the module is used for quantifying and sequencing the sequencing library;
And a data normalization module: the module is used for carrying out data normalization on quantitative and sequencing results;
CMS molecular typing Module: the module is used for molecular typing of the data normalization result.
D. Computer readable medium
In another aspect, provided herein is a computer-readable medium comprising:
Each gene expression level data for a marker panel of biological samples from a subject, wherein the marker panel has the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4 (i.e., "CMS14 marker panel")
CMS typing feature genome expression template data, and
Instructions for performing a method comprising:
(a) Obtaining gene expression level data of a marker set of a biological sample from a subject;
(b) Calculating cosine distances between the gene expression quantity data in the step (a) and the CMS typing characteristic genome expression templates;
(c) Determining CMS typing based on the computed cosine distances of (b); and
(D) Determining a prognosis of the bowel cancer in the subject based on the CMS type obtained in (c).
In some embodiments, the marker set may further comprise the following 6 gene markers on the basis of the CMS14 marker set: CA1, CLCA4, MS4a12, CLDN8, MUC2, and ZEB1 (i.e., forming a "CMS20 marker set"). In some such embodiments, the marker set has the following 20 gene markers :IFIT3、CXCL13、STAT1、CXCL9、CA4、CA1、CLCA4、MS4A12、AQP8、CLDN8、SLC4A4、EREG、AREG、SPINK4、REG4、MUC2、SFRP2、ZEB1、ZEB2 and SFRP4.
In some embodiments, the marker set may further include the following 20 gene markers :CXCL10、AIM2、GBP5、CXCL9、KRT20、SLC26A3、CA2、ASCL2、VAV3、CELP、RNF43、MLPH、TFF3、AQP3、COL3A1、SNAI2、CCDC80、AEBP1、TIMP2 and TWIST1 (i.e., forming a "CMS40 marker set") on the basis of the CMS20 marker set. In some such embodiments, the marker set has the following 40 gene markers :CXCL10、AIM2、IFIT3、CXCL13、STAT1、GBP5、CXCL11、CXCL9、CA4、CA1、CLCA4、KRT20、MS4A12、AQP8、CLDN8、SLC26A3、SLC4A4、CA2、ASCL2、VAV3、CELP、EREG、RNF43、AREG、SPINK4、REG4、MLPH、TFF3、MUC2、AQP3、COL3A1、SFRP2、ZEB1、SNAI2、ZEB2、SFRP4、CCDC80、AEBP1、TIMP2 and TWIST1.
In some embodiments, the bowel cancer is colorectal cancer. In some embodiments, the intestinal cancer is stage II/III colorectal cancer. In some embodiments, the bowel cancer may be selected from rectal cancer, left-half colon cancer, or right-half colon cancer.
In some embodiments, each gene expression data of a marker set from a biological sample of a subject is derived from RNA sequencing data of the sample tissue.
Sequencing may include any suitable sequencing technique known to those skilled in the art. In some embodiments, sequencing comprises high throughput sequencing.
In some embodiments, each gene expression data for a marker panel from a biological sample of a subject is obtained via data normalization processing based on RNA sequencing data for the sample tissue.
Normalization may include any suitable normalization method known to those skilled in the art. In some embodiments, the normalization comprises quantile normalization. In some embodiments, the quantile normalization comprises calculating LOG2 values of the original sequenced molecular numbers for each sample, then sorting the LOG2 molecular numbers for each sample, calculating an arithmetic average of all sample LOG2 molecular numbers corresponding to the order, and replacing the LOG2 values to form a normalized corrected molecular number matrix.
In some embodiments, CMS typing may include one or more of the following: CMS 1-inflammatory, CMS 2-transiently proliferative, CMS 2-intestinal epithelial cell, CMS 3-goblet-shaped and CMS 4-dry.
In another aspect, provided herein is an electronic device loaded with the computer-readable medium.
E. Use of the same
In another aspect, provided herein is the use of a reagent for detecting each marker of a marker panel of a biological sample from a subject in the manufacture of a product or product panel for assessing the prognosis of bowel cancer in a subject, wherein the marker panel has the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4 (i.e., "CMS14 marker panel").
In another aspect, provided herein is a use of a product or set of products comprising reagents for detecting each marker of a set of markers from a biological sample of a subject for assessing the prognosis of bowel cancer in a subject, wherein the set of markers has the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4 (i.e., "CMS14 marker panel").
In some embodiments, the marker set may further comprise the following 6 gene markers on the basis of the CMS14 marker set: CA1, CLCA4, MS4a12, CLDN8, MUC2, and ZEB1 (i.e., forming a "CMS20 marker set"). In some such embodiments, the marker set has the following 20 gene markers :IFIT3、CXCL13、STAT1、CXCL9、CA4、CA1、CLCA4、MS4A12、AQP8、CLDN8、SLC4A4、EREG、AREG、SPINK4、REG4、MUC2、SFRP2、ZEB1、ZEB2 and SFRP4.
In some embodiments, the marker set may further include the following 20 gene markers :CXCL10、AIM2、GBP5、CXCL9、KRT20、SLC26A3、CA2、ASCL2、VAV3、CELP、RNF43、MLPH、TFF3、AQP3、COL3A1、SNAI2、CCDC80、AEBP1、TIMP2 and TWIST1 (i.e., forming a "CMS40 marker set") on the basis of the CMS20 marker set. In some such embodiments, the marker set has the following 40 gene markers :CXCL10、AIM2、IFIT3、CXCL13、STAT1、GBP5、CXCL11、CXCL9、CA4、CA1、CLCA4、KRT20、MS4A12、AQP8、CLDN8、SLC26A3、SLC4A4、CA2、ASCL2、VAV3、CELP、EREG、RNF43、AREG、SPINK4、REG4、MLPH、TFF3、MUC2、AQP3、COL3A1、SFRP2、ZEB1、SNAI2、ZEB2、SFRP4、CCDC80、AEBP1、TIMP2 and TWIST1.
The inventor determines a gene list related to CMS typing through a large number of screening analysis works through long-term and deep analysis research, performs differential expression gene analysis on expression data to obtain differential expression genes of each CMS type, analyzes and selects over-expressed genes for each CMS type, and refines the genes to obtain the specific marker group. The inventors have surprisingly found that with the marker panel described herein, a better CMS typing effect can be obtained with a smaller number of markers and can be used for accurate assessment of the prognosis of intestinal cancer. Thus, low cost, high efficiency, high quality prognostic evaluation means for bowel cancer are provided herein.
Examples
The invention is further elucidated below in connection with specific exemplary embodiments. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. Appropriate modifications and variations of the invention may be made by those skilled in the art, and are within the scope of the invention.
EXAMPLE 1 screening of prognostic markers for intestinal cancer
The present study uses 1,116 colorectal cancer patient samples from 3 hospitals in the upper ocean for the screening and analysis of prognostic markers. FIG. 1 shows a general method for screening prognostic markers for intestinal cancer, comprising the following steps:
(one) determining a list of genes associated with CMS typing
1) Data conversion and normalization: aiming at RNA sequencing data of fresh tissues and FFPE tissues, a quantile normalization method is adopted for pretreatment, and the specific flow is as follows: due to the different technical characteristics of fresh tissue and FFPE tissue during sample preparation and sequencing, there is also a significant difference in the distribution of raw sequencing data molecular counts. To eliminate technical bias caused by sample sources (e.g., fresh tissue or FFPE), the data of samples from different sources need to be normalized to be comparable. First, the LOG value (LOG 2 value) of the raw sequencing molecule count for each sample is calculated to make the value distribution more symmetric. Considering that zero values exist, the zero values are added together by a small number of 0.25 to avoid becoming NA or missing values after taking the logarithm. The LOG2 values for each sample are then sorted and the average of LOG2 values for all samples at the same order position is calculated. For each gene, the original LOG2 value was replaced with its sequential average of LOG2 values in each sample. Thus, samples from different sources are processed by the same normalization method, and the result values are subjected to the same distribution. This eliminates the technical bias of sample sources, and more reliable conclusions can be drawn by performing differential analysis between different groupings based on the normalized data.
2) Construction of CMS templates: the CMS typing model uses the changes in expression (e.g., up-or down-regulation) of characteristic gene groups in each CMS type associated with gene pathway activity, signaling, and cellular biological activity processes to determine whether a test sample has a characteristic pattern associated with a particular CMS type. CMS typing herein is based on the reported 786 genes (SADANANDAM, A. (2013) Nat Med.19 (5): 619-25), and their related information (including updated information for these genes in NCBI et al databases), and subsequently discovered related gene groups, comprehensively screening and building gene sequencing combinations to train building of a typing model. The typing model algorithm uses nearest template prediction (NEAREST TEMPLATE Predictions) to calculate the cosine distance between the normalized molecular number value distribution of the characteristic genome of each sample and the CMS typing characteristic genome expression template, using the expression changes (e.g., up-or down-regulation) of the characteristic genome in each CMS typing for the gene pathway activity, signaling and cell biological activity.
The cosine distance is defined as the feature distance, which is the default feature distance of the nearest template prediction model. d indicates that the sample is closest to the characteristic genomic expression template of a certain CMS type, i.e. most likely belonging to that type. As a typing P value for the metric statistical significance test, a random permutation test was used. The P value of the significance test is calculated by randomly extracting the characteristic genes (default value is 1000 times) to generate random distribution of characteristic distances, comparing the distance between the detected sample and the parting characteristic template with the randomly generated distance distribution and correcting the False Discovery Rate (FDR). Smaller P values indicate a stronger statistical significance for the shortest cosine feature distance, representing a more reliable CMS typing of the prediction (typically the threshold for statistical significance P is P < 0.05).
Marker genes associated with CMS typing are predefined. Without loss of generality, it is assumed that group a genes (nA) up-regulate expression in CMS type a, but not or down-regulate expression in CMS type B. Likewise, group B genes (nB) up-regulate expression in CMS type B, but not or down-regulate expression in CMS type a. The group A plus group B genes constitute the characteristic genome templates A and B for the A-and B-type genotyping gene expression patterns of CMS. And extracting normalized values of nA+nB characteristic genes from N genes of the detected sample, comparing the normalized values with the A template and the B template respectively, and calculating the characteristic cosine distance relative to A or B. The closest template's typing becomes the predicted CMS typing. In calculating the statistical significance of the feature distance, na+nb genes were randomly repeated 1000 times from the N genes, resulting in a zero distribution of feature distance d. And calculating a calibration P value of the statistical significance by comparing the characteristic distance of the tested sample with zero distribution. Red and blue in the heatmap represent up-and down-regulated gene expression, respectively. CMS typing of all samples is classified into the following five types: CMS 1-inflammatory, CMS 2-intestinal epithelial cell, CMS 2-transiently proliferative, CMS 3-caliciviform and CMS 4-stem.
(II) differential expression gene analysis of the expression data to obtain differential expression genes of each CMS type
Taking 838 samples of a certain trime hospital in Shanghai as a training data set for screening a prognosis marker set; performing Differential Expression Gene (DEG) analysis by using R package limma and adopting eBayes algorithm, and obtaining the differential expression genes of each CMS type through the steps of reading expression data, setting an expression data matrix, converting data by voom functions, fitting a linear model by lmFit, adjusting variance and p value by eBayes, obtaining topTable results, drawing MD (machine direction) diagrams, drawing volcanic diagrams (such as figure 2) and the like.
(III) selection of overexpressed genes for each CMS type to give the marker set CMS40
Selection of overexpressed genes for each CMS type: genes with fold changes greater than 1.3 were first selected from the differential expression marker panel for each CMS type, and up to 10 genes were selected for each CMS type. Then, through gene ontology annotation analysis, 40 genes were finally determined to represent 5 CMS types, named "CMS40" marker set. As shown in table 1, the CMS40 marker group comprises :CXCL10、AIM2、IFIT3、CXCL13、STAT1、GBP5、CXCL11、CXCL9、CA4、CA1、CLCA4、KRT20、MS4A12、AQP8、CLDN8、SLC26A3、SLC4A4、CA2、ASCL2、VAV3、CELP、EREG、RNF43、AREG、SPINK4、REG4、MLPH、TFF3、MUC2、AQP3、COL3A1、SFRP2、ZEB1、SNAI2、ZEB2、SFRP4、CCDC80、AEBP1、TIMP2 and TWIST1. Among them, CXCL10, AIM2, IFIT3, CXCL13, STAT1, GBP5, CXCL11 and CXCL9 can specifically distinguish CMS 1-inflammatory (cms1_ Inflammatory) types; CA4, CA1, CLCA4, KRT20, MS4A12, AQP8, CLDN8, SLC26A3, SLC4A4 and CA2 are able to distinguish specifically CMS 2-intestinal epithelial cells (CMS2_ Enterocyte) type; ASCL2, VAV3, CELP, EREG, RNF43 and AREG are able to distinguish specifically CMS 2-transient proliferation (cms2_ TRANSIT AMPLIFYING) types; SPINK4, REG4, MLPH, TFF3, MUC2 and AQP3 can specifically distinguish CMS 3-cup (cms3_goblet like); COL3A1, SFRP2, ZEB1, SNAI2, ZEB2, SFRP4, CCDC80, AEBP1, TIMP2 and TWIST1 can specifically distinguish CMS 4-Stem like types.
Table 1 list of 40 significant differential genes with fold changes greater than 1.3
(IV) further comprehensively verifying the marker group obtained by screening and taking intersection to obtain a marker group CMS20
We compared the intersection of the marker combination (40 candidate genes) obtained by the above screening with the verified intestinal cancer molecular typing genes (38 candidate genes as positive control marker group), and the intersection obtained contained 20 gene markers. These 20 consensus genes are gene markers identified as being associated with colorectal cancer by both algorithmic prediction and empirical verification, which are more accurate and reliable than either algorithmic prediction or empirical verification alone. These 20 genes were assigned to the high confidence intestinal cancer molecular typing marker group, named "CMS20" marker group. Specifically, the CMS20 marker panel comprises :IFIT3、CXCL13、STAT1、CXCL9、CA4、CA1、CLCA4、MS4A12、AQP8、CLDN8、SLC4A4、EREG、AREG、SPINK4、REG4、MUC2、SFRP2、ZEB1、ZEB2 and SFRP4. Of these, IFIT3, CXCL13, STAT1 and CXCL9 are able to specifically distinguish CMS 1-inflammatory (cms1_ Innammatory) types; CA4, CA1, CLCA4, MS4A12, AQP8, CLDN8 and SLC4A4 are able to distinguish specifically CMS 2-intestinal epithelial cells (CMS2_ Entcrocyte); EREG and AREG can specifically distinguish CMS 2-transient proliferation (CMS2_ TRANSIT AMPLIFYING) types; SPINK4, REG4, and MUC2 can specifically distinguish CMS 3-cup (cms3_goblet like) types; SFRP2, ZEB1, ZEB2 and SFRP4 can specifically distinguish CMS 4-Stem like types.
TABLE 2 CMS20 marker set
CMS type Gene EntrezID Fold change t P value adj.P.Val
CMS1_Inflammatory IFIT3 3437 1.933130837 9.657039362 1.90086E-21 1.81612E-20
CMS1_Inflammatory CXCL13 10563 2.529919558 14.07397641 2.38204E-42 3.77791E-40
CMS1_Inflammatory STAT1 6772 2.518828709 13.92543994 1.51279E-41 1.49955E-39
CMS1_Inflammatory CXCL9 4283 2.398874858 13.11158448 2.89751E-37 2.08884E-35
CMS2_Enterocyte CA4 762 2.366377764 17.35527122 1.23907E-61 9.82581E-59
CMS2_Enterocyte CA1 759 2.340091454 17.08073666 6.50317E-60 2.57851E-57
CMS2_Enterocyte CLCA4 22802 2.249182095 16.21269278 1.34659E-54 3.55949E-52
CMS2_Enterocyte MS4A12 54860 2.215540592 15.90077477 9.83181E-53 1.55933E-50
CMS2_Enterocyte AQP8 343 2.197753239 15.6196028 4.47165E-51 5.91003E-49
CMS2_Enterocyte CLDN8 9073 2.118084287 14.70039096 8.33649E-46 8.26355E-44
CMS2_Enterocyte SLC4A4 8671 2.091970501 14.62326613 2.25184E-45 1.98412E-43
CMS2_Transit.amplifying EREG 2069 2.058542252 14.52587039 7.85432E-45 1.03808E-42
CMS2_Transit.amplifying AREG 374 2.004373762 13.90426091 1.96663E-41 1.94942E-39
CMS3_Goblet.like SPINK4 27290 2.530018071 12.07638181 4.11256E-32 3.26126E-29
CMS3_Goblet.like REG4 83998 2.50208751 11.93404367 1.97628E-31 7.83596E-29
CMS3_Goblet.like MUC2 4583 2.110217066 9.715236529 1.11207E-21 2.20468E-19
CMS4_Stem.like SFRP2 6423 1.649504053 9.299433542 4.81886E-20 4.19929E-19
CMS4_Stem.like ZEB1 6935 1.570443115 8.203515337 4.96341E-16 3.0992E-15
CMS4_stem.like ZEB2 9839 1.587685106 8.452358663 6.65993E-17 4.51395E-16
CMS4_Stem.like SFRP4 6424 2.205499829 15.19508982 1.29815E-48 1.71572E-46
(V) use of correlation analysis to find surrogate relationship between prognostic markers
Surrogate relationships among 40 prognostic markers in CMS40 were found using correlation analysis, further refining the marker set: pcarson correlation coefficients between the markers are calculated using corrplot functions in the R language, forming a correlation coefficient matrix graph (see fig. 3). From the correlation coefficient matrix diagram the following can be concluded:
The correlation coefficient of CA1 and CA4 is 0.74, and CA1 and CA4 can be replaced with each other;
The correlation coefficient of CA1 and CA2 is 0.69, and CA1 and CA2 can be replaced with each other;
c, the correlation coefficient of the CA1 and the CLCA4 is 0.69, and the CA1 and the CLCA4 can be replaced with each other;
The correlation coefficient of the CA1 and the MS4A12 is 0.75, and the CA1 and the MS4A12 can be replaced with each other;
the correlation coefficient of CA1 and CLDNB is 0.71, and CA1 and CLDNB can be replaced with each other;
The correlation coefficient of the CA4 and the MS4A12 is 0.69, and the CA4 and the MS4A12 can be replaced with each other;
The correlation coefficient of the CA4 and the CLCA4 is 0.67, and the CA4 and the CLCA4 can be replaced with each other;
the correlation coefficient of the CA2 and the MS4A12 is 0.70, and the CA2 and the MS4A12 can be replaced with each other;
The correlation coefficient of CLCA4 and MS4A12 is 0.70, CLCA4 and MS4A12 can be replaced with each other;
The correlation coefficient of REG4 and SPINK4 is 0.69, REG4 and SPINK4 can be replaced with each other;
the correlation coefficient of REG4 and MUC2 is 0.67, REG4 and MUC2 can be replaced with each other;
The correlation coefficient of spink4 and MUC2 is 0.75, spink4 and MUC2 can be substituted for each other;
the correlation coefficient of EREG and AREG is 0.73, and the EREG and the AREG can be replaced with each other;
the correlation coefficient among CXCL9, CXCL10 and CXCL11 is at least 0.69, and the three can be replaced mutually.
(Six) further refining the marker set based on the substitution relationship between the markers to obtain a marker set CMS14
On the basis of CMS20, the set of markers is further refined by combining the correlation results and the difference analysis results between genes to obtain a list of 14 genes, named "CMS14" marker set. Specifically, the CMS14 marker panel comprises: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4. Of these, IFIT3, CXCL13, STAT1 and CXCL9 are able to specifically distinguish CMS 1-inflammatory (cms1_ Inflammatory) types; CA4, AQP8 and SLC4A4 are able to specifically differentiate CMS 2-intestinal epithelial cell (CMS2_ Enterocyte) types; AREG and EREG can specifically distinguish CMS 2-transient proliferation (CMS2_ TRANSIT AMPLIFYING) types; SPINK4 and REG4 can specifically distinguish CMS 3-cup (cms3_goblet like) types; SFRP2, ZEB2 and SFRP4 are able to distinguish specifically CMS 4-Stem like types.
TABLE 3 list of CMS14 genes
Seventh, all 1,116 colorectal cancer samples (including 278 of the other 2 hospitals) were used to assess CMS molecular typing ability of the marker group
CMS molecules were typed for all samples using CMS40, CMS20, CMS14 marker binding, respectively, based on 1,116 sample data. At the same time, 1,116 samples of relapse free survival data were collected, including relapse status and time.
The recurrence-free survival analysis was performed with CMS molecular typing results, recurrence status and time for all samples obtained using CMS40 marker panel. The results are shown in FIG. 4, wherein the abscissa represents time to live without recurrence in months; the ordinate indicates survival without recurrence. The light green curve represents cms1_ Inflammatory, the yellow curve represents cms2_ Enterocyte, the blue curve represents cms2_transmission. The five curves are closer when the survival rate is close to 1, and when the survival rate is reduced to be close to 0, the divergence degree between the curves is increased, which indicates that obvious prognosis differences exist between different groups. The overall differences in the five curves were significant (p < 0.0001), suggesting that the CMS40 signature gene can be an important biomarker for accurate prediction of colorectal cancer patient survival.
The recurrence-free survival analysis was performed with CMS molecular typing results, recurrence status and time for all samples obtained with CMS20 marker panel. The results are shown in FIG. 5, wherein the abscissa represents time to live without recurrence in months; the ordinate indicates survival without recurrence. The light green curve represents cms1_ Inflammatory, the yellow curve represents cms2_ Enterocyte, the blue curve represents cms2_transmission. The five curves are closer when the survival rate is close to 1, and when the survival rate is reduced to be close to 0, the divergence degree between the curves is increased, which indicates that obvious prognosis differences exist between different groups. The overall differences in the five curves were significant (p < 0.017), indicating that the CMS20 signature gene can be used as an important biomarker for accurate prediction of colorectal cancer patient survival.
Disease-free survival analysis was performed with CMS molecular typing results, disease status and time for all samples obtained using the CMS14 marker set. The results are shown in FIG. 6, wherein the abscissa represents disease-free survival time in months; the ordinate is disease-free survival. The red curve represents cms1_ Inflammatory, the blue curve represents cms2_ Enterocyte, the green curve represents cms2_transit.mapping, the orange curve represents cms3_goblet.like, and the purple curve represents cms4_stem.like. The five curves are closer when the survival rate is close to 1, and when the survival rate is reduced to be close to 0, the divergence degree between the curves is increased, which indicates that obvious prognosis differences exist between different groups. The overall differences in the five curves were significant (p=0.0015), indicating that CMS14 signature genes can be used as important biomarkers for accurate prediction of colorectal cancer patient survival.
Example 2 molecular typing to assess the prognosis of intestinal cancer
Sequencing library construction
(1) Sample RNA extraction
RNA extraction and purification of FFPE samples was performed using RNA extraction kit according to the instructions. The extracted RNA was precisely quantified (Qubit fluorometer is recommended) and the extracted product RNA was stored at-70 ℃.
(2) Sequencing library construction
1) Reverse transcription of RNA samples into cDNA: the RNA sample is synthesized into complementary DNA (cDNA) by reverse transcriptase reaction.
2) Adding a molecular tag: a unique molecular barcode was added to each cDNA sample (molecular barcode). The molecular bar code is a short sequence tag, and is used for accurately calculating the expression level later by marking each original template molecule in the amplification process.
3) Purifying: and purifying the cDNA product marked with the bar code from the reaction system by adopting a magnetic bead or silicon column purification method, and removing impurities such as proteins.
4) First round PCR reaction: a first round of Polymerase Chain Reaction (PCR) was performed. The gene-specific primers are used to amplify the marker genes to obtain a sufficient amount of templates.
5) And (3) purifying a PCR product: and purifying the first round PCR product again to prevent impurities from affecting subsequent reactions. The purification can be carried out by a magnetic bead or silicon column purification method.
6) Second round of linker sequence PCR reaction: a second round of PCR was performed, adding sample multiplex index and sequencing platform universal adaptor sequences for distinguishing samples and binding to sequencing chips.
7) Sequencing library purification: and finally purifying again to obtain a final sequencing library, and purifying by a magnetic bead purification method.
8) Sequencing library quantification: the library is precisely quantified, and the sequencing quantity of the upper machine is controlled, so that each sample is ensured to reach the required sequencing depth. The usual quantitative methods include fluorescent quantitative PCR, chip electrophoresis, and the like.
(II) quantitative data acquisition
The detection technology of the gene expression data acquisition of the intestinal cancer prognosis marker combination comprises, but is not limited to, a real-time fluorescence quantitative qPCR technology, a gene chip technology and a high-throughput full transcriptome (or targeted gene transcriptome) sequencing technology. This example illustrates targeted genome sequencing.
The technological process includes the following steps:
1. And taking any one of CMS14, CMS20 and CMS40 marker combinations as sequencing targets according to the screening result of the intestinal cancer prognosis markers.
2. Total RNA was extracted from tumor samples using RNA extraction kit. The quality and concentration of RNA were assessed.
3. CDNA libraries of this set of target genes were obtained by reverse transcription and PCR amplification. Unique molecular barcodes were added during the amplification process for calculating the expression level.
4. Single-ended or double-ended sequencing was performed on each sample using a high throughput sequencing platform such as Illumina.
5. And comparing the readings to a reference genome, and obtaining an expression quantity matrix of each gene according to the unique molecular tag count.
6. And comprehensively analyzing the expression quantity matrixes of the plurality of samples to obtain a combined prognosis model.
(III) data normalization: to eliminate technical bias caused by sample sources (e.g., fresh tissue or FFPE), the data of samples from different sources need to be normalized to be comparable. First, the LOG value (LOG 2 value) of the raw sequencing molecule count for each sample is calculated to make the value distribution more symmetric. Considering that zero values exist, the zero values are added together by a small number of 0.25 to avoid becoming NA or missing values after taking the logarithm. The LOG2 values for each sample are then sorted and the average of LOG2 values for all samples at the same order position is calculated. For each gene, the original LOG2 value was replaced with its sequential average of LOG2 values in each sample. Thus, samples from different sources are processed by the same normalization method, and the result values are subjected to the same distribution. This eliminates the technical bias of sample sources, and more reliable conclusions can be drawn by performing differential analysis between different groupings based on the normalized data.
(IV) CMS molecular typing: based on the normalized expression matrix, a marker panel of CMS14 or CMS20 or CMS40 was used to calculate cosine similarity of each sample on the 5 characteristic expression patterns of CMS typing (CMS 1 inflammatory, CMS2 intestinal epithelial, CMS2 transient proliferation, CMS3 goblet cell, CMS4 stem cell). The subtype of CMS molecules to which each sample belongs is determined according to the principle of maximum cosine similarity.
EXAMPLE 3 study of the correlation of CMS40 with survival of colon cancer patients
Sample source: FFPE samples from 229 patients with stage III colon cancer from some trimethyl hospital in the Shanghai.
(II) clinical sample data: progression free survival data was collected for 229 samples.
(III) obtaining gene expression data:
sequencing library construction and quantitative data acquisition were performed as in the molecular typing system for evaluation of intestinal cancer prognosis of example 2, resulting in expression data of all 40 signature genes in the CMS40 marker group of 229 clinical samples.
(IV) data normalization: firstly, all data are subjected to Log2 conversion, and then, expression data are normalized by adopting a quantile normalization method.
(V) CMS molecular typing: based on the normalized expression matrix, a marker set of CMS40 was used to calculate cosine similarity of each sample on 5 characteristic expression patterns of CMS typing (CMS 1 inflammatory, CMS2 intestinal epithelial, CMS2 transient proliferation, CMS3 goblet cell, CMS4 stem cell). And determining the CMS subtype to which each sample belongs according to the principle of maximum cosine similarity.
And (six) lifetime analysis:
all samples were divided into three groups according to 5 types of CMS molecules:
① Low risk group: samples of cms1 inflammatory cms1_ Inflammatory and cms2 transient increment cms2_transit. Amplifing are a group;
② Medium risk group: the samples of cms2 intestinal epithelium cms2_ Enterocyte and cms3 goblet cell type cms3_goblet.like are a group;
③ High-risk group: cms4 stem cell type cms4_stem.like samples are a group.
The results of the patient progression free survival analysis are shown in fig. 7, wherein the abscissa is disease free survival time in months; the ordinate is disease-free survival. The red curve in the figure represents samples of the low risk group, consisting of samples of both cms1_ Inflammatory and cms2_transit. The blue curve represents samples of the medium risk group, consisting of samples of both cms2_ Enterocyte and cms3_goblet.like subtypes; the green curve represents the samples of the high risk group, consisting of samples of cms4_stem.like, a CMS subtype. The three curves are closer when the survival rate is close to 1, and when the survival rate is reduced to be close to 0, the divergence degree between the curves is increased, which indicates that obvious prognosis differences exist between different groups. The difference in survival curves of the low-risk group (cms1_ Inflammatory and cms2_transit.mapping) and the medium-risk group (cms2_ Enterocyte and cms3_goblet.like) was significant (p=0.0069), while the difference in survival curves of the low-risk group and the high-risk group (cms4_stem.like) was very significant (p < 0.0001), indicating that the cms40 gene marker group can accurately predict colorectal cancer patient survival as an important biomarker group.
The foregoing description is only of a preferred embodiment of the invention and is not intended to limit the invention in any way or in any way. It should be noted that several modifications and additions will be possible to those skilled in the art without departing from the method of the invention, which modifications and additions should also be considered as within the scope of the invention. Equivalent embodiments of the present invention will be apparent to those skilled in the art having the benefit of the teachings disclosed herein, when considered in the light of the foregoing general description and the following detailed description, and without departing from the spirit and scope of the invention; meanwhile, any equivalent changes, modifications and evolution of the above embodiments according to the essential technology of the present invention still fall within the scope of the technical solution of the present invention.
Meanwhile, the present application uses specific words to describe embodiments of the present application. Reference to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic is associated with at least one embodiment of the application. Thus, it should be emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various positions in this specification are not necessarily referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the application may be combined as suitable.
In some embodiments, numbers describing the components, number of attributes are used, it being understood that such numbers being used in the description of embodiments are modified in some examples by the modifier "about," approximately, "or" substantially. Unless otherwise indicated, "about," "approximately," or "substantially" indicate that the number allows for a 20% variation. Accordingly, in some embodiments, numerical parameters set forth in the specification and claims are approximations that may vary depending upon the desired properties sought to be obtained by the individual embodiments. In some embodiments, the numerical parameters should take into account the specified significant digits and employ a method for preserving the general number of digits. Although the numerical ranges and parameters set forth herein are approximations in some embodiments for use in determining the breadth of the range, in particular embodiments, the numerical values set forth herein are as precisely as possible.

Claims (10)

1. A product or set of products for assessing the prognosis of intestinal cancer in a subject, comprising reagents for detecting each marker of a set of markers from a biological sample of the subject, the set of markers having the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4.
2. A system for assessing a prognosis of bowel cancer in a subject comprising a memory and one or more processors;
wherein the memory comprises:
Each gene expression data for a marker panel from a biological sample of a subject,
CMS typing feature genome expression template data, and
One or more processor-executable instructions;
Wherein the marker panel has the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4; and
Wherein the one or more processor-executable instructions are configured to:
(a) Obtaining gene expression data for the marker set from a biological sample of a subject;
(b) Calculating cosine distances between the gene expression data in (a) and the CMS typing characteristic genome expression templates;
(c) Determining CMS typing based on the computed cosine distances of (b); and
(D) Determining a prognosis of the bowel cancer in the subject based on the CMS type obtained in (c).
3. A computer-readable medium, comprising:
Each gene expression level data for a marker panel of biological samples from a subject, wherein the marker panel has the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2 and SFRP4,
CMS typing feature genome expression template data, and
Instructions for performing a method comprising:
(a) Obtaining gene expression level data of a marker set of a biological sample from a subject;
(b) Calculating cosine distances between the gene expression quantity data in the step (a) and the CMS typing characteristic genome expression templates;
(c) Determining CMS typing based on the computed cosine distances of (b); and
(D) Determining a prognosis of the bowel cancer in the subject based on the CMS type obtained in (c).
4. An electronic device loaded with the computer-readable medium of claim 3.
5. Use of a reagent for detecting each marker of a marker panel of a biological sample from a subject, wherein the marker panel has the following 14 gene markers, in the manufacture of a product or product panel for assessing the prognosis of bowel cancer in a subject: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4.
6. The product or group of products of claim 1, the system of claim 2, the computer readable medium of claim 3, the electronic device of claim 4 or the use of claim 5, wherein the set of markers further has the following 6 genetic markers: CA1, CLCA4, MS4a12, CLDN8, MUC2 and ZEB1.
7. The product or set of products, system, computer readable medium, electronic device or use of claim 6, wherein said set of markers further has the following 20 gene markers :CXCL10、AIM2、GBP5、CXCL11、KRT20、SLC26A3、CA2、ASCL2、VAV3、CELP、RNF43、MLPH、TFF3、AQP3、COL3A1、SNAI2、CCDC80、AEBP1、TIMP2 and TWIST1.
8. The product or group of products of claim 1, the system of claim 2, the computer readable medium of claim 3, the electronic device of claim 4 or the use of claim 5, wherein the bowel cancer is colorectal cancer; specifically, the bowel cancer is stage II/III colorectal cancer.
9. The system of claim 2, comprising the following modules:
sequencing library building block: the module is used for constructing a sequencing library from sample RNA;
quantitative sequencing module: the module is used for quantifying and sequencing the sequencing library;
And a data normalization module: the module is used for carrying out data normalization on quantitative and sequencing results;
CMS molecular typing Module: the module is used for molecular typing of the data normalization result.
10. The product or set of products of claim 1, or the use of claim 5, wherein the reagent is a reagent for detecting expression of each gene marker in the set of markers in the biological sample.
CN202410127289.7A 2024-01-30 2024-01-30 Marker group, product, system and application thereof for prognosis of intestinal cancer Pending CN117947166A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410127289.7A CN117947166A (en) 2024-01-30 2024-01-30 Marker group, product, system and application thereof for prognosis of intestinal cancer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410127289.7A CN117947166A (en) 2024-01-30 2024-01-30 Marker group, product, system and application thereof for prognosis of intestinal cancer

Publications (1)

Publication Number Publication Date
CN117947166A true CN117947166A (en) 2024-04-30

Family

ID=90803393

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410127289.7A Pending CN117947166A (en) 2024-01-30 2024-01-30 Marker group, product, system and application thereof for prognosis of intestinal cancer

Country Status (1)

Country Link
CN (1) CN117947166A (en)

Similar Documents

Publication Publication Date Title
JP6908571B2 (en) Gene expression profile algorithms and tests to quantify the prognosis of prostate cancer
CN103299188B (en) Molecular diagnostic assay for cancer
ES2741745T3 (en) Method to use gene expression to determine the prognosis of prostate cancer
US8822153B2 (en) Molecular diagnosis and typing of lung cancer variants
US20190085407A1 (en) Methods and compositions for diagnosis of glioblastoma or a subtype thereof
US20130337444A1 (en) NANO46 Genes and Methods to Predict Breast Cancer Outcome
JP7301798B2 (en) Gene Expression Profile Algorithm for Calculating Recurrence Scores for Patients with Kidney Cancer
US20140154681A1 (en) Methods to Predict Breast Cancer Outcome
WO2015073949A1 (en) Method of subtyping high-grade bladder cancer and uses thereof
US20210238695A1 (en) Methods of mast cell tumor prognosis and uses thereof
US20200109457A1 (en) Chromosomal assessment to diagnose urogenital malignancy in dogs
US20160115551A1 (en) Methods to predict risk of recurrence in node-positive early breast cancer
US20180371553A1 (en) Methods and compositions for the analysis of cancer biomarkers
JP2022149754A (en) Simultaneous detecting method of cancer
WO2018146162A1 (en) Molecular biomarker for prognosis of sepsis patients
CN101457254B (en) Gene chip and kit for liver cancer prognosis
Musella et al. Use of formalin-fixed paraffin-embedded samples for gene expression studies in breast cancer patients
US20160298198A1 (en) Method for predicting development of melanoma brain metastasis
CN117947166A (en) Marker group, product, system and application thereof for prognosis of intestinal cancer
US10240206B2 (en) Biomarkers and methods for predicting benefit of adjuvant chemotherapy
CN113528670A (en) Biomarker and detection kit for predicting postoperative late-stage recurrence risk of liver cancer patient
CN114507717A (en) Method for predicting bile duct cancer recurrence by combining multiple mRNAs and application thereof
US20170226592A1 (en) Methods and kits used in classifying adrenocortical carcinoma
Sehovic Analysis of Circulating Biomarkers for Minimally Invasive Early Detection of Breast Cancer
CN115369173A (en) Application of gene marker combination in predicting prognosis of bladder urothelial cancer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination