CN113921079A - MSI prediction model construction method based on immune related gene - Google Patents

MSI prediction model construction method based on immune related gene Download PDF

Info

Publication number
CN113921079A
CN113921079A CN202111481486.1A CN202111481486A CN113921079A CN 113921079 A CN113921079 A CN 113921079A CN 202111481486 A CN202111481486 A CN 202111481486A CN 113921079 A CN113921079 A CN 113921079A
Authority
CN
China
Prior art keywords
immune
msi
genes
irmsis
prediction model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111481486.1A
Other languages
Chinese (zh)
Other versions
CN113921079B (en
Inventor
路顺
邓思瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Cancer Hospital
Original Assignee
Sichuan Cancer Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Cancer Hospital filed Critical Sichuan Cancer Hospital
Priority to CN202111481486.1A priority Critical patent/CN113921079B/en
Publication of CN113921079A publication Critical patent/CN113921079A/en
Application granted granted Critical
Publication of CN113921079B publication Critical patent/CN113921079B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • G16B35/20Screening of libraries
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders

Abstract

The invention relates to an MSI prediction model construction method based on immune related genes, which comprises the following steps: collecting a training set and a verification set for constructing an immune-related MSI prediction model irMSIs from a cancer genomic map database; selecting immune related genes from an immunological database, and screening out differential genes from the immune related genes; constructing an immune-related MSI prediction model irMSIs by an LASSO logistic regression algorithm according to the screened differential genes; prognostic risk was validated using an immune-related MSI prediction model irMSIs. The invention provides application of immune-related genes in MSI state prediction, and a group of characteristic genes capable of stably predicting MSI in digestive tract tumors, particularly colon cancer, are found by combining the immune-related genes, and the prognosis risk of the colon cancer can be well predicted.

Description

MSI prediction model construction method based on immune related gene
Technical Field
The invention relates to the technical field of biological information, in particular to a construction method of an MSI prediction model related to colon cancer and based on immune related genes.
Background
In recent years, the immunotherapy of cancer of colon is considered as a non-negligible treatment method, which focuses on achieving the curative effects of recognizing, controlling and eliminating cancer by activating the immune system of human body. Drugs targeting Immune Checkpoint Inhibitors (ICIs), such as cytotoxic T-lymphoid system-associated protein 4(CTLA-4) monoclonal antibody, programmed death inhibitor protein and its ligand (PD-1/PD-L1) monoclonal antibody, have brought new eosin for the treatment of various tumors, including advanced melanoma, non-small cell lung cancer and bladder cancer. Colon cancer patients can also benefit from immunotherapy, and the FDA currently approved PD-1 immunotherapeutic mabs pembrolizumab, ipilimumab and nivolumab in the United states as effective drugs for treating colon cancer patients.
Tumor immunotherapy is one of the first-line treatment schemes, and biomarker selection is particularly important. Microsatellite instability (MSI), one of the hottest biomarkers of interest, refers to the phenomenon of microsatellite sequence length changes due to insertion or deletion mutations during DNA replication, often caused by a defect in mismatch repair function, and is closely associated with the formation of malignant tumors.
In the colon cancer guide issued by NCCN in the United states, it is recommended that MSI testing should be performed in patients with all colon cancer histories to guide clinical medication. Research proves that the sensitivity of advanced colon cancer patients with high microsatellite instability (MSI-H) to ICIs is obviously higher than that of colon cancer patients with stable Microsatellite (MSS)/low microsatellite instability (MSI-L), the colon cancer patients can promote the body immune system to attack and kill tumor cells through the targeted inhibition of PD-1/PD-L1, but the microsatellite instability (MSI) does not directly treat or diagnose tumors. Furthermore, MSI is closely related to the prognosis of colon cancer, which is the prediction of the final outcome of a disease. Compared with MSS/MSI-L patients, MSI-H colon cancer patients have significant survival advantages and poor clinical manifestations, but the overall survival period and disease-free survival period are obviously prolonged.
Therefore, immune-related genes play a crucial role in the occurrence and development of colon cancer, and the traditional method for detecting MSI mainly comprises Immunohistochemistry (IHC) and Polymerase Chain Reaction (PCR), but because IHC and PCR detection means are required to be carried out in large-scale medical institutions, the cost is high, the operation is complex, and the method is difficult to popularize to each patient in clinical practice, timely ICIs treatment cannot be provided for a large number of potential immunotherapy-sensitive patients, and thus the clinical benefit opportunity is lost.
Disclosure of Invention
The invention aims to overcome the defects of the traditional MSI detection method, provides an MSI prediction model construction method based on immune related genes, does not need extra laboratories to carry out IHC and PCR detection analysis, and obtains the immune related genes with differential expression based on a cancer genome map (TCGA) and an immunological database (ImmPort).
In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:
the MSI prediction model construction method based on the immune related gene comprises the following steps:
step S1: collecting a training set and a verification set for constructing an immune-related MSI prediction model irMSIs from a cancer genomic map database;
step S2: selecting immune related genes from an immunological database, and screening out differential genes from the immune related genes;
step S3: constructing an immune-related MSI prediction model irMSIs by an LASSO logistic regression algorithm according to the screened differential genes;
step S4: prognostic risk was validated using an immune-related MSI prediction model irMSIs.
The step of collecting a training set and a validation set for constructing an immune-related MSI prediction model irMSIs from a cancer genomic profile database comprises:
downloading four cancer cohorts from a cancer genomic profile database, four of the cancer cohorts comprising mRNA expression profiles and clinical information of colon cancer COAD, rectal cancer READ, gastric cancer STAD, esophageal cancer ESCA;
the colon cancer COAD queue is used as a training set of a screening and immune-related MSI prediction model irMSIs of differential genes, and other queues are used as a verification set of the immune-related MSI prediction model irMSIs.
The step of selecting immune-related genes from an immunological database and screening differential genes therefrom, comprising:
downloading N immune related genes from an immunological database, and selecting M paired genes for analysis, wherein N is greater than M; using an edgeR software package to screen differential genes between the microsatellite instability high MSI-H group and the microsatellite stability MSS group, or the microsatellite instability high MSI-H group and the microsatellite instability low MSI-L group in a colon cancer COAD cohort, the screening criteria were:
false discovery rate FDR <0.05
|log2(Fold Change)| ≥ 1
Wherein FDR is false discovery rate, the value of which is determined for multiple test adjustments; fold Change represents the Fold difference of counts expression of sequencing data of a certain gene between two groups;
thereby identifying M distinct genes, M < M; the m differential genes include a up-regulated genes and b down-regulated genes, and m = a + b.
The step of constructing an immune-related MSI prediction model irMSIs by a LASSO logistic regression algorithm according to the screened differential genes comprises the following steps:
randomly dividing the colon cancer COAD queue into a training set and a testing set according to the proportion of 7:3, identifying c robust genes by adopting a recursive characteristic elimination random forest algorithm, selecting the first 5 genes with the strongest robust genes as the minimum absolute contraction, and performing score calculation of an LASSO logistic regression algorithm, wherein c is more than or equal to 5;
verifying an immune-related MSI prediction model irMSIs in a test set of a colon cancer COAD queue, a rectal cancer READ queue, a gastric cancer STAD queue and an esophageal cancer ESCA queue; the predictive efficacy of the immune-related MSI predictive model irMSIs was evaluated by the area AUC values under the ROC curve.
In the above scheme, the first 5 genes with the strongest robust genes are selected as TGFBR2 gene, GNLY gene, ULBP2 gene, SEMA5A gene and R3HDML gene, and the coefficients of minimum absolute contraction are-0.077, 0.084, 0.070, -0.064 and-0.055 in order, and then the score calculation of LASSO logistic regression algorithm can be performed:
irMSIs = 0.683-0.077 TGFBR2 expression level + 0.084 GNLY expression level + 0.070 ULBP2 expression level-0.064 SEMA5A expression level-0.055R 3HDML expression level.
The step of validating prognostic risk using an immune-related MSI prediction model, irMSIs, comprises:
in a colon cancer COAD queue, dividing patients into an irMSIs high group and an irMSIs low group according to the fact that an immune-related MSI prediction model irMSIs reaches a critical value of a highest Yoden index of a ROC value;
dividing patients into a high group of micro-satellite stable MSS and micro-satellite unstable low MSS-L and a low group of micro-satellite stable MSS and micro-satellite unstable low MSI-L according to the fact that an immune correlation MSI prediction model irMSIs reaches the median of the highest Yoden index of the ROC value;
based on the cutoff value of the highest Yoden index of the ROC value and the median of the highest Yoden index of the ROC value, the patients were divided into an irMSIs high group, an irMSIs medium group and an irMSIs low group, and the prognosis differences among the three groups of patients were compared.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides application of immune-related genes in MSI state prediction, and a group of characteristic genes capable of stably predicting MSI in digestive tract tumors, particularly colon cancer, are found by combining the immune-related genes, and the prognosis risk of the colon cancer can be well predicted.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a volcano plot of differential genes selected according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating the establishment and evaluation of a predictive model irMSIs in accordance with an embodiment of the present invention; FIG. 3(A) is a parameter diagram of a prediction model irMSIs established by using a LASSO logistic regression algorithm; FIG. 3(B) is a coefficient diagram of a prediction model irMSIs established using the LASSO logistic regression algorithm; FIG. 3(C) is a schematic representation of the evaluation of the predictive model irMSIs in a colon cancer COAD cohort by ROC curves of the training and validation sets; fig. 3(D) is a schematic diagram of evaluation of the predictive model irMSI by ROC curves in colorectal cancer READ, gastric cancer STAD, esophageal cancer ESCA cohorts.
FIG. 4 is a schematic diagram of the survival analysis of OS and DSS between groups according to the embodiment of the present invention; wherein FIG. 4(A) is a schematic representation of OS and DSS survival for MSS/MSI-L in a colon cancer COAD cohort; FIG. 4(B) is a schematic representation of the OS and DSS survival for the MSI-H group; FIG. 4(C) is a graph showing the survival of OS between irMSIs high and low in the colon cancer COAD cohort; FIG. 4(D) is a graph showing the survival of DSS between irMSIs high and low in the colon cancer COAD cohort; FIG. 4(E) is a schematic representation of OS survival between the high group in MSS/MSI-L and the low group in MSS/MSI-L in the colon cancer COAD cohort; FIG. 4(F) is a graph showing the survival of DSS between the high group in MSS/MSI-L and the low group in MSS/MSI-L in the colon cancer COAD cohort; FIG. 4(G) is a graph showing the survival of OS between irMSIs high, irMSIs medium and irMSIs low in the colon cancer COAD cohort; FIG. 4(H) is a graph showing the survival of DSS between irMSIs high, irMSIs medium and irMSIs low in the colon cancer COAD cohort.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Also, in the description of the present invention, the terms "first", "second", and the like are used for distinguishing between descriptions and not necessarily for describing a relative importance or implying any actual relationship or order between such entities or operations.
Example (b):
the invention is realized by the following technical scheme, please refer to fig. 1, and the MSI prediction model construction method based on immune related genes comprises the following steps:
step S1: training and validation sets for constructing immune-related MSI prediction models irMSIs were collected from a cancer genomic profiling database.
Four cancer cohorts including mRNA expression profiles and clinical information of colon cancer COAD (n = 551), rectal cancer READ (n = 177), gastric cancer STAD (n = 407), esophageal cancer ESCA (n = 173) were downloaded from a cancer genomic profile database TCGA (hereinafter TCGA). The colon cancer COAD queue is used as a training set of a screening and immune-related MSI prediction model irMSIs of differential genes, and other queues are used as a verification set of the immune-related MSI prediction model irMSIs.
The fragments per million per kilobase (FKPM) in the cohort were converted to the number of Transcripts Per Million (TPM) and normalized to expression data using 1 and log 2. A total of 1028 samples were included after excluding repeat, recurrent and normal tissue samples, or tissue samples lacking MSI status.
Step S2: immune-related genes are selected from an immunological database, and differential genes are screened from the immune-related genes.
Downloading N immune related genes from an immunological database ImmPort (hereinafter referred to as ImmPort) and selecting M paired genes for analysis, wherein N is more than M; the edgeR software package was used to screen for differential genes in the colon cancer COAD cohort between the group of microsatellite instability high MSI-H and the group of microsatellite stable MSS/group of microsatellite instability low MSI-L.
In this example, 2428 immune-related genes were downloaded, 1229 alleles were selected for further analysis, and the difference genes between the microsatellite instability high MSI-H group and the microsatellite stability MSS group, or the difference genes between the microsatellite instability high MSI-H group and the microsatellite instability low MSI-L group, in the colon cancer COAD cohort in step S1 were screened using the R software package edgeR.
It should be noted that, in the following, the set of high-MSI-H due to microsatellite instability is referred to as MSI-H, the set of MSS due to microsatellite stability is referred to as MSS, the set of low-MSI-L due to microsatellite instability is referred to as MSI-L, and MSS/MSI-L represents the set of MSS due to microsatellite stability or the set of low-MSI-L due to microsatellite instability.
The screening mode is as follows: calculating count-per-million (CPM) of read counts data of original sequencing, normalizing by using a TMM method, and calculating the size factor of each sample; differential expression genes between MSI-H and MSS/MSI-L groups were compared using a likelihood ratio test, where the screening criteria were: the false discovery rate FDR is less than 0.05, | log2 (Fold Change) | ≧ 1. Where FDR is the false discovery rate, which is the P value determined for multiple test adjustments (by the Benjamini-Hochberg method); fold Change represents the Fold difference in counts expression from the sequencing data for a gene between the two groups. Thus, 233 differential genes were identified, including 112 up-regulated genes and 121 down-regulated genes among the 233 differential genes, see the volcano plot shown in fig. 2.
Step S3: and constructing an immune-related MSI prediction model irMSIs by a LASSO logistic regression algorithm according to the screened differential genes.
Randomly dividing the colon cancer COAD queue into a training set and a testing set according to the proportion of 7:3, removing low-variance sparse variables and highly-relevant variables from the 233 identified differential genes by using a 'caret' packet, wherein the variable coefficients are all 0.8, and then identifying 65 robust genes by using a random forest recursive feature elimination algorithm by using a 'randomForest' packet. The first 5 genes with the strongest robust genes as shown in table 1 were selected as the input of the least absolute contraction algorithm (LASSO), see fig. 3(a) and fig. 3(B), and score calculation of the LASSO logistic regression algorithm was performed:
irMSIs = 0.683-0.077 TGFBR2 expression level + 0.084 GNLY expression level + 0.070 ULBP2 expression level-0.064 SEMA5A expression level-0.055R 3HDML expression level.
TABLE 1
Figure 76138DEST_PATH_IMAGE002
The verification of immune-related MSI prediction models irMSIs is carried out in a test set of colon cancer COAD queues, rectal cancer READ queues, stomach cancer STAD queues and esophageal cancer ESCA queues, and the prediction efficiency of the immune-related MSI prediction models irMSIs is evaluated through an area AUC value under an ROC curve. Among them, the AUC value in training set was 0.974 (95% CI: 0.954-0.994), and AUC value in validation set was 0.999 (95% CI: 0.985-1.000), indicating that the immune-related MSI prediction model irMSIs had significant prediction effect.
In addition, referring to FIG. 3(C) and FIG. 3(D), immune-related MSI prediction models irMSIs were also used to predict colorectal cancer READ cohort, gastric cancer STAD cohort, and esophageal cancer ESCA cohort, and AUC values were 0.845 (95% CI: 0.800-0.899), 0.855 (95% CI: 0.608-1.000), and 0.824 (95% CI: 0.582-1.000), respectively.
Step S4: prognostic risk was validated using an immune-related MSI prediction model irMSIs.
In the colon cancer COAD cohort, when patients were divided into irMSIs high and low groups based on the immune-related MSI prediction model irMSIs reaching the critical value of the highest Yoden index with ROC values (0.325), the survival difference between irMSIs high and low groups was not statistically significant, corresponding to the actual MSI status, see fig. 4(a) -4 (D).
And when the patient is divided into a high group of MSS and MSI-L and a low group of MSS and MSI-L according to the fact that the immune-related MSI prediction model irMSIs reaches the median of the highest Yoden index of ROC value, the overall survival OS and the disease-specific survival DSS have significant difference within 5 years. The survival rates of the low groups in MSS and MSI-L were significantly higher than those of the high groups in MSS and MSI-L (OS: P = 0.0063; DSS: P = 0.0026; P indicates the significance of the difference between the two groups in survival analysis), see FIG. 4(E) and FIG. 4 (F).
Therefore, by classifying patients into the irMSIs high group, the irMSIs medium group and the irMSIs low group based on the critical value of the highest Yoden index of the ROC value and the median of the highest Yoden index of the ROC value, the results of comparing the prognosis differences among the three groups of patients showed that the prognosis of the patients in the irMSIs low group was the best and the prognosis of the patients in the irMSIs low group was the worst (OS: P = 0.0130; DSS: P = 0.0055), see FIG. 4(G), FIG. 4 (H).
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (5)

1. The MSI prediction model construction method based on the immune related gene is characterized in that: the method comprises the following steps:
step S1: collecting a training set and a verification set for constructing an immune-related MSI prediction model irMSIs from a cancer genomic map database;
step S2: selecting immune related genes from an immunological database, and screening out differential genes from the immune related genes;
step S3: constructing an immune-related MSI prediction model irMSIs by an LASSO logistic regression algorithm according to the screened differential genes;
step S4: prognostic risk was validated using an immune-related MSI prediction model irMSIs.
2. The method for constructing the MSI prediction model based on the immune-related genes according to claim 1, which is characterized in that: the step of collecting a training set and a validation set for constructing an immune-related MSI prediction model irMSIs from a cancer genomic profile database comprises:
downloading four cancer cohorts from a cancer genomic profile database, four of the cancer cohorts comprising mRNA expression profiles and clinical information of colon cancer COAD, rectal cancer READ, gastric cancer STAD, esophageal cancer ESCA;
the colon cancer COAD queue is used as a training set of a screening and immune-related MSI prediction model irMSIs of differential genes, and other queues are used as a verification set of the immune-related MSI prediction model irMSIs.
3. The method for constructing the MSI prediction model based on the immune-related genes according to claim 2, which is characterized in that: the step of selecting immune-related genes from an immunological database and screening differential genes therefrom, comprising:
downloading N immune related genes from an immunological database, and selecting M paired genes for analysis, wherein N is greater than M; using an edgeR software package to screen differential genes between the microsatellite instability high MSI-H group and the microsatellite stability MSS group, or the microsatellite instability high MSI-H group and the microsatellite instability low MSI-L group in a colon cancer COAD cohort, the screening criteria were:
false discovery rate FDR <0.05
|log2(Fold Change)| ≥ 1
Wherein FDR is false discovery rate, the value of which is determined for multiple test adjustments; fold Change represents the Fold difference of counts expression of sequencing data of a certain gene between two groups;
thereby identifying M distinct genes, M < M; the m differential genes include a up-regulated genes and b down-regulated genes, and m = a + b.
4. The method for constructing an MSI prediction model based on immune-related genes as claimed in claim 3, wherein: the step of constructing an immune-related MSI prediction model irMSIs by a LASSO logistic regression algorithm according to the screened differential genes comprises the following steps:
randomly dividing the colon cancer COAD queue into a training set and a testing set according to the proportion of 7:3, adopting a recursive characteristic elimination random forest algorithm to identify c robust genes, wherein c is more than or equal to 5, selecting the first 5 genes with the strongest robust genes as the input of a least absolute contraction algorithm LASSO, and performing score calculation of the LASSO logistic regression algorithm;
verifying an immune-related MSI prediction model irMSIs in a test set of a colon cancer COAD queue, a rectal cancer READ queue, a gastric cancer STAD queue and an esophageal cancer ESCA queue; the predictive efficacy of the immune-related MSI predictive model irMSIs was evaluated by the area AUC values under the ROC curve.
5. The method for constructing the MSI prediction model based on the immune-related genes as claimed in claim 4, wherein the MSI prediction model comprises the following steps: the step of validating prognostic risk using an immune-related MSI prediction model, irMSIs, comprises:
in a colon cancer COAD queue, dividing patients into an irMSIs high group and an irMSIs low group according to the fact that an immune-related MSI prediction model irMSIs reaches a critical value of a highest Yoden index of a ROC value;
dividing patients into a high group of micro-satellite stable MSS and micro-satellite unstable low MSS-L and a low group of micro-satellite stable MSS and micro-satellite unstable low MSI-L according to the fact that an immune correlation MSI prediction model irMSIs reaches the median of the highest Yoden index of the ROC value;
based on the cutoff value of the highest Yoden index of the ROC value and the median of the highest Yoden index of the ROC value, the patients were divided into an irMSIs high group, an irMSIs medium group and an irMSIs low group, and the prognosis differences among the three groups of patients were compared.
CN202111481486.1A 2021-12-06 2021-12-06 MSI prediction model construction method based on immune related gene Active CN113921079B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111481486.1A CN113921079B (en) 2021-12-06 2021-12-06 MSI prediction model construction method based on immune related gene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111481486.1A CN113921079B (en) 2021-12-06 2021-12-06 MSI prediction model construction method based on immune related gene

Publications (2)

Publication Number Publication Date
CN113921079A true CN113921079A (en) 2022-01-11
CN113921079B CN113921079B (en) 2022-03-18

Family

ID=79248730

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111481486.1A Active CN113921079B (en) 2021-12-06 2021-12-06 MSI prediction model construction method based on immune related gene

Country Status (1)

Country Link
CN (1) CN113921079B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324846A (en) * 2013-06-13 2013-09-25 浙江加州国际纳米技术研究院绍兴分院 Screening method of colorectal cancer treatment prognosis biomarkers
US20190226030A1 (en) * 2018-01-22 2019-07-25 Liquid Biopsy Research LLC Methods for colon cancer detection and treatment
CN110791565A (en) * 2019-09-29 2020-02-14 浙江大学 Prognostic marker gene for colorectal cancer recurrence prediction in stage II and random survival forest model
CN111028223A (en) * 2019-12-11 2020-04-17 大连医科大学附属第一医院 Microsatellite unstable intestinal cancer energy spectrum CT iodine water map image omics feature processing method
CN111304303A (en) * 2020-02-18 2020-06-19 福建和瑞基因科技有限公司 Method for predicting instability of microsatellite and application thereof
US20200273576A1 (en) * 2019-02-26 2020-08-27 Tempus Systems and methods for using sequencing data for pathogen detection
CN112183557A (en) * 2020-09-29 2021-01-05 山西医科大学 MSI prediction model construction method based on gastric cancer histopathology image texture features
CN112687342A (en) * 2020-11-16 2021-04-20 徐同鹏 Application of a group of immune-related molecular markers identified based on TCGA (TCGA) database in esophageal cancer prognosis prediction
CN113421609A (en) * 2021-08-08 2021-09-21 上海市嘉定区中心医院 Colorectal cancer prognosis prediction model based on lncRNA pair and construction method thereof

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324846A (en) * 2013-06-13 2013-09-25 浙江加州国际纳米技术研究院绍兴分院 Screening method of colorectal cancer treatment prognosis biomarkers
US20190226030A1 (en) * 2018-01-22 2019-07-25 Liquid Biopsy Research LLC Methods for colon cancer detection and treatment
US20200273576A1 (en) * 2019-02-26 2020-08-27 Tempus Systems and methods for using sequencing data for pathogen detection
CN110791565A (en) * 2019-09-29 2020-02-14 浙江大学 Prognostic marker gene for colorectal cancer recurrence prediction in stage II and random survival forest model
CN111028223A (en) * 2019-12-11 2020-04-17 大连医科大学附属第一医院 Microsatellite unstable intestinal cancer energy spectrum CT iodine water map image omics feature processing method
CN111304303A (en) * 2020-02-18 2020-06-19 福建和瑞基因科技有限公司 Method for predicting instability of microsatellite and application thereof
CN112183557A (en) * 2020-09-29 2021-01-05 山西医科大学 MSI prediction model construction method based on gastric cancer histopathology image texture features
CN112687342A (en) * 2020-11-16 2021-04-20 徐同鹏 Application of a group of immune-related molecular markers identified based on TCGA (TCGA) database in esophageal cancer prognosis prediction
CN113421609A (en) * 2021-08-08 2021-09-21 上海市嘉定区中心医院 Colorectal cancer prognosis prediction model based on lncRNA pair and construction method thereof

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
TAO CHEN 等: "A gastric cancer LncRNAs model for MSI and survival prediction based on support vector machine", 《BMC GENOMICS》 *
TOSHIAKI WATANABE 等: "Distal Colorectal Cancers with Microsatellite Instability (MSI)Display Distinct Gene Expression Profiles that Are Different from Proximal MSI Cancers", 《CANCER RES》 *
徐莹 等: "基于差异表达基因组合构建高度微卫星不稳定结直肠癌转移预测模型", 《上海交通大学学报(医学版)》 *
梁小龙: "基于TCGA数据库筛选结肠癌枢纽基因及预后风险模型的建立", 《中国优秀博硕士学位论文全文数据库(硕士) 医药卫生科技辑》 *

Also Published As

Publication number Publication date
CN113921079B (en) 2022-03-18

Similar Documents

Publication Publication Date Title
US20210381062A1 (en) Nasal epithelium gene expression signature and classifier for the prediction of lung cancer
Robertson et al. Comprehensive molecular characterization of muscle-invasive bladder cancer
Chan et al. Assessment of myometrial transcriptome changes associated with spontaneous human labour by high‐throughput RNA‐seq
Mitra et al. Prediction of postoperative recurrence-free survival in non–small cell lung cancer by using an internationally validated gene expression model
ES2656487T3 (en) Evaluation of the response to therapy of gastroenteropancreatic neuroendocrine neoplasms (GEP-NEN)
JP2014509189A (en) Colon cancer gene expression signature and methods of use
Kwon et al. Prognosis of stage III colorectal carcinomas with FOLFOX adjuvant chemotherapy can be predicted by molecular subtype
WO2019204576A1 (en) Methods and kits for diagnosis and triage of patients with colorectal liver metastases
AU2015317893B2 (en) Compositions, methods and kits for diagnosis of a gastroenteropancreatic neuroendocrine neoplasm
EP3149209B1 (en) Methods for typing of lung cancer
CN112779338B (en) Gene marker for esophageal cancer prognosis evaluation
WO2020237184A1 (en) Systems and methods for determining whether a subject has a cancer condition using transfer learning
WO2019157345A1 (en) Compositions and methods for characterizing bladder cancer
US20210358626A1 (en) Systems and methods for cancer condition determination using autoencoders
CN113066585A (en) Method for efficiently and quickly evaluating prognosis of stage II colorectal cancer patient based on immune gene expression profile
CN115410713A (en) Hepatocellular carcinoma prognosis risk prediction model construction based on immune-related gene
JP2019514344A (en) Epigenetic profiling of cancer
JP6181638B2 (en) Genomic signature of metastasis in prostate cancer
WO2014066984A1 (en) Method for identifying a target molecular profile associated with a target cell population
Rahmatallah et al. Platform-independent gene expression signature differentiates sessile serrated adenomas/polyps and hyperplastic polyps of the colon
CN113921079B (en) MSI prediction model construction method based on immune related gene
Bao et al. Comprehensive analysis of the function, immune profiles, and clinical implication of m1A regulators in lung adenocarcinoma
CN114507732B (en) Composition for evaluating cell aging characteristics in tissues and application thereof
CN109609649B (en) lncRNA for diagnosing and treating rectal adenocarcinoma
US20230348990A1 (en) Prognostic and treatment response predictive method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant