CN112133365B

CN112133365B - Gene set for evaluating tumor microenvironment, scoring model and application of gene set

Info

Publication number: CN112133365B
Application number: CN202010918662.2A
Authority: CN
Inventors: 廖旺军; 曾东强
Original assignee: Southern Hospital Southern Medical University
Current assignee: Guangdong Longyu Medical Technology Co.,Ltd.
Priority date: 2020-09-03
Filing date: 2020-09-03
Publication date: 2022-05-10
Anticipated expiration: 2040-09-03
Also published as: CN112133365A

Abstract

The invention discloses a gene set for evaluating a tumor microenvironment, a scoring model and application of the gene set and the scoring model, and belongs to the field of biological information. The invention is used for the first time to prove that the gastric cancer has obvious tumor microenvironment classification, and the prognosis difference of gastric cancer patients with different classifications is obvious. The invention provides a construction method of a polygene scoring model for evaluating gastric cancer tumor microenvironment, which is characterized in that 44 screened gene sets capable of representing different microenvironment typing are obtained, the prediction efficiency of the polygene scoring model of the tumor microenvironment corresponding to the gene sets is remarkably better than that of other existing models, and the model is not only a prognosis marker of a gastric cancer patient, but also has quite remarkable prediction effect in cancer generic. The model of the invention can be used as a good prognosis biomarker by effectively evaluating the tumor microenvironment, can also be used for predicting the immunotherapy response of checkpoint inhibitors of various tumors, and has important significance and clinical application value for better screening of the benefit population of immunotherapy.

Description

Gene set for evaluating tumor microenvironment, scoring model and application of gene set

Technical Field

The invention belongs to the field of biological information, and relates to a gene set and a scoring model for evaluating a tumor microenvironment and application of the gene set and the scoring model.

Background

The morbidity and mortality of the gastric cancer in China are very high, about 40 ten thousand new cases in China each year account for 42% of the total cases in the world, and the gastric cancer patients in China are more frequent in late stage and have the mortality rate at the third position of all tumors. Gastric cancer has numerous prognostic factors, and gastric cancer patients with the same clinical stage and histological grade receive the same treatment regimen, and the prognosis may also vary. In recent years, immunotherapy represented by immune checkpoint inhibitors breaks the situation that the traditional chemotherapy and targeted therapy monopolize the treatment scheme of the gastric cancer, and obviously improves the survival benefit of partial gastric cancer patients. However, the biggest challenge of the therapy lies in that the curative effect difference of patients is very different, multiple studies show that only 11-25% of gastric cancer patients can benefit from the therapy, a biomarker capable of accurately predicting the treatment response is searched, the drug resistance mechanism of the therapy is found, a corresponding individualized treatment scheme is formulated, the harm and burden of patients caused by over-treatment and improper treatment are avoided, and the therapy is a problem which needs to be solved urgently in clinic.

How to accurately screen out patient populations that would benefit from checkpoint inhibitors is the biggest challenge facing gastric cancer immunotherapy. To date, biomarkers for predicting the efficacy of PD-1/PD-L1 monoclonal antibodies include immunohistochemical expression levels (CPS) of PD-L1, high microsatellite instability (MSI-H), and Tumor Mutational Burden (TMB), among others. However, the stability of various biomarkers in predicting the curative effect of the PD-1/PD-L1 monoclonal antibody is poor at present, the biomarkers are all focused on the internal characteristics of tumors with obvious heterogeneity, certain limitations exist, the evaluation on the microenvironment condition of the tumors is omitted, the action mechanism of immunotherapy is not comprehensive, and the reason that the response rate of gastric cancer to checkpoint inhibitors is not high in clinical tests of various biomarkers is partially explained; therefore, there is a need for more accurate biomarkers for predicting the efficacy of the immune checkpoint inhibitor PD-1/PD-L1 mab.

In addition, with the progress of tumor immunity research, people gradually recognize that tumor cells are not individuals existing in isolation, and the microenvironment of the tumor cells is an active participant in the development of tumors. Compared with single biomarkers PD-L1 and MSI-H, TMB, the infiltration mode and the intricate and complex interaction of various immune cells and interstitial cells in a tumor microenvironment can more comprehensively reflect the evolution rule of tumor immunotherapy. Therefore, predicting the reactivity of the immune checkpoint inhibitor PD-L1/PD-1 monoclonal antibody according to the relevant tumor immune infiltration characteristics is an important measure for improving the treatment success rate of the current gastric cancer immune checkpoint inhibitor and developing the next generation immunotherapy.

Disclosure of Invention

Aiming at the problems, the invention aims to provide a method for realizing efficient multi-gene multi-dimensional tumor microenvironment assessment of a gastric cancer patient through a multi-gene scoring model, screening the patient who can benefit from immunotherapy, monitoring dynamic evolution of the microenvironment of the tumor patient, accurately guiding the transformation of the tumor microenvironment of the patient, and really realizing individualized therapy of the gastric cancer patient.

In order to achieve the purpose, the invention adopts the technical scheme that: a construction method of a multi-gene scoring model for evaluating a tumor microenvironment comprises the following steps:

(1) collecting sequencing data of a tumor patient;

(2) performing component evaluation of immune cells of the tumor microenvironment on the sequencing data by using a CIBERSORT and MCP-counter algorithm;

(3) unsupervised clustering of immune cells in the tumor microenvironment is carried out through consensus 2, so that different tumor microenvironment types are obtained;

(4) finally determining a gene set capable of evaluating the tumor microenvironment score through pairwise difference analysis and random forest dimension reduction;

(5) immunotherapy cohort data was collected;

(6) evaluating the significance difference of each gene in the gene set obtained in the step (4) for predicting the immunotherapy response in the immunotherapy cohort to obtain a corresponding P value, dividing the P value by the number of samples of the corresponding cohort to obtain the characteristic importance of each gene in the cohort, adding the characteristic importance of each gene in all the cohorts to obtain the sum of the characteristic importance of each gene, obtaining the positive-too-distribution of the importance indexes through Shapiro-Wilk test, and taking the genes outside 95% intervals on both sides as the important genes obtained by screening;

(7) transforming the gene expression data obtained in the step (6) by Z-score, and constructing a multigene scoring model for evaluating the tumor microenvironment by adopting a PCA (principal component analysis) method, wherein the model is TMEscore _ plus ═ TMEscorea A-TMEscoreB; TMEscoreA is the immune score and TMEscoreB is the interstitial score.

According to the invention, gene transcriptome data of tumors is utilized, a tumor microenvironment infiltration mode of gastric cancer is discovered through a machine learning algorithm, related genes capable of evaluating a tumor microenvironment are obtained through gene difference analysis and a random forest algorithm, the tumor microenvironment cell infiltration mode of a patient is quantitatively evaluated by inputting the related tumor microenvironment genes and using a principal component analysis algorithm, and a polygene scoring model for evaluating the tumor microenvironment is established.

Different data types need to be standardized, if the data is Affymetrix chip data, the data can be converted into a file format of RMA, and if the data is RNAseq data, the data needs to be converted into a format of TPM/FPKM as an input file of the next step.

References 1 to 3 for CIBERSORT, MCP-counter and consensus 2 algorithms, respectively:

1.M.Newman et al.,Robust enumeration of cell subsets from tissue expression profiles.Nat Methods 12,453-457(2015)

2.E.Becht et al.,Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression.Genome Biol 17,218(2016)

3.S.Monti,P.Tamayo,J.Mesirov,T.Golub,Consensus Clustering:A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data.Machine Learning 52,91-118(2003)。

pairwise difference analysis and random forest dimension reduction references 4, 5:

4.M.E.Ritchie et al.,limma powers differential expression analyses for RNA-sequencing and microarray studies.Nucleic Acids Research 43,e47-e47(2015)

5.M.B.Kursa,W.R.Rudnicki,Feature Selection with the Boruta Package.Journal of Statistical Software 036,(2010)。

the required sequencing data comprises the mRNA expression level of the gene detected by the technologies of real-time fluorescent quantitative PCR, gene chip, second-generation high-throughput sequencing, Panomics or NanoString and the like; when the data detected by using the second generation sequencing platform or Affymetrix chip technology needs to be paid attention to: samples of FFPE due to RNA degradation issues, tissue samples of greater than 2 years have been suggested for detection by Affymetrix chips or NanoString technology.

And (3) filtering, standardizing, correcting in batches and annotating genes of the original data of each sample, and removing repeated genes according to the average expression quantity.

TMEscore _ plus was calculated from the screened gene sets representing different microenvironment genotypes by Z-score transformation of gene expression data and quantification using principal component analysis algorithms (PCA). The code associated with the TMEscore _ plus calculation is referenced as follows:

# install TMEscore R packet

devtools::install_github("DongqiangZeng0808/TMEscore")

# load R Package

library('TMEscore')

# input Gene expression matrix-and calculation of tumor microenvironment score

tmescore<-tmescore(eset＝eset_stad,classify＝T)

Link where detailed code and parameter references: https:// github. com/DongqiangZeng0808/TMEscore

The code core is a PCA algorithm, and the calculation principle of the PCA analysis is as follows:

1) forming n rows and m columns of matrix X by the original data according to columns;

2) zero-averaging each row of X (representing an attribute field), i.e. subtracting the average of this row;

3) solving a covariance matrix;

4) solving the eigenvalue of the covariance matrix and the corresponding eigenvector;

5) arranging the eigenvectors into a matrix from top to bottom according to the size of the corresponding eigenvalue, and taking the first k rows to form a matrix P;

6) and Y is PX which is the data from dimensionality reduction to dimensionality k.

As a preferred embodiment of the present invention, the construction method further comprises the step (8) of evaluating the efficacy of the model.

More preferably, the method of assessing the efficacy of the model comprises assessing the model using the non-parametric test (wilcoxon: rank sum test) and ROC.

The invention also claims a multi-gene scoring model for evaluating the tumor microenvironment, which is constructed according to the construction method.

The invention also discloses a gene set for evaluating the tumor microenvironment, which comprises tumor microenvironment immune related genes and tumor microenvironment stroma related genes; the tumor microenvironment immune-related genes comprise: CDT1, PSAT1, IDO1, BRIP1, WARS, DTL, GBP5, KIF18A, UBD, GZMB, CCL4, CCL5, RCC1, HELLS, GNLY, TAP1, CD8A, CXCL11, GBP4, ETV7, ZNF367, CXCL10, KLRC2, CXCL9, and IFNG; the tumor microenvironment stroma-related genes comprise: MIR100HG, PRICKLE2, PAPLN, PLEKHH2, TNS1, MCC, MAMDC2, C14orf132, SYNPO, PID1, MCEMP1, zchc 24, PROS1, JAM3, TGFB1I1, MXRA7, FILIP1, CRYAB, and ant xr 2.

The data (data from GSE62254) standardized by Affymetrix chips of 300 patients with gastric cancer were screened and modeled, and 244 genes were obtained by co-screening in the above steps (1) to (4), and the above 44 important genes were obtained by reduction in the steps (5) and (6). Compared with the gene set before further screening, the gene set is more simplified, and after evaluation, the simplified gene set has consistent tumor microenvironment evaluation efficacy and is remarkably improved in the aspect of predicting immunotherapy.

The invention also claims the application of the gene set in constructing a multi-gene scoring model for evaluating a tumor microenvironment.

The invention also provides a multigene scoring model for evaluating a tumor microenvironment, which comprises TMEscore _ plus, wherein the TMEscore _ plus is TMEscorea-TMEscorea B; TMEscorea A is immune score, and TMEscorea B is interstitial score; the immune score and the interstitial score are obtained by calculation according to the expression data of the gene set, and the calculation method comprises the following steps: performing Z-score transformation on gene expression data, and respectively calculating the immune score and the interstitial score by adopting a PCA analysis method; the genes comprise tumor microenvironment immune related genes and tumor microenvironment stroma related genes; the tumor microenvironment immune-related genes comprise: CDT1, PSAT1, IDO1, BRIP1, WARS, DTL, GBP5, KIF18A, UBD, GZMB, CCL4, CCL5, RCC1, HELLS, GNLY, TAP1, CD8A, CXCL11, GBP4, ETV7, ZNF367, CXCL10, KLRC2, CXCL9, and IFNG; the tumor microenvironment stroma-related genes comprise: MIR100HG, PRICKLE2, PAPLN, PLEKHH2, TNS1, MCC, MAMDC2, C14orf132, SYNPO, PID1, MCEMP1, zchc 24, PROS1, JAM3, TGFB1I1, MXRA7, FILIP1, CRYAB, and ant xr 2.

Tumor microenvironment immune related genes were used to calculate immune scores, tumor microenvironment stroma related genes were used to calculate stroma scores.

The invention also claims the application of the gene set in the preparation of a kit for evaluating a tumor microenvironment.

The invention also provides a kit for evaluating a tumor microenvironment, which comprises primers or probes for detecting the expression level of each gene in the gene set.

As a preferred embodiment of the present invention, the kit is used for detecting the expression level of each gene in the gene set by the following technology, comprising: real-time fluorescent quantitative PCR, gene chip, second-generation high-throughput sequencing, Panomics or NanoString technology.

In the preferred embodiments of the present invention, the tumor is gastric cancer in the construction method, the model, the gene set, the use and the kit.

The invention has the advantages that: the invention uses gastric cancer data of a large number of people for the first time, proves that the gastric cancer has obvious tumor microenvironment classification, and the prognosis difference of gastric cancer patients with different types is obvious. The invention also provides a construction method of the polygene scoring model for evaluating the gastric cancer tumor microenvironment, and 44 obtained gene sets which can represent different microenvironment typing are screened out by a machine learning algorithm, and the gene sets can be used for establishing the polygene scoring model for evaluating the tumor microenvironment of a patient. The model can be used as a good prognosis biomarker by effectively evaluating the tumor microenvironment, can also be used for predicting the immunotherapy response of checkpoint inhibitors of various tumors, can better screen out the benefit population of immunotherapy, saves medical resources while realizing individualized medical treatment, and has important significance and clinical application value.

Drawings

Fig. 1 is a result of screening and evaluating important genes in a tumor microenvironment in a method for constructing a multigene scoring model for evaluating a gastric cancer tumor microenvironment in example 1 of the present invention.

FIG. 1A shows the predicted significance distribution of the immunotherapeutic effect of 244 tumor microenvironment genes; 1B is the importance ranking of the 44 genes selected; 1C represents the evaluation of the scoring model after the lean model (TMEscore _ plus) and before the lean model (TMEscore) for the consistency of the tumor microenvironment.

FIG. 2 is a graph of the efficacy of the scoring model of the present invention in predicting patient prognosis in gastric cancer chip data.

FIG. 3 is a prognostic efficacy assessment in pan-cancer data for the scoring model of the present invention.

FIG. 4 is a graph showing the analysis of the scoring model of the present invention to accurately predict the therapeutic effect of immunotherapy on gastric cancer, and the comparison results with other models.

Figure 4A shows the statistical test results for significant differences in patients with CR/PR (CR: fully responsive patients, PR partial responsive patients) with significantly higher tumor microenvironment scores than SD/PD (SD for tumor stabilization, PD for tumor progression): p ═ 1.7 e-5; 4B shows that the accuracy of the predicted curative effect of the tumor microenvironment score reaches 0.891; 4C and 4D show that the accuracy of the tumor microenvironment score for predicting the immunotherapy response is obviously higher than that of other clinical common indexes, including TMB (tumor mutation load), PD-L1-CPS (PD-L1 comprehensive score of tumor tissues), EBV virus infection, MSI (high microsatellite instability) and other gene scoring methods.

FIG. 5 is an assessment of the efficacy of the scoring model of the present invention to predict immunotherapy; 5A-D are the predicted performance comparisons of different models in different queues, respectively.

FIG. 6 is a graph of the efficacy of the scoring model of the present invention in predicting patient prognosis and molecular typing in second generation sequencing data RNAseq.

Detailed Description

To better illustrate the objects, aspects and advantages of the present invention, the present invention will be further described with reference to the accompanying drawings and specific embodiments.

Embodiment 1 the multi-gene scoring model of the tumor microenvironment and the construction method thereof

The construction method of the multi-gene scoring model for evaluating the tumor microenvironment comprises the following steps:

(1) affymetrix chip standardized data from 300 patients with gastric cancer; the data is sourced from GSE62254 (obtained from NCBI public database);

(2) carrying out component evaluation on immune cells in the tumor microenvironment on the sequencing data through a CIBERSORT algorithm and an MCP-counter algorithm to obtain component evaluation results of 23 immune cells;

(4) finally determining 244 gene sets capable of evaluating the tumor microenvironment scores through pairwise difference analysis and random forest dimensionality reduction;

(5) in order to further realize the clinical transformation and higher economic benefit of the scoring model, 6 immunotherapy cohort data are collected to screen important genes; references 6-11 for the 6 immunotherapy cohorts of data:

6.S.T.Kim et al.,Comprehensive molecular characterization of clinical responses to PD-1inhibition in metastatic gastric cancer.Nat Med 24,1449-1458(2018)

7.S.Mariathasan et al.,TGFbeta attenuates tumour response to PD-L1 blockade by contributing to exclusion of T cells.Nature 554,544-548(2018)

8.J.D.Minna et al.,Contribution of systemic and somatic factors to clinical response and resistance to PD-L1 blockade in urothelial cancer:An exploratory multi-omic analysis.PLOS Medicine 14(2017)

9.F.Ulloa-Montoya et al.,Predictive gene signature in MAGE-A3antigen-specific cancer immunotherapy.J Clin Oncol 31,2388-2395(2013)

10.W.Hugo et al.,Genomic and Transcriptomic Features of Response to Anti-PD-1Therapy in Metastatic Melanoma.Cell 165,35-44(2016)

11.P.L.Chen et al.,Analysis of Immune Signatures in Longitudinal Tumor Samples Yields Insight into Biomarkers of Response and Mechanisms of Resistance to Immune Checkpoint Blockade.Cancer Discov 6,827-837(2016)；

(6) firstly, evaluating the significance difference of each gene in each queue for predicting the immunotherapy response, obtaining the statistical difference P value (P value) of each gene in the corresponding queue, dividing the P value by the number of samples of the corresponding queue to obtain the characteristic importance, and then adding the characteristic importance data of each gene in 6 queues to obtain the characteristic importance sum of 244 genes; obtaining positive distribution of importance indexes through Shapiro-Wilk test, and taking genes outside the 95% interval on both sides as important genes obtained by screening;

referring to FIG. 1A, the characteristic importance distribution of 244 genes is shown, and according to the distribution rule of importance index, the genes with index in the range of [ - ∞, -90], [80 ], + ∞ ] are taken as the input genes of the simple model, as shown in FIG. 1B; the input gene comprises a tumor microenvironment immune related gene and a tumor microenvironment stroma related gene; the tumor microenvironment immune related genes comprise: CDT1, PSAT1, IDO1, BRIP1, WARS, DTL, GBP5, KIF18A, UBD, GZMB, CCL4, CCL5, RCC1, HELLS, GNLY, TAP1, CD8A, CXCL11, GBP4, ETV7, ZNF367, CXCL10, KLRC2, CXCL9, and IFNG; the tumor microenvironment stroma-related genes comprise: MIR100HG, PRICKLE2, PAPLN, PLEKHH2, TNS1, MCC, MAMDC2, C14orf132, SYNPO, PID1, MCEMP1, zchc 24, PROS1, JAM3, TGFB1I1, MXRA7, FILIP1, CRYAB, and ant xr 2;

(7) constructing a multigene scoring model for evaluating a tumor microenvironment based on the input genes obtained in the step (6), wherein the model is TMEscore _ plus (TMEscorea-immune score) -TMEscorea B (interstitial score); the calculation method is as follows: and (3) carrying out dimensionality reduction on the PCA algorithm by using the expression matrixes of the 25 genes in the group A, taking PC1 as a score (TMEscorea A) of an immune microenvironment, carrying out dimensionality reduction on the PCA algorithm by using the expression matrixes of the 19 genes in the group B, taking PC1 as a score (TMEscorea B) of a stromal microenvironment, and subtracting the stromal score from the immune score to obtain TMEscorea _ plus, namely a score of a tumor microenvironment.

The relevant codes in step (7) are referred to as follows:

# install TMEscore R packet

devtools::install_github("DongqiangZeng0808/TMEscore")

# load R Package

library('TMEscore')

# input Gene expression matrix-and calculation of tumor microenvironment score

tmescore<-tmescore(eset＝eset_stad,classify＝T)

In the data (GSE15459, GSE57303, GSE62254, GSE84437 and TGGA-STAD), the efficacy of two sets of genes (TMEscore, TMEscore _ plus) before and after the reduction to evaluate the tumor microenvironment was evaluated, and the results are shown in FIG. 1C. From the evaluation results, although the number of genes was reduced, the correlation between the two scores was still high, indicating that the efficacy of the two gene sets for evaluating the tumor microenvironment was hardly changed.

TMEscore is a gene set obtained by the above construction method without performing steps (5) and (6), and a value obtained by calculation is expressed as TMEscore, and reference 12 may be specifically made.

TMEscore, GEPs scoring model references 12, 13:

12.D.Zeng et al.,Tumor Microenvironment Characterization in Gastric Cancer Identifies Prognostic and Immunotherapeutically Relevant Gene Signatures.Cancer Immunology Research 7,737-750(2019)

13.R.Cristescu et al.,Pan-tumor genomic biomarkers for PD-1checkpoint blockade-based immunotherapy.Science 362(2018)。

example 3 assessment of efficacy of a tumor microenvironment score model to predict patient prognosis

The tumor microenvironment scoring system is applied to a plurality of gastric cancer patient queues with survival data, and the prognosis value of the tumor microenvironment is analyzed, and the steps are as follows:

(ii) obtaining transcriptome chip data (GSE84437, GSE15459, GSE34942, GSE57303, GSE62254) of gastric cancer patients from a website (https:// www.ncbi.nlm.nih.gov/geo /), we downloaded the patient's raw chip data and normalized and corrected the data using affy (ref.14: L.Gautier, L.cope, B.M.Bolstad, R.A.Irizarry, affy- -analysis of Affymetrix GeneChip data at the probe level. bioinformatics 20, 307-;

extracting data of 44 tumor microenvironment genes in the gene expression matrix, calculating a tumor immunity score, a tumor stroma score and a tumor microenvironment score of each tumor patient after Z-score transformation (standardization is data with a mean value of 0 and a standard deviation of 1), and calculating a correlation process in reference example 1 by TMEscore _ plus;

and thirdly, dividing the patients into a high group and a low group and then carrying out survival analysis and log-rank test.

The results show (as shown in fig. 2), the tumor microenvironment score data of a large number of gastric cancer patients show that the prognosis of the gastric cancer patients with high scores is significantly better than that of the patients with high scores (log-rank test result: GSE 84437: P < 0.0001; GSE15459: P ═ 0.0132; GSE34942: P ═ 0.067; GSE57303: P ═ 0.112; GSE62254: P <0.001), which indicates that the model of the invention has significant prognostic effect in a plurality of gastric cancer cohorts.

The model is further analyzed systematically to obtain the prognostic effect in TCGA pan-cancer, and the results show (as shown in figure 3), the data of 7450 TCGA pan-cancer are analyzed, most cancer species with high scores are obtained, and the prognosis of the patient is obviously higher than that of patients with low scores; (log-rank test result: melanoma TCGA-SKCM: P < 0.0001; colon cancer TCGA-COAD: P ═ 0.0001; squamous cell lung carcinoma TCGA-LUSC: P ═ 0.0231; bladder cancer TCGA-BLCA: P ═ 0.001; head and neck squamous cell carcinoma TCGA-HNSC: P ═ 0.0074), which shows that the model has significant prognostic effect in a plurality of cancer species and can be used as a good prognostic biomarker.

Example 4 efficacy assessment of tumor microenvironment scoring model to predict immunotherapy response in tumor patients

To verify that TMEscore _ plus predicts the efficacy of immunotherapy, the procedure was as follows:

firstly, downloading RNAseq second-generation sequencing data of PRJEB25780(https:// www.ebi.ac.uk/ena/browser/view/PRJEB25780), and carrying out data standardization on a sample by using a TPM method;

extracting data of 44 tumor microenvironment genes in the gene expression matrix, standardizing sample data to enable the mean value of each gene to be 0 and the standard deviation to be 1, and assigning the expression matrix to an eset object in R;

installing a TMEscore R packet, and calculating the tumor microenvironment score by using the following codes:

# install TMEscore packet

devtools::install_github("DongqiangZeng0808/TMEscore")

# load R Package

library('TMEscore')

# input Gene expression matrix-and calculation of tumor microenvironment score

tmescore<-tmescore(eset＝eset,classify＝T)

eset is the gene expression matrix; pdata, which is the patient phenotype data associated with the project and may not be provided;

and fourthly, evaluating the predicted immunotherapy effect of the tumor microenvironment score by using a non-parametric test (wilcoxon: rank sum test) and a ROC method respectively, wherein as shown in the attached figure 4AB, patients with high tumor microenvironment score are remarkably enriched in patients who can benefit from the treatment, P is 1.7e-5, and ROC with prediction accuracy is 0.891.

Further comparison with the predicted potency of the current multiple immune markers: the PRJEB25780 data is used for further and systematically evaluating the predicted performance of the TMEscore _ plus and other molecular markers (including scores of MSI, TMB, GEP, Pan-F-TBRS and the like), and research results show that (shown in figure 4 CD) the predicted performance of the TMEscore _ plus is obviously higher than that of the other molecular markers, so that the model disclosed by the invention has a better effect on screening of immunotherapy benefit people.

References

7, 14 to 17 such as MSI, TMB, GEP, Pan-F-TBRS:

15.D.T.Le et al.,PD-1Blockade in Tumors with Mismatch-Repair Deficiency.N Engl J Med 372,2509-2520(2015)

16.R.M.Samstein et al.,Tumor mutational load predicts survival after immunotherapy across multiple cancer types.Nat Genet 51,202-206(2019)

17.M.Ayers et al.,IFN-gamma-related mRNA profile predicts clinical response to PD-1blockade.J Clin Invest 127,2930-2940(2017)

the efficacy of the model-predicted Immunotherapy constructed by the different methods was further compared using data from multiple Immunotherapy cohorts, referred to in

references

7, 9, 10, 18 (reference 18: N.Riaz et al, Tumor and Microenvironmental Evolution therapy with Nivolumab. cell 171,934-949e916(2017)), the results of which are shown in FIG. 5.

As can be seen from fig. 5, in the data of the plurality of immunotherapy cohorts, the gene-reduced model TMEscore _ plus (i.e., the model of the present invention) is used for predicting immunotherapy, and AUC of prediction accuracy is significantly greater than the TMEscore and GEPs scoring model, which indicates that the model TMEscore _ plus constructed in the present invention is more excellent.

Example 5 efficacy of tumor microenvironment score model to predict patient prognosis and molecular typing

The tumor microenvironment scoring model is applied to 367 TCGA gastric cancer patient queues with survival data and molecular typing, and the relationship between the tumor microenvironment and the prognosis of the patient and the molecular typing of the patient is systematically evaluated, and the method comprises the following steps:

(ii) obtaining transcriptome data from a website (https:// portal. gdc. cancer. gov /) for TCGA gastric cancer patients, downloading HTseq-Coutns data and converting the data into chip-like data using the voom function of the R language package limma (reference 19: C.W.Law, Y.Chen, W.Shi, G.K.Smyth, voom: precision weights unleader model analysis tools for RNA-seq record units. genome Biology 15, R29 (2014));

extracting data of 44 tumor microenvironment genes in the gene expression matrix, calculating tumor immunity score, tumor stroma score and tumor microenvironment score of each tumor patient after Z-score conversion (standardization is data with mean value of 0 and standard deviation of 1), and calculating a related process of reference example 1 by TMEscore _ plus;

and thirdly, evaluating the relation between the tumor microenvironment score and the molecular typing by using nonparametric detection and ROC methods respectively.

And fourthly, utilizing the R-packet surfminer to obtain the optimal cutoff value of the TMEscorre _ plus, and dividing the patients into a high-risk group and a low-risk group for survival analysis and log-rank test (figure 6C).

The results show that patients with high tumor microenvironment scores are significantly enriched in MSI and EBV patients who could benefit from immunotherapy, P2.2 e-16(6A), and ROC 0.88(6B) for prediction accuracy; the data results also show that the accuracy of the TMEscore _ plus predicted immunotherapeutic benefit populations (MSI + EBV) is significantly higher than the TMB and GEP scoring models. Wherein CIN is chromosome unstable (CIN), EBV is EB virus infectious (EBV), GS is Genome Stable (GS), and MSI is microsatellite unstable (MSI).

Example 5

Examples of the present model for assessing the tumor microenvironment of a patient, detecting tumor microenvironment immune related genes (CDT1, PSAT1, IDO1, BRIP1, WARS, DTL, GBP5, KIF18A, UBD, GZMB, CCL4, RCC 4, HELLS, GNLY, TAP 4, CD 84, CXCL 4, GBP4, ETV 4, ZNF367, CXCL 4, KLRC 4, CXCL 4, IFNG) in a patient, tumor stroma related genes (MIR100 4, pricckle 4, PAPLN, plekfs 4, TNS 4, MCC, mdcc 685 4, C14orf132, mcpo, PID 4, emyap 4, ZCCHC 4, PROS 4, JAMs 4, tgm 6851I 6851, tgra 4, mx 4, lipb 4, antz 4, xr b 4, xr 4, and flc), calculating the expression level of said genes, and xr-Z score using a conversion method, respectively, calculating the expression data of said tumor microenvironment and said western gene; obtaining the TMEscorea _ plus score of the patient according to a formula TMEscorea _ plus-TMEscorea B (TMEscorea is an immune score, and TMEscorea B is an interstitial score), and comparing the relationship between the score of the patient and the cutoff value (or the position of the patient in a large number of samples) to obtain whether the patient is in a high tumor microenvironment score or a low tumor microenvironment score.

If the patient adopts Affymetrix method for detection, the data can be merged and standardized with the data of GSE62254 for scoring (cutoff-1.174); if the patient uses RNAseq technology for detection, the data can be combined with the data of TCGA-STAD and normalized to score (cutoff is 0.631). If greater than the cutoff value, defined as a high tumor microenvironment score, the patient has a better prognosis and may benefit from immunotherapy.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the protection scope of the present invention, although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A construction method of a multi-gene scoring model for evaluating a tumor microenvironment is characterized by comprising the following steps:

(1) collecting sequencing data of a tumor patient;

(5) immunotherapy cohort data was collected;

(6) evaluating the significance difference of each gene in the gene set obtained in the step (4) in the immunotherapy queue for predicting the immunotherapy response to obtain a corresponding P value, dividing the P value by the number of samples in the corresponding queue to obtain characteristic importance, adding the characteristic importance of each gene in all queues to obtain the sum of the characteristic importance of each gene, obtaining the positive distribution of the importance indexes through Shapiro-Wilk test, and taking the genes outside the 95% intervals on both sides as the important genes obtained by screening;

(7) transforming the gene expression data obtained in the step (6) by Z-score, and constructing a polygene scoring model for evaluating a tumor microenvironment by adopting a PCA (principal component analysis) method, wherein the model is TMEscore _ plus ═ TMEscorea A-TMEscorea B; TMEscoreA is the immune score and TMEscoreB is the interstitial score.

2. The method of constructing according to claim 1, further comprising the step of (8) evaluating the efficacy of the model.

3. The device for constructing the multi-gene scoring model for evaluating the microenvironment of the tumor by the construction method of claim 1 or 2.

4. The gene set determined by the construction method of claim 1, wherein the gene set comprises tumor microenvironment immune-related genes and tumor microenvironment stroma-related genes; the tumor microenvironment immune-related genes comprise: CDT1, PSAT1, IDO1, BRIP1, WARS, DTL, GBP5, KIF18A, UBD, GZMB, CCL4, CCL5, RCC1, HELLS, GNLY, TAP1, CD8A, CXCL11, GBP4, ETV7, ZNF367, CXCL10, KLRC2, CXCL9, and IFNG; the tumor microenvironment stroma-related genes comprise: MIR100HG, PRICKLE2, PAPLN, PLEKHH2, TNS1, MCC, MAMDC2, C14orf132, SYNPO, PID1, MCEMP1, zchc 24, PROS1, JAM3, TGFB1I1, MXRA7, FILIP1, CRYAB, and ant xr 2.

5. Use of the gene set of claim 4 in the preparation of a kit for assessing a tumor microenvironment.

6. A kit for assessing the microenvironment of a tumor, comprising primers or probes for detecting the expression level of each gene in the gene set of claim 4.

7. The kit of claim 6, wherein the kit is used for detecting the expression level of each gene in a gene set by a technique comprising: real-time fluorescent quantitative PCR, gene chip, second-generation high-throughput sequencing, Panomics or NanoString.

8. The method of claim 1 or 2, wherein the tumor is gastric cancer.

9. The device of claim 3, wherein the tumor is gastric cancer.

10. The gene set of claim 4, wherein the tumor is gastric cancer.

11. The use according to claim 5, wherein the tumor is gastric cancer.

12. The kit of claim 6 or 7, wherein the tumor is gastric cancer.