CN116656820A

CN116656820A - Prognosis model based on breast tumor stem cell related genes and application thereof

Info

Publication number: CN116656820A
Application number: CN202310557532.4A
Authority: CN
Inventors: 张陶蓝; 胡海红
Original assignee: First Affiliated Hospital of University of South China
Current assignee: First Affiliated Hospital of University of South China
Priority date: 2023-05-17
Filing date: 2023-05-17
Publication date: 2023-08-29

Abstract

The invention belongs to the technical field of medical artificial intelligence, and particularly relates to a prognosis model based on breast tumor stem cell related genes and application thereof. The invention provides a breast tumor stem cell prognosis characteristic gene (BRD 4, RPS24, SERPINA3, SKP1, NTRK3, CD79A, JAK1, NT5E, NDRG1 and CD 24) for constructing a breast cancer prognosis model and application thereof in constructing a breast tumor stem cell-related gene-based prognosis model; the invention also comprises application of the reagent for detecting the prognostic characteristic gene expression quantity of the breast tumor stem cells in preparing a kit for evaluating the prognosis survival of breast cancer. The invention judges the prognosis of the patient based on the prognosis characteristic gene of the breast tumor stem cells and a corresponding prediction model (namely, according to the sum of the products of the expression quantity of the prognosis characteristic genes of the breast tumor stem cells and the coefficients thereof and the clinical characteristics of the patient, the invention has the advantages of high efficiency and accurate prediction of the prognosis of the breast cancer patient, provides effective guidance opinion for the treatment decision of the breast cancer patient for clinicians, reduces the occurrence of ineffective treatment, thereby reducing the treatment cost of the patient and promoting the clinical development and application of accurate treatment.

Description

Prognosis model based on breast tumor stem cell related genes and application thereof

Technical Field

The invention belongs to the technical field of medical artificial intelligence, and particularly relates to a prognosis model based on breast tumor stem cell related genes and application thereof.

Background

Breast cancer is a malignancy that occurs in breast tissue with a high degree of heterogeneity. Although traditional therapeutic approaches such as surgery, chemotherapy, radiotherapy and the like and emerging immunotherapeutic approaches significantly improve prognosis, cancer heterogeneity leads to breast cancer recurrence, metastasis, drug resistance and immune escape, which still presents a significant challenge for the clinical treatment of breast cancer. Thus, the heterogeneity of cancer is well recognized and its use in clinical diagnosis and treatment would be helpful to further enhance the clinical therapeutic efficacy of cancer patients.

The existing reports include that related genes such as immune related genes (CN 113862363A), iron death related genes, autophagy related genes (CN 113593648A), apoptosis related genes, lactic acid metabolism genes, macrophage characteristic genes, copper-dependent related genes, and the like are adopted to construct a prediction model based on breast cancer single cell transcriptome sequencing analysis (CN 109072481B, the constructed model relates to 95 genes, the model is too complex), some prognosis models relate to detection of hundreds of genes, the industrialization cost is too high, the clinical popularization is not suitable, most prognosis models are constructed aiming at a specific signal path and target point, and problems of low accuracy, low precision and the like of prediction often occur in the practical application process are likely to be caused by breast cancer heterogeneity, so that no accurate and reliable breast cancer prognosis model is applied to clinic at present.

Early studies speculate that breast tumor stem cells may be a source of heterogeneity, however, there is no report and clinical application to construct a prognostic model of breast cancer based on breast tumor stem cell-related genes.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a prognosis model based on breast tumor stem cell related genes and application thereof.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

the invention provides a prognosis characteristic gene for constructing a breast cancer prognosis model, wherein the prognosis characteristic gene is a breast tumor stem cell related gene. The prognosis characteristic genes are BRD4, RPS24, SERPINA3, SKP1, NTRK3, CD79A, JAK1, NT5E, NDRG1 and CD24.

Based on the application of the prognosis characteristic gene in constructing a breast cancer prognosis model, the specific formula of the model is as follows: risk score of breast tumor stem cell-related genes (BCSCRS) = (-0.53045 ×brd4) +(-0.26259 ×rps24) +(-0.31334 ×serpina 3) + (0.434039 ×skp1) +(-0.53742 ×ntrk3) +(-0.23344 ×cd79A) +(-0.40628 ×jak1) + (0.192005 ×nt 5E) + (0.152866 ×ndrg1) + (0.194872 ×cd24); when BCSCRS is less than 0.985569, the risk state is judged to be low, and when BCSCRS is more than 0.985569, the risk state is judged to be high.

Further, a nomogram was constructed based on risk scores and clinical variables for breast tumor stem cell-related genes to predict overall survival and prognostic treatment of patients.

The clinical variables include gender, TNM staging and age.

By adding the risk score of the breast tumor stem cell-related gene and the clinical variable, the overall score for each patient is calculated, and the probability of survival for each patient at 1 year, 3 years, and 5 years is estimated using the transfer function, with lower overall scores having higher patient survival probabilities.

The construction method of the prognosis model comprises the following steps:

(1) And (3) data acquisition: transcriptome data and clinical data of breast cancer in TCGA and GEO are obtained, normal tissue samples and samples from the same patient are excluded, the TCGA patients are randomly divided into training queues and internal validation queues in a ratio of 7:3 using the createDataPartition function in R-packet cart, and the patients in GSE20685 are used as external validation queues;

(2) Screening of breast tumor stem cell genes: collecting breast tumor stem cell related genes from a GeneCards database, reserving genes with a correlation score of more than 30, and screening breast tumor stem cell genes related to breast cancer prognosis by single factor Cox analysis;

(3) Construction of a prognosis model: further screening a prognosis characteristic gene of the breast cancer by adopting a minimum absolute shrinkage selection operator (LASSO) Cox regression method in a TCGA training queue, and constructing a breast cancer prognosis model by multivariate Cox regression analysis; the risk score of the prognostic model is bcscrs= (-0.53045 ×brd4) +(-0.26259 ×rps24) +(-0.31334 ×serpina 3) + (0.434039 ×skp1) +(-0.53742 ×ntrk3) +(-0.23344 ×cd79A) +(-0.40628 ×jak1) + (0.192005 ×nt 5E) + (0.152866 ×ndrg1) + (0.194872 ×cd24); calculating the risk score of each sample through the prediction function in the survivinal package according to the expression quantity of the gene in the Cox regression model and the regression coefficient corresponding to the expression quantity; all patients were classified into high risk groups and low risk groups according to the median (0.985569) of BCSCRS, i.e. when BCSCRS < 0.985569, they were classified into low risk groups, when BCSCRS > 0.985569, they were included into high risk groups;

(4) And (3) verifying a prognosis model: the TCGA test queue and the GSE20685 queue are respectively used as an internal verification queue and an external verification queue to verify the accuracy of the prognosis model, and ROC, C-index and DCA indexes are used for evaluating the accuracy of the prognosis model; (5) construction of nomograms: constructing nomogram and calibration curves including patient age, sex, TNM stage and breast tumor stem cell related gene risk score (BCSCRS) using rms package; performing subject work feature (ROC) and Decision Curve Analysis (DCA) using timeROC and ggDCA software packages, comparing the predictive accuracy of nonomogram with other prognostic factors; nomograms constructed by inclusion of BCSCRS and clinical variables were used as quantitative methods for predicting survival of breast cancer patients.

The invention also provides application of the detection reagent in preparation of a kit for evaluating prognosis survival of breast cancer, wherein the detection reagent consists of reagents for detecting the following 10 gene expression levels: BRD4, RPS24, SERPINA3, SKP1, NTRK3, CD79A, JAK1, NT5E, NDRG, CD24, as a kit for achieving the only key components for assessing prognostic survival of breast cancer, the kit further comprises instructions for: bcscrs= (-0.53045 ×brd4) +(-0.26259 ×rps24) +(-0.31334 ×serpina 3) + (0.434039 ×skp1) +(-0.53742 ×ntrk3) +(-0.23344 ×cd79A) +(-0.40628 ×jak1) + (0.192005 ×nt 5E) + (0.152866 ×ndrg1) + (0.194872 ×cd24); when BCSCRS is less than 0.985569, the risk state is judged to be low, and when BCSCRS is more than 0.985569, the risk state is judged to be high, and the prognosis of the patient in the low risk state is better than that of the patient in the high risk state.

Preferably, the sample detected using the kit is a fresh tissue tumor sample.

The invention has the advantages that:

1. the invention uses the indexes such as ROC, C-index, DCA and the like to compare the accuracy of the model with other existing prognosis models by obtaining transcriptome data and clinical data of breast cancer in TCGA and GEO, screening breast tumor stem cell genes related to survival, and constructing a prognosis model related to breast tumor stem cells by adopting LASSO regression and multi-variable COX regression. Based on this study, to better apply the prognostic model to the clinic, we have constructed an alignment map based on risk scores and clinical variables to predict overall survival of the patient. The constructed prognosis model based on the breast tumor stem cell related genes has better prediction accuracy (AUC values of 1, 3 and 5 years are all up to 0.7) in the training queue or the verification queue, and compared with other models, the model has better prediction accuracy, wherein the alignment chart containing clinical variables has better prediction accuracy and prediction accuracy (AUC=0.758 and C-index=0.744). The BCSCRS risk score is applied to clinical tumor microenvironment and immune infiltration immune landscape analysis and immune checkpoint inhibitor treatment responsiveness assessment, so that the low risk group has better response to immunotherapy and chemotherapy sensitivity, and the prognosis model constructed based on breast tumor stem cell related genes can be effectively applied to clinical practice of breast cancer treatment and prognosis.

2. The prognosis model based on the breast tumor stem cell related genes obtained by the method has the advantages of high sensitivity, good specificity and high accuracy, can provide effective guidance opinion for a clinician to the treatment decision of a breast cancer patient, reduces the occurrence of ineffective treatment, thereby reducing the treatment cost of the patient and promoting the clinical development and application of accurate treatment.

3. The nomogram constructed based on the risk scores and clinical variables improves the prediction performance of the prognosis model in breast cancer prognosis.

Drawings

FIG. 1 is a flow chart of construction of a prognostic model based on breast tumor stem cell-related genes.

FIG. 2 is the development and validation of a BCSCs-related prognostic model: (a-B) a minimum absolute shrinkage and selection operator to further screen for breast cancer prognosis signature genes; (C) a forest map of 10 BCSCs prognosis-related genes; (D) regression coefficients for each gene in the prognosis model; TCGA training queue (E), TCGA test queue (F), TCGA-total queue (G), GSE20685 test queue (H), 1 year, 3 years and 5 years survival scatter plots, kaplan-Meier analysis, time-dependent ROC curve analysis.

Fig. 3 is a diagram of the development and verification of nomograms: (a) a forest map of a single factor Cox regression analysis; (B) forest plots of multifactor Cox regression analysis; (C) Predicting a nomogram of survival probabilities of breast cancer patients for 1 year, 3 years and 5 years according to the risk scores and clinical factors; (D) a calibration curve of the nomogram; (E-G) 1 year, 3 years, and 5 years ROC curves for nomograms, BCSCRS, and clinical factors; (H-J) alignment, BCSCRS and Decision Curve Analysis (DCA) of clinical factors.

FIG. 4 is a graph of the results of comparing prognostic value of breast cancer with different prognostic models: (A-D) Kaplan-Meier survival curves of prognosis models constructed by BCSCRS, li, etc., wang, etc., and Zhang, et al, respectively; ROC curves (E-H) for total survival for 1 year, 3 years, and 5 years; (I) Comparing the total ROC curves of the prognostic models involved in the present study; (J-L) C-index, RMS, and DCA analysis.

FIG. 5 is a graph showing the results of an immunolandscape analysis of tumor microenvironment and immunoinfiltration: (a) GSEA enrichment analysis of high risk and low risk groups; (B) The heatmap shows the overall immune landscape for different risk groups; (C) Differential analysis of tumor microenvironments between different risk groups; (D) Differential analysis of immunoinfiltrated cells between two risk groups; and (E) correlation analysis of BCSCRS and immune infiltration cells.

FIG. 6 is a graph of the results of immune checkpoint inhibitor treatment responsiveness: (a) expression of 27 immune checkpoint molecules; (B) IPS analysis between two risk groups.

Fig. 7 is a relationship of BCSCRS with tumor progression: (A) The relationship between BCSCRS and stage, age, gender and TNM stage in TCGA queue; (B) relationship between BCSCRS and TNM phases in GSE20685 queue.

Detailed Description

Abbreviations involved in the present invention:

BCSCs: breast tumor stem cells; IARC: international cancer research institutions; WHO, world health organization; PD-1: programmed death 1; CTLA-4: cytotoxic T lymphocyte-associated antigen-4; GEO: a comprehensive gene expression database; TCGA: cancer genomic profile; NMF: non-negative matrix factorization; PCA: analyzing principal components; TMB: tumor mutational burden; OS: total survival rate; NES: normalizing the enrichment score; WGCNA: weighting gene co-expression network analysis; GO: gene ontology; KEGG: the encyclopedia of kyoto genes and genomes; LASSO: a minimum absolute shrinkage selection operator; ROC: a subject work profile; AUC: area under the curve; DCA: analyzing a decision curve; c index: a consistency index; GSEA: enrichment analysis of gene sets; TME: tumor microenvironment; ici: immune checkpoint inhibitors; IPS: immunophenotype scoring; IC50:50% maximum inhibitory concentration; BCSCRS: breast tumor stem cell-related risk score; NK: natural killer cells; TAMs: tumor-associated macrophages; APC: an antigen presenting cell; TCR: a T cell receptor;

embodiment one: and (3) constructing a prognosis model based on the breast tumor stem cell related genes.

The construction flow of the prognosis model is shown in figure 1.

1. Acquisition of data:

transcriptome data, clinical phenotype data from the GDC-TCGA-BRCA project in the UCSC Genome Browser database (https:// xenabrowser. Net/datapages /) and Gene Expression Omnibus (GEO) database (https:// www.ncbi.nlm.nih.gov/GEO /). After excluding normal tissue samples and samples from the same patient, total transcriptome data for 1069 breast cancer patients with complete clinical information were obtained from the TCGA database. TCGA patients were randomly divided into training queues (n=749) and internal validation queues (n=320) in a 7:3 ratio using the createDataPartition function in R-pack cart, and 327 patients in GSE20685 were used as external validation queues on this basis. Table 1 is the clinical profile of breast cancer patients in all cohorts. Finally, breast tumor stem cell related genes (BCSCGs) were collected from the GeneCards database (https:// www.genecards.org /), and genes with a correlation score greater than 30 were retained for subsequent analysis. And then, screening stem cell genes related to breast cancer prognosis by single factor Cox analysis for construction of a subsequent model.

Table 1 clinical features of breast cancer patients in all cohorts.

2. Constructing and verifying a prognosis model:

in the study, according to the expression quantity of genes in a model formula and the regression coefficient corresponding to the expression quantity, calculating the risk score of each sample through the prediction function in the survivinal package.

The method comprises the steps of further screening the prognostic characteristic genes of breast cancer by adopting a minimum absolute shrinkage selection operator (LASSO) Cox regression method, then further screening the prognostic characteristic genes of breast cancer by adopting a minimum absolute shrinkage selection operator (LASSO) Cox regression method in a TCGA training queue (N=749), constructing a breast cancer prognosis model by adopting multivariate Cox regression analysis, respectively adopting a TCGA test queue (N=320) and a GSE20685 queue (N=327) as an internal verification queue and an external verification queue for verifying the accuracy of the prognosis model, and then calculating the risk score of each sample by adopting a prediction function in a survivinal package according to the expression quantity of genes in the Cox regression model and the regression coefficient corresponding to the expression quantity of genes.

From 749 patients in the training cohort, we determined 45 genes associated with survival-related breast tumor stem cells using a single factor Cox assay. Next, we performed LASSO regression analysis, selecting 17 breast tumor stem cell related genes for the construction of a multifactor Cox regression model (fig. 2: a and B). Then, we constructed a prognostic model using multifactor Cox regression analysis, determining 10 genes as prognostic signature genes (fig. 2-C): BRD4, RPS24, SERPINA3, SKP1, NTRK3, CD79A, JAK1, NT5E, NDRG, CD24. As shown in fig. 2-D and table 2, the following formula was used to calculate the risk score (BCSCRS) for breast tumor stem cell-related genes:

BCSCRS＝(-0.53045×BRD4)+(-0.26259×RPS24)+(-0.31334×SERPINA3)+(0.434039×SKP1)+(-0.53742×NTRK3)+(-0.23344×CD79A)+(-0.40628×JAK1)+(0.192005×NT5E)+(0.152866×NDRG1)+(0.194872×CD24)。

TABLE 2 regression coefficients of 10 characteristic genes in prognosis model

The accuracy of the Cox regression model was assessed by calculating the area under the curve (AUC) value of the subject's operating characteristics (ROC). The risk profile for all cohorts and the survival status map for all patients were plotted using the pheeatmap package. Total survival (OS) status in the higher and lower risk groups was compared using a Kaplan-Meier (KM) analysis.

All patients were classified into high risk groups and low risk groups according to the median of BCSCRS, i.e. when BCSCRS < 0.985569 (median of risk scores), then they were classified into low risk groups, and when BCSCRS > 0.985569, they were included into high risk groups.

In the TCGA training queue (n=749), the overall survival rate was significantly higher in the low risk group (n=374) than in the high risk group (n=375). Analysis of time-dependent ROC curves showed that BCSCRS had better prediction accuracy in TCGA training queues with AUCs of 0.733 (1 year), 0.742 (3 years) and 0.741 (5 years), respectively (fig. 2-E); AUC for TCGA test queues for 1 year, 3 years, and 5 years were 0.808, 0.689, and 0.646, respectively (fig. 2-F); AUC of the TCGA total queue for 1 year, 3 years and 5 years was 0.751, 0.728, 0.707, respectively; AUCs for GSE20685 external validation queues for 1, 3 and 5 years were 0.765, 0.718, 0.692, respectively, indicating that BCSCRS had better accuracy in predicting survival (figures 2:G and H). The prognosis model constructed based on 10 genes related to the breast tumor stem cells has good accuracy and can be used for clinical practice of breast cancer prognosis treatment.

The invention screens the characteristics of target genes: the screening method of double dimension reduction is selected, namely, after the gene related to the prognosis of the breast cancer is obtained by single factor Cox analysis, the gene of the prognosis of the breast cancer is further screened by using a LASSO regression method, so that the screening method can help us to further reduce the dimension of data and obtain more accurate prognosis genes.

Embodiment two: constructing an alignment chart:

to improve the accuracy of the prognostic model and better apply to clinical practice, we have further developed a nomogram (nomogram) model comprising risk scores (i.e. risk scores of breast tumor stem cell-related genes, BCSCRS) and clinical variables (such as age and tumor stage).

First, single-factor and multi-factor Cox regression analysis was performed to assess whether risk scores and clinical variables could be used as independent prognostic factors. Then, the rms package was used to construct nomogram and calibration curves including patient age, sex, TNM stage and risk score. To compare the predictive accuracy of nomogram with other prognostic factors, subject work characteristics (ROC) and Decision Curve Analysis (DCA) were performed using timeROC and ggDCA software packages, respectively.

The potential independence of BCSCRS as a prognostic factor was investigated by univariate and multivariate cox regression analysis (table 3).

Table 3 univariate and multivariate cox regression analysis of risk scores and clinical features.

* Independent prognostic factors.

The results indicate that Risk scores (Risk score), age (Age), stage (Stage) and T, N, M Stage (Stage) are significantly correlated with prognosis of cancer (fig. 3-a, p < 0.001), and that multivariate cox regression analysis indicates that both Risk scores and ages can be used as independent prognostic factors for breast cancer patients (fig. 3-B, p < 0.001).

To investigate the potential association between BCSCRS and multiple clinical variables, wilcoxon and Kruskal-Wallis assays were performed. The results showed that BCSCRS increased with increasing tumor stage in the TCGA cohort, showing significant differences between stages (fig. 7-a). The risk scores for the T and N phases were on an upward trend, with significant differences between each group, but the risk scores for the N3 phases were opposite. In addition, BCSCRS is significantly higher in late M patients and patients over 65 years old. There was no statistically significant difference in risk scores between the different categories, similar results were also obtained in the GSE20685 cohort, with significantly higher risk scores at the advanced TNM stage (fig. 7-B).

These findings indicate that BCSCRS varies significantly between different sets of clinical variables, and that higher risk scores indicate poorer pathology in breast cancer patients.

By including risk scores and clinical variables, nomograms were constructed as a quantitative method for predicting survival of breast cancer patients (fig. 3-C). The overall score for each patient was calculated by summing the risk score and clinical variables including gender, TNM stage and age. Patients with lower total scores have higher survival probabilities. The accuracy of the alignment graph was assessed by the area under the calibration curve (fig. 3-D) and ROC curve. The prediction accuracy of the nomograms is better than other clinical features and the original risk scores. AUC for 1, 3 and 5 years of alignment in TCGA cohorts were 0.805, 0.746 and 0.758, respectively (figures 3:E, F, G). In addition, decision Curve Analysis (DCA) demonstrated better accuracy of alignment prediction than other predictors (FIG. 3: H, I, J). The accuracy of the prognosis model is improved, and the method can be better applied to clinical practice.

Embodiment III: analysis and comparison with existing breast cancer predictive models

While we have demonstrated the accuracy of BCSCRS from a number of perspectives, the most important aspect of the clinical prognosis model is its superiority in predictive performance in clinical practice. To verify that the breast cancer prognosis model constructed in this study has better predictive performance, we compared it to three different prognosis models, respectively.

The first model was the iron death-related prognosis model established by Wang et al [1], involving a total of 9 genes ALOX15, CISD1, CS, GCLC, GPX4, SLC7a11, EMC2, G6PD and ACSF 2; the second model is the macrophage signature gene model [2] constructed by Li et al, involving 7 genes in total for SERPINA1, CD74, STX11, ADAM9, CD24, NFKBIA and PGK 1; the third model is the lactic acid metabolism-related prognosis model proposed by Zhang et al [3], involving three genes, LDHD, LYRM7 and PNKD. The construction method of the model is consistent with that of the literature, and in order to reduce errors caused by different dimensions of the data, the analysis is carried out in the same transcriptome data, the expression level of genes in each model is extracted, and the multi-variable Cox regression is carried out to obtain regression coefficients of the genes. Subsequently, a risk score is calculated for each sample and the predictive power and clinical utility of each model is assessed by a consistency index (C-index) and Decision Curve Analysis (DCA), as well as a subject work feature (ROC) curve and survival analysis. All analyses were performed using timeROC and survival packages in R software.

The references for the construction of the three different prognosis models compared by the invention are as follows:

[1].Wang D,Wei G,Ma J et al.Identification of the prognostic value of ferroptosis-related gene signature in breast cancer patients,BMC Cancer 2021；21:645.

[2].Li Y,Zhao X,Liu Q et al.Bioinformatics reveal macrophages marker genes signature in breast cancer to predict prognosis,Ann Med 2021；53:1019-1031.

[3].Zhang Z,Fang T,Lv Y.A novel lactate metabolism-related signature predicts prognosis and tumor immune microenvironment of breast cancer,Front Genet 2022；13:934830.

as shown in FIG. 4, the survival curves showed that the low risk group survival was higher (FIG. 4: A-D). In addition to the characteristics of Zhang et al (auc= 0.502,0.522,0.568), other characteristics showed good potential in predicting survival of breast cancer for 1, 3 and 5 years based on the area under the subject's working characteristics curve (fig. 4:E-H). The accuracy of BCSCRS (auc=0.694) and nomogram (auc=0.758) established in this study was higher than other features (fig. 4:I). Graphs optimized for clinical variables are not included in the feature comparison, but are used only for auxiliary validation. The results of the C-index, RMS, DCA analysis further demonstrated that BCSCRS has excellent accuracy in predicting breast cancer survival (FIG. 4:J-L).

Embodiment four: application of prognosis model based on breast tumor stem cell related gene-cancer immune landscape analysis based on BCSCRS.

In view of the significant correlation between BCSC core gene and immune activity that we observed in studies and assays, we performed GSVA and GSEA assays to explore this correlation further, it was found that in the high risk group, the signaling pathway was significantly enriched in biological processes such as steroid biosynthesis, fructose and mannose metabolism, protein export, proteasome and the citrate cycle TCA cycle. In contrast, the low risk group's pathway is characterized by primary immunodeficiency and T cell receptor signaling pathways (figure 5:A), suggesting a potential link between the low risk group and immunity.

To further explore this relationship, we studied features related to immunolandscapes, including TME and immunoinfiltration (fig. 5:B). The results showed that the estimated score, immune score and matrix score (these several scores were directly in english) were significantly higher for the low risk group than for the high risk group, whereas the tumor purity results were the opposite (fig. 5: c). These findings indicate that the content of TME stromal cells and immune cells is higher in the low risk group than in the high risk group. Subsequently, the analysis results of ssGSEA showed that immune cell expression levels were higher in the low risk group TME, except macrophages (fig. 5:B). The infiltration abundance of 22 specific immune cells in the high and low risk groups was further analyzed using the cibelort algorithm, and B cells, plasma cells, memory activated CD 4T cells, CD8T cells, and gamma-delta T cells of the low risk group were observed to infiltrate more, while M0 and M2 macrophages of the high risk group infiltrate more (fig. 5:D). In addition, the infiltration levels of B cells, plasma cells, memory activated CD 4T cells, CD8T cells, and gamma-delta T cells were inversely correlated with risk scores (figure 5:E).

These results indicate that there is a close relationship between BCSCRS and immune cells, and a lower risk score indicates higher expression of stromal cells and immune cells in TME.

Fifth embodiment: application of prognosis model based on breast tumor stem cell related gene-BCSCRS-based immunotherapy response assessment.

To further investigate the relationship between BCSCRS and immunotherapy response, we evaluated several indicators:

first, we analyzed the expression of immune checkpoint molecules, and the results showed that the expression of 27 immune checkpoints was significantly higher in the low risk group, suggesting that these patients may be more reactive to immune checkpoint inhibitors (figure 6:A).

We also used the IPS scores of PD1 and CTLA4 as quantitative indicators to further evaluate the effectiveness of immune checkpoint inhibitors. The study results showed that the low risk groups were significantly higher in IPS-CTLA4, IPS-PD1 and IPS-PD1-CTLA4 scores, indicating that these patients had better efficacy when treated with PD-1 and CTLA4 inhibitors (fig. 6:B).

In addition, since BCSCs have been reported to be involved in the drug resistance process of cancer, we also analyzed drug sensitivity of common chemotherapeutics for breast cancer in different risk groups. The study results indicate that the low risk group is more sensitive to chemotherapeutic agents, such as cisplatin, doxorubicin, gemcitabine, methotrexate, paclitaxel, and vinorelbine, indicating that these patients in the low risk group will have better efficacy when treated with chemotherapeutic agents with less likelihood of developing resistance.

Therefore, the research results show that the low-risk group can have better response to immunotherapy and chemotherapy, which has very important clinical practical significance, and also proves that the prognosis model based on the breast tumor stem cell related genes constructed by the invention can be effectively applied to the clinical practice of breast cancer treatment and prognosis.

Breast Cancer Stem Cells (BCSCs) may be the origin of breast cancer heterogeneity, thought to be involved in regulating the response of breast cancer immunotherapy. Thus, understanding the prognostic value and immunoreactivity of BCSCs is crucial to determining patients who are likely to benefit from immunotherapy. Firstly, transcriptome data and clinical data of breast cancer in TCGA and GEO are obtained, then breast tumor stem cell genes related to survival are screened out, a breast cancer stem cell prognosis model is constructed by adopting LASSO regression and multi-variable COX regression, and indexes such as ROC, C-index and DCA are used for comparing the accuracy of the model with that of other existing prognosis models. Based on the research, in order to better apply the prognosis model to clinic, an alignment chart is constructed based on the risk score and clinical variables to predict the overall survival of patients, so that the accuracy of the prognosis model is improved, and the method is better applied to clinic practice.

Results: our study constructed a 10 gene breast cancer stem cell-related prognostic model, with better prediction accuracy (AUC values of 1, 3, 5 years all up to 0.7) for the prognostic model constructed by the study, whether in training or validation cohorts, and in addition, the model had better prediction accuracy compared to other models, with better prediction accuracy and prediction accuracy for nomograms incorporating clinical variables (auc=0.758, c-index=0.744).

The invention judges the prognosis of the patient based on the prognosis characteristic gene of the breast tumor stem cells and a corresponding prediction model (namely, according to the sum of the products of the expression quantity of the prognosis characteristic genes of the breast tumor stem cells and the coefficients thereof and the clinical characteristics of the patient, the invention has the advantages of efficiently and accurately predicting the prognosis of the breast cancer patient, provides effective guidance for the treatment decision of the breast cancer patient for clinicians, reduces the occurrence of ineffective treatment, and further reduces the treatment cost and uncomfortable experience of the patient.

The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not all described in detail nor are they intended to limit the invention to the specific embodiments described. Obviously, other relevant modifications may be made in view of the present description. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims

1. A prognostic signature for breast cancer prognosis model construction, characterized by: the prognosis characteristic gene is a breast tumor stem cell related gene.

2. A prognostic signature for breast cancer prognosis model construction according to claim 1, characterized in that the prognostic signature is BRD4, RPS24, SERPINA3, SKP1, NTRK3, CD79A, JAK1, NT5E, NDRG1, CD24.

3. Use of the prognostic signature gene according to claim 2 in the construction of a breast cancer prognostic model, wherein the prognostic model includes: risk score of breast tumor stem cell-related genes (BCSCRS) = (-0.53045 ×brd4) +(-0.26259 ×rps24) +(-0.31334 ×serpina 3) + (0.434039 ×skp1) +(-0.53742 ×ntrk3) +(-0.23344 ×cd79A) +(-0.40628 ×jak1) + (0.192005 ×nt 5E) + (0.152866 ×ndrg1) + (0.194872 ×cd24); when BCSCRS is less than 0.985569, the risk state is judged to be low, and when BCSCRS is more than 0.985569, the risk state is judged to be high.

4. A use according to claim 3, characterized in that: nomograms were constructed based on risk scores and clinical variables of breast tumor stem cell-related genes to predict overall survival and prognostic treatment of patients.

5. The use according to claim 4, characterized in that: the clinical variables include gender, TNM staging and age.

6. The use according to claim 5, characterized in that: the total score of each patient is calculated by adding the risk score of the breast tumor stem cell related genes and the scores of all clinical variables, so that the survival probability of breast cancer patients in 1 year, 3 years and 5 years is predicted, and the survival probability of patients with lower total score is higher.

7. The use according to claim 6, wherein the method of constructing a prognostic model comprises the steps of:

(4) And (3) verifying a prognosis model: the TCGA test queue and the GSE20685 queue are respectively used as an internal verification queue and an external verification queue to verify the accuracy of the prognosis model, and ROC, C-index and DCA indexes are used for evaluating the accuracy of the prognosis model;

(5) Constructing an alignment chart: constructing nomogram and calibration curves including patient age, sex, TNM stage and breast tumor stem cell related gene risk score (BCSCRS) using rms package; performing subject work feature (ROC) and Decision Curve Analysis (DCA) using timeROC and ggDCA software packages, comparing the predictive accuracy of nonomogram with other prognostic factors; nomograms constructed by inclusion of BCSCRS and clinical variables were used as quantitative methods for predicting survival of breast cancer patients.

8. The application of the detection reagent in preparing a kit for evaluating the prognosis survival of breast cancer is characterized in that the detection reagent consists of reagents for detecting the following 10 gene expression levels: BRD4, RPS24, SERPINA3, SKP1, NTRK3, CD79A, JAK1, NT5E, NDRG, CD24, as a kit for achieving the only key components for assessing prognostic survival of breast cancer, the kit further comprises instructions for: bcscrs= (-0.53045 ×brd4) +(-0.26259 ×rps24) +(-0.31334 ×serpina 3) + (0.434039 ×skp1) +(-0.53742 ×ntrk3) +(-0.23344 ×cd79A) +(-0.40628 ×jak1) + (0.192005 ×nt 5E) + (0.152866 ×ndrg1) + (0.194872 ×cd24); when BCSCRS is less than 0.985569, the risk state is judged to be low, and when BCSCRS is more than 0.985569, the risk state is judged to be high, and the prognosis of the patient in the low risk state is better than that of the patient in the high risk state.

9. The use according to claim 8, characterized in that: the specimen detected by the kit is a fresh tissue tumor specimen.