CN115497562A - Pancreatic cancer prognosis prediction model construction method based on copper death-related gene - Google Patents

Pancreatic cancer prognosis prediction model construction method based on copper death-related gene Download PDF

Info

Publication number
CN115497562A
CN115497562A CN202211330510.6A CN202211330510A CN115497562A CN 115497562 A CN115497562 A CN 115497562A CN 202211330510 A CN202211330510 A CN 202211330510A CN 115497562 A CN115497562 A CN 115497562A
Authority
CN
China
Prior art keywords
copper
death
prognosis
genes
pancreatic cancer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211330510.6A
Other languages
Chinese (zh)
Other versions
CN115497562B (en
Inventor
梁智勇
刘启贤
尹香琳
吴焕文
张卉
李瑞玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking Union Medical College Hospital Chinese Academy of Medical Sciences
Original Assignee
Peking Union Medical College Hospital Chinese Academy of Medical Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking Union Medical College Hospital Chinese Academy of Medical Sciences filed Critical Peking Union Medical College Hospital Chinese Academy of Medical Sciences
Priority to CN202211330510.6A priority Critical patent/CN115497562B/en
Publication of CN115497562A publication Critical patent/CN115497562A/en
Application granted granted Critical
Publication of CN115497562B publication Critical patent/CN115497562B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Public Health (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Biochemistry (AREA)
  • Library & Information Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention belongs to the field of medicine, and particularly relates to a pancreatic cancer prognosis prediction model construction method based on a copper death-related gene. The method comprises the following steps: collecting clinical follow-up data and gene expression data of pancreatic cancer samples; removing samples without survival state, survival time and clinical follow-up information from the data set; obtaining 35 copper death related genes from the literature; screening three significant genes from the copper death related genes according to a LASSO-Cox regression analysis method; constructing a prediction prognosis model consisting of three genes according to the expression quantity of the significant genes and pancreatic cancer prognosis information on an ICGC database; and carrying out prognostic analysis on the case data of the pancreatic cancer based on the predictive prognosis model. The invention constructs a new model based on three copper death-related genes, and the model has good performance in predicting the prognosis of pancreatic adenocarcinoma patients and identifying pancreatic adenocarcinoma patients suitable for immunotherapy, targeted therapy and chemotherapy.

Description

Pancreatic cancer prognosis prediction model construction method based on copper death-related gene
Technical Field
The invention belongs to the field of medicine, and particularly relates to a pancreatic cancer prognosis prediction model construction method based on a copper death-related gene.
Background
Pancreatic adenocarcinoma (PAAD) is a disease with a high mortality rate, with a 5-year survival rate of only about 10%. High mortality rates may be largely due to high aggressiveness and multiple levels of treatment differences, and traditional staging systems, such as the American Joint Committee for Cancer (AJCC) classification, primarily assess risk and treatment based only on clinical analysis of patients, regardless of their molecular biological properties. Thus, it is difficult to effectively distinguish "high risk" patients early. Despite the great advances made in recent years in the development of therapeutic approaches to the treatment of pancreatic adenocarcinoma, including chemotherapy, radiation therapy, immunotherapy and targeted therapy, only a few patients can benefit from these therapies. Effective biomarkers for identifying patients who may benefit from these treatments have also not been developed. Early differentiation of "high risk" patients from individualized treatment is crucial to improve the clinical outcome of pancreatic adenocarcinoma. Therefore, in order to improve prognosis and treatment effect of pancreatic adenocarcinoma, more accurate and effective indicators are urgently required.
Copper is an important nutrient involved in many biological processes. In recent years, dysregulation of copper homeostasis may play an important role in cancer. In one aspect, copper accumulation has been reported to promote tumor proliferation, growth, angiogenesis, and metastasis, including pancreatic neuroendocrine tumors. On the other hand, with the global success of platinum (II) compounds in cancer chemotherapy, copper complexes are also potential antitumor agents, however, their underlying mechanisms remain largely unclear. Recently, a new pathway of programmed cell death, known as copper death, has been discovered. Excess copper induces cell death, unlike all other known forms of cell death, such as apoptosis, necrosis, iron death, and autophagy. During copper death, excess copper can bind directly to fatty acylated proteins in the tricarboxylic acid (TCA) cycle, leading to protein toxic stress and cell death. Clinical trials have shown that copper ionophore-induced death contributes to improved therapeutic response in cancer patients with low Lactate Dehydrogenase (LDH). Cancer cells exhibit therapeutic significance for copper death due to their preference for copper ions compared to normal cells. It is worth noting that there are continuing studies reporting that copper death-related genes are closely related to tumor development, progression, prognosis and drug sensitivity. However, the role of copper death-related genes in pancreatic adenocarcinoma is not clear.
Disclosure of Invention
Aiming at the problems, the invention provides a method for constructing a pancreatic cancer prognosis prediction model based on a copper death-related gene, so as to establish an effective index for predicting the prognosis and treatment selection of pancreatic adenocarcinoma. According to the invention, a pancreatic adenocarcinoma prediction prognosis model is constructed in a training set by adopting LASSO-Cox regression analysis. The model also verifies the accuracy of the model through other three independent sets. The prognostic power of the model and the relevance of the predictive prognostic model to other clinical pathological characteristics were analyzed. In addition, the relationship of the model with molecules and immune environment, and the response of patients with different risk groups to immunotherapy, targeted therapy and chemotherapy are comprehensively analyzed. In addition, the expression of a novel independent prognostic gene TSC22D2 for pancreatic adenocarcinoma was verified by public databases and experiments. The invention provides an effective prediction prognosis model based on the copper death related gene and promotes the personalized treatment of pancreatic adenocarcinoma patients.
The main technical scheme of the invention is as follows:
a method for constructing a pancreatic cancer prognosis prediction model based on a copper death-related gene comprises the following steps:
step 1, collecting clinical follow-up data and gene expression data of pancreatic cancer samples to form a data set, removing samples without survival state, survival time and clinical follow-up information from the data set;
step 2, obtaining a copper death related gene;
step 3, screening three genes which obviously influence the pancreatic cancer prognosis from the copper death-related genes according to a LASSO-Cox regression analysis method;
step 4, constructing a prediction prognosis model consisting of three genes according to the expression quantity of the three significant difference genes on the ICGC database and the pancreatic cancer prognosis information of the ICGC database;
and 5, carrying out prognostic predictive analysis on the pancreatic cancer case data with the enlarged sample number based on the prognostic predictive model.
In the method for constructing a pancreatic cancer prognosis prediction model, the step 2 of obtaining the copper death-related gene comprises the following steps: genes associated with 35 copper deaths were obtained from the literature and genes that did not match the data set were then knocked out, the remaining genes being copper death-related genes for analytical screening.
In the method for constructing a pancreatic cancer prognosis prediction model, the expression level of the copper death-related gene is converted into a log2 (TPM + 1) format.
In the method for constructing the pancreatic cancer prognosis prediction model, in step 3, a single gene having an independent prediction prognosis value for the overall survival rate of the patients in the training set is screened as an obvious gene.
In the method for constructing a pancreatic cancer prognosis prediction model, the LASSO-Cox regression analysis method is a combination of a LASSO regression analysis method, a single-factor Cox regression analysis method and a multi-factor Cox regression analysis method;
primarily screening through single-factor Cox regression analysis, wherein the single-factor Cox regression analysis method screens genes which are remarkably related to survival prognosis in copper death related genes;
the LASSO regression analysis further narrowed down reliable copper death-related genes that were significantly associated with prognosis.
In the pancreatic cancer prognosis prediction model construction method, the three significant genes are PRKDC, C6orf136 and TSC22D2, which are all key genes involved in the copper death process.
In the method for constructing a pancreatic cancer prognosis prediction model, in step 4, a multi-factor Cox regression analysis is used for constructing the prognosis prediction model, and the correlation between the prognosis prediction model and other clinical characteristics is evaluated, wherein the multi-factor Cox regression analysis is used for simultaneously detecting whether a plurality of genes are significantly correlated with survival prognosis.
In the method for constructing a pancreatic cancer prognosis prediction model, the other clinical characteristics are tumor grade, T stage, N stage, sex, age and AJCC stage.
In the method for constructing the pancreatic cancer prognosis prediction model, a nomogram is created based on independent prognosis factors, the prediction prognosis model and other clinical characteristics are all variables in the nomogram, each variable in a nomogram scoring system is given a score, and all the scores are added to calculate a total score, so that the total score is used for predicting the prognosis survival probability of a pancreatic cancer patient after 1 year, 2 years and 3 years;
in the method for constructing the pancreatic cancer prognosis prediction model, a calibration curve is drawn to evaluate the 1-year, 2-year and 3-year prognosis survival rates of pancreatic cancer predicted by a nomogram and the consistency of clinical observation.
In the method for constructing the pancreatic cancer prognosis prediction model, the coefficient of each gene is obtained through the multi-factor Cox regression analysis, a risk scoring formula is established according to the expression quantity of each gene,
risk score = 0.035E ^ (0.292 XPRKDC-0.347 XC 6orf136+0.613 XTSC 22D 2)
The risk scoring formula is a prediction prognosis model, and the median of the risk scoring is used as a critical value, wherein: those greater than or equal to the threshold are high risk groups, and those less than the threshold are low risk groups.
In the pancreatic cancer prognosis prediction model construction method, the classification variables are analyzed by Fisher's accurate test or chi-square test;
in the method for constructing the pancreatic cancer prognosis prediction model, wilcoxon rank sum test or Kruskal-WallisH test is adopted for continuous variable analysis;
in the pancreatic cancer prognosis prediction model construction method, the correlation between two continuous variables is tested by a spearman correlation analysis method;
in the method for constructing the pancreatic cancer prognosis prediction model, all statistical analyses are performed by using R software.
By the technical scheme, the invention at least has the following advantages:
the invention constructs a new model based on three copper death-related genes, and the model has good performance in predicting the prognosis of pancreatic cancer patients and identifying pancreatic adenocarcinoma patients suitable for immunotherapy, targeted therapy and chemotherapy. The TSC22D2 gene related to copper death has good prediction efficiency on the overall survival time of pancreatic adenocarcinoma patients. The invention lays a good foundation for further discussing the action of copper death in pancreatic adenocarcinoma.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to implement them in accordance with the contents of the description, the following detailed description is given with reference to the preferred embodiments of the present invention and the accompanying drawings.
Drawings
FIG. 1 is a general study design and workflow of the predictive prognostic model for patients with pancreatic adenocarcinoma of the present invention.
FIG. 2 shows the construction and validation of predictive prognosis models in the training set and validation set: (a) risk score, survival time, survival status and 3-gene expression trend in the training set (ICGC, 152 samples), (B) ROC curve to evaluate sensitivity and specificity of OS for one, two and three years to predict pancreatic cancer patients from risk scores, (C) Kaplan-Meier curve for OS between high and low risk groups in the training set, (D) risk score, survival time, survival status and 3-gene expression trend in the validation set (ICGC, 61 samples). (E) ROC curve, assessing sensitivity and specificity of predicting one, two and three year OS in pancreatic cancer patients based on risk score, (F) verification of Kaplan-Meier curve for OS between high and low risk groups in the set.
FIG. 3 is a functional enrichment analysis of predictive prognostic models between the assessment and high and low risk groups in the TCGA test set: (A) risk score, survival time, survival status and 3-gene expression trend in test set (TCGA, 176 samples), (B) ROC curve to predict sensitivity and specificity of OS for one, two and three years of pancreatic cancer patients according to risk score in test set, (C) Kaplan-Meier curve of overall survival between high and low risk groups in test set, (D) volcano plot of difference genes between high and low risk groups in test set, (E) chord plot to show 8 significantly enriched GO pathways between high and low risk groups.
FIG. 4 shows the clinical significance and prognostic role of this model: (a-F) correlations between risk scores and various clinical characteristics, (G) single factor Cox analysis of OS, selection risk scores and various clinical characteristics in the TCGA cohort, (H) multi factor Cox analysis of OS, selection risk scores and various clinical characteristics in the TCGA cohort, (I) prediction of one, two and three year OS in the TCGA cohort from nomograms constructed in combination with risk scores and N clinical stages, (J) assessment of predicted one, two and three year overall survival and actual compliance from calibration curves of nomograms in the TCGA cohort. * p < 0.05;. P <0.01; * P <0.001, P < 0.0001.
Figure 5 is a molecular profile and immune landscape between higher low risk groups: (A) the waterfall of somatic mutations in high and low risk patients were studied using TCGA cohorts, (B) tumor mutation load comparisons between high and low risk patients in TCGA cohorts, (C) the proportion of immune infiltrating cells compared between high and low risk groups using CIBERSORT algorithm, (D) IPS scores between high and low risk patients in TCGA cohorts, (E) TIDE scores between high and low risk patients in TCGA cohorts, < 0.05;. P < 0.01. P <0.001. Fig..
Figure 6 is a graph of the relationship between risk score and drug sensitivity, including chemotherapy, targeted drug and a proteasome inhibitor. IC50: half inhibitory concentration.
FIG. 7 shows the expression of TSC22D2 and its prognostic value: (A) in ICGC cohort, the Kaplan-Meier curve of overall survival between high and low expression sets of TSC22D2, (B) the CPTAC website showed differences in TSC22D2 expression at the protein level between cancer and normal tissues, (C) Western blot analysis showed differences in TSC22D2 protein expression between pancreatic cancer cell lines and human pancreatic ductal epithelial cell lines, (D) the GEPIA website suggested relative mRNA expression of TSC22D2 between pancreatic adenocarcinoma and normal samples, (E) the GEPIA website showed a relationship between TSC22D2 relative mRNA expression and clinical staging, (F) RT-qPCR assay showed relative mRNA expression of TSC22D2 between pancreatic cancer cell lines and human pancreatic ductal epithelium, <0.05; < 0.01;. P < 0.001;. P < 0.0001. C.
FIG. 8 is a prognostic model constructed using LASSO-Cox regression analysis: (a) LASSO regression analysis in the training set, (B) confidence interval for each lambda, (C) calculating the risk ratio of the three genes in the predictive prognosis model using multivariate Cox regression analysis.
FIG. 9 is an evaluation of the predictive prognostic model in the GEO test set: (A) the risk score, survival time, survival status and 3-gene expression trend of the test set (GSE 85916,80 samples), (B) ROC curve predicting the sensitivity and specificity of OS in one, two and three years in pancreatic cancer patients based on the risk score of GEO test set, and (C) Kaplan-Meier curve of OS between high and low risk groups in GEO test set.
Figure 10 is the molecular profile and immune landscape between higher and lower risk groups: (A) correlation between risk score and tumor mutation negativity showing the Spearman correlation coefficient (R) and corresponding P-values, (B) correlation between risk score and MSI in TCGA-pancreatic adenocarcinoma cohort, (C) expression levels of representative immune checkpoint genes in high and low risk pancreatic adenocarcinoma patients in TCGA cohort, <0.05; <0.01; < 0.001;. P < 0.0001. Fig..
FIG. 11 shows the prognostic value and expression of TSC22D 2: (A) Kaplan-Meier curve for OS between high and low TSC22D2 expression groups in TCGA cohort, (B) TSC22D2 expression of pancreatic cancer cells in cancer cell line encyclopedia database.
Detailed Description
To further explain the technical means and effects of the present invention adopted to achieve the predetermined object, the following detailed description of the embodiments, structures, features and effects according to the present invention will be made with reference to the accompanying drawings and preferred embodiments.
The method for constructing the pancreatic cancer prognosis prediction model based on the copper death-related gene shown in FIGS. 1 to 11 comprises the following steps:
step 1, collecting clinical follow-up data and gene expression data of a pancreatic cancer sample to form a data set, and removing the sample without survival state, survival time and clinical follow-up information from the data set;
step 2, obtaining a copper death related gene;
step 3, screening three genes which obviously influence the pancreatic cancer prognosis from the copper death-related genes according to a LASSO-Cox regression analysis method;
step 4, constructing a prediction prognosis model consisting of three genes according to the expression quantity of the three significant difference genes and the pancreatic cancer prognosis information of the ICGC database;
and 5, carrying out prognostic predictive analysis on the pancreatic cancer case data with the enlarged sample number based on the prognostic predictive model.
In this example, clinical follow-up data and gene expression data were collected from 472 pancreatic cancer samples from 3 independent public datasets, including 216 International cancer genome Association (ICGC) -PACA-CA (https:// dcc. Icgc.org/freesases/current/Projects) samples (213 samples remaining after the data for information insufficiency was removed), 80 (GSE 85916) samples (GEO, https:// www.ncbi.nlm.nih.gov/GEO /) (79 samples remaining after the data for information insufficiency was removed), and 176 cancer genomic map (TCGA) -pancreatic adenocarcinoma cohort samples. The latest gene expression profile of TCGA and the corresponding clinical follow-up data were downloaded via UCSC Xena genome browser database (https:// xenambrowser. Net/datapages /), with a download time of 2022 years, 4 months, 14 days.
The following are the steps to process these data: 1) And removing samples without the survival state, the survival time and the clinical follow-up information from the analysis, specifically screening the samples without the survival state, the survival time and the clinical follow-up information by using R software. 2) The median expression level is taken for genes having a plurality of gene symbols. 3) The gene with Ensembl IDs was converted to gene symbol. 4) Gene expression data were converted to log2 (TPM + 1) format.
35 Copper-death-related genes obtained from Science, "compressor indeces cell death by targeting expressed TCA cycle proteins", were deleted one gene that did not match the dataset, i.e., 34 genes.
The single-factor Cox regression analysis is to screen by R software which single gene of 34 copper death-related genes is significantly related to survival prognosis (p < 0.05), while the multi-factor Cox regression analysis is to simultaneously detect whether multiple genes are significantly related to survival prognosis (p < 0.05). The calculation steps of the above analysis are implemented by corresponding codes, the single-factor Cox and the multi-factor Cox are implemented on the basis of "survival" R, and the LASSO regression analysis is implemented on the basis of "glmnet" R.
A plurality of genes which are obviously related to survival prognosis can be selected by the method of the application to construct a model for predicting prognosis.
Using a 'survivval' packet in R language, namely single-factor Cox regression analysis, analyzing and screening single genes which have independent prognostic value on the overall survival rate of the patients with the training set, and finally screening to obtain the following 3 genes: TSC22D2, C6orf136, PRKDC. And LASSO regression analysis (the 'glmnet' package in the R language) is used for confirming that 3 genes are all key genes again, the number of the genes which cannot be reduced is reduced, finally, the multi-factor Cox regression analysis is used for obtaining the respective coefficients of the three genes, and a risk scoring formula, namely a prediction prognosis model, is constructed by using the respective coefficients and the expression quantity of each gene.
The categorical variables were analyzed using Fisher's exact test or chi-square test. Analysis of continuous variables used Wilcoxon rank sum test or Kruskal-WallisH test. The correlation between two consecutive variables was examined using spearman's correlation analysis. All statistical analyses were performed using R software. All statistical tests were two-tailed unless otherwise noted. P <0.05, the difference was considered statistically significant.
213 samples of the ICGC-PACA-CA dataset were randomly divided into a training set (n =152,70%) and a validation set (n =61,30%). In the presence of a random partitioning bias, 213 samples were pre-randomized 1000 times in advance using the createdatapartion function of the "caret" R packet. The partitioning of the training and validation sets was not biased because there were no differences in some characteristics between the two groups, such as overall survival (overall survival), gender, age, AJCC staging and histology staging (see table 1). In addition, the test set also included samples from the TCGA-pancreatic adenocarcinoma cohort and the GSE85916 cohort.
TABLE 1 difference in clinical pathology between training and validation sets
Figure 316891DEST_PATH_IMAGE002
The predictive prognosis model divides patients into high-risk and low-risk cohorts based on three significantly different genes (TSC 22D2, C6orf136, PRKDC). Patients with pancreatic adenocarcinoma in the high-risk cohort had a poor prognosis and were validated with three external cohorts to obtain consistent results. The predictive prognostic model is an independent predictor of overall survival (HR =10.7, p-straw 0.001) and is used to create a nomogram with good prognostic value. High risk patients have a high TP53 mutation rate and less benefit from immunotherapy, but respond well to a variety of targeted therapies and chemotherapeutic drugs. Finally, the pivot gene TSC22D2 in the predictive prognostic model was found to be an independent prognostic factor for overall survival (p < 0.001). TSC22D2 was highly expressed in pancreatic cancer tissues/cells compared to normal tissues/cells, which has been validated by public databases and experiments.
The model constructed by the pancreatic cancer prognosis prediction model construction method based on the copper death related gene is a new model based on three key genes participating in the copper death process, and provides a reliable biomarker for predicting the prognosis and treatment response of a pancreatic adenocarcinoma patient. The potential prognostic role of the TSC22D2 gene in pancreatic adenocarcinoma remains to be further explored.
As shown in table 2, the present invention includes 35 copper death-related genes and their associated False Discovery Rates (FDRs). Of these, 34 copper death-related genes were successfully matched to the training set (no successful match of TMEM191B gene). Single-factor Cox regression analysis was used in the training set to investigate the prognostic relevance of these genes to overall survival through the coxph function of the "survival" R package. Then, LASSO regression models were constructed using the "glmnet" R package, and confidence intervals under each 5-fold cross validation were analyzed to prevent overfitting. And finally, obtaining the coefficient of each gene by adopting multi-factor Cox regression analysis, and establishing a risk scoring formula. Patients were divided into two risk groups according to the median of the risk scores. The "timeROC" software package was used for three additional datasets to analyze their specificity and sensitivity for predicting 1, 2 and 3-year overall survival. The ROC effect was evaluated by calculating the area under the curve (AUC). Kaplan-Meier (KM) curve analysis was established by the "surfminer" software package to compare overall survival between high and low risk groups in the training, validation and test sets.
TABLE 2 35 copper death-related genes and their associated False Discovery Rates (FDR)
Figure 234031DEST_PATH_IMAGE003
The risk score and the clinical feature correlation of the model constructed by the method are visualized by a violin diagram by using a ggplot2 software package. Independent prognostic value of 3 gene signature models and other clinical features (tumor grade, T stage, N stage, gender, age and AJCC stage) were assessed using single and multifactorial Cox regression analysis. The results of univariate and multifactor Cox regression analysis based on TCGA cohorts were analyzed by the "forest" software package and displayed with random forest plots.
The "rms" software package was used to create nomograms based on independent prognostic factors in multivariate Cox regression analysis. Each variable in the nomogram scoring system is assigned a score, and the total score is calculated by adding all the scores. Calibration curves were plotted to show the consistency of the histogram prediction and clinical observations over the one-, two-and three-year overall survival.
In the analysis of molecular and immunological properties, the "DESeq2" package was used to identify differentially expressed genes between high and low risk groups in the TCGA dataset. The | Log2FoldChange | >1 and false discovery rate adjusted P value <0.01 are set as the differential gene threshold. The result is visualized by using a volcanic map visualization software package of ggplot 2. GO functional enrichment analysis was performed on these DEG using the "clusterProfiler" software package. Data on the mutated genes were downloaded from the TCGA data portal (https:// portal. Gdc. Cancer. Gov). The mutsig2.0 method was used to identify significantly mutated genes and the mutated genes in both risk groups were visualized using the "Maftools" software package. The correlation analysis of risk score and total mutation burden was performed using the "ggplot2" software package. The difference in total mutation burden and microsatellite instability scores between the two risk groups was also visualized by the boxplot of the "ggplot2" package. 22 tumour-infiltrating subsets of immune cells were analysed using the CIBERSORT algorithm (https:// ciberstart. Stanford. Edu /) and differential expression of partial immune checkpoints of the high and low risk groups, such as CD274, CD276, CD44, CD40, were detected.
In an immunotherapy response assay, the immunophenotypic score (IPS) is a good predictor of response to anti-cytotoxic T lymphocyte antigen 4 (CTLA-4) and anti-programmed cell death protein 1 (PD-1) antibodies. IPS can be obtained by cancer immunology (TCIA) (https:// tia. At /) and fall into four categories: effector cells (activated CD4+ T cells, activated CD8+ T cells, effector memory CD4+ T cells and effector memory CD8+ T cells), suppressor cells (treg and MDSCs), MHC-related molecules and checkpoints or immunomodulators. Tumor Immune Dysfunction and Exclusion (TIDE) online calculations (http:// TIDE. Dfci. Harvard. Edu /) have the potential clinical efficacy to assess the response of patients in different risk groups to treatment with Immune Checkpoint Inhibitors (ICIs). The TIDE score was superior to accepted immunotherapeutic biomarkers (PD-L1 levels and interferon gamma) in assessing the effectiveness of anti-PD 1 and anti-CTLA 4. The response to chemotherapy and targeted therapy was evaluated using the "pRRophetic" software package based on the cancer drug sensitivity Genomics (GDSC) website (https:// www.cancerrxgene.org /). A lower half maximal inhibitory concentration (IC 50) refers to a higher sensitivity to drug treatment.
Furthermore, the TSC22D2 is taken as an example in the invention, and the prognosis effect of the pancreatic cancer prognosis prediction model construction method is further verified.
1) Expression and prognostic value of TSC22D2 in public datasets
GEPIA (http:// GEPIA. Cancer-pku. Cn /) was used to study tumor samples (n = 179) and normal samples (n = 171) to predict mRNA expression of the relevant TSC22D2 gene in the prognostic model. Patients were divided into two groups based on the median TSC22D2 gene expression. The prognostic power of these TSC22D2 genes was demonstrated in the TCGA and ICGC datasets using KM curves, respectively. To assess the expression of the TSC22D2 gene at the protein level, we obtained a TCGA-pancreatic adenocarcinoma proteomics cohort from the Clinical Proteomics Tumor Analysis Consortium (CPTAC) (https:// proteomics cancer. Gov/programs/CPTAC), including tumor samples (n = 137) and normal samples (n = 74).
2) Validation of TSC22D2RNA and protein expression in cell lines
Pancreatic cancer cell lines (AspC-1 and BxPC-3) were cultured in RPMI-1640 (Corning, NY, USA) containing 10% Fetal Bovine Serum (FBS) and 1% penicillin-streptomycin. Two additional pancreatic cancer cell lines (PANC-1, MIA Paca-2) were cultured in DMEM (Dulbecco' modified eagle medium) (Gibco, grand Island, NY, USA) containing 10% FBS and 1% penicillin-streptomycin. Human pancreatic ductal epithelium (hTERT-HPNE) cells were cultured in medium D containing a mixture of M3 and DMEM medium containing one volume of M3TM Base F medium (InCell Corp., san Antonio, TX, USA), three volumes of glucose-free DMEM, 5% FBS, 5.5mM glucose, 10 ng/ml EGF and 50 μ g/ml gentamycin. All these cells were cultured at 37 ℃ in a medium containing 5% carbon dioxide. RNA was extracted from the tissue using TRIzol reagent (Invitrogen, carlsbad, calif., USA) and reverse-transcribed into cDNA using PrimeScript RT Master Mix (Takara, otsu, shiga, japan). RT-qPCR analysis was quantified using PowerUp SYBR Green Master Mix (Applied Biosystems, austin, TX, USA) and expression levels were normalized to GAPDH expression levels. The proteins were extracted in RIPA buffer to which protease and phosphatase inhibitors (Thermo Scientific) were added. Proteins were separated by SDS-PAGE and transferred onto PVDF membrane. anti-TSC 22D2 (1 dilution, #25418-1-AP, proteintech) was used as the primary antibody for immunoblotting. Detection is performed using a chemiluminescent detection system.
The specific test results of the prediction model constructed according to the method of the application are as follows:
1) Workflow chart and construction of prognosis model
The main workflow of the present invention is shown in fig. 1. Of the 34 copper death-related genes, only 3 genes (TSC 22D2, C6orf136 and PRKDC) significantly affected the overall survival of pancreatic adenocarcinoma patients in the training set, based on one-way Cox regression analysis. Subsequent LASSO regression analysis was performed and the locus plot of the coefficient values for each gene with log (lambda) values is shown in FIG. 8A. The confidence interval for each lambda is shown in FIG. 8B, which illustrates that the model is optimal when lambda is 0.04134, i.e., it is clear that three genes are optimal. And obtaining the coefficients of the three genes respectively through multi-factor Cox regression analysis, and constructing a prediction prognosis model. FIG. 8C shows risk scores, survival times, survival status and 3-gene expression trends in the test set (GSE 85916,80 samples).
2) Risk assessment
Figure 10 is the molecular profile and immune landscape between higher and lower risk groups: (A) correlation between risk score and tumor mutation negativity, showing a Spearman correlation coefficient (R) and corresponding P values, (B) correlation between risk score and MSI in TCGA-pancreatic adenocarcinoma cohort, (C) expression levels of immune checkpoint genes representative of high and low risk pancreatic adenocarcinoma patients in TCGA cohort, P <0.05;, <0.01;, < 0.001;, < 0.0001;. Mu.M.
FIG. 11 shows the prognostic value and expression of TSC22D 2: (A) Kaplan-Meier curve for high and low TSC22D2 expression across the groups in TCGA cohort, (B) TSC22D2 expression of pancreatic cancer cells in cancer cell line encyclopedia database.
In the predictive prognosis model, the risk ratios of the three genes were calculated by multifactor Cox regression analysis, i.e., the results of multifactor Cox regression. The results show that PRKDC (hazard ratio (HR) =1.34, 95% ci (1.10-1.63), p = 0.003), C6orf136 (HR =0.71, 95% ci (0.54-0.92), p = 0.011), TSC22D2 (HR =1.85, 95% ci (1.35-2.53), p < 0.001) had a significant effect on overall survival.
PRKDC and TSC22D2 are both risk factors for HR >1, while C6orf136 is a protective factor for HR < 1. Then, a predictive prognosis model for all samples was constructed using a formula based on the risk coefficients of these three genes:
"Risk Score=0.035×e^(0.292×PRKDC-0.347×C6orf136+0.613×TSC22D2)"
the 152 samples in the ICGC training set were divided into high risk group (n = 76) and low risk group (n = 76) with median risk score as the cutoff. The distribution of risk scores showed that more mortality events were observed in the high risk group (fig. 2A). As the risk score increased, the expression of TSC22D2 and PRKDC also increased significantly, while the expression of C6orf136 showed a significant downward trend. Based on ROC analysis, our model had good predictive performance for overall survival in the training set (AUC for overall survival 1 year = 0.64. KM curve analysis of the training set showed poor prognosis for patients in the high risk group relative to patients in the low risk group (p =0.00011, fig. 2C).
3) Predictive prognostic models in evaluation validation set and test set
The 61 patients in the ICGC validation set were divided into high risk (n = 30) and low risk (n = 31) groups according to the median risk score of the risk formula. A risk score distribution was plotted in the validation set, indicating that higher risk scores were associated with more mortality events (fig. 2D). ROC curves were then plotted, verifying that the 1-year, 2-year, and 3-year AUC for global survival predictions in the set were 0.66, 0.71, and 0.75, respectively (fig. 2E). Then, KM curve analysis showed that the prognosis for patients with higher risk scores tended to be significantly lower than for the low risk group (p =0.00062, fig. 2F).
To further validate the robustness of the 3 gene prediction prognostic models, we selected 176 samples from the TCGA-PAAD dataset and 79 samples from the GSE85916 dataset as test sets. Also, we performed the above analysis in two independent data sets. The distribution of risk scores also shows that high risk populations tend to have a higher risk of death. The expression trends of the three genes in the test group (TCGA-PAAD cohort, fig. 3a, gse85916 cohort, fig. 9A) were consistent with those of the training and validation group. In the TCGA test set, the 1 year, 2 year and 3 year AUC of this model were 0.61, 0.64 and 0.71, respectively (fig. 3B). In addition, the 1-year, 2-year, and 3-year AUC for global survival prediction in the GSE85916 test set were 0.53, 0.62, and 0.68, respectively (fig. 9B). KM curve analysis showed significantly poorer clinical prognosis for patients in the high risk group (TCGA-PAAD dataset p =0.024, fig. 3C, gse85916 dataset p =0.05, fig. 9C), similar to the results before the two groups described above.
To investigate the potential mechanisms leading to different outcomes of the stratification of the predictive prognostic models, we performed GO enrichment analysis in the TCGA cohort. As shown in the volcano plot (fig. 3D), 342 up-regulated genes and 1149 down-regulated genes were identified in the high risk group in total relative to the low risk group (| log2FoldChange | >1, p < 0.01). And significantly enriched GO pathways were plotted according to p-value <0.01 and FDR <0.01 as thresholds (fig. 3E).
4) Correlation between risk score and clinical pathology characteristics
Risk scores were calculated in the TCGA cohort to check whether the risk scores correlated with clinical pathology (fig. 3D-I). The risk score has significant correlation (p < 0.05) with tumor grade, AJCC stage, T stage and N stage, and the gender and the risk score have no influence. The method suggests that the high-risk patients have higher tumor grade, higher AJCC stage, higher lymphatic metastasis risk and larger tumor volume. Patients under 65 years of age had an increased risk of death (p < 0.05). This is consistent with previous recognition that young pancreatic adenocarcinoma patients may have a higher degree of tumor malignancy and a poorer prognosis. Therefore, our predictive prognostic model has a reliable prediction of clinical features.
5) Evaluating the independent prognostic value and the prognostic accuracy of the model
To investigate whether clinical indices and predictive prognostic models are independent prognostic factors, we performed single-and multifactorial Cox regression analyses in the TCGA-pancreatic adenocarcinoma group. Single-factor Cox regression analysis showed that T-staging, N-staging, and predictive prognosis models were significantly associated with overall survival (p <0.05, fig. 4A). Meanwhile, multifactor Cox regression analysis showed that the predictive prognosis model was the only independent prognostic indicator for overall survival (HR =10.7, p-we-0.001, fig. 4B).
To make the predictive prognosis model more likely to be applied clinically, we constructed a histogram model using the N-stage and risk scores, which are considered independent factors in the multifactor Cox regression analysis (fig. 4C). Using these two factors, a summary of the patient's scores and a total score are calculated, predicting 1 year, 2 year and 3 year survival probabilities for pancreatic adenocarcinoma patients. And drawing a calibration curve, and analyzing the consistency between the predicted value and the predicted value of the actual measurement through a nomogram model. As shown in fig. 4D, the calibration curves for 1-year, 2-year, and 3-year overall lifetimes in the TCGA cohort are close to optimal performance, indicating that the predictive performance of the histogram model is better.
6) Exploring molecular signatures and immune landscape between populations of high and low risk groups
To further understand the immunological nature, we analyzed the genetic mutations of two risk groups. Figure 4E shows the first 17 mutations of the known driver genes for pancreatic adenocarcinoma. Missense mutations are the most common type of mutation, followed by frameshift deletions and nonsense mutations. The mutation rates of the first four mutant genes (KRAS, TP53, SMAD4, CDKN 2A) were all above 15% in both groups, with the mutation rate of the TP53 gene being more common in the high risk group, and the TTN, RYR1 and GNAS gene mutations being more common in the low risk group.
7) Relationship between Risk score and TMB and MSI
There was a slight negative correlation between risk score and TMB (R = -0.166, p-straw 0.05, fig. 10A). The high-risk group TMB score was significantly lower than the patients (p <0.05, fig. 5B), while the two groups MSI score were not statistically different (fig. 10B).
Since immune response-related pathways correlate with this model, we compared the distribution of 22 immune cells to analyze the Tumor Microenvironment (TME) of both risk groups (fig. 5C). Eosinophils and activated natural killer cells (NK cells) were more abundant in the low risk group, while activated dendritic cells and activated memory CD4T cells were more abundant in the high risk group. In view of the importance of ICIs in the treatment of pancreatic adenocarcinomas, we collected immune modulatory genes of pancreatic adenocarcinomas from previous literature and analyzed their expression in two risk groups separately (fig. 10C). The results show that almost all immune checkpoint molecules in the high risk group are significantly higher than in the low risk group (average p < 0.05).
8) Predicting treatment response between high risk group and low risk group
And analyzing the relationship between the grouping risk degree of the patients and the immune response by adopting an IPS method. In the TCGA cohort, IPS was significantly higher in the low risk group than in the high risk group (p <0.001, fig. 5D), indicating that patients in the low risk group may respond better to ICIs. Moreover, TIDE has also been used to assess the potential clinical efficacy of immunotherapy in different risk groups. A higher TIDE score indicates a higher likelihood of tumor immune escape, but a lesser likelihood of benefit from ICIs. In addition, a higher TIDE score correlates with poorer outcomes. As shown in fig. 5E, the high risk group with a higher predicted TIDE score may have a lower sensitivity to ici and the high risk group has a poorer prognosis.
Chemotherapy and targeted therapy are important components of methods for treating pancreatic cancer that improve the symptoms and survival rate of pancreatic cancer patients, and the combined effect is better in most pancreatic cancers. Many people may fail chemotherapy and targeted therapies due to drug resistance. A total of 251 drugs were statistically sensitive to chemotherapy and targeted therapy in the high-risk and low-risk groups in the TCGA cohort from the GDSC website. Figure 6 is a graph of the relationship between risk score and drug sensitivity, including chemotherapy, targeted drugs, and proteasome inhibitors. IC50: half maximal inhibitory concentration. As shown in fig. 6A-I, 9 drugs had significantly different sensitivities in the two risk groups-4 chemotherapeutic drugs (paclitaxel, ji Boteng tan, vinorelbine, bei Shaluo tikitin), 4 cancer-targeted drugs (midostaurin, palzopanib, sorafenib, imatinib), and 1 proteasome inhibitor (bortezomib). The semi-inhibitory concentration IC50 values for these chemotherapeutic and targeted therapies decreased significantly in the high risk group, indicating that patients in the high risk group were more sensitive to these drugs.
9) TSC22D2 is a prognostic gene for pancreatic adenocarcinoma
We screened each gene in the 3 gene prediction prognosis model by KM curve analysis separately. In the ICGC and TCGA cohorts, TSC22D2 was found to be significantly correlated with the prognosis of pancreatic adenocarcinoma. High expression of TSC22D2 in ICGC-PACA-CA samples was associated with a poor prognosis (p <0.001, fig. 7A), which was similar in TCGA-PAAD samples (p <0.05, fig. 11A). Next, we obtained data on CPTAC, and studied the expression of TSC22D2 at the protein level. Protein expression of TSC22D2 was significantly higher in pancreatic adenocarcinoma samples (n = 137) than in normal samples (n = 74) (p < 0.001), as shown in fig. 7B, we analyzed expression of TSC22D2 at the RNA level. As shown in fig. 7D, the data for GEPIA showed that the expression of the TSC22D2 gene was significantly higher in pancreatic cancer tissue (n = 179) than in normal tissue (n = 171) (p < 0.001). High expression of TSC22D2 was associated with increased AJCC staging (p =0.026, fig. 7E), with higher AJCC staging generally indicating poorer prognosis.
We confirmed that the pancreatic cancer cell lines (BxPC-3, PANC-1, MIA PaCa-2) expressed TSC22D2 protein in higher amounts than hTERT-HPNE by western blot assay (FIG. 7C). We confirmed that TSC22D2 expression in pancreatic cancer cell lines (BxPC-3, PANC-1) was higher than in hTERT-HPNE as detected by qRT-PCR (p <0.001, FIG. 7F). Differential expression was consistent with the results of the Cancer Cell Line Encyclopedia (CCLE) database (fig. 11A). Therefore, TSC22D2 was highly expressed in pancreatic cancer tissues/cells compared to normal tissues/cells and was confirmed to be an independent prognostic gene for pancreatic adenocarcinoma.
The invention adopts LASSO-Cox regression analysis to establish a 3-gene prediction prognosis model which is an independent prognosis factor of the overall survival period. We then explored the clinical significance of this predictive prognostic model and analyzed its predictive power for various therapeutic responses, including immunotherapy, chemotherapy and targeted therapy. In addition, the expression of the newly discovered prognostic gene TSC22D2 was also confirmed by public databases and experiments.
There are several reasons to use the median of the risk scores as the critical point, with the high and low risk components as the training, validation and test set. First, the median is a conservative approach to avoid data manipulation and to present objective comparisons. The threshold value is avoided, the influence of a small number of patients on the prognosis value can be reduced, and the universality of the developed model is improved. Genetic mutations between the two risk groups were analyzed to further understand the immunological properties of the two groups. In line with previous studies, missense mutations are the most common type of mutation, followed by frameshift deletions and nonsense mutations. The largest distinct mutation between the two risk groups was a TP53 gene mutation, which was more common in the high risk group than the low risk group (72% vs 57%). Furthermore, of the 4 major driver genes of pancreatic adenocarcinoma, the high risk group had a higher frequency of KRAS, TP53 and CDKN2A mutations. KRAS and CDKN2A mutations and alterations are early events in pancreatic tumor development, TP53 is a tumor suppressor gene, mutated in more than 70% of pancreatic ductal adenocarcinomas, and often associated with aggressive and metastatic phenotypes. Patients with KRAS or TP53 mutations proved to have a poorer prognosis than non-mutated patients (KRAS p =0.0092, tp53 p = 0.013). In mouse PDAC model KPC cells, KRAS/TP53 mutations were shown to be closely associated with immune escape. Thus, in the high risk group, the TP53 mutation is relatively high, which may indicate a poor prognosis and a low sensitivity to ICIs.
The present invention analyzes the relationship between predictive prognostic models and known biomarkers, such as TMB and MSI, used to predict immune therapy response. TMB is also considered a potential biomarker for predicting response to ICI for many tumor types, including pancreatic adenocarcinoma. High TMB leads to increased mutant-derived neoantigens recognized by the immune system, making TMB-high tumors more likely to respond to anti-PD-1/PD-L1 therapy. According to our study, the TMB of high-risk patients is relatively low, which may explain why high-risk patients are less likely to respond to ICI and face a greater risk of death. Furthermore, immunotherapy has been shown to be effective in rare cases of MSI hyperpancreatic adenocarcinoma. Therefore, we also tested the MSI scores of the high-risk and low-risk groups. However, the difference in MSI between the two risk groups was not significant. This is probably due to the very low prevalence of MSI in pancreatic adenocarcinomas (1% -2%). The incidence of MSI was too low to detect differences between the two risk groups.
Different TMEs in the two risk groups may also be of significant interest for immunotherapy. In the two IRGPI subgroups, some immune cells have different compositions. The high risk group had a higher proportion of activated NK cells and eosinophils than the low risk group. Eosinophil migration, localization and activation from the bone marrow is promoted with the occurrence of infection and tissue injury or inflammatory states. In addition, activated NK cells can mediate bone marrow rejection and promote transplantation, and elicit potent anti-tumor effects. Thus, a higher proportion of eosinophils and activated NK cells in the low risk group indicates that low risk patients may have a good prognosis. Furthermore, we analyzed the expression of immune checkpoint molecules, which were found to be mostly up-regulated in high risk patients, including PD-1/L1 and CTLA4. Thus, there may be suppression of immune responses in high risk populations with pancreatic adenocarcinoma, achieving immune escape by "hijacking" the immune checkpoint pathway.
Both TIDE and IPS have been shown to be good predictors of therapeutic response of anti-CTLA-4 and anti-PD-1 antibodies. IPS was developed by machine learning, using TCGA data to analyze tumor immunogenicity and tumor escape mechanisms. TIDE is a new computational method that can represent two tumor immune escape mechanisms: inducing T cell dysfunction in Cytotoxic T Lymphocyte (CTL) -high tumors and T cell rejection in CTL-low tumors. High risk patients have high TIDE and low IPS scores compared to lower risk patients, suggesting that high risk patients have a higher level of immune escape and may be less sensitive to ICI treatment. Furthermore, we analyzed drug sensitivity between the high risk group and the low risk group. Analysis of 9 drugs showed that patients in the high risk group are likely to receive more benefit from these chemotherapeutic and molecularly targeted drugs. In clinical trials, most of these drugs have been shown to be effective in treating patients with pancreatic adenocarcinoma. For example, albumin-paclitaxel plus gemcitabine is a first line treatment for advanced pancreatic cancer. Sorafenib is an inhibitor of B-raf, VEGFR2 and PDGFR-beta, and has a good treatment effect on pancreatic cancer in a phase I test. Thus, based on our predictive prognostic model, individualized treatment of pancreatic adenocarcinoma patients can be optimized by optimizing the regimen of immunotherapy, chemotherapy, and targeted therapy.
The predictive prognosis model consisted of 3 key genes (PRKDC, C6orf136, TSC22D 2) involved in the copper death process. DNA-dependent protein kinase (PRKDC) encodes a DNA-dependent protein kinase catalytic subunit (DNA-pkcs), which is critical for DNA double strand break repair and V (D) J recombination. Missense mutations in PRKDC lead to DNA-PKcs deficiencies, which are associated with organ-specific autoimmune inflammatory diseases and loss of mature T and B cells and jak3 in T and putative natural killer cells. Chromosome 6 open reading frame 136 (C6 orf 136) is a conserved gene that is hypermethylated in head and neck squamous cell carcinoma cells for the expression of oncogene FOXM 1. In addition, C6orf136 was reported as a core gene in a predictive prognostic model and was identified as a novel biomarker for bladder cancer. It is emphasized that the transforming growth factor β -stimulated clone 22 domain family, member 2 (TSC 22D 2), was found to be an independent prognostic factor for pancreatic adenocarcinoma. Higher TSC22D2 expression was associated with poorer prognosis and higher AJCC tumor stage. Results from experiments and public databases confirmed that TSC22D2 was significantly more expressed in pancreatic cancer tissues/cells than normal tissues/cells. Previous studies found TSC22D2 to be a novel cancer-associated gene in a rare multiple cancer family. TSC22D2 expression is significantly down-regulated in colorectal cancer, and over-expression of TSC22D2 inhibits cell growth. Given the critical role of TSC22D2 in the copper death process, depletion of TSC22D2 may reduce copper-induced cell death, which may be a potential treatment for pancreatic adenocarcinoma. Thus, TSC22D2 has a controversial role in cancer, particularly in pancreatic adenocarcinoma, and its underlying mechanisms remain to be further investigated. In the calculation formula of the risk score, the coefficients of PRKDC and TSC22D2 are both positive numbers, which indicates that the risk score is positively correlated with PRKDC and TSC22D 2. And C6orf136 is negatively correlated with risk score.
The invention constructs a new model based on three copper death-related genes, and the model has good performance in predicting prognosis and identifying pancreatic adenocarcinoma patients suitable for immunotherapy, targeted therapy and chemotherapy. The TSC22D2 gene related to copper death has good prediction efficiency on the overall survival time of pancreatic adenocarcinoma patients. The invention lays a good foundation for further discussing the action of copper death in pancreatic adenocarcinoma.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and any simple modification, equivalent change and modification made to the above embodiment according to the technical spirit of the present invention are still within the scope of the technical solution of the present invention.

Claims (10)

1. A method for constructing a pancreatic cancer prognosis prediction model based on a copper death-related gene is characterized by comprising the following steps of:
step 1, collecting clinical follow-up data and gene expression data of a pancreatic cancer sample to form a data set, and removing the sample without survival state, survival time and clinical follow-up information from the data set;
step 2, obtaining a copper death related gene;
step 3, screening three genes which obviously influence the pancreatic cancer prognosis from the copper death-related genes according to a LASSO-Cox regression analysis method;
step 4, constructing a prediction prognosis model consisting of three genes according to the expression quantity of the three significant difference genes and pancreatic cancer prognosis information on the ICGC database;
and 5, carrying out prognostic predictive analysis on the pancreatic cancer case data with the enlarged sample number based on the prognostic predictive model.
2. The method for constructing a pancreatic cancer prognosis prediction model based on a copper death-related gene according to claim 1,
the step of obtaining the copper death related gene in the step 2 comprises the following steps: genes associated with 35 copper deaths were obtained from the literature and genes that did not match the data set were then knocked out, the remaining genes being copper death-related genes for analytical screening.
3. The method for constructing a pancreatic cancer prognosis prediction model based on copper death-related gene according to claim 1,
and converting the expression quantity of the copper death related gene into a log2 (TPM + 1) format.
4. The method for constructing a pancreatic cancer prognosis prediction model based on copper death-related gene according to claim 3,
in step 3, single genes with independent prediction and prognosis values for the overall survival rate of the patients in the training set are screened as significant genes.
5. The method for constructing a pancreatic cancer prognosis prediction model based on copper death-related gene according to claim 1,
the LASSO-Cox regression analysis method is a combination of LASSO regression analysis, single-factor Cox regression analysis and multi-factor Cox regression analysis;
primarily screening through single-factor Cox regression analysis, wherein the single-factor Cox regression analysis method screens genes which are remarkably related to survival prognosis in copper death related genes;
the LASSO regression analysis further narrowed down reliable copper death-related genes that were significantly associated with prognosis.
6. The method for constructing a pancreatic cancer prognosis prediction model based on copper death-related gene according to claim 1,
the three significant genes are PRKDC, C6orf136 and TSC22D2, which are all key genes participating in the copper death process;
and 4, constructing the prediction and prognosis model, adopting multi-factor Cox regression analysis, and evaluating the correlation between the prediction and prognosis model and other clinical characteristics, wherein the multi-factor Cox regression analysis is used for simultaneously detecting whether a plurality of genes are obviously correlated with survival and prognosis.
7. The method for constructing a pancreatic cancer prognosis prediction model based on copper death-related gene according to claim 6,
the other clinical features are tumor grade, T stage, N stage, sex, age and AJCC stage.
8. The method for constructing a pancreatic cancer prognosis prediction model based on copper death-related gene according to claim 6,
creating a nomogram based on independent prognostic factors, wherein the predictive prognosis model and other clinical features are variables in the nomogram, giving a score to each variable in a nomogram scoring system, and adding all the scores to calculate a total score for predicting the prognostic survival probability of the pancreatic cancer patient 1 year, 2 years and 3 years later;
a calibration curve was plotted to assess the consistency of nomogram predicted pancreatic cancer survival and clinical observations at 1, 2 and 3 year prognosis.
9. The method for constructing a pancreatic cancer prognosis prediction model based on copper death-related gene according to claim 6,
obtaining the coefficient of each gene through the multi-factor Cox regression analysis, establishing a risk scoring formula according to the expression quantity of each gene,
risk score = 0.035E ^ (0.292 XPRKDC-0.347 XC 6orf136+0.613 XTSC 22D 2)
The risk scoring formula is a prediction prognosis model, and the median of the risk scoring is used as a critical value, wherein: those greater than or equal to the threshold are high risk groups, and those less than the threshold are low risk groups.
10. The method for constructing a pancreatic cancer prognosis prediction model based on copper death-related gene according to claim 6,
analyzing the classification variables by Fisher accurate test or chi-square test;
the analysis of continuous variables was performed using Wilcoxon rank sum test or Kruskal-WallisH test;
the correlation between two continuous variables is checked by adopting a spearman correlation analysis method;
all statistical analyses were performed using R software.
CN202211330510.6A 2022-10-27 2022-10-27 Pancreatic cancer prognosis prediction model construction method based on copper death related gene Active CN115497562B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211330510.6A CN115497562B (en) 2022-10-27 2022-10-27 Pancreatic cancer prognosis prediction model construction method based on copper death related gene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211330510.6A CN115497562B (en) 2022-10-27 2022-10-27 Pancreatic cancer prognosis prediction model construction method based on copper death related gene

Publications (2)

Publication Number Publication Date
CN115497562A true CN115497562A (en) 2022-12-20
CN115497562B CN115497562B (en) 2023-04-14

Family

ID=85115023

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211330510.6A Active CN115497562B (en) 2022-10-27 2022-10-27 Pancreatic cancer prognosis prediction model construction method based on copper death related gene

Country Status (1)

Country Link
CN (1) CN115497562B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116564421A (en) * 2023-06-08 2023-08-08 苏州卫生职业技术学院 Method for constructing prognosis model related to copper death of acute myelogenous leukemia patient
CN116844685A (en) * 2023-07-03 2023-10-03 广州默锐医药科技有限公司 Immunotherapeutic effect evaluation method, device, electronic equipment and storage medium
CN117038092A (en) * 2023-08-21 2023-11-10 中山大学孙逸仙纪念医院 Pancreatic cancer prognosis model construction method based on Cox regression analysis
CN117524486A (en) * 2024-01-04 2024-02-06 北京市肿瘤防治研究所 TTE model establishment method for predicting non-progressive survival probability of postoperative patient

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160090638A1 (en) * 2013-05-17 2016-03-31 National Health Research Institutes Methods of prognostically classifying and treating glandular cancers
CN113035358A (en) * 2021-04-08 2021-06-25 南京市第一医院 Model construction method for predicting prognosis risk of early colon cancer patient
CN113948154A (en) * 2021-11-19 2022-01-18 湘南学院附属医院 Pancreatic cancer prognosis analysis method, system and device based on iron death related gene
CN114875149A (en) * 2022-06-02 2022-08-09 中国人民解放军空军军医大学 Application of reagent for detecting biomarkers in preparation of product for predicting gastric cancer prognosis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160090638A1 (en) * 2013-05-17 2016-03-31 National Health Research Institutes Methods of prognostically classifying and treating glandular cancers
CN113035358A (en) * 2021-04-08 2021-06-25 南京市第一医院 Model construction method for predicting prognosis risk of early colon cancer patient
CN113948154A (en) * 2021-11-19 2022-01-18 湘南学院附属医院 Pancreatic cancer prognosis analysis method, system and device based on iron death related gene
CN114875149A (en) * 2022-06-02 2022-08-09 中国人民解放军空军军医大学 Application of reagent for detecting biomarkers in preparation of product for predicting gastric cancer prognosis

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116564421A (en) * 2023-06-08 2023-08-08 苏州卫生职业技术学院 Method for constructing prognosis model related to copper death of acute myelogenous leukemia patient
CN116564421B (en) * 2023-06-08 2024-01-30 苏州卫生职业技术学院 Method for constructing prognosis model related to copper death of acute myelogenous leukemia patient
CN116844685A (en) * 2023-07-03 2023-10-03 广州默锐医药科技有限公司 Immunotherapeutic effect evaluation method, device, electronic equipment and storage medium
CN117038092A (en) * 2023-08-21 2023-11-10 中山大学孙逸仙纪念医院 Pancreatic cancer prognosis model construction method based on Cox regression analysis
CN117524486A (en) * 2024-01-04 2024-02-06 北京市肿瘤防治研究所 TTE model establishment method for predicting non-progressive survival probability of postoperative patient
CN117524486B (en) * 2024-01-04 2024-04-05 北京市肿瘤防治研究所 TTE model establishment method for predicting non-progressive survival probability of postoperative patient

Also Published As

Publication number Publication date
CN115497562B (en) 2023-04-14

Similar Documents

Publication Publication Date Title
Long et al. Development and validation of a TP53-associated immune prognostic model for hepatocellular carcinoma
CN115497562B (en) Pancreatic cancer prognosis prediction model construction method based on copper death related gene
Chen et al. Prognostic value of SLC4A4 and its correlation with immune infiltration in colon adenocarcinoma
Zhang et al. Identification of an IRGP signature to predict prognosis and immunotherapeutic efficiency in bladder cancer
Zhong et al. Characterization of hypoxia-related molecular subtypes in clear cell renal cell carcinoma to aid immunotherapy and targeted therapy via multi-omics analysis
Peng et al. Identification of a novel prognostic signature of genome instability-related LncRNAs in early stage lung adenocarcinoma
Chang et al. Distinct immune and inflammatory response patterns contribute to the identification of poor prognosis and advanced clinical characters in bladder cancer patients
Peng et al. Identification of disulfidptosis-related subtypes and development of a prognosis model based on stacking framework in renal clear cell carcinoma
Zhang et al. Inflammation‐Related Gene Signature: An Individualized Risk Prediction Model for Kidney Renal Clear Cell Carcinoma
Liu et al. A novel cuproptosis-related gene model predicts outcomes and treatment responses in pancreatic adenocarcinoma
Pineda et al. DUX4 is a common driver of immune evasion and immunotherapy failure in metastatic cancers
Wu et al. Identification of a novel signature and construction of a nomogram predicting overall survival in clear cell renal cell carcinoma
CN109735619B (en) Molecular marker related to non-small cell lung cancer prognosis and application thereof
Sun et al. Comprehensive analysis and reinforcement learning of hypoxic genes based on four machine learning algorithms for estimating the immune landscape, clinical outcomes, and therapeutic implications in patients with lung adenocarcinoma
Ma et al. Predicting the survival and immune landscape of colorectal cancer patients using an immune-related lncRNA pair model
Li et al. Identification and validation of anoikis-associated gene SNCG as a prognostic biomarker in gastric cancer
Wang et al. A bioinformatics-based immune-related prognostic index for lung adenocarcinoma that predicts patient response to immunotherapy and common treatments
Zhao et al. Combination of immune-related genomic alterations reveals immune characterization and prediction of different prognostic risks in ovarian cancer
Liu et al. A novel fatty acid metabolism‐related gene signature predicts the prognosis, tumor immune properties, and immunotherapy response of colon adenocarcinoma patients
Liang et al. Immune Signature-Based Risk Stratification and Prediction of Immunotherapy Efficacy for Bladder Urothelial Carcinoma
Liu et al. Development and validation of an immune-related gene prognostic index for lung adenocarcinoma
Shi et al. Cuproptosis-related lncRNAs predict prognosis and immune response of thyroid carcinoma
Li et al. [Retracted] Bioinformatic Analysis of PTTG Family and Prognosis and Immune Cell Infiltration in Gastric Cancer
Tu et al. Identification and Experimental Verification of a Cuproptosis-Associated Gene Signature for Overall Survival Prediction in Patients with Non-Small Cell Lung Cancer
Jiang et al. Identification and validation an anoikis-related gene signature for clinical diagnosis, prognosis and treatment of patients with hepatocellular carcinoma

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant