CN115762792A - Method for predicting survival prognosis of bladder cancer patient based on lncRNA optimization model - Google Patents
Method for predicting survival prognosis of bladder cancer patient based on lncRNA optimization model Download PDFInfo
- Publication number
- CN115762792A CN115762792A CN202211565423.9A CN202211565423A CN115762792A CN 115762792 A CN115762792 A CN 115762792A CN 202211565423 A CN202211565423 A CN 202211565423A CN 115762792 A CN115762792 A CN 115762792A
- Authority
- CN
- China
- Prior art keywords
- model
- lncrna
- data
- bladder cancer
- prognosis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Investigating Or Analysing Biological Materials (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention relates to the technical field of bladder cancer prediction, and discloses a method for predicting the survival prognosis of a bladder cancer patient based on an lncRNA optimization model, which comprises the following steps: s1: data collection and pre-processing, analysis of bladder cancer lncRNA data from TCGA using FPKM data, analysis of mRNA data from tcgalvel 3 using RSEM normalized count class data and further log2 transformed expression matrix, TCGA clinical data with corrected phenotype data, pre-processing of the data, quality control, normalization and transformation to obtain a uniform expression matrix. In research, incomplete data or missing data are common problems limiting the application of the model, and the model is constructed on the basis of relatively complete data, so that the generation can be predicted in multiple dimensions, and the model performance is more stable.
Description
Technical Field
The invention relates to the technical field of prognosis prediction of bladder cancer, in particular to a method for predicting survival prognosis of a bladder cancer patient based on an lncRNA optimization model.
Background
Bladder Cancer (BC) is one of the most common malignancies worldwide, with significant tumor heterogeneity. Muscle-invasive Bladder Cancer (MIBC) generally has a poor prognosis, while Non-Muscle-invasive Bladder Cancer (NMIBC) has a relatively better prognosis. Prognosis prediction of bladder cancer is of great significance to the selection of clinical treatment regimens. However, accurately assessing the risk of a poor prognosis for a patient remains a challenge.
A number of predictive models for bladder cancer have been established. For non-muscle invasive bladder cancer, these models focus primarily on predicting disease recurrence and progression, patient responsiveness to neoadjuvant chemotherapy, lymph node metastasis, and prognosis of survival. However, for muscle-invasive bladder cancer, the prediction effect of these models on the survival prognosis of patients is not satisfactory, and this may be related to tumor heterogeneity, patient treatment response, and other unexplained mechanisms of action that affect risk factors related to bladder cancer development.
As for the lymph node metastasis model, several models such as KNN51, RF15, and LN20 are reported. It is reported that the AUC of KNN51 predicting lymph node positive cases is 0.82 (range 0.71-0.93), and in addition, a model for predicting lymph node metastasis before surgery is proposed, which uses genome and clinical pathological features to identify patients at risk of lymph node metastasis of bladder cancer, showing good discriminative power. Our studies indicate that this composite model is effective in predicting lymph node metastasis. However, the prediction model for the urothelial cancer lymph node metastasis in the past research still cannot be clinically applied, and accurate prediction of bladder cancer prognosis still remains to be a difficult challenge.
Previous studies show that Long noncoding RNA (Long-noncoding-RNA) has obvious tissue specificity and is widely involved in epigenetic regulation in cells. A plurality of researches show that lncRNA plays an important role in regulating and controlling bladder cancer and influences the treatment response, tumor metastasis and progression of tumors. A plurality of long non-coding RNAs are related to the metastasis of bladder cancer, such AS H19, DLX6-AS1, BLACAT2 and the like. Most of these studies have focused on analyzing the biological function of single long non-coding RNAs in tumors and their correlation with prognosis. Models for predicting bladder cancer prognosis based on multiple lncrnas are still few, lacking validation and systematic studies. A lncRNA model is constructed based on a statistical and unbiased method, and the prognosis prediction performance of the model is compared with a model established based on lncRNA discovered on the basis of functional research.
Due to significant tumor heterogeneity in bladder cancer, patients with similar urothelial cancer may have different outcomes after treatment. Recent studies have revealed that different molecular subtypes of bladder cancer have different clinical outcomes, and that bladder cancer of different molecular classifications exhibits specific tumor microenvironment characteristics and is significantly correlated with patient prognosis. Among them, cancer-associated fibroblast (CAF) cells in the tumor microenvironment are closely related to specific tumor cell differentiation subtypes, and the subpopulations rich in stromal cells are usually associated with poor prognosis of bladder Cancer, and are also accompanied by abundant lymphocyte infiltration, indicating that immunosuppression exists in bladder Cancer tissues. In addition, several studies have evaluated the role of immune molecules as biomarkers for cancer prognosis, suggesting that immune status may affect the prognosis of bladder cancer. The immune characteristic and interstitial characteristic information of a tumor microenvironment are integrated to construct a bladder cancer prognosis prediction model, so that the accuracy of prediction can be improved, and the method is helpful for guiding clinical treatment scheme formulation.
Based on this, we constructed and optimized a long noncoding RNA fusion model in this study. The model integrates gene expression information of clinical risk factors, tumor microenvironment interstitial cells and immune cell subtypes and is used for predicting the prognosis of bladder cancer patients. The model has excellent performance in accurately predicting bladder cancer prognosis and has expandability.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides a method for predicting the survival prognosis of a bladder cancer patient based on an lncRNA optimization model.
(II) technical scheme
In order to achieve the purpose, the invention provides the following technical scheme: a method for predicting survival prognosis of a patient with bladder cancer based on an lncRNA optimization model, comprising the following steps:
s1: data collection and pre-processing
Analyzing lncRNA data from TCGA using FPKM data, mRNA data from TCGA Level3 using RSEM normalized count class data and further log2 transformed expression matrices, TCGA clinical data using corrected phenotype data, pre-processing the data, quality control, normalization and transformation to obtain a unified expression matrix;
s2: statistical analysis
394 patients who are included in the analysis are randomly divided into a training set and a verification set according to the proportion of 7:3, firstly, data in the training set are used for searching for independent prognostic factors, a multivariate Cox risk model is constructed by further reducing dimensions of variables by adopting lasso regression and a step-by-step method, and then the model is applied to a verification queue to evaluate the specificity, sensitivity and clinical effectiveness of the prediction model. For the optimization of the model, a given gene expression label is constructed into a model in an mRNA data set, a risk score is calculated and used for optimizing a fusion model, the fused and optimized model is displayed by a nomogram, and the evaluation of the prediction value and the clinical effectiveness of the model is respectively analyzed by adopting a test subject working characteristic curve and a decision curve;
s3: framework design and data pre-processing
After data preprocessing and lasso dimension reduction screening, constructing an lncRNA prognosis prediction model, then bringing clinical risk factors influencing bladder cancer prognosis into the model, including T stage, N stage and tumor grading which have clinical significance indexes, so as to construct a clinical factor-lncRNA composite model, then respectively calculating risk scores based on tumor-related fibroblast interstitial Cell (CAF) specific expression labels in a microenvironment and immune cell subgroup cell information, optimizing the clinical factor-lncRNA composite model as an optimization variable, and then comparing the optimized model with a published tumor-related lncRNA model;
s4: construction of prognosis prediction model based on lncRNA
The method combining the lasso algorithm and the multivariate Cox regression analysis is adopted to obtain an lncRNA model containing 12 molecules, an ROC curve shows that the lncRNA model is good in predicting bladder cancer prognosis, the survival prediction AUC of a training data set for 5 years is 0.894, the risk score calculated by the model can distinguish patients into two types with significant difference, the death risk of the patients with high risk score is 7.5 times higher than that of the patients with low risk score, the survival prediction AUC of the training data set for 5 years is 0.755, and the death risk of the patients with high risk score is 2.7 times higher than that of the patients with low risk score;
s5: integration of lncRNA-based model with clinical risk factors
The clinical risk factors are integrated, the clinical risk factors comprise a bladder cancer T stage, a bladder cancer N stage and a tumor stage, a clinical risk factor-lncRNA composite model is constructed, the single clinical risk factor model and the single lncRNA model have good performance on prognosis prediction of the bladder cancer, but the performance does not reach an excellent level, the AUC of the 5-year survival prediction of the centralized clinical risk factor model is 0.774, the AUC of the lncRNA model is 0.764, in comparison, after the lncRNA model is fused into the clinical risk factors (the clinical risk factor-lncRNA composite model), the AUC of the 5-year survival prediction of the centralized clinical risk model is 0.882, the performance of the model reaches an excellent level, and the fused model constructed by combining the lncRNA and the clinical risk factors can greatly improve the performance of the prediction model;
s6: prognosis prediction effect of tumor microenvironment interstitial cell characteristic genes and immune cell subsets on bladder cancer
We modeled the characteristic gene expression signature of mesenchymal cells, calculated the risk score and integrated into the lncRNA model. The results indicate that the characteristic gene expression risk score of mesenchymal cells can improve the performance of the model. In the validation dataset, the prognostic prediction AUC for 5-year survival was 0.789. The research of Immune cell components obtained by deconvolution calculation from mRNA data by using CYBERSORT shows that the single Immune cell components can predict the prognosis of bladder cancer, then the risk scores of the Immune cell components are calculated and integrated into an lncRNA-CAF composite model, and the result shows that the lncRNA-CAF-Immune composite model is excellent in performance in a training set (AUC =0.924 of 5-year survival prediction), and the prediction value of the composite model in a verification set for 5 years is also superior to that of a pure lncRNA model (AUC = 0.787);
s7: performance of optimized lncRNA fusion model for predicting survival prognosis of bladder cancer patient
The prediction performance can be improved by combining a prediction model of multidimensional biological information, so that a fusion model which takes an lncRNA model as a framework and is fused with clinical risk factors and interstitial cell/immune cell subtype gene expression information of a tumor microenvironment is established, the result shows that an ROC curve of the fusion model is excellent in a training set and a verification data set, and the prognosis prediction AUC for 5-year survival is 0.913 in the verification data set;
s8: clinical application exploration of optimized lncRNA fusion model
And drawing a nomogram based on the constructed fusion model. The nomogram visually demonstrates the impact of the most feasible lncRNA markers, CAF risk scores, immune risk scores and clinical risk factors on the outcome of the generated fusion model. In addition, the DCA curve drawn by the user shows that the fusion model constructed by the user has clinical application value.
Preferably, in S3, the model for data processing and model construction obtains a unified data matrix from the obtained public data TCGA-bladder cancer level3 data through quality control, normalization and conversion operations, the data matrix is randomly divided into a training data set and a verification data set according to the proportion of 7:3, and the data is subjected to dimensionality reduction and screening by using a lasso regression method to construct an lncRNA prognosis prediction model; after the model is constructed, clinical risk factors are added firstly, and then the influence of the tumor microenvironment interstitial cell/immune cell characteristic gene expression labels on the model expression is explored.
Preferably, in S6, a model is constructed for the characteristic gene expression signature of the mesenchymal cells, the risk score is calculated and integrated into the lncRNA model, and the result indicates that the characteristic gene expression risk score of the mesenchymal cells can improve the prognostic prediction performance of the model, and in the validation data set, the prognostic prediction AUC for 5-year survival is 0.789.
Preferably, in S6, the risk score of characteristic genes of mesenchymal cells and the risk score of immune cell components are further fused into a model, and the performance of a composite model of lncRNA and tumor microenvironment stroma/immunity is close to excellent.
Preferably, in S8, the scoring method includes the lncRNA marker with the highest feasibility and the variables of clinical risk factors, and also provides a CAF risk score for further optimization and a risk score calculated by immune cell subpopulation, and the nomogram can be used for potential future verification and diagnosis, and in addition, the constructed fusion model is indicated to have clinical application value by a drawn DCA curve.
Preferably, in S3, the lncRNA model comprises 12 lncRNA molecules.
(III) advantageous effects
Compared with the prior art, the invention provides a method for predicting the survival prognosis of the bladder cancer patient based on an lncRNA optimization model, which has the following beneficial effects:
1. according to the method for predicting the survival prognosis of the bladder cancer patient based on the lncRNA optimization model, the lncRNA model which can be used for predicting the bladder cancer prognosis is constructed by the method. By integrating gene expression information of clinical risk factors, tumor microenvironment interstitial cells and immune cell subtypes, the long-term survival performance of bladder cancer patients is optimized by the lncRNA model. The optimized fusion model has excellent performance, expandability and certain clinical application value.
2. The method for predicting the survival prognosis of a patient with bladder cancer based on an lncRNA optimization model can comprise molecular characteristics from multiple groups of mathematical data. In research, incomplete data or missing data are common problems limiting the application of the model, and the model is constructed on the basis of relatively complete data, so that the generation can be predicted in multiple dimensions, and the model performance is more stable. Secondly, the framework of the invention is scalable and can be adjusted to the clinical and genetic data availability of different centers. Thirdly, with the framework in the study of the present invention also applicable to various cancer types, the availability of multigroup data makes the framework very useful for model building of other cancers as well. The same framework development can be applied to prognosis prediction of other cancer types, ultimately making models constructed by the concepts of the present invention suitable for clinical applications.
3. The method for predicting the survival prognosis of the bladder cancer patient based on the lncRNA optimization model has the advantages that the model in the method is combined with molecular characteristics, and the model is more biologically interpretable. Research shows that the molecular characteristics of the tumor microenvironment can also influence the curative effect, and perhaps can also be used for constructing a model to predict the treatment response, thereby being worthy of further research in the future. The advantages of integrating a functional signal network from a omics data system in a computational model for predicting cancer prognosis are highlighted. Previous studies have shown that a variety of signaling pathways are involved in the mechanism of bladder cancer development, including MAPK signaling and ERBB signaling. In this study, only the role of immune/stromal cells in the bladder cancer microenvironment was studied, and there are other genetic factors that might explain bladder cancer prognosis. Such a multi-dimensional integrated model performs better in prognostic prediction and is easy to interpret from a biological perspective, making the model more interpretable.
4. The method for predicting the survival prognosis of the patient with bladder cancer based on the lncRNA optimization model is helpful for predicting the survival rate of the patient with MIBC and predicting the prognosis of early bladder cancer.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of data processing and lncRNA model construction according to the present invention;
FIG. 2 shows the ROC curve and survival curve of the lncRNA model in the training data set and the verification data set (A: the predicted AUC curve of the lncRNA model in the training set; B: the predicted survival curve of the lncRNA model in the training set; C: the predicted AUC curve of the lncRNA model in the verification data set; D: the predicted survival curve of the lncRNA model in the verification data set)
FIG. 3 is a graph of the effect of the IncRNA model of the invention incorporating clinical risk factors on the prognosis of bladder cancer; ( A: an AUC curve of prognosis prediction (36 months) after the incRNA model in the training set integrates clinical risk factors; b: incorporation of clinical risk factors into lncRNA model predicted AUC curve for prognosis (60 months) in training set; c: validating the AUC curve of predicted prognosis (36 months) after the lncRNA model in the data set integrates clinical risk factors; d: validation of AUC curves for integration of clinical risk factors into lncRNA model prediction (60 month) prognosis in data sets )
FIG. 4 is a diagram of an lncRNA prediction model optimized based on the tumor microenvironment interstitial/immune cell characteristic gene expression signature; ( A: respectively integrating the risk score calculated by the expression label of the immune cell characteristic gene of the training set and the risk score calculated by the expression label of the mesenchymal cell characteristic gene into an lncRNA model, and predicting an AUC curve for 3-year survival; b: respectively integrating the risk score calculated by the expression label of the immune cell characteristic gene of the training set and the risk score calculated by the expression label of the mesenchymal cell characteristic gene into an lncRNA model, and predicting an AUC curve for 5-year survival; c: the risk score calculated by the expression label of the verification set immune cell characteristic gene and the risk score calculated by the expression label of the mesenchymal cell characteristic gene are respectively integrated into an lncRNA model, and an AUC curve for 3-year survival is predicted; d: the risk scores calculated by the expression labels of the verification set immune cell characteristic genes and the risk scores calculated by the expression labels of the mesenchymal cell characteristic genes are respectively integrated into an lncRNA model, and an AUC curve for predicting 5-year survival )
FIG. 5 is a representation of the prognosis predicted by lncRNA fusion model integrating clinical risk factors and tumor microenvironment gene expression signatures according to the present invention; ( A: predicting the predicted (36-month) AUC curve of the lncRNA fusion model in the training set; b: predicting the AUC curve of the prognosis (60 months) by the lncRNA fusion model in the training set; c: verifying the AUC curve of the concentrated lncRNA fusion model for predicting prognosis (36 months); d: verification of AUC curve of prediction prognosis (60 months) of centralized lncRNA fusion model )
FIG. 6 is a nomogram for predicting the prognosis generated for a patient with bladder cancer based on an lncRNA-optimized model of the invention
FIG. 7 is a DCA graph of the lncRNA optimized fusion model of the present invention (A: plotted in the training dataset; B: plotted in the validation dataset for 12, 24, 36, 48 and 60 months)
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
Examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
In the description of the present invention, it is to be understood that the terms "central," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," "counterclockwise," "axial," "radial," "circumferential," and the like are used in the orientations and positional relationships indicated in the drawings for convenience in describing the invention and to simplify the description, and are not intended to indicate or imply that the referenced device or element must have a particular orientation, be constructed and operated in a particular orientation, and are not to be considered limiting of the invention.
In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
The invention provides a method for predicting survival prognosis of a bladder cancer patient based on an lncRNA optimization model, which comprises the following steps:
s1: data collection and pre-processing
Analyzing lncRNA data from TCGA using FPKM data, analyzing mRNA data from TCGA Level3 using RSEM normalized count class data and further log2 transformed expression matrices, TCGA clinical data using corrected phenotype data, pre-processing the data, quality control, normalization and transformation to obtain a unified expression matrix;
s2: statistical analysis
394 patients with bladder cancer who are included in the analysis are randomly divided into a training set and a verification set according to the proportion of 7:3, independent prognosis factors are firstly searched by using data in the training set, a multivariate Cox risk model is constructed by further reducing dimensions of variables by adopting lasso regression and a step-by-step method, and then the model is applied to a verification queue to evaluate the specificity, the sensitivity and the clinical effectiveness of a prediction model. For model optimization, a given gene expression label is modeled in an mRNA data set, a risk score is calculated and used for optimizing a fusion model, the fused and optimized model is displayed by a nomogram, and evaluation of model prediction value and clinical effectiveness is respectively analyzed by a subject working characteristic curve and a decision curve;
s3: framework design and data pre-processing
After data preprocessing and lasso dimension reduction screening, constructing an lncRNA prognosis prediction model, wherein the lncRNA model comprises 12 lncRNA molecules, then bringing clinical risk factors influencing bladder cancer prognosis into the model, including T stages, N stages and tumor grading, which have clinical significance, so as to construct a clinical factor-lncRNA composite model, then respectively calculating risk scores based on specific expression labels of tumor-related fibroblast stromal Cells (CAF) in a microenvironment and immune cell subgroup cell information, optimizing the clinical factor-gene composite model as an optimization variable, and then comparing the optimized model with the published tumor-related lncRNA model;
in a model for data processing and model construction, a unified data matrix is obtained by starting from the obtained public data TCGA-bladder cancer level3 data and performing quality control, standardization and conversion operation, the data matrix is randomly divided into a training data set and a verification data set according to the proportion of 7:3, and a lasso regression method is adopted to perform dimensionality reduction and screening on the data to construct an lncRNA prognosis prediction model; after the model is constructed, firstly adding clinical risk factors, and then exploring the influence of risk scores calculated by the tumor microenvironment interstitial cell/immune cell characteristic gene expression labels on the model expression;
s4: construction of prognosis prediction model based on lncRNA
The method combining the lasso algorithm and the multivariate Cox regression analysis is adopted to obtain an lncRNA model containing 12 molecules, an ROC curve shows that the lncRNA model is good in predicting bladder cancer prognosis, the survival prediction AUC of a training data set for 5 years is 0.894, the risk score calculated by the model can distinguish patients into two types with significant difference, the death risk of the patients with high risk score is 7.5 times higher than that of the patients with low risk score, the survival prediction AUC of the training data set for 5 years is 0.755, and the death risk of the patients with high risk score is 2.7 times higher than that of the patients with low risk score;
s5: integration of lncRNA-based model with clinical risk factors
Integrating clinical risk factors including T stage, N stage and tumor grading of bladder cancer, constructing a clinical risk factor-lncRNA composite model, wherein the single clinical risk factor model and the single lncRNA model have good performance on prognosis prediction of the bladder cancer, but the performance does not reach an excellent level, the AUC of the 5-year survival prediction of the centralized clinical risk factor model is 0.774, and the AUC of the lncRNA model is 0.764;
s6: prognosis prediction effect of tumor microenvironment interstitial cell characteristic genes and immune cell subsets on bladder cancer
We modeled the characteristic gene expression signature of mesenchymal cells, calculated the risk score and integrated into the lncRNA model. The results indicate that the characteristic gene expression risk score of mesenchymal cells can improve the performance of the model. In the validation dataset, the prognostic prediction AUC for 5-year survival was 0.789. The research of Immune cell components obtained by deconvolution calculation from mRNA data by using CYBERSORT shows that the single Immune cell components can predict the prognosis of bladder cancer, then the risk scores of the Immune cell components are calculated and integrated into an lncRNA-CAF composite model, and the result shows that the lncRNA-CAF-Immune composite model is excellent in performance in a training set (AUC =0.924 of 5-year survival prediction), and the prediction value of the composite model in a verification set for 5 years is also superior to that of a pure lncRNA model (AUC = 0.787);
s7: performance of optimized lncRNA fusion model for predicting survival prognosis of bladder cancer patient
The prediction performance can be improved by combining a prediction model of multidimensional biological information, so that a fusion model which takes an lncRNA model as a framework and is fused with clinical risk factors and interstitial cells/immune cell subtype gene expression information of a tumor microenvironment is established, the result shows that ROC curves of the fusion model are excellent in verification data set, and the prognosis prediction AUC of 5-year survival is 0.913 in the verification data set;
s8: clinical application exploration of optimized lncRNA fusion model
Drawing a nomogram based on the constructed fusion model;
the scoring method comprises the lncRNA marker with the highest feasibility and the variables of clinical risk factors, and also provides a CAF risk score for further optimization and a risk score calculated by immune cell subsets, the nomogram can be used for predicting the survival and prognosis of patients with bladder cancer after being verified by a strict random control test, and in addition, the constructed fusion model has better clinical application value as shown by a drawn DCA curve.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a reference structure" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Claims (6)
1. A method for predicting the survival prognosis of a patient with bladder cancer based on an lncRNA optimization model is characterized by comprising the following steps:
s1: data collection and pre-processing
Analyzing bladder cancer lncRNA data from TCGA using FPKM data, analyzing mRNA data from TCGA Level3 using RSEM normalized count-class data and further log2 transformed expression matrices, TCGA clinical data using corrected phenotype data, pre-processing the data, quality control, normalization and transformation to obtain a unified expression matrix;
s2: statistical analysis
394 patients who are included in the analysis are randomly divided into a training set and a verification set according to the proportion of 7:3, firstly, data in the training set are used for searching for independent prognostic factors, a multivariate Cox risk model is constructed by further reducing dimensions of variables by adopting lasso regression and a step-by-step method, and then the model is applied to a verification queue to evaluate the specificity, sensitivity and clinical effectiveness of the prediction model. For the optimization of the model, a model is constructed in an mRNA data set for a given gene expression label related to a tumor microenvironment, a risk score is calculated and used for optimizing a fusion model, the fused and optimized model is displayed by a nomogram, and the evaluation of the prediction value and the clinical effectiveness of the model are respectively analyzed by adopting a subject working characteristic curve and a decision curve;
s3: framework design and data pre-processing
After data preprocessing and lasso dimension reduction screening, constructing an lncRNA prognosis prediction model, then bringing clinical risk factors influencing bladder cancer prognosis into the model, including T stage, N stage and tumor grading which have clinical significance indexes, so as to construct a clinical factor-lncRNA composite model, then respectively calculating risk scores based on tumor-related fibroblast interstitial Cell (CAF) specific expression labels in a microenvironment and immune cell subgroup cell information, optimizing the clinical factor-lncRNA composite model as an optimization variable, and then comparing the optimized model with a published tumor-related lncRNA model;
s4: construction of prognosis prediction model based on lncRNA
The method combining the lasso algorithm and the multivariate Cox regression analysis is adopted to obtain an lncRNA model containing 12 molecules, an ROC curve indicates that the lncRNA model is good in predicting bladder cancer prognosis, the AUC of survival prediction in 5 years of a training data set is 0.894, the risk score calculated by the model can be used for distinguishing patients into two types with significant difference, the death risk of patients with high risk score is increased by 7.5 times compared with the patients with low risk score, the AUC of survival prediction in 5 years of the training data set is verified to be 0.755, and the death risk of the patients with high risk score is 2.7 times that of the patients with low risk score;
s5: integration of lncRNA-based model with clinical risk factors
The clinical risk factors are integrated, including the T stage, the N stage and the tumor stage of the bladder cancer, a clinical risk factor-lncRNA composite model is constructed, the single clinical risk factor model and the single lncRNA model have good performance on prognosis prediction of the bladder cancer, but the performance does not reach an excellent level, the AUC of 5-year survival prediction in a centralized clinical risk factor model is 0.774, and the AUC of the lncRNA model is 0.764, compared with the AUC of 5-year survival prediction in a centralized clinical risk factor model after the lncRNA model is fused into the clinical risk factors (the clinical risk factor-lncRNA composite model), the AUC of the model performance reaches an excellent level, and the fusion model constructed by combining the lncRNA and the clinical risk factors can greatly improve the performance of the prediction model;
s6: prognosis prediction effect of tumor microenvironment interstitial cell characteristic genes and immune cell subsets on bladder cancer
We constructed models for the characteristic gene expression signatures of mesenchymal cells, calculated the risk score and integrated into the lncRNA model. The results indicate that the characteristic gene expression risk score of mesenchymal cells can improve the performance of the model. In the validation dataset, the prognostic prediction AUC for 5-year survival was 0.789. The research of Immune cell components obtained by deconvolution calculation from mRNA data by using CYBERSORT shows that the single Immune cell components can predict the prognosis of bladder cancer, then the risk scores of the Immune cell components are calculated and integrated into an lncRNA-CAF composite model, and the result shows that the lncRNA-CAF-Immune composite model is excellent in performance in a training set (AUC =0.924 of 5-year survival prediction), and the prediction value of the composite model in a verification set for 5 years is also superior to that of a pure lncRNA model (AUC = 0.787);
s7: optimized lncRNA fusion model for predicting performance of bladder cancer patient prognosis
The prediction performance can be improved by combining a prediction model of multidimensional biological information, so that a fusion model which takes an lncRNA model as a framework and is fused with clinical risk factors and interstitial cells/immune cell subtype gene expression information of a tumor microenvironment is established, the result shows that an ROC curve of the fusion model is excellent in a training set and a verification data set, and the prognosis prediction AUC for 5-year survival is 0.913 in the verification data set;
s8: clinical application exploration of optimized lncRNA fusion model
And drawing a nomogram based on the constructed fusion model. The nomogram visually demonstrates the impact of the most feasible lncRNA markers, CAF risk scores, immune risk scores and clinical risk factors on the outcome of the generated fusion model. In addition, the DCA curve drawn by the user shows that the fusion model constructed by the user has clinical application value.
2. The method for predicting the survival prognosis of the patient with bladder cancer based on the lncRNA optimization model as claimed in claim 1, wherein in S3, the model for data processing and model construction is obtained by performing quality control, standardization and transformation operations on the obtained public data TCGA-bladder cancer level3 to obtain a uniform data matrix, the data matrix is randomly divided into a training data set and a verification data set according to the proportion of 7:3, and dimension reduction and screening are performed on the data by using a lasso regression method to construct the lncRNA prognosis prediction model; after the model is constructed, clinical risk factors are added firstly, and then the influence of the risk score calculated by the tumor microenvironment interstitial cell/immune cell characteristic gene expression label on the model expression is explored.
3. The method for predicting survival prognosis of bladder cancer patient by using the incrna-based optimization model according to claim 1, wherein in S6, the characteristic gene expression label of the mesenchymal cells is modeled, the risk score is calculated and integrated into the incrna model, and the result shows that the characteristic gene expression risk score of the mesenchymal cells can improve the prognosis prediction performance of the model, and the prognosis prediction AUC for 5-year survival in the validation dataset is 0.789.
4. The method for predicting survival prognosis of bladder cancer patient according to the lncRNA-based optimization model of claim 1, wherein in S6, the risk score of characteristic genes of mesenchymal cells and the risk score of immune cell components are further fused into the model, and the performance of the lncRNA and the composite model of tumor microenvironment stroma/immunity is close to optimal.
5. The method for predicting the survival prognosis of the patient with bladder cancer based on the lncRNA optimized model of claim 1, wherein in S8, the scoring method comprises the lncRNA markers with the highest feasibility and the variables of clinical risk factors, and also provides CAF risk scores and risk scores calculated by immune cell subsets, which can be used for further optimization, and the nomogram can be used for potential future verification and diagnosis, and in addition, the constructed fused model is shown to have clinical application value through drawn DCA curves.
6. The method for predicting survival prognosis of a patient with bladder cancer based on an lncRNA optimization model of claim 1, wherein in S3, the lncRNA model comprises 12 lncRNA molecules.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211565423.9A CN115762792A (en) | 2022-12-07 | 2022-12-07 | Method for predicting survival prognosis of bladder cancer patient based on lncRNA optimization model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211565423.9A CN115762792A (en) | 2022-12-07 | 2022-12-07 | Method for predicting survival prognosis of bladder cancer patient based on lncRNA optimization model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115762792A true CN115762792A (en) | 2023-03-07 |
Family
ID=85344162
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211565423.9A Pending CN115762792A (en) | 2022-12-07 | 2022-12-07 | Method for predicting survival prognosis of bladder cancer patient based on lncRNA optimization model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115762792A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117038092A (en) * | 2023-08-21 | 2023-11-10 | 中山大学孙逸仙纪念医院 | Pancreatic cancer prognosis model construction method based on Cox regression analysis |
CN117637185A (en) * | 2024-01-25 | 2024-03-01 | 首都医科大学宣武医院 | Image-based craniopharyngeal tube tumor treatment auxiliary decision-making method, system and equipment |
-
2022
- 2022-12-07 CN CN202211565423.9A patent/CN115762792A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117038092A (en) * | 2023-08-21 | 2023-11-10 | 中山大学孙逸仙纪念医院 | Pancreatic cancer prognosis model construction method based on Cox regression analysis |
CN117637185A (en) * | 2024-01-25 | 2024-03-01 | 首都医科大学宣武医院 | Image-based craniopharyngeal tube tumor treatment auxiliary decision-making method, system and equipment |
CN117637185B (en) * | 2024-01-25 | 2024-04-23 | 首都医科大学宣武医院 | Image-based craniopharyngeal tube tumor treatment auxiliary decision-making method, system and equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Caudai et al. | AI applications in functional genomics | |
Califano et al. | Analysis of gene expression microarrays for phenotype classification. | |
Patruno et al. | A review of computational strategies for denoising and imputation of single-cell transcriptomic data | |
CN115762792A (en) | Method for predicting survival prognosis of bladder cancer patient based on lncRNA optimization model | |
US20200239965A1 (en) | Source of origin deconvolution based on methylation fragments in cell-free dna samples | |
CN107025384A (en) | A kind of construction method of complex data forecast model | |
Su et al. | Identification of expression signatures for non-small-cell lung carcinoma subtype classification | |
CN111312334A (en) | Method for analyzing receptor-ligand system influencing intercellular communication | |
Dou et al. | Single-nucleotide variant calling in single-cell sequencing data with Monopogen | |
Qu et al. | Quantitative trait associated microarray gene expression data analysis | |
CN108320797B (en) | Nasopharyngeal carcinoma database and comprehensive diagnosis and treatment decision method based on database | |
KR101090892B1 (en) | Method of providing information for predicting enzyme selectivity of metabolism phase ii reactions | |
CN111944902A (en) | Early prediction method of renal papillary cell carcinoma based on lincRNA expression profile combination characteristics | |
Shi et al. | An application based on bioinformatics and machine learning for risk prediction of sepsis at first clinical presentation using transcriptomic data | |
CN110942808A (en) | Prognosis prediction method and prediction system based on gene big data | |
KR102462746B1 (en) | Method And System For Constructing Cancer Patient Specific Gene Networks And Finding Prognostic Gene Pairs | |
Zubi et al. | Sequence mining in DNA chips data for diagnosing cancer patients | |
Li et al. | Using the SVM Method for Lung Adenocarcinoma Prognosis Based on Expression Level | |
LU103183B1 (en) | Method for building prognosis model of lung adenocarcinoma based on cuproptosis-related genes | |
Cozzini et al. | Model-based clustering with gene ranking using penalized mixtures of heavy-tailed distributions | |
Blazadonakis et al. | The linear neuron as marker selector and clinical predictor in cancer gene analysis | |
Zhang et al. | CFC: a Cascade Forest approach to discover Cancer driver genes using multi-omics data | |
Pramana et al. | A comparative assessment on gene expression classification methods of RNA-seq data generated using next-generation sequencing (NGS) | |
Perino | Hybrid gene selection framework for predicting breast cancer relapse | |
Wei | Survival-Related Clustering of Cancer Patients by Integrating Clinical and Biological Datasets |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |