CN115762792A - A lncRNA-based optimization model for predicting survival and prognosis of bladder cancer patients - Google Patents

A lncRNA-based optimization model for predicting survival and prognosis of bladder cancer patients Download PDF

Info

Publication number
CN115762792A
CN115762792A CN202211565423.9A CN202211565423A CN115762792A CN 115762792 A CN115762792 A CN 115762792A CN 202211565423 A CN202211565423 A CN 202211565423A CN 115762792 A CN115762792 A CN 115762792A
Authority
CN
China
Prior art keywords
model
lncrna
data
prognosis
bladder cancer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211565423.9A
Other languages
Chinese (zh)
Inventor
覃海德
阮红莲
裴璐
杨美华
黄龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen Memorial Hospital Sun Yat Sen University
Original Assignee
Sun Yat Sen Memorial Hospital Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen Memorial Hospital Sun Yat Sen University filed Critical Sun Yat Sen Memorial Hospital Sun Yat Sen University
Priority to CN202211565423.9A priority Critical patent/CN115762792A/en
Publication of CN115762792A publication Critical patent/CN115762792A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention relates to the technical field of bladder cancer prediction, and discloses a method for predicting the survival prognosis of a bladder cancer patient based on an lncRNA optimization model, which comprises the following steps: s1: data collection and pre-processing, analysis of bladder cancer lncRNA data from TCGA using FPKM data, analysis of mRNA data from tcgalvel 3 using RSEM normalized count class data and further log2 transformed expression matrix, TCGA clinical data with corrected phenotype data, pre-processing of the data, quality control, normalization and transformation to obtain a uniform expression matrix. In research, incomplete data or missing data are common problems limiting the application of the model, and the model is constructed on the basis of relatively complete data, so that the generation can be predicted in multiple dimensions, and the model performance is more stable.

Description

一种基于lncRNA的优化模型预测膀胱癌患者生存预后的方法A lncRNA-based optimization model for predicting survival and prognosis of bladder cancer patients

技术领域technical field

本发明涉及膀胱癌预后预测技术领域,具体为一种基于lncRNA的优化模型预测膀胱癌患者生存预后的方法。The invention relates to the technical field of prognosis prediction of bladder cancer, in particular to a method for predicting survival and prognosis of bladder cancer patients based on an lncRNA optimization model.

背景技术Background technique

膀胱癌(Bladder Cancer,BC)是世界范围内最常见的恶性肿瘤之一,具有显著的肿瘤异质性。肌层浸润性膀胱癌(Muscle-invasive Bladder Cancer,MIBC)通常预后差,非肌层浸润性膀胱癌(Non-muscle-invasive Bladder Cancer,NMIBC)则相对预后较好。膀胱癌预后预测对临床治疗方案的选择具有重要意义。然而,准确地对患者不良预后的风险进行评估仍然是一个挑战。Bladder cancer (Bladder Cancer, BC) is one of the most common malignant tumors worldwide, with significant tumor heterogeneity. Muscle-invasive bladder cancer (Muscle-invasive Bladder Cancer, MIBC) usually has a poor prognosis, while non-muscle-invasive bladder cancer (Non-muscle-invasive Bladder Cancer, NMIBC) has a relatively better prognosis. Prognosis prediction of bladder cancer is of great significance for the selection of clinical treatment options. However, accurate assessment of a patient's risk for adverse prognosis remains a challenge.

目前已经建立起了多个膀胱癌预测模型。对于非肌层浸润性膀胱癌,这些模型主要聚焦于预测疾病复发和进展、患者对新辅助化疗的反应性、淋巴结转移以及生存预后。然而对于肌层浸润性膀胱癌来说,这些模型对患者生存预后的预测效果并不尽人意,这可能和肿瘤的异质性、患者治疗反应以及其他尚未阐明的影响膀胱癌发展相关风险因素的作用机制等相关。Several bladder cancer prediction models have been established. For non-muscle-invasive bladder cancer, these models mainly focus on predicting disease recurrence and progression, patient response to neoadjuvant chemotherapy, lymph node metastasis, and survival prognosis. However, for muscle-invasive bladder cancer, these models are less effective in predicting patient survival outcomes, which may be related to tumor heterogeneity, patient response to treatment, and other as-yet-unrecognized risk factors related to bladder cancer development. mechanism of action, etc.

对于淋巴结转移的模型,目前报道有KNN51、RF15和LN20等多个模型。据报道,KNN51预测淋巴结阳性病例的AUC为0.82(范围0.71-0.93)此外,还提出了一种术前淋巴结转移预测模型,该模型利用基因组及临床病理特征来鉴别具有膀胱癌淋巴结转移风险的患者,显示出良好的鉴别能力。我们的研究表明该复合模型在预测淋巴结转移中是有效的。然而,既往研究中对于尿路上皮癌淋巴结转移的预测模型仍无法实现临床应用,准确预测膀胱癌预后仍然是一项艰巨的挑战。As for the models of lymph node metastasis, several models such as KNN51, RF15 and LN20 have been reported so far. The AUC of KNN51 for predicting lymph node positive cases was reported to be 0.82 (range 0.71–0.93). In addition, a preoperative lymph node metastasis prediction model was proposed that utilized genomic as well as clinicopathological features to identify patients at risk of lymph node metastasis from bladder cancer , showing good discriminative ability. Our study shows that the composite model is effective in predicting lymph node metastasis. However, the prediction model for lymph node metastasis of urothelial carcinoma in previous studies is still not available for clinical application, and it is still a formidable challenge to accurately predict the prognosis of bladder cancer.

既往研究表明,长非编码RNA(Long-noncoding-RNA,lncRNA)具有显著的组织特异性,广泛参与了细胞中的表观遗传调控。多个研究表明,lncRNA在膀胱癌起重要调控作用,影响了肿瘤的治疗反应、肿瘤转移和进展。多个长非编码RNA与膀胱癌的转移有关,如H19,DLX6-AS1与BLACAT2等。但这些研究大多是聚焦于分析单个长非编码RNA在肿瘤中的生物学功能及和预后的相关性。基于多个lncRNA预测膀胱癌预后的模型研究尚少,缺乏验证和系统的研究。我们拟基于统计的、无偏倚的方法构建lncRNA模型,并和基于功能研究基础上发现的lncRNA所建立的模型比较其预后预测性能。Previous studies have shown that long noncoding RNA (Long-noncoding-RNA, lncRNA) has significant tissue specificity and is widely involved in epigenetic regulation in cells. Multiple studies have shown that lncRNA plays an important regulatory role in bladder cancer, affecting tumor treatment response, tumor metastasis and progression. Multiple long non-coding RNAs are related to the metastasis of bladder cancer, such as H19, DLX6-AS1 and BLACAT2. However, most of these studies focus on analyzing the biological function of a single long non-coding RNA in tumors and its correlation with prognosis. There is still little research on models based on multiple lncRNAs to predict the prognosis of bladder cancer, and there is a lack of validation and systematic research. We intend to build an lncRNA model based on statistical and unbiased methods, and compare its prognostic performance with models established based on lncRNAs discovered on the basis of functional studies.

由于膀胱癌中显著的肿瘤异质性,相似的尿路上皮癌患者治疗后可能有不同的结局。最近的研究揭示了膀胱癌的不同分子亚型具有不同的临床预后,不同分子分型的膀胱癌呈现出特异的肿瘤微环境特征,并且与患者预后显著相关。其中,肿瘤微环境中的肿瘤成纤维细胞(Cancer-associated fibroblast,CAF)与特定的肿瘤细胞分化亚型紧密相关,间质细胞丰富的亚群通常和膀胱癌的预后不良相关,这些亚群也伴随有丰富的淋巴细胞浸润,提示膀胱癌组织中存在免疫抑制。另外,已有一些研究评估免疫分子作为癌症预后生物标志物的作用,提示免疫状态可能影响膀胱癌的预后。我们假设,综合肿瘤微环境的免疫特征和间质特征信息,构建膀胱癌预后预测模型,可能提高预测的准确度,有助于指导临床制定治疗方案。Due to the marked tumor heterogeneity in bladder cancer, similar patients with urothelial carcinoma may have different outcomes after treatment. Recent studies have revealed that different molecular subtypes of bladder cancer have different clinical prognosis, and different molecular subtypes of bladder cancer present specific tumor microenvironmental characteristics, which are significantly correlated with patient prognosis. Among them, tumor fibroblasts (Cancer-associated fibroblast, CAF) in the tumor microenvironment are closely related to specific tumor cell differentiation subtypes, and subgroups rich in mesenchymal cells are usually associated with poor prognosis of bladder cancer, and these subgroups are also Accompanied by abundant lymphocyte infiltration, suggesting the presence of immunosuppression in bladder cancer tissue. In addition, several studies have evaluated the role of immune molecules as cancer prognostic biomarkers, suggesting that immune status may affect bladder cancer prognosis. We hypothesize that constructing a bladder cancer prognosis prediction model by integrating the immune and stromal characteristics of the tumor microenvironment may improve the accuracy of prediction and help guide clinical treatment plans.

基于此,我们在这项研究中,构建并优化了一个长非编码RNA融合模型。该模型融入了临床风险因素、肿瘤微环境间质细胞、免疫细胞亚型的基因表达信息,用于预测膀胱癌患者的预后。该模型在准确预测膀胱癌预后方面性能优异,且具有可扩展性。Based on this, in this study, we constructed and optimized a long non-coding RNA fusion model. The model incorporates clinical risk factors, tumor microenvironmental stromal cells, and gene expression information of immune cell subtypes to predict the prognosis of bladder cancer patients. The model performs well in accurately predicting bladder cancer prognosis and is scalable.

发明内容Contents of the invention

(一)解决的技术问题(1) Solved technical problems

针对现有技术的不足,本发明提供了一种基于lncRNA的优化模型预测膀胱癌患者生存预后的方法。Aiming at the deficiencies of the prior art, the present invention provides a method for predicting the survival and prognosis of bladder cancer patients based on an lncRNA-based optimization model.

(二)技术方案(2) Technical solution

为实现上述目的,本发明提供如下技术方案:一种基于lncRNA的优化模型预测膀胱癌患者生存预后的方法,包括以下步骤:In order to achieve the above object, the present invention provides the following technical scheme: a method for predicting the survival and prognosis of bladder cancer patients based on an lncRNA-based optimization model, comprising the following steps:

S1:数据收集和预处理S1: Data collection and preprocessing

使用FPKM数据分析来自TCGA的lncRNA数据,使用RSEM归一化计数类数据和进一步的log2转换表达矩阵分析来自TCGA Level 3的mRNA数据,TCGA临床数据采用校正的表型数据,对数据进行预处理,通过质量控制、归一化和转换,以获得统一的表达矩阵;Use FPKM data to analyze lncRNA data from TCGA, use RSEM normalized count data and further log2 transformation expression matrix to analyze mRNA data from TCGA Level 3, TCGA clinical data uses corrected phenotype data, preprocesses the data, Through quality control, normalization and transformation to obtain a unified expression matrix;

S2:统计分析S2: Statistical Analysis

将纳入分析的394例患者按7:3的比例随机分为训练集和验证集,首先使用训练集中的数据寻找独立预后因素,采用lasso回归与逐步法对变量进一步降维构建多变量Cox风险模型,然后将模型应用于验证队列以评估预测模型的特异性、敏感度及临床有效性。对于模型的优化,对给定的基因表达标签在mRNA数据集中构建模型,计算风险分值,用来优化融合模型,融合与优化之后的模型用列线图展示,模型预测价值及临床有效性的评估分别采用受试者工作特性曲线和决策曲线分析;The 394 patients included in the analysis were randomly divided into a training set and a validation set at a ratio of 7:3. First, the data in the training set was used to find independent prognostic factors, and lasso regression and stepwise method were used to further reduce the variables to build a multivariate Cox risk model. , and then applied the model to a validation cohort to assess the specificity, sensitivity, and clinical validity of the predictive model. For model optimization, construct a model in the mRNA data set for a given gene expression label, calculate the risk score, and use it to optimize the fusion model. The model after fusion and optimization is displayed in a nomogram, and the predictive value and clinical effectiveness of the model The evaluation adopts receiver operating characteristic curve and decision curve analysis respectively;

S3:框架设计和数据预处理S3: framework design and data preprocessing

经过数据预处理及lasso降维筛选后,构建lncRNA预后预测模型,随后,把影响膀胱癌预后的临床风险因素纳入模型中,包括T分期、N分期及肿瘤分级这些具有临床意义的指标,以构建临床因素-lncRNA复合模型,然后再基于微环境中肿瘤相关成纤维细胞间质细胞(CAF)特异表达标签,连同免疫细胞亚群细胞信息,分别计算风险分值,作为优化变量对临床因素-lncRNA复合模型进行优化,再把这个优化的模型与已经发表的、肿瘤相关的lncRNA模型进行比较;After data preprocessing and lasso dimensionality reduction screening, the lncRNA prognosis prediction model was constructed, and then clinical risk factors affecting the prognosis of bladder cancer were incorporated into the model, including clinically meaningful indicators such as T stage, N stage and tumor grade, to construct The clinical factor-lncRNA composite model, and then based on the specific expression tags of tumor-associated fibroblast stromal cells (CAF) in the microenvironment, together with the cell information of immune cell subsets, the risk scores were calculated respectively, as an optimization variable for the clinical factor-lncRNA Optimize the composite model, and then compare this optimized model with published, tumor-associated lncRNA models;

S4:基于lncRNA的预后预测模型的构建S4: Construction of lncRNA-based prognosis prediction model

采用lasso算法和多元Cox回归分析相结合的方法,获得一个包含12个分子的lncRNA模型,ROC曲线表明,lncRNA模型在预测膀胱癌预后方面表现良好,训练数据集5年的生存预测的AUC为0.894,利用该模型计算的风险分值可把患者区分为显著差异的两类,高风险分值相比于低风险分值的患者,死亡风险增加了7.5倍,验证数据集5年生存预测的AUC为0.755,高风险分值患者死亡风险是低风险分值患者的2.7倍;Using the combination of lasso algorithm and multiple Cox regression analysis, a lncRNA model containing 12 molecules was obtained. The ROC curve showed that the lncRNA model performed well in predicting the prognosis of bladder cancer, and the AUC of the 5-year survival prediction of the training data set was 0.894. , the risk score calculated by the model can be used to distinguish patients into two groups with significant differences. Compared with patients with low risk scores, the risk of death increased by 7.5 times. The AUC of the 5-year survival prediction of the validation data set was 0.755, and the death risk of patients with high-risk scores was 2.7 times that of patients with low-risk scores;

S5:基于lncRNA模型与临床风险因素的整合S5: Integration of lncRNA-based models and clinical risk factors

整合入临床风险因素,包括膀胱癌T分期、N分期、肿瘤分级,构建临床风险因素-lncRNA复合模型,单独的临床风险因素模型和单独lncRNA模型,对膀胱癌的预后预测表现良好,但表现尚未达到优的级别,在验证集中临床风险因素模型5年生存预测的AUC为0.774,lncRNA模型的AUC为0.764,相比之下,lncRNA模型融合入临床风险因素后(临床风险因素-lncRNA复合模型)在验证集中5年生存预测的AUC为0.882,模型表现达到优的级别,lncRNA与临床风险因素的结合构建的融合模型,可大大提高预测模型的性能;Integrating clinical risk factors, including bladder cancer T stage, N stage, tumor grade, constructing a clinical risk factor-lncRNA composite model, a separate clinical risk factor model and a single lncRNA model, the prognosis prediction performance of bladder cancer is good, but the performance is not yet Reaching an excellent level, the AUC of the 5-year survival prediction of the clinical risk factor model in the verification set is 0.774, and the AUC of the lncRNA model is 0.764. In contrast, after the lncRNA model is integrated into the clinical risk factors (clinical risk factor-lncRNA composite model) In the verification set, the AUC of 5-year survival prediction is 0.882, and the performance of the model has reached an excellent level. The fusion model constructed by combining lncRNA and clinical risk factors can greatly improve the performance of the prediction model;

S6:肿瘤微环境间质细胞特征基因和免疫细胞亚群对膀胱癌预后预测作用S6: The predictive effect of tumor microenvironment stromal cell signature genes and immune cell subsets on the prognosis of bladder cancer

我们对间质细胞的特征基因表达标签构建模型,计算风险分值并整合入lncRNA模型中。结果表明,间质细胞的特征基因表达风险分值可提高模型的性能。在验证数据集中,5年生存的预后预测AUC为0.789。采用CYBERSORT从mRNA数据经反卷积计算得到的免疫细胞组分的研究表明,单独的免疫细胞组分可以预测膀胱癌的预后,然后计算免疫细胞成分风险分值并整合入lncRNA-CAF复合模型中,结果表明,lncRNA-CAF-Immune复合模型的表现在训练集中的表现优异(5年生存预测的AUC=0.924),复合模型在验证集中5年生存的预测价值同样优于单纯的lncRNA模型(AUC=0.787);We built a model for the characteristic gene expression signature of mesenchymal cells, calculated the risk score and integrated it into the lncRNA model. The results show that characteristic gene expression risk scores of mesenchymal cells can improve the performance of the model. In the validation dataset, the prognostic prediction AUC for 5-year survival was 0.789. Using CYBERSORT to calculate immune cell components from mRNA data by deconvolution, it was shown that immune cell components alone can predict the prognosis of bladder cancer, and then the risk score of immune cell components was calculated and integrated into the lncRNA-CAF composite model , the results show that the performance of the lncRNA-CAF-Immune composite model is excellent in the training set (AUC of 5-year survival prediction=0.924), and the prediction value of the composite model in the verification set of 5-year survival is also better than that of the simple lncRNA model (AUC =0.787);

S7:优化的lncRNA融合模型预测膀胱癌患者生存预后的表现S7: Performance of optimized lncRNA fusion model in predicting survival and prognosis of bladder cancer patients

结合多维生物学信息的预测模型可能会提高预测性能,由此我们建立了一个以lncRNA模型为骨架,融合入临床风险因素、肿瘤微环境的间质细胞/免疫细胞亚型基因表达信息的融合模型,结果表明,融合模型的ROC曲线在训练集与验证数据集中均表现优异,在验证数据集中,5年生存的预后预测AUC为0.913;Predictive models that combine multidimensional biological information may improve predictive performance, so we established a fusion model based on the lncRNA model and integrated with clinical risk factors and gene expression information of stromal cells/immune cell subtypes in the tumor microenvironment , the results show that the ROC curve of the fusion model performs well in both the training set and the verification data set. In the verification data set, the prognosis prediction AUC of 5-year survival is 0.913;

S8:优化的lncRNA融合模型的临床应用探索S8: Clinical application exploration of the optimized lncRNA fusion model

基于构建的融合模型,绘制列线图。该列线图直观展示了优化的融合模型中可行性最高的lncRNA标记,CAF风险评分,Immune风险分值以及临床风险因素对生成预后的影响。另外,我们绘制的DCA曲线表明,我们构建的融合模型具有临床应用价值。Based on the constructed fusion model, a nomogram was drawn. The nomogram visually shows the most feasible lncRNA markers, CAF risk score, Immune risk score and the impact of clinical risk factors on the generated prognosis in the optimized fusion model. In addition, the DCA curve we drew shows that the fusion model we constructed has clinical application value.

优选的,所述S3中,数据处理和模型构建的模型,从获得的公开数据TCGA-膀胱癌level 3的数据开始,通过质量控制,标化和转换操作,获得统一的数据矩阵,数据矩阵按7:3的比例随机分为训练数据集和验证数据集,采用lasso回归的方法对数据进行降维和筛选,构建lncRNA预后预测模型;模型构建好后,首先加入临床风险因素,然后,探索由肿瘤微环境间质细胞/免疫细胞特征基因表达标签对模型表现的影响。Preferably, in said S3, the model of data processing and model construction starts from the obtained public data TCGA-bladder cancer level 3 data, and obtains a unified data matrix through quality control, standardization and conversion operations, and the data matrix is according to The ratio of 7:3 is randomly divided into training data set and verification data set, and the lasso regression method is used to reduce the dimension and screen the data, and construct the lncRNA prognosis prediction model; after the model is built, first add clinical risk factors, and then explore Impact of microenvironmental stromal/immune cell signature gene expression signatures on model performance.

优选的,所述S6中,对间质细胞的特征基因表达标签构建模型,计算风险分值并整合入lncRNA模型中,结果表明,间质细胞的特征基因表达风险分值可提高模型的预后预测性能,在验证数据集中,5年生存的预后预测AUC为0.789。Preferably, in said S6, a model is constructed for the characteristic gene expression signature of the mesenchymal cells, and the risk score is calculated and integrated into the lncRNA model. The results show that the characteristic gene expression risk score of the mesenchymal cells can improve the prognosis prediction of the model Performance, in the validation dataset, the prognostic prediction AUC for 5-year survival was 0.789.

优选的,所述S6中,进一步的把间质细胞的特征基因的风险分值和免疫细胞组分的风险分值融合入模型中,lncRNA和肿瘤微环境间质/免疫的复合模型表现达到接近优。Preferably, in the S6, the risk score of the characteristic gene of the mesenchymal cell and the risk score of the immune cell component are further integrated into the model, and the performance of the composite model of lncRNA and tumor microenvironment mesenchymal/immunity is close to excellent.

优选的,所述S8中,该评分方法包括可行性最高的lncRNA标记及临床风险因素的变量,还提供了可用于进一步优化的CAF风险分值和免疫细胞亚群计算的风险分值,该列线图可用于将来的潜在验证和诊断,另外,通过绘制的DCA曲线表明,构建的融合模型具有临床应用价值。Preferably, in said S8, the scoring method includes variables of the most feasible lncRNA markers and clinical risk factors, and also provides CAF risk scores that can be used for further optimization and risk scores calculated by immune cell subsets, the column The line graph can be used for potential validation and diagnosis in the future. In addition, the drawn DCA curve shows that the constructed fusion model has clinical application value.

优选的,所述S3中,所述lncRNA模型包括12个lncRNA分子。Preferably, in said S3, said lncRNA model includes 12 lncRNA molecules.

(三)有益效果(3) Beneficial effects

与现有技术相比,本发明提供了一种基于lncRNA的优化模型预测膀胱癌患者生存预后的方法,具备以下有益效果:Compared with the prior art, the present invention provides a method for predicting the survival and prognosis of bladder cancer patients based on an lncRNA-based optimized model, which has the following beneficial effects:

1、该一种基于lncRNA的优化模型预测膀胱癌患者生存预后的方法,通过本方法构建了一个可用于膀胱癌预后预测的lncRNA模型。通过融合入临床风险因素、肿瘤微环境间质细胞、免疫细胞亚型的基因表达信息,且优化了lncRNA模型预测膀胱癌患者长期生存的性能。优化的融合模型表现优异,且具有可扩展性,具有一定临床应用价值。1. The method for predicting survival and prognosis of bladder cancer patients based on an optimized model of lncRNA, constructs an lncRNA model that can be used for prognosis prediction of bladder cancer through this method. By incorporating clinical risk factors, gene expression information of tumor microenvironmental stromal cells, and immune cell subtypes, the performance of the lncRNA model in predicting long-term survival of bladder cancer patients was optimized. The optimized fusion model has excellent performance, is scalable, and has certain clinical application value.

2、该一种基于lncRNA的优化模型预测膀胱癌患者生存预后的方法,通过本方法,首先,可以包含来自多组学数据的分子特征。在研究中,数据不完整或数据缺失通常是限制模型应用的常见问题,本发明的模型在相对完整的数据基础之上构建,这使得能够在多个维度对生存进行预测,模型表现会更加稳定。其二,本发明的框架具有可扩展性,可根据不同中心的临床及基因数据的可获得性进行调整。其三,通过本发明的研究中的框架也适用于各种癌症类型,多组学数据的可得性使得本框架对于其他癌症的模型构建也非常有用。可以把同样的框架开发应用于其他癌症类型的预后预测,最终使本发明的概念构建的模型适用于临床应用。2. The method for predicting survival and prognosis of bladder cancer patients with an optimized model based on lncRNA, through this method, firstly, molecular features from multi-omics data can be included. In research, incomplete data or missing data is usually a common problem that limits the application of the model. The model of the present invention is built on the basis of relatively complete data, which makes it possible to predict survival in multiple dimensions, and the performance of the model will be more stable . Second, the framework of the present invention is scalable and can be adjusted according to the availability of clinical and genetic data at different centers. Third, the framework in the research of the present invention is also applicable to various cancer types, and the availability of multi-omics data makes this framework very useful for model construction of other cancers. The same framework can be developed for prognostic prediction of other cancer types, ultimately making the conceptually constructed models of the present invention suitable for clinical application.

3、该一种基于lncRNA的优化模型预测膀胱癌患者生存预后的方法,通过该方法中的模型,模型结合了分子特征,这使得模型更具生物学解释力。研究表明,这些肿瘤微环境的分子特征还可影响疗效,或许也可以用于构建模型预测治疗反应,值得将来进一步研究。突出了在预测癌症预后的计算模型中,从多组学数据系统综合功能信号网络的优势。以往的研究表明,多种信号通路参与了膀胱癌的发生机制,包括MAPK信号和ERBB信号。在本研究中,仅研究了免疫/间质细胞在膀胱癌微环境中的作用,尚有其他基因因素可能也可以解释膀胱癌预后。这种多个维度的综合模型在预后预测中获得更好表现,并且容易从生物学的角度去解释,使得模型更具有可解释性。3. This lncRNA-based optimization model predicts the survival and prognosis of bladder cancer patients. Through the model in this method, the model combines molecular features, which makes the model more biologically explanatory. Studies have shown that these molecular characteristics of the tumor microenvironment can also affect the efficacy, and may also be used to construct models to predict treatment response, which is worth further research in the future. Highlights the advantages of systematically synthesizing functional signaling networks from multi-omics data in computational models for predicting cancer prognosis. Previous studies have shown that multiple signaling pathways are involved in the pathogenesis of bladder cancer, including MAPK signaling and ERBB signaling. In this study, only the role of immune/stromal cells in the microenvironment of bladder cancer was investigated, and there are other genetic factors that may also explain bladder cancer prognosis. This multi-dimensional comprehensive model achieves better performance in prognosis prediction, and is easy to explain from a biological perspective, making the model more interpretable.

4、该一种基于lncRNA的优化模型预测膀胱癌患者生存预后的方法,通过本方法,有助于预测MIBC患者的存活率,对于早期膀胱癌预后预测也有帮助。4. The method for predicting survival and prognosis of patients with bladder cancer based on an optimized model of lncRNA is helpful to predict the survival rate of MIBC patients, and is also helpful for predicting the prognosis of early bladder cancer.

附图说明Description of drawings

附图用来提供对本发明的进一步理解,并且构成说明书的一部分,与本发明的实施例一起用于解释本发明,并不构成对本发明的限制。在附图中:The accompanying drawings are used to provide a further understanding of the present invention, and constitute a part of the description, and are used together with the embodiments of the present invention to explain the present invention, and do not constitute a limitation to the present invention. In the attached picture:

图1为本发明数据处理及lncRNA模型构建流程图;Fig. 1 is the flow chart of data processing and lncRNA model construction of the present invention;

图2为本发明基于lncRNA模型在训练数据集和验证数据集的ROC曲线和生存曲线(A:训练集中lncRNA模型预测预后的AUC曲线;B:训练集中lncRNA模型预测预后的生存曲线;C:验证数据集中lncRNA模型预测预后的AUC曲线;D:验证数据集中lncRNA模型预测预后的生存曲线)Fig. 2 is the ROC curve and the survival curve (A: the AUC curve that the lncRNA model predicts the prognosis in the training set based on the lncRNA model of the present invention in the training data set and the verification data set; B: the survival curve that the lncRNA model predicts the prognosis in the training set; C: verification The AUC curve of the lncRNA model predicting the prognosis in the data set; D: the survival curve of the lncRNA model predicting the prognosis in the validation data set)

图3为本发明整合临床风险因素的lncRNA模型对膀胱癌预后预测的作用图;(A:训练集中lncRNA模型整合临床风险因素后预测预后(36个月)的AUC曲线;B:训练集中临床风险因素整合入lncRNA模型预测预后(60个月)的AUC曲线;C:验证数据集中lncRNA模型整合临床风险因素后预测预后(36个月)的AUC曲线;D:验证数据集中临床风险因素整合入lncRNA模型预测(60个月)预后的AUC曲线)Fig. 3 is the figure of the effect of the lncRNA model integrating clinical risk factors of the present invention on bladder cancer prognosis prediction; (A: the AUC curve of predicting prognosis (36 months) after the lncRNA model integrates clinical risk factors in the training set; B: clinical risk in the training set Factors integrated into lncRNA model to predict prognosis (60 months) AUC curve; C: AUC curve of lncRNA model integrated clinical risk factors in validation data set to predict prognosis (36 months); D: Validation data set clinical risk factors integrated into lncRNA AUC curve of model prediction (60 months) prognosis)

图4为本发明基于肿瘤微环境的间质/免疫细胞特征基因表达标签优化lncRNA预测模型图;(A:训练集免疫细胞特征基因的表达标签计算的风险分值,以及间质细胞特征基因的表达标签计算的风险分值分别整合入lncRNA模型,预测3年生存的AUC曲线;B:训练集免疫细胞特征基因的表达标签计算的风险分值,以及间质细胞特征基因的表达标签计算的风险分值分别整合入lncRNA模型,预测5年生存的AUC曲线;C:验证集免疫细胞特征基因的表达标签计算的风险分值,以及间质细胞特征基因的表达标签计算的风险分值分别整合入lncRNA模型,预测3年生存的AUC曲线;D:验证集免疫细胞特征基因的表达标签计算的风险分值,以及间质细胞特征基因的表达标签计算的风险分值分别整合入lncRNA模型,预测5年生存的AUC曲线)Fig. 4 is the optimal lncRNA prediction model diagram of the interstitial/immune cell characteristic gene expression label based on the tumor microenvironment of the present invention; The risk scores calculated by the expression labels were integrated into the lncRNA model to predict the AUC curve of 3-year survival; B: The risk scores calculated by the expression labels of the immune cell characteristic genes in the training set, and the risk calculated by the expression labels of the mesenchymal cell characteristic genes The scores are integrated into the lncRNA model to predict the AUC curve of 5-year survival; C: The risk score calculated by the expression signature of the immune cell characteristic gene in the verification set, and the risk score calculated by the expression signature of the mesenchymal cell characteristic gene are respectively integrated into lncRNA model, predicting the AUC curve of 3-year survival; D: The risk score calculated by the expression signature of the immune cell signature gene in the validation set, and the risk score calculated by the expression signature of the mesenchymal cell signature gene were integrated into the lncRNA model, predicting 5 AUC curve of annual survival)

图5为本发明整合临床风险因素及肿瘤微环境基因表达标签的lncRNA融合模型预测预后的表现图;(A:训练集中lncRNA融合模型预测预后(36个月)的AUC曲线;B:训练集中lncRNA融合模型预测预后(60个月)的AUC曲线;C:验证集中lncRNA融合模型预测预后(36个月)的AUC曲线;D:验证集中lncRNA融合模型预测预后(60个月)的AUC曲线)Fig. 5 is the performance diagram of predicting the prognosis of the lncRNA fusion model of the present invention integrating clinical risk factors and tumor microenvironmental gene expression labels; (A: the AUC curve of the lncRNA fusion model predicting prognosis (36 months) in the training set; B: the lncRNA in the training set AUC curve of fusion model predicting prognosis (60 months); C: AUC curve of lncRNA fusion model predicting prognosis (36 months) in validation set; D: AUC curve of lncRNA fusion model predicting prognosis (60 months) in validation set)

图6为本发明基于lncRNA的优化模型预测膀胱癌患者生成预后的列线图Fig. 6 is a nomogram for predicting the prognosis of bladder cancer patients based on lncRNA-based optimization model of the present invention

图7为本发明lncRNA优化融合模型的DCA曲线图(A:在训练数据集中绘制;B:在验证数据集中绘制12,24,36,48及60为月)Fig. 7 is the DCA curve diagram of the lncRNA optimization fusion model of the present invention (A: drawing in the training data set; B: drawing 12, 24, 36, 48 and 60 months in the verification data set)

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention.

所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本发明,而不能理解为对本发明的限制。Examples of the described embodiments are shown in the drawings, wherein like or similar reference numerals designate like or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

在本发明的描述中,需要理解的是,术语“中心”、“纵向”、“横向”、“长度”、“宽度”、“厚度”、“上”、“下”、“前”、“后”、“左”、“右”、“竖直”、“水平”、“顶”、“底”、“内”、“外”、“顺时针”、“逆时针”、“轴向”、“径向”、“周向”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本发明的限制。In describing the present invention, it should be understood that the terms "center", "longitudinal", "transverse", "length", "width", "thickness", "upper", "lower", "front", " Back", "Left", "Right", "Vertical", "Horizontal", "Top", "Bottom", "Inner", "Outer", "Clockwise", "Counterclockwise", "Axial" , "radial", "circumferential" and other indicated orientations or positional relationships are based on the orientations or positional relationships shown in the drawings, which are only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying the referred device or Elements must have certain orientations, be constructed and operate in certain orientations, and therefore should not be construed as limitations on the invention.

在本发明中,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”、“固定”等术语应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或成一体;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通或两个元件的相互作用关系。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本发明中的具体含义。In the present invention, unless otherwise clearly specified and limited, terms such as "installation", "connection", "connection" and "fixation" should be understood in a broad sense, for example, it can be a fixed connection or a detachable connection , or integrated; it can be mechanically connected or electrically connected; it can be directly connected or indirectly connected through an intermediary, and it can be the internal communication of two components or the interaction relationship between two components. Those of ordinary skill in the art can understand the specific meanings of the above terms in the present invention according to specific situations.

本发明提供了一种基于lncRNA的优化模型预测膀胱癌患者生存预后的方法,包括以下步骤:The present invention provides a method for predicting the survival and prognosis of bladder cancer patients based on an lncRNA-based optimization model, comprising the following steps:

S1:数据收集和预处理S1: Data collection and preprocessing

使用FPKM数据分析来自TCGA的lncRNA数据,使用RSEM归一化计数类数据和进一步的log2转换表达矩阵,分析来自TCGA Level 3的mRNA数据,TCGA临床数据采用校正的表型数据,对数据进行预处理,通过质量控制、归一化和转换,以获得统一的表达矩阵;Use FPKM data to analyze lncRNA data from TCGA, use RSEM to normalize count data and further log2 transform expression matrix, analyze mRNA data from TCGA Level 3, and use corrected phenotype data for TCGA clinical data to preprocess the data , through quality control, normalization and transformation to obtain a unified expression matrix;

S2:统计分析S2: Statistical analysis

将纳入分析的394例膀胱癌患者按7:3的比例随机分为训练集和验证集,首先使用训练集中的数据寻找独立预后因素,采用lasso回归与逐步法对变量进一步降维构建多变量Cox风险模型,然后将模型应用于验证队列以评估预测模型的特异性、敏感度及临床有效性。对于模型的优化,对给定的基因表达标签在mRNA数据集中构建模型,计算风险分值,用来优化融合模型,融合与优化之后的模型用列线图展示,模型预测价值及临床有效性的评估分别采用受试者工作特性曲线和决策曲线分析;The 394 bladder cancer patients included in the analysis were randomly divided into a training set and a validation set at a ratio of 7:3. First, the data in the training set was used to find independent prognostic factors, and lasso regression and stepwise method were used to further reduce the dimensionality of variables to construct a multivariate Cox The risk model was then applied to the validation cohort to assess the specificity, sensitivity, and clinical validity of the predictive model. For model optimization, construct a model in the mRNA data set for a given gene expression label, calculate the risk score, and use it to optimize the fusion model. The model after fusion and optimization is displayed in a nomogram, and the predictive value and clinical effectiveness of the model The evaluation adopts receiver operating characteristic curve and decision curve analysis respectively;

S3:框架设计和数据预处理S3: framework design and data preprocessing

经过数据预处理及lasso降维筛选后,构建lncRNA预后预测模型,该lncRNA模型包括12个lncRNA分子,随后,把影响膀胱癌预后的临床风险因素纳入模型中,包括T分期、N分期及肿瘤分级这些具有临床意义的指标,以构建临床因素-lncRNA复合模型,然后再基于微环境中肿瘤相关成纤维细胞间质细胞(CAF)特异表达标签,连同免疫细胞亚群细胞信息,分别计算风险分值,作为优化变量对临床因素-基因复合模型进行优化,再把这个优化的模型与已经发表的、肿瘤相关的lncRNA模型进行比较;After data preprocessing and lasso dimensionality reduction screening, the lncRNA prognosis prediction model was constructed. The lncRNA model included 12 lncRNA molecules. Then, the clinical risk factors affecting the prognosis of bladder cancer were incorporated into the model, including T stage, N stage and tumor grade. These clinically meaningful indicators are used to construct a clinical factor-lncRNA composite model, and then calculate the risk score based on the specific expression tags of tumor-associated fibroblast-stromal cells (CAF) in the microenvironment, together with the cell information of immune cell subsets , as an optimization variable to optimize the clinical factor-gene composite model, and then compare this optimized model with published, tumor-related lncRNA models;

在数据处理和模型构建的模型中,从获得的公开数据TCGA-膀胱癌level 3的数据开始,通过质量控制,标化和转换操作,获得统一的数据矩阵,数据矩阵按7:3的比例随机分为训练数据集和验证数据集,采用lasso回归的方法对数据进行降维和筛选,构建lncRNA预后预测模型;模型构建好后,首先加入临床风险因素,然后,探索由肿瘤微环境间质细胞/免疫细胞特征基因表达标签计算的风险分值对模型表现的影响;In the model of data processing and model construction, starting from the obtained public data TCGA-bladder cancer level 3 data, through quality control, standardization and conversion operations, a unified data matrix is obtained, and the data matrix is randomized at a ratio of 7:3 Divided into a training data set and a verification data set, the lasso regression method was used to reduce the dimensionality and screen the data, and construct the lncRNA prognosis prediction model; after the model was constructed, firstly, clinical risk factors were added, and then, the tumor microenvironment stromal cells/ The impact of risk scores calculated from immune cell signature gene expression signatures on model performance;

S4:基于lncRNA预后预测模型的构建S4: Construction of lncRNA-based prognosis prediction model

采用lasso算法和多元Cox回归分析相结合的方法,获得一个包含12个分子的lncRNA模型,ROC曲线表明,lncRNA模型在预测膀胱癌预后方面表现良好,训练数据集5年的生存预测的AUC为0.894,利用该模型计算的风险分值可把患者区分为显著差异的两类,高风险分值相比于低风险分值的患者,死亡风险增加了7.5倍,验证数据集5年生存预测的AUC为0.755,高风险分值患者死亡风险是低风险分值患者的2.7倍;Using the combination of lasso algorithm and multiple Cox regression analysis, a lncRNA model containing 12 molecules was obtained. The ROC curve showed that the lncRNA model performed well in predicting the prognosis of bladder cancer, and the AUC of the 5-year survival prediction of the training data set was 0.894. , the risk score calculated by the model can be used to distinguish patients into two groups with significant differences. Compared with patients with low risk scores, the risk of death increased by 7.5 times. The AUC of the 5-year survival prediction of the validation data set was 0.755, and the death risk of patients with high-risk scores was 2.7 times that of patients with low-risk scores;

S5:基于lncRNA模型与临床风险因素的整合S5: Integration of lncRNA-based models and clinical risk factors

整合入临床风险因素,包括膀胱癌T分期、N分期、肿瘤分级,构建临床风险因素-lncRNA复合模型,单独的临床风险因素模型和单独lncRNA模型,对膀胱癌的预后预测表现良好,但表现尚未达到优的级别,在验证集中临床风险因素模型5年生存预测的AUC为0.774,lncRNA模型的AUC为0.764,相比之下,lncRNA模型融合入临床风险因素后(临床风险因素-lncRNA复合模型)在验证集中5年生存预测的AUC为0.882,模型表现达到优的级别,lncRNA与临床风险因素的结合,可大大提高模型的性能;Integrating clinical risk factors, including bladder cancer T stage, N stage, tumor grade, constructing a clinical risk factor-lncRNA composite model, a separate clinical risk factor model and a single lncRNA model, the prognosis prediction performance of bladder cancer is good, but the performance is not yet Reaching an excellent level, the AUC of the 5-year survival prediction of the clinical risk factor model in the verification set is 0.774, and the AUC of the lncRNA model is 0.764. In contrast, after the lncRNA model is integrated into the clinical risk factors (clinical risk factor-lncRNA composite model) In the verification set, the AUC of 5-year survival prediction is 0.882, and the performance of the model reaches an excellent level. The combination of lncRNA and clinical risk factors can greatly improve the performance of the model;

S6:肿瘤微环境间质细胞特征基因和免疫细胞亚群对膀胱癌预后预测作用S6: The predictive effect of tumor microenvironment stromal cell signature genes and immune cell subsets on the prognosis of bladder cancer

我们对间质细胞的特征基因表达标签构建模型,计算风险分值并整合入lncRNA模型中。结果表明,间质细胞的特征基因表达风险分值可提高模型的性能。在验证数据集中,5年生存的预后预测AUC为0.789。采用CYBERSORT从mRNA数据经反卷积计算得到的免疫细胞组分的研究表明,单独的免疫细胞组分可以预测膀胱癌的预后,然后计算免疫细胞成分风险分值并整合入lncRNA-CAF复合模型中,结果表明,lncRNA-CAF-Immune复合模型的表现在训练集中的表现优异(5年生存预测的AUC=0.924),复合模型在验证集中5年生存的预测价值同样优于单纯的lncRNA模型(AUC=0.787);We built a model for the characteristic gene expression signature of mesenchymal cells, calculated the risk score and integrated it into the lncRNA model. The results show that characteristic gene expression risk scores of mesenchymal cells can improve the performance of the model. In the validation dataset, the prognostic prediction AUC for 5-year survival was 0.789. Using CYBERSORT to calculate immune cell components from mRNA data by deconvolution, it was shown that immune cell components alone can predict the prognosis of bladder cancer, and then the risk score of immune cell components was calculated and integrated into the lncRNA-CAF composite model , the results show that the performance of the lncRNA-CAF-Immune composite model is excellent in the training set (AUC of 5-year survival prediction=0.924), and the prediction value of the composite model in the verification set of 5-year survival is also better than that of the simple lncRNA model (AUC =0.787);

S7:优化的lncRNA融合模型预测膀胱癌患者生存预后的表现S7: Performance of optimized lncRNA fusion model in predicting survival and prognosis of bladder cancer patients

结合多维生物学信息的预测模型可能会提高预测性能,由此建立了一个以lncRNA模型为骨架,融合入临床风险因素、肿瘤微环境的间质细胞/免疫细胞亚型基因表达信息的融合模型,结果表明,融合模型的ROC曲线在验证数据集中均表现优,在验证数据集中,5年生存的预后预测AUC为0.913;The prediction model combined with multi-dimensional biological information may improve the prediction performance. Therefore, a fusion model based on the lncRNA model and integrated with clinical risk factors and tumor microenvironment stromal cells/immune cell subtype gene expression information was established. The results showed that the ROC curve of the fusion model performed well in the verification data set, and in the verification data set, the prognosis prediction AUC of 5-year survival was 0.913;

S8:优化的lncRNA融合模型的临床应用探索S8: Clinical application exploration of the optimized lncRNA fusion model

基于构建的融合模型,绘制列线图;Based on the constructed fusion model, draw a nomogram;

该评分方法包括可行性最高的lncRNA标记及临床风险因素的变量,还提供了可用于进一步优化的CAF风险分值和免疫细胞亚群计算的风险分值,该列线图在经过严格随机对照试验验证之后,可用于膀胱癌患者生存与预后的预测,另外,通过绘制的DCA曲线表明,构建的融合模型具有较好的临床应用价值。The scoring method includes variables of the most feasible lncRNA markers and clinical risk factors, and also provides risk scores for CAF and immune cell subsets that can be used for further optimization. After verification, it can be used to predict the survival and prognosis of bladder cancer patients. In addition, the drawn DCA curve shows that the constructed fusion model has good clinical application value.

需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个引用结构”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that there is a relationship between these entities or operations. There is no such actual relationship or order between them. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a referenced structure" does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

Claims (6)

1. A method for predicting the survival prognosis of a patient with bladder cancer based on an lncRNA optimization model is characterized by comprising the following steps:
s1: data collection and pre-processing
Analyzing bladder cancer lncRNA data from TCGA using FPKM data, analyzing mRNA data from TCGA Level3 using RSEM normalized count-class data and further log2 transformed expression matrices, TCGA clinical data using corrected phenotype data, pre-processing the data, quality control, normalization and transformation to obtain a unified expression matrix;
s2: statistical analysis
394 patients who are included in the analysis are randomly divided into a training set and a verification set according to the proportion of 7:3, firstly, data in the training set are used for searching for independent prognostic factors, a multivariate Cox risk model is constructed by further reducing dimensions of variables by adopting lasso regression and a step-by-step method, and then the model is applied to a verification queue to evaluate the specificity, sensitivity and clinical effectiveness of the prediction model. For the optimization of the model, a model is constructed in an mRNA data set for a given gene expression label related to a tumor microenvironment, a risk score is calculated and used for optimizing a fusion model, the fused and optimized model is displayed by a nomogram, and the evaluation of the prediction value and the clinical effectiveness of the model are respectively analyzed by adopting a subject working characteristic curve and a decision curve;
s3: framework design and data pre-processing
After data preprocessing and lasso dimension reduction screening, constructing an lncRNA prognosis prediction model, then bringing clinical risk factors influencing bladder cancer prognosis into the model, including T stage, N stage and tumor grading which have clinical significance indexes, so as to construct a clinical factor-lncRNA composite model, then respectively calculating risk scores based on tumor-related fibroblast interstitial Cell (CAF) specific expression labels in a microenvironment and immune cell subgroup cell information, optimizing the clinical factor-lncRNA composite model as an optimization variable, and then comparing the optimized model with a published tumor-related lncRNA model;
s4: construction of prognosis prediction model based on lncRNA
The method combining the lasso algorithm and the multivariate Cox regression analysis is adopted to obtain an lncRNA model containing 12 molecules, an ROC curve indicates that the lncRNA model is good in predicting bladder cancer prognosis, the AUC of survival prediction in 5 years of a training data set is 0.894, the risk score calculated by the model can be used for distinguishing patients into two types with significant difference, the death risk of patients with high risk score is increased by 7.5 times compared with the patients with low risk score, the AUC of survival prediction in 5 years of the training data set is verified to be 0.755, and the death risk of the patients with high risk score is 2.7 times that of the patients with low risk score;
s5: integration of lncRNA-based model with clinical risk factors
The clinical risk factors are integrated, including the T stage, the N stage and the tumor stage of the bladder cancer, a clinical risk factor-lncRNA composite model is constructed, the single clinical risk factor model and the single lncRNA model have good performance on prognosis prediction of the bladder cancer, but the performance does not reach an excellent level, the AUC of 5-year survival prediction in a centralized clinical risk factor model is 0.774, and the AUC of the lncRNA model is 0.764, compared with the AUC of 5-year survival prediction in a centralized clinical risk factor model after the lncRNA model is fused into the clinical risk factors (the clinical risk factor-lncRNA composite model), the AUC of the model performance reaches an excellent level, and the fusion model constructed by combining the lncRNA and the clinical risk factors can greatly improve the performance of the prediction model;
s6: prognosis prediction effect of tumor microenvironment interstitial cell characteristic genes and immune cell subsets on bladder cancer
We constructed models for the characteristic gene expression signatures of mesenchymal cells, calculated the risk score and integrated into the lncRNA model. The results indicate that the characteristic gene expression risk score of mesenchymal cells can improve the performance of the model. In the validation dataset, the prognostic prediction AUC for 5-year survival was 0.789. The research of Immune cell components obtained by deconvolution calculation from mRNA data by using CYBERSORT shows that the single Immune cell components can predict the prognosis of bladder cancer, then the risk scores of the Immune cell components are calculated and integrated into an lncRNA-CAF composite model, and the result shows that the lncRNA-CAF-Immune composite model is excellent in performance in a training set (AUC =0.924 of 5-year survival prediction), and the prediction value of the composite model in a verification set for 5 years is also superior to that of a pure lncRNA model (AUC = 0.787);
s7: optimized lncRNA fusion model for predicting performance of bladder cancer patient prognosis
The prediction performance can be improved by combining a prediction model of multidimensional biological information, so that a fusion model which takes an lncRNA model as a framework and is fused with clinical risk factors and interstitial cells/immune cell subtype gene expression information of a tumor microenvironment is established, the result shows that an ROC curve of the fusion model is excellent in a training set and a verification data set, and the prognosis prediction AUC for 5-year survival is 0.913 in the verification data set;
s8: clinical application exploration of optimized lncRNA fusion model
And drawing a nomogram based on the constructed fusion model. The nomogram visually demonstrates the impact of the most feasible lncRNA markers, CAF risk scores, immune risk scores and clinical risk factors on the outcome of the generated fusion model. In addition, the DCA curve drawn by the user shows that the fusion model constructed by the user has clinical application value.
2. The method for predicting the survival prognosis of the patient with bladder cancer based on the lncRNA optimization model as claimed in claim 1, wherein in S3, the model for data processing and model construction is obtained by performing quality control, standardization and transformation operations on the obtained public data TCGA-bladder cancer level3 to obtain a uniform data matrix, the data matrix is randomly divided into a training data set and a verification data set according to the proportion of 7:3, and dimension reduction and screening are performed on the data by using a lasso regression method to construct the lncRNA prognosis prediction model; after the model is constructed, clinical risk factors are added firstly, and then the influence of the risk score calculated by the tumor microenvironment interstitial cell/immune cell characteristic gene expression label on the model expression is explored.
3. The method for predicting survival prognosis of bladder cancer patient by using the incrna-based optimization model according to claim 1, wherein in S6, the characteristic gene expression label of the mesenchymal cells is modeled, the risk score is calculated and integrated into the incrna model, and the result shows that the characteristic gene expression risk score of the mesenchymal cells can improve the prognosis prediction performance of the model, and the prognosis prediction AUC for 5-year survival in the validation dataset is 0.789.
4. The method for predicting survival prognosis of bladder cancer patient according to the lncRNA-based optimization model of claim 1, wherein in S6, the risk score of characteristic genes of mesenchymal cells and the risk score of immune cell components are further fused into the model, and the performance of the lncRNA and the composite model of tumor microenvironment stroma/immunity is close to optimal.
5. The method for predicting the survival prognosis of the patient with bladder cancer based on the lncRNA optimized model of claim 1, wherein in S8, the scoring method comprises the lncRNA markers with the highest feasibility and the variables of clinical risk factors, and also provides CAF risk scores and risk scores calculated by immune cell subsets, which can be used for further optimization, and the nomogram can be used for potential future verification and diagnosis, and in addition, the constructed fused model is shown to have clinical application value through drawn DCA curves.
6. The method for predicting survival prognosis of a patient with bladder cancer based on an lncRNA optimization model of claim 1, wherein in S3, the lncRNA model comprises 12 lncRNA molecules.
CN202211565423.9A 2022-12-07 2022-12-07 A lncRNA-based optimization model for predicting survival and prognosis of bladder cancer patients Pending CN115762792A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211565423.9A CN115762792A (en) 2022-12-07 2022-12-07 A lncRNA-based optimization model for predicting survival and prognosis of bladder cancer patients

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211565423.9A CN115762792A (en) 2022-12-07 2022-12-07 A lncRNA-based optimization model for predicting survival and prognosis of bladder cancer patients

Publications (1)

Publication Number Publication Date
CN115762792A true CN115762792A (en) 2023-03-07

Family

ID=85344162

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211565423.9A Pending CN115762792A (en) 2022-12-07 2022-12-07 A lncRNA-based optimization model for predicting survival and prognosis of bladder cancer patients

Country Status (1)

Country Link
CN (1) CN115762792A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117038092A (en) * 2023-08-21 2023-11-10 中山大学孙逸仙纪念医院 Pancreatic cancer prognosis model construction method based on Cox regression analysis
CN117637185A (en) * 2024-01-25 2024-03-01 首都医科大学宣武医院 An image-based auxiliary decision-making method, system and equipment for craniopharyngioma treatment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117038092A (en) * 2023-08-21 2023-11-10 中山大学孙逸仙纪念医院 Pancreatic cancer prognosis model construction method based on Cox regression analysis
CN117637185A (en) * 2024-01-25 2024-03-01 首都医科大学宣武医院 An image-based auxiliary decision-making method, system and equipment for craniopharyngioma treatment
CN117637185B (en) * 2024-01-25 2024-04-23 首都医科大学宣武医院 Image-based craniopharyngeal tube tumor treatment auxiliary decision-making method, system and equipment

Similar Documents

Publication Publication Date Title
US20210246511A1 (en) Integrated machine-learning framework to estimate homologous recombination deficiency
Zafar et al. Monovar: single-nucleotide variant detection in single cells
US20200239965A1 (en) Source of origin deconvolution based on methylation fragments in cell-free dna samples
CN115762792A (en) A lncRNA-based optimization model for predicting survival and prognosis of bladder cancer patients
CN111128385B (en) Prognosis early warning system for esophageal squamous carcinoma and application thereof
CN114203256B (en) MIBC typing and prognosis prediction model construction method based on microbial abundance
Kim et al. rSW-seq: algorithm for detection of copy number alterations in deep sequencing data
WO2021150990A1 (en) Small rna disease classifiers
Zhang et al. Bioinformatic identification of genomic instability-associated lncRNAs signatures for improving the clinical outcome of cervical cancer by a prognostic model
Zhang et al. Development and validation of a set of novel and robust 4-lncRNA-based nomogram predicting prostate cancer survival by bioinformatics analysis
CN117219162B (en) Evidence strength assessment method for tumor tissue STR profiles for identification of origin
CN113215261A (en) Gene marker for prognosis prediction and diagnosis of pancreatic cancer and use thereof
Lu et al. A clinically practical model for the preoperative prediction of lymph node metastasis in bladder cancer: a multicohort study
CN113234835A (en) Application of prognosis related gene and risk model in prediction of pancreatic cancer prognosis
CN113234833A (en) Pancreatic cancer prognosis marker, prognosis risk assessment model and application thereof
US20200105374A1 (en) Mixture model for targeted sequencing
Zhang et al. Efficiency evaluation of common forensic genetic markers for parentage identification involving close relatives
LU103183B1 (en) Method for building prognosis model of lung adenocarcinoma based on cuproptosis-related genes
CN117558346B (en) Molecular classification of UTUC and construction of prognostic prediction model
CN118028468B (en) Bladder cancer prognosis prediction marker, prediction model and construction method thereof
Zou et al. A robust statistical procedure to discover expression biomarkers using microarray genomic expression data
Li et al. Using the SVM Method for Lung Adenocarcinoma Prognosis Based on Expression Level
CN118773317A (en) A liver cancer prognosis risk diagnosis kit and diagnosis system
CN113215263A (en) Marker molecule related to pancreatic cancer prognosis and detection kit
CN113249489A (en) Cancer prognosis-related molecules

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination