CN112086199A

CN112086199A - Liver cancer data processing system based on multiple groups of mathematical data

Info

Publication number: CN112086199A
Application number: CN202010963978.3A
Authority: CN
Inventors: 任菲; 王忠烈; 谭光明; 刘玉东; 段勃; 张春明
Original assignee: Western Institute Of Advanced Technology Institute Of Computing Chinese Academy Of Sciences
Current assignee: Western Research Institute Of China Science And Technology Computing Technology
Priority date: 2020-09-14
Filing date: 2020-09-14
Publication date: 2020-12-15
Anticipated expiration: 2040-09-14
Also published as: CN112086199B

Abstract

A liver cancer data processing system based on multi-omics data provided by the present invention includes a preprocessing module, a data dimensionality reduction processing module, a classification processing module and a classifier module; the preprocessing module is used for multi-omics data analysis of liver cancer. Perform screening, and output the screened target data to the data dimensionality reduction processing module; the data dimensionality reduction processing module is used to receive the target data output by the preprocessing module, perform dimensionality reduction processing on the target data, and reduce the The dimensionally processed target data is output to the data dimensionality reduction processing module; the classification processing module is used to receive the dimensionally reduced target data output by the data dimensionality reduction processing module, and perform classification processing according to the dimensionally reduced target data, and output the classification label; the classifier module is used to receive the classification label, use the classification label to train the classifier module, and then the classifier module receives the real-time multi-omics liver cancer data and predicts the survival time of liver cancer; The multi-omics data is well fused, and the complementarity of the data is effectively used to fuse the multi-omics data of liver cancer, thereby effectively avoiding the loss of characteristic information in the process of data processing, effectively ensuring the accuracy of data processing, and ensuring the survival of subsequent liver cancer. guarantee the accuracy of the forecast.

Description

Liver cancer data processing system based on multi-omics data

技术领域technical field

本发明涉及一种数据处理系统，尤其涉及一种基于多组学数据的肝癌数据处理系统。The invention relates to a data processing system, in particular to a liver cancer data processing system based on multi-omics data.

背景技术Background technique

早期的肝癌主要以手术切除为主，但临床资料显示，术后肝癌复发率约为70％，严重阻碍了患者的长期生存。如果我们建立HCC的分型标准，对高危复发患者进行更加精细的分层管理，首先从源头上筛选出可能获益的人群再进行手术，对于改善患者生存、实现HCC的精准治疗可能具有更加重要的意义。基于多组学数据建立肝癌的分类标准，对不同的患者进行更准确的预后治疗和管理，将提高患者的生存率。因此，对于融合多组学数据从分子层面来对患者进行分型并预测患者的预后有着重要意义，这对患者的治疗也有着临床意义。Surgical resection is the main method for early stage liver cancer, but clinical data show that the recurrence rate of liver cancer after surgery is about 70%, which seriously hinders the long-term survival of patients. If we establish classification standards for HCC, carry out more detailed stratified management of high-risk recurrence patients, and first screen out the potentially beneficial population from the source before performing surgery, it may be more important to improve patient survival and achieve precise treatment of HCC. meaning. To establish a classification standard for liver cancer based on multi-omics data, and to carry out more accurate prognostic treatment and management for different patients, will improve the survival rate of patients. Therefore, it is of great significance to fuse multi-omics data to classify patients at the molecular level and predict the prognosis of patients, which also has clinical significance for the treatment of patients.

近年来也有融合RNA测序数据、miRNA数据、甲基化数据和肝癌患者的临床生存数据来对肝癌进行分型并预测预后的方法。但是，现有技术中，很少有研究者在研究分子亚型时考虑患者的生存状态。生存率对分子亚型的研究具有重要的临床意义，而生存率的巨大差异往往对分子亚型有很大的影响。利用多组学数据的融合来进行分子分型并预测预后有以下两个特点：(1)多组学数据的融合时期一般分为早期融合，中期融合和后期融合，不同的融合时期对融合结果存在很大的影响。(2)融合方式也有很大的影响。现有技术的融合方法或者系统存在以下缺陷：一方面采用自动编码器对输入数据进行集成，但是容易造成特征数据丢失，另一方面，现有技术对于数据的仅仅简单地将数据直接叠加，使得不同的数据融合性差，数据不能互补，不能提取出准确的信息。In recent years, there are also methods that fuse RNA sequencing data, miRNA data, methylation data and clinical survival data of liver cancer patients to classify liver cancer and predict prognosis. However, in the prior art, few researchers consider the patient's survival status when studying molecular subtypes. Survival rates have important clinical implications for the study of molecular subtypes, and large differences in survival rates often have a large impact on molecular subtypes. Using the fusion of multi-omics data to perform molecular typing and predict prognosis has the following two characteristics: (1) The fusion period of multi-omics data is generally divided into early fusion, mid-term fusion and late fusion. There is a big impact. (2) The fusion method also has a great influence. The fusion method or system of the prior art has the following defects: on the one hand, an automatic encoder is used to integrate the input data, but it is easy to cause the loss of characteristic data; The fusion of different data is poor, the data cannot complement each other, and accurate information cannot be extracted.

因此，为了解决上述技术问题，亟需提出一种新的技术手段。Therefore, in order to solve the above technical problems, it is urgent to propose a new technical means.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明的目的是提供一种基于多组学数据的肝癌数据处理系统，能够对肝癌多组学数据进行良好地融合，有效利用数据的互补性将肝癌多组学数据融合在一起，从而有效避免了在数据处理过程中特征信息丢失，有效确保数据处理的准确性，为后续肝癌生存期预测的准确性提供保障。In view of this, the purpose of the present invention is to provide a liver cancer data processing system based on multi-omics data, which can well fuse the liver cancer multi-omics data, and effectively utilize the complementarity of the data to fuse the liver cancer multi-omics data together. , so as to effectively avoid the loss of characteristic information in the process of data processing, effectively ensure the accuracy of data processing, and provide a guarantee for the accuracy of subsequent liver cancer survival prediction.

本发明提供的一种基于多组学数据的肝癌数据处理系统，包括预处理模块、数据降维处理模块、分类处理模块以及分类器模块；The invention provides a liver cancer data processing system based on multi-omics data, comprising a preprocessing module, a data dimensionality reduction processing module, a classification processing module and a classifier module;

所述预处理模块，用于对肝癌多组学数据进行筛选，并将筛选出的目标数据输出至数据降维处理模块中；The preprocessing module is used for screening liver cancer multi-omics data, and outputting the screened target data to the data dimension reduction processing module;

所述数据降维处理模块，用于接收预处理模块输出的目标数据，并对目标数据进行降维处理，并将降维处理后的目标数据输出至数据降维处理模块中；The data dimensionality reduction processing module is used to receive the target data output by the preprocessing module, perform dimensionality reduction processing on the target data, and output the dimensionality reduction processed target data to the data dimensionality reduction processing module;

所述分类处理模块，用于接收数据降维处理模块输出的降维后的目标数据，并根据降维后的目标数据进行分类处理，并输出分类标签；The classification processing module is configured to receive the dimensionally reduced target data output by the data dimensionality reduction processing module, perform classification processing according to the dimensionally reduced target data, and output a classification label;

所述分类器模块，用于接收分类标签，采用分类标签对分类器模块进行训练，然后分类器模块接收实时的多组学肝癌数据并对肝癌生存期进行预测。The classifier module is used for receiving classification labels, and using the classification labels to train the classifier module, and then the classifier module receives real-time multi-omics liver cancer data and predicts the survival time of liver cancer.

进一步，所述预处理模块对肝癌多组学数据筛选包括：Further, the preprocessing module for screening liver cancer multi-omics data includes:

所述预处理模块基于单变量Cox-PH模型对肝癌多组学数据的每个特征进行评分，然后将分值Per1与设定阈值P_y进行对比，筛选出Per1＜P_y的特征，并将筛选出的数据进行融合形成目标数据。The preprocessing module scores each feature of the multi-omics data of liver cancer based on the univariate Cox-PH model, then compares the score Per1 with the set threshold P _y to screen out the features with Per1 < P _y The filtered data are fused to form target data.

进一步，所述数据降维处理模块对目标数据进行降维处理具体包括：Further, the dimensionality reduction processing performed by the data dimensionality reduction processing module on the target data specifically includes:

SA1.在数据降维处理模块中构建K层自编码器，其中，K层自编码器的输出函数为：SA1. Build a K-layer autoencoder in the data dimensionality reduction processing module, where the output function of the K-layer autoencoder is:

x'＝Relu(W_i·Relu(W_ix+b_i))；其中，W_i为相邻自编码器之间的权重矩阵，b_i为权重矩阵W_i的偏移量，x为m维目标数据X＝(x₁,x₂,…,x_m)中的特征值；x'=Relu(W _i · _Relu (W _i x+ _bi )); wherein, Wi is the weight matrix between adjacent self-encoders, _bi is the offset of the weight matrix Wi _, and x is m Dimensional target data X = eigenvalues in (x ₁ , x ₂ ,...,x _m );

SA2.数据降维处理模块构建损失函数，其中，损失函数为：SA2. The data dimensionality reduction processing module constructs a loss function, where the loss function is:

其中，L(x,x')为损失函数，β_w为正则化惩罚系数，

Among them, L(x,x') is the loss function, _βw is the regularization penalty coefficient,

SA3.通过损失函数进行迭代运算，更新权重矩阵W_i和权重矩阵W_i的偏移量b_i，直至达到迭代次数后，数据降维处理模块输出降维处理后的目标数据。SA3. Perform an iterative operation through the loss function, update the weight matrix _Wi and the offset _bi of the weight matrix _Wi , until the number of iterations is reached, the data dimensionality reduction processing module outputs the target data after dimensionality reduction.

进一步，所述分类处理模块的生存期预测具体包括：Further, the survival prediction of the classification processing module specifically includes:

SB1.分类处理模块采用单变量Cox-PH模型对降维处理后的目标数据中的特征再次进行评分，然后将特征的评分值Per2与设定阈值P_y进行比较，筛选出Per2＜P_y的特征，并将筛选出的数据进行融合处理；SB1. The classification processing module uses the univariate Cox-PH model to score the features in the target data after dimensionality reduction processing again, and then compares the score value Per2 of the feature with the set threshold P _y , and filters out Per2 < P _y . features, and fuse the filtered data;

SB2.分类处理模块构建归一化处理模型，并对步骤SB1处理后的数据进行归一化处理，其中，归一化处理模型为：SB2. The classification processing module constructs a normalization processing model, and normalizes the data processed in step SB1, wherein the normalization processing model is:

p为步骤SB1输出的特征数据，P为归一化处理后的特征数据，Var(p)为特征数据p的方差，E(p)为特征数据p的经验平均值；

p is the characteristic data output in step SB1, P is the normalized characteristic data, Var(p) is the variance of the characteristic data p, and E(p) is the empirical average value of the characteristic data p;

SB3.分类处理模块构建相似性函数：SB3. The classification processing module builds the similarity function:

其中，W(i,j)为第i个样本z_i与第j个样本z_j的相似性，θ_ij为归一化因子；其中：

Among them, W(i,j) is the similarity between the ith sample _zi and the jth sample z _j , and θ _ij is the normalization factor; where:

λ_i为第i个样本z_i的k个近邻，λ_j为第j个样本z_j的k个近邻；z_r表示λ_i里的第r个样本。

λ _i is the k nearest neighbors of the ith sample zi _i , λ _j is the k nearest neighbors of the j th sample z _j ; z _r represents the r th sample in λ _i .

SB4.分类处理模块根据相似性函数确定出分类标签，并输出至分类器模块。SB4. The classification processing module determines the classification label according to the similarity function, and outputs it to the classifier module.

本发明的有益效果：通过本发明，能够对肝癌多组学数据进行良好地融合，有效利用数据的互补性将肝癌多组学数据融合在一起，从而有效避免了在数据处理过程中特征信息丢失，有效确保数据处理的准确性，为后续肝癌生存期预测的准确性提供保障。Beneficial effects of the present invention: through the present invention, the multi-omics data of liver cancer can be well fused, and the complementarity of the data can be effectively used to fuse the multi-omics data of liver cancer, thereby effectively avoiding the loss of characteristic information in the process of data processing , which can effectively ensure the accuracy of data processing and provide a guarantee for the accuracy of subsequent liver cancer survival prediction.

附图说明Description of drawings

下面结合附图和实施例对本发明作进一步描述：Below in conjunction with accompanying drawing and embodiment, the present invention is further described:

图1为本发明的结构示意图。FIG. 1 is a schematic structural diagram of the present invention.

图2为本发明的分类标签示意图。FIG. 2 is a schematic diagram of a classification label of the present invention.

图3为本发明的具体实例对比图。3 is a comparison diagram of a specific example of the present invention.

具体实施方式Detailed ways

以下结合说明书附图对本发明做出进一步详细说明：The present invention is further described in detail below in conjunction with the accompanying drawings:

所述分类器模块，用于接收分类标签，采用分类标签对分类器模块进行训练，然后分类器模块接收实时的多组学肝癌数据并对肝癌生存期进行预测；通过本发明，能够对肝癌多组学数据进行良好地融合，有效利用数据的互补性将肝癌多组学数据融合在一起，从而有效避免了在数据处理过程中特征信息丢失，有效确保数据处理的准确性，为后续肝癌生存期预测的准确性提供保障。The classifier module is used to receive the classification label, and use the classification label to train the classifier module, and then the classifier module receives the real-time multi-omics liver cancer data and predicts the survival period of liver cancer; The omics data is well fused, and the complementarity of the data is effectively used to fuse the multi-omics data of liver cancer, thereby effectively avoiding the loss of characteristic information in the process of data processing, effectively ensuring the accuracy of data processing, and improving the survival of subsequent liver cancer. The accuracy of the forecast is guaranteed.

本实施例中，所述预处理模块对肝癌多组学数据筛选包括：In this embodiment, the preprocessing module for screening liver cancer multi-omics data includes:

所述预处理模块基于单变量Cox-PH模型对肝癌多组学数据的每个特征进行评分，然后将分值Per1与设定阈值P_y进行对比，筛选出Per1＜P_y的特征，并将筛选出的数据进行融合形成目标数据，其中，设定阈值P_y一般设定为0.5，通过上述，能够有效防止处理过程中信息的丢失，从而确保最终结果的准确性。The preprocessing module scores each feature of the multi-omics data of liver cancer based on the univariate Cox-PH model, then compares the score Per1 with the set threshold P _y to screen out the features with Per1 < P _y The filtered data are fused to form target data, wherein the set threshold P _y is generally set to 0.5. Through the above, the loss of information during processing can be effectively prevented, thereby ensuring the accuracy of the final result.

本实施例中，所述数据降维处理模块对目标数据进行降维处理具体包括：In this embodiment, the dimensionality reduction processing performed on the target data by the data dimensionality reduction processing module specifically includes:

其中，L(x,x')为损失函数，β_w为正则化惩罚系数，

本实施例中，所述分类处理模块的生存期预测具体包括：In this embodiment, the survival prediction of the classification processing module specifically includes:

SB1.分类处理模块采用单变量Cox-PH模型对降维处理后的目标数据中的特征再次进行评分，然后将特征的评分值Per2与设定阈值P_y进行比较，筛选出Per2＜P_y的特征，并将筛选出的数据进行融合处理，其中，该数据融合过程中为将多个特征组合形成一个特征矩阵；SB1. The classification processing module uses the univariate Cox-PH model to score the features in the target data after dimensionality reduction processing again, and then compares the score value Per2 of the feature with the set threshold P _y , and filters out Per2 < P _y . features, and fuse the filtered data, wherein, in the data fusion process, a feature matrix is formed by combining multiple features;

SB4.分类处理模块根据相似性函数确定出分类标签，并输出至分类器模块。其中，分类器模块采用XGBoost分类器，多组学肝癌数据包括RNA测序数据、miRNA数据、DNA甲基化数据；以RNA测序数据为例：在预处理模块进行筛选时，从RNA测序数据中筛选出符合筛选标准的特征数据，然后各个RNA测序数据的筛选数据进行重新组合，形成一个新的RNA测序数据。SB4. The classification processing module determines the classification label according to the similarity function, and outputs it to the classifier module. Among them, the classifier module uses the XGBoost classifier, and the multi-omics liver cancer data includes RNA sequencing data, miRNA data, and DNA methylation data; taking RNA sequencing data as an example: when the preprocessing module performs screening, the RNA sequencing data is screened. Feature data that meet the screening criteria are obtained, and then the screening data of each RNA sequencing data are recombined to form a new RNA sequencing data.

而在步骤SB1中，则将三种多组学数据筛选出的特征融合形成一个数据矩阵，该数据矩阵为n×n阶，将该矩阵的每一列作为一个样本，那么在进行聚类处理时具有n个样本{z₁,z₂,…,z_n}，分类器模块通过上述对各个样本进行聚类分析，得出最终的分类标签，一般来说，分类标签设定为2个。In step SB1, the features selected from the three kinds of multi-omics data are fused to form a data matrix, and the data matrix is of order n×n, and each column of the matrix is used as a sample. With n samples {z ₁ , z ₂ ,..., z _n }, the classifier module obtains the final classification label by performing cluster analysis on each sample as described above. Generally speaking, the classification label is set to 2.

从GEO数据库中挖掘的数据集GSE14520和GSE31384分别作为RNA-seq和miRNA训练分类器的确认队列。对于这两个确认队列，我们首先选择训练集样本中的共同特征，然后使用与多组分数据规范化相同的方法对数据进行规范化。在研究中，我们需要为训练集和两个队列选择基于聚类标签的M个特征。这样，两个队列将作为验证数据集对模型进行测试，最终得到分类结果。在这里，我们设置M的值(50-100)，发现当M的值设置为50时，所得到的训练模型可以获得最佳的预测结果。Datasets GSE14520 and GSE31384 were mined from the GEO database as validation cohorts for RNA-seq and miRNA training classifiers, respectively. For both confirmation cohorts, we first select common features in the training set samples, and then normalize the data using the same method as for multicomponent data normalization. In our study, we need to select M features based on cluster labels for the training set and two cohorts. In this way, the two cohorts will be used as validation datasets to test the model and finally get the classification result. Here, we set the value of M (50-100) and found that when the value of M is set to 50, the resulting trained model can achieve the best prediction results.

以TCGA为训练数据集，获得肝癌的RNA-seq、miRNA-seq和DNA甲基化数据，预测处理模块构建单变量Cox-PH模型得到Per1<0.05的特征，然后将处理后的多组学数据输入到降维处理模块处理后，输入到分类处理模块中再次构建单变量Cox-PH模型进行筛选得到Per1<0.05的特征，最后，分类器模块使用谱聚类获得两个生存差异显著的亚型，基于得到的聚类标签，分类器模块还使用XGBoost分类器通过聚类标签进行训练，然后输入实时的多组学肝癌数据进行生存期预测。为了验证该分类器在预测生存率方面的有效性，我们使用了来自GEO的两组数据，即GSE1452和GES31384来验证该模型如图2。对于两种生存亚型的生存曲线，我们的结果优于其他模型的结果，可见与其他已发表的模型相比，我们的模型的预测效果有了显著的提高。Using TCGA as the training data set, the RNA-seq, miRNA-seq and DNA methylation data of liver cancer were obtained, and the prediction processing module constructed a univariate Cox-PH model to obtain the feature of Per1<0.05, and then processed the multi-omics data. After input to the dimensionality reduction processing module for processing, input to the classification processing module to construct a univariate Cox-PH model again for screening to obtain features with Per1 < 0.05. Finally, the classifier module uses spectral clustering to obtain two subtypes with significant survival differences. , based on the obtained cluster labels, the classifier module also uses the XGBoost classifier to train with the cluster labels, and then input real-time multi-omics liver cancer data for survival prediction. To verify the effectiveness of this classifier in predicting survival, we used two sets of data from GEO, namely GSE1452 and GES31384, to validate the model as shown in Figure 2. For the survival curves of the two survival subtypes, our results outperformed those of the other models, showing a significant improvement in the predictive power of our model compared to other published models.

最后，我们还将我们的结果与其他模型的结果进行了比较。无论是对数秩P值还是C指数，我们的实验结果都明显优于其他实验结果，如图3。Finally, we also compare our results with those of other models. Both the log-rank P value and the C index, our experimental results are significantly better than other experimental results, as shown in Figure 3.

在差异基因表达分析中，我们可以鉴定1465个上调基因和930个下调基因，包括肿瘤标记基因BIRC5(P＝2.07e-41)和干细胞标记基因CD24(P＝2.83e-11)、KRT19(P＝2.82e-26)和EPCAM(P＝1.01e-6)。此外，我们还发现了28个基因(SLC2A2、AQP9、RGN、SULT2A1、CRYL1、SERPINC1、PAH、CDO1、PLG、APOC3、CYP27A1、PFKFB3、TM4SF1、ACSL5、RGS2、HN1、SERPINA10、CYB5A、EPHX2、SPHX2、RGS1、ADH1B、LECT2、TBX3、RNASE4、ALDOA、ADH6，SLC38A1)在我们确定的两个生存风险组之间是不同的，并且与肝癌的生存有很强的关系。In differential gene expression analysis, we could identify 1465 up-regulated genes and 930 down-regulated genes, including tumor marker gene BIRC5 (P=2.07e-41) and stem cell marker genes CD24 (P=2.83e-11), KRT19 (P=2.83e-11) = 2.82e-26) and EPCAM (P = 1.01e-6). In addition, we also found 28 genes (SLC2A2, AQP9, RGN, SULT2A1, CRYL1, SERPINC1, PAH, CDO1, PLG, APOC3, CYP27A1, PFKFB3, TM4SF1, ACSL5, RGS2, HN1, SERPINA10, CYB5A, EPHX2, SPHX2, RGS1, ADH1B, LECT2, TBX3, RNASE4, ALDOA, ADH6, SLC38A1) were different between the two survival risk groups we identified and were strongly associated with HCC survival.

对于通过差异分析获得的差异表达基因，我们还对两个亚组进行了基因和基因组京都百科全书(KEGG)途径分析。PI3K-Akt信号通路、细胞周期信号通路、p53信号通路等在侵袭性亚型(C2)中富含肿瘤相关途径，其中P13K-Akt信号通路也与CD8+T细胞浸润有关。低危生存亚型(C1)存在药物代谢、细胞色素P450、代谢途径和脂肪酸降解等相关途径。这些途径对研究肝癌的预后具有重要意义。For differentially expressed genes obtained by differential analysis, we also performed Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis on both subgroups. The PI3K-Akt signaling pathway, cell cycle signaling pathway, and p53 signaling pathway are enriched in tumor-related pathways in the aggressive subtype (C2), among which the P13K-Akt signaling pathway is also associated with CD8+ T cell infiltration. The low-risk survival subtype (C1) has related pathways such as drug metabolism, cytochrome P450, metabolic pathways and fatty acid degradation. These pathways have important implications for studying the prognosis of liver cancer.

最后说明的是，以上实施例仅用以说明本发明的技术方案而非限制，尽管参照较佳实施例对本发明进行了详细说明，本领域的普通技术人员应当理解，可以对本发明的技术方案进行修改或者等同替换，而不脱离本发明技术方案的宗旨和范围，其均应涵盖在本发明的权利要求范围当中。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent replacements without departing from the spirit and scope of the technical solutions of the present invention should be included in the scope of the claims of the present invention.

Claims

1. a liver cancer data processing system based on multi-omics data, is characterized in that: comprise preprocessing module, data dimensionality reduction processing module, classification processing module and classifier module;

The preprocessing module is used for screening liver cancer multi-omics data, and outputting the screened target data to the data dimension reduction processing module;

The data dimensionality reduction processing module is used to receive the target data output by the preprocessing module, perform dimensionality reduction processing on the target data, and output the dimensionality reduction processed target data to the data dimensionality reduction processing module;

The classification processing module is configured to receive the dimensionally reduced target data output by the data dimensionality reduction processing module, perform classification processing according to the dimensionally reduced target data, and output a classification label;

The classifier module is used for receiving classification labels, and using the classification labels to train the classifier module, and then the classifier module receives real-time multi-omics liver cancer data and predicts the survival time of liver cancer.

2. The liver cancer data processing system based on multi-omics data according to claim 1, characterized in that: the screening of liver cancer multi-omics data by the preprocessing module comprises:

The preprocessing module scores each feature of the multi-omics data of liver cancer based on the univariate Cox-PH model, and then compares the score Per1 with the set threshold P _y to screen out the features with Per1 < P _y The filtered data are fused to form target data.

3. The liver cancer data processing system based on multi-omics data according to claim 2, wherein the data dimensionality reduction processing module performs dimensionality reduction processing on the target data specifically comprising:

SA1. Build a K-layer autoencoder in the data dimensionality reduction processing module, where the output function of the K-layer autoencoder is:

x'=Relu(W _i · _Relu (W _i x+ _bi )); wherein, Wi is the weight matrix between adjacent self-encoders, _bi is the offset of the weight matrix Wi _, and x is m Dimensional target data X = eigenvalues in (x ₁ , x ₂ ,...,x _m );

SA2. The data dimensionality reduction processing module constructs a loss function, where the loss function is:

SA3. Perform an iterative operation through the loss function, update the weight matrix _Wi and the offset _bi of the weight matrix _Wi , until the number of iterations is reached, the data dimensionality reduction processing module outputs the target data after dimensionality reduction.

4. The liver cancer data processing system based on multi-omics data according to claim 3, wherein the survival prediction of the classification processing module specifically includes:

SB1. The classification processing module uses the univariate Cox-PH model to score the features in the target data after dimensionality reduction processing again, and then compares the score value Per2 of the feature with the set threshold P _y , and filters out Per2 < P _y . features, and fuse the filtered data;

SB2. The classification processing module constructs a normalization processing model, and normalizes the data processed in step SB1, wherein the normalization processing model is:

SB3. The classification processing module builds the similarity function:

SB4. The classification processing module determines the classification label according to the similarity function, and outputs it to the classifier module.