CN112086199A - Liver cancer data processing system based on multiple groups of mathematical data - Google Patents
Liver cancer data processing system based on multiple groups of mathematical data Download PDFInfo
- Publication number
- CN112086199A CN112086199A CN202010963978.3A CN202010963978A CN112086199A CN 112086199 A CN112086199 A CN 112086199A CN 202010963978 A CN202010963978 A CN 202010963978A CN 112086199 A CN112086199 A CN 112086199A
- Authority
- CN
- China
- Prior art keywords
- data
- processing module
- module
- dimensionality reduction
- liver cancer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012545 processing Methods 0.000 title claims abstract description 99
- 201000007270 liver cancer Diseases 0.000 title claims abstract description 47
- 208000014018 liver neoplasm Diseases 0.000 title claims abstract description 47
- 230000009467 reduction Effects 0.000 claims abstract description 49
- 230000004083 survival effect Effects 0.000 claims abstract description 26
- 238000007781 pre-processing Methods 0.000 claims abstract description 19
- 238000012216 screening Methods 0.000 claims abstract description 11
- 239000011159 matrix material Substances 0.000 claims description 16
- 238000010606 normalization Methods 0.000 claims description 10
- 101150008094 per1 gene Proteins 0.000 claims description 8
- 101150074181 PER2 gene Proteins 0.000 claims description 6
- 238000000034 method Methods 0.000 abstract description 8
- 230000008569 process Effects 0.000 abstract description 4
- 238000007405 data analysis Methods 0.000 abstract 1
- 238000003559 RNA-seq method Methods 0.000 description 8
- 230000004927 fusion Effects 0.000 description 6
- 108090000623 proteins and genes Proteins 0.000 description 5
- 238000004393 prognosis Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 206010073071 hepatocellular carcinoma Diseases 0.000 description 3
- 108091070501 miRNA Proteins 0.000 description 3
- 239000002679 microRNA Substances 0.000 description 3
- 230000037361 pathway Effects 0.000 description 3
- 230000007067 DNA methylation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000007500 overflow downdraw method Methods 0.000 description 2
- 230000001105 regulatory effect Effects 0.000 description 2
- 230000019491 signal transduction Effects 0.000 description 2
- 238000001356 surgical procedure Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 230000007730 Akt signaling Effects 0.000 description 1
- 102100031794 Alcohol dehydrogenase 6 Human genes 0.000 description 1
- 102100034044 All-trans-retinol dehydrogenase [NAD(+)] ADH1B Human genes 0.000 description 1
- 102100022977 Antithrombin-III Human genes 0.000 description 1
- 102100030970 Apolipoprotein C-III Human genes 0.000 description 1
- 102100029406 Aquaporin-7 Human genes 0.000 description 1
- 102100035342 Cysteine dioxygenase type 1 Human genes 0.000 description 1
- 108010015742 Cytochrome P-450 Enzyme System Proteins 0.000 description 1
- 102000002004 Cytochrome P-450 Enzyme System Human genes 0.000 description 1
- 102100031655 Cytochrome b5 Human genes 0.000 description 1
- 230000022963 DNA damage response, signal transduction by p53 class mediator Effects 0.000 description 1
- 102000012804 EPCAM Human genes 0.000 description 1
- 101150084967 EPCAM gene Proteins 0.000 description 1
- 102100022629 Fructose-2,6-bisphosphatase Human genes 0.000 description 1
- 102100022277 Fructose-bisphosphate aldolase A Human genes 0.000 description 1
- 102000058058 Glucose Transporter Type 2 Human genes 0.000 description 1
- 101000775460 Homo sapiens Alcohol dehydrogenase 6 Proteins 0.000 description 1
- 101000780453 Homo sapiens All-trans-retinol dehydrogenase [NAD(+)] ADH1B Proteins 0.000 description 1
- 101000757319 Homo sapiens Antithrombin-III Proteins 0.000 description 1
- 101000793223 Homo sapiens Apolipoprotein C-III Proteins 0.000 description 1
- 101000771402 Homo sapiens Aquaporin-7 Proteins 0.000 description 1
- 101000771413 Homo sapiens Aquaporin-9 Proteins 0.000 description 1
- 101000737778 Homo sapiens Cysteine dioxygenase type 1 Proteins 0.000 description 1
- 101000922386 Homo sapiens Cytochrome b5 Proteins 0.000 description 1
- 101000823463 Homo sapiens Fructose-2,6-bisphosphatase Proteins 0.000 description 1
- 101000755879 Homo sapiens Fructose-bisphosphate aldolase A Proteins 0.000 description 1
- 101000988651 Homo sapiens Humanin-like 1 Proteins 0.000 description 1
- 101001050286 Homo sapiens Jupiter microtubule associated homolog 1 Proteins 0.000 description 1
- 101000998011 Homo sapiens Keratin, type I cytoskeletal 19 Proteins 0.000 description 1
- 101000745469 Homo sapiens Lambda-crystallin homolog Proteins 0.000 description 1
- 101000945751 Homo sapiens Leukocyte cell-derived chemotaxin-2 Proteins 0.000 description 1
- 101001077840 Homo sapiens Lipid-phosphate phosphatase Proteins 0.000 description 1
- 101000780205 Homo sapiens Long-chain-fatty-acid-CoA ligase 5 Proteins 0.000 description 1
- 101000605403 Homo sapiens Plasminogen Proteins 0.000 description 1
- 101000964691 Homo sapiens Protein Z-dependent protease inhibitor Proteins 0.000 description 1
- 101000582767 Homo sapiens Regucalcin Proteins 0.000 description 1
- 101000692933 Homo sapiens Ribonuclease 4 Proteins 0.000 description 1
- 101000884271 Homo sapiens Signal transducer CD24 Proteins 0.000 description 1
- 101000713305 Homo sapiens Sodium-coupled neutral amino acid transporter 1 Proteins 0.000 description 1
- 101000875401 Homo sapiens Sterol 26-hydroxylase, mitochondrial Proteins 0.000 description 1
- 101000585365 Homo sapiens Sulfotransferase 2A1 Proteins 0.000 description 1
- 101000666775 Homo sapiens T-box transcription factor TBX3 Proteins 0.000 description 1
- 101000658574 Homo sapiens Transmembrane 4 L6 family member 1 Proteins 0.000 description 1
- 102100023133 Jupiter microtubule associated homolog 1 Human genes 0.000 description 1
- 102100033420 Keratin, type I cytoskeletal 19 Human genes 0.000 description 1
- 102100039324 Lambda-crystallin homolog Human genes 0.000 description 1
- 102100034762 Leukocyte cell-derived chemotaxin-2 Human genes 0.000 description 1
- 102100025357 Lipid-phosphate phosphatase Human genes 0.000 description 1
- 102100034318 Long-chain-fatty-acid-CoA ligase 5 Human genes 0.000 description 1
- 238000012179 MicroRNA sequencing Methods 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 102100038223 Phenylalanine-4-hydroxylase Human genes 0.000 description 1
- 101710125939 Phenylalanine-4-hydroxylase Proteins 0.000 description 1
- 102100038124 Plasminogen Human genes 0.000 description 1
- 102100040790 Protein Z-dependent protease inhibitor Human genes 0.000 description 1
- 102100030262 Regucalcin Human genes 0.000 description 1
- 102100021269 Regulator of G-protein signaling 1 Human genes 0.000 description 1
- 101710140408 Regulator of G-protein signaling 1 Proteins 0.000 description 1
- 102100021258 Regulator of G-protein signaling 2 Human genes 0.000 description 1
- 101710140412 Regulator of G-protein signaling 2 Proteins 0.000 description 1
- 102100026411 Ribonuclease 4 Human genes 0.000 description 1
- 108091006299 SLC2A2 Proteins 0.000 description 1
- -1 SPHX2 Proteins 0.000 description 1
- 102100036916 Sodium-coupled neutral amino acid transporter 1 Human genes 0.000 description 1
- 102100036325 Sterol 26-hydroxylase, mitochondrial Human genes 0.000 description 1
- 102100029867 Sulfotransferase 2A1 Human genes 0.000 description 1
- 108010002687 Survivin Proteins 0.000 description 1
- 102100038409 T-box transcription factor TBX3 Human genes 0.000 description 1
- 210000001744 T-lymphocyte Anatomy 0.000 description 1
- 101150057140 TACSTD1 gene Proteins 0.000 description 1
- 102100034902 Transmembrane 4 L6 family member 1 Human genes 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000022131 cell cycle Effects 0.000 description 1
- 239000002771 cell marker Substances 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000009274 differential gene expression Effects 0.000 description 1
- 230000036267 drug metabolism Effects 0.000 description 1
- 238000010195 expression analysis Methods 0.000 description 1
- 230000004133 fatty acid degradation Effects 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000008595 infiltration Effects 0.000 description 1
- 238000001764 infiltration Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000037353 metabolic pathway Effects 0.000 description 1
- 230000011987 methylation Effects 0.000 description 1
- 238000007069 methylation reaction Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000003068 pathway analysis Methods 0.000 description 1
- 238000002271 resection Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 210000000130 stem cell Anatomy 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 239000000439 tumor marker Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- Public Health (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Epidemiology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Pathology (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Primary Health Care (AREA)
- Bioethics (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Biotechnology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
本发明提供的一种基于多组学数据的肝癌数据处理系统,包括预处理模块、数据降维处理模块、分类处理模块以及分类器模块;所述预处理模块,用于对肝癌多组学数据进行筛选,并将筛选出的目标数据输出至数据降维处理模块中;所述数据降维处理模块,用于接收预处理模块输出的目标数据,并对目标数据进行降维处理,并将降维处理后的目标数据输出至数据降维处理模块中;所述分类处理模块,用于接收数据降维处理模块输出的降维后的目标数据,并根据降维后的目标数据进行分类处理,并输出分类标签;所述分类器模块,用于接收分类标签,采用分类标签对分类器模块进行训练,然后分类器模块接收实时的多组学肝癌数据并对肝癌生存期进行预测;能够对肝癌多组学数据进行良好地融合,有效利用数据的互补性将肝癌多组学数据融合在一起,从而有效避免了在数据处理过程中特征信息丢失,有效确保数据处理的准确性,为后续肝癌生存期预测的准确性提供保障。
A liver cancer data processing system based on multi-omics data provided by the present invention includes a preprocessing module, a data dimensionality reduction processing module, a classification processing module and a classifier module; the preprocessing module is used for multi-omics data analysis of liver cancer. Perform screening, and output the screened target data to the data dimensionality reduction processing module; the data dimensionality reduction processing module is used to receive the target data output by the preprocessing module, perform dimensionality reduction processing on the target data, and reduce the The dimensionally processed target data is output to the data dimensionality reduction processing module; the classification processing module is used to receive the dimensionally reduced target data output by the data dimensionality reduction processing module, and perform classification processing according to the dimensionally reduced target data, and output the classification label; the classifier module is used to receive the classification label, use the classification label to train the classifier module, and then the classifier module receives the real-time multi-omics liver cancer data and predicts the survival time of liver cancer; The multi-omics data is well fused, and the complementarity of the data is effectively used to fuse the multi-omics data of liver cancer, thereby effectively avoiding the loss of characteristic information in the process of data processing, effectively ensuring the accuracy of data processing, and ensuring the survival of subsequent liver cancer. guarantee the accuracy of the forecast.
Description
技术领域technical field
本发明涉及一种数据处理系统,尤其涉及一种基于多组学数据的肝癌数据处理系统。The invention relates to a data processing system, in particular to a liver cancer data processing system based on multi-omics data.
背景技术Background technique
早期的肝癌主要以手术切除为主,但临床资料显示,术后肝癌复发率约为70%,严重阻碍了患者的长期生存。如果我们建立HCC的分型标准,对高危复发患者进行更加精细的分层管理,首先从源头上筛选出可能获益的人群再进行手术,对于改善患者生存、实现HCC的精准治疗可能具有更加重要的意义。基于多组学数据建立肝癌的分类标准,对不同的患者进行更准确的预后治疗和管理,将提高患者的生存率。因此,对于融合多组学数据从分子层面来对患者进行分型并预测患者的预后有着重要意义,这对患者的治疗也有着临床意义。Surgical resection is the main method for early stage liver cancer, but clinical data show that the recurrence rate of liver cancer after surgery is about 70%, which seriously hinders the long-term survival of patients. If we establish classification standards for HCC, carry out more detailed stratified management of high-risk recurrence patients, and first screen out the potentially beneficial population from the source before performing surgery, it may be more important to improve patient survival and achieve precise treatment of HCC. meaning. To establish a classification standard for liver cancer based on multi-omics data, and to carry out more accurate prognostic treatment and management for different patients, will improve the survival rate of patients. Therefore, it is of great significance to fuse multi-omics data to classify patients at the molecular level and predict the prognosis of patients, which also has clinical significance for the treatment of patients.
近年来也有融合RNA测序数据、miRNA数据、甲基化数据和肝癌患者的临床生存数据来对肝癌进行分型并预测预后的方法。但是,现有技术中,很少有研究者在研究分子亚型时考虑患者的生存状态。生存率对分子亚型的研究具有重要的临床意义,而生存率的巨大差异往往对分子亚型有很大的影响。利用多组学数据的融合来进行分子分型并预测预后有以下两个特点:(1)多组学数据的融合时期一般分为早期融合,中期融合和后期融合,不同的融合时期对融合结果存在很大的影响。(2)融合方式也有很大的影响。现有技术的融合方法或者系统存在以下缺陷:一方面采用自动编码器对输入数据进行集成,但是容易造成特征数据丢失,另一方面,现有技术对于数据的仅仅简单地将数据直接叠加,使得不同的数据融合性差,数据不能互补,不能提取出准确的信息。In recent years, there are also methods that fuse RNA sequencing data, miRNA data, methylation data and clinical survival data of liver cancer patients to classify liver cancer and predict prognosis. However, in the prior art, few researchers consider the patient's survival status when studying molecular subtypes. Survival rates have important clinical implications for the study of molecular subtypes, and large differences in survival rates often have a large impact on molecular subtypes. Using the fusion of multi-omics data to perform molecular typing and predict prognosis has the following two characteristics: (1) The fusion period of multi-omics data is generally divided into early fusion, mid-term fusion and late fusion. There is a big impact. (2) The fusion method also has a great influence. The fusion method or system of the prior art has the following defects: on the one hand, an automatic encoder is used to integrate the input data, but it is easy to cause the loss of characteristic data; The fusion of different data is poor, the data cannot complement each other, and accurate information cannot be extracted.
因此,为了解决上述技术问题,亟需提出一种新的技术手段。Therefore, in order to solve the above technical problems, it is urgent to propose a new technical means.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本发明的目的是提供一种基于多组学数据的肝癌数据处理系统,能够对肝癌多组学数据进行良好地融合,有效利用数据的互补性将肝癌多组学数据融合在一起,从而有效避免了在数据处理过程中特征信息丢失,有效确保数据处理的准确性,为后续肝癌生存期预测的准确性提供保障。In view of this, the purpose of the present invention is to provide a liver cancer data processing system based on multi-omics data, which can well fuse the liver cancer multi-omics data, and effectively utilize the complementarity of the data to fuse the liver cancer multi-omics data together. , so as to effectively avoid the loss of characteristic information in the process of data processing, effectively ensure the accuracy of data processing, and provide a guarantee for the accuracy of subsequent liver cancer survival prediction.
本发明提供的一种基于多组学数据的肝癌数据处理系统,包括预处理模块、数据降维处理模块、分类处理模块以及分类器模块;The invention provides a liver cancer data processing system based on multi-omics data, comprising a preprocessing module, a data dimensionality reduction processing module, a classification processing module and a classifier module;
所述预处理模块,用于对肝癌多组学数据进行筛选,并将筛选出的目标数据输出至数据降维处理模块中;The preprocessing module is used for screening liver cancer multi-omics data, and outputting the screened target data to the data dimension reduction processing module;
所述数据降维处理模块,用于接收预处理模块输出的目标数据,并对目标数据进行降维处理,并将降维处理后的目标数据输出至数据降维处理模块中;The data dimensionality reduction processing module is used to receive the target data output by the preprocessing module, perform dimensionality reduction processing on the target data, and output the dimensionality reduction processed target data to the data dimensionality reduction processing module;
所述分类处理模块,用于接收数据降维处理模块输出的降维后的目标数据,并根据降维后的目标数据进行分类处理,并输出分类标签;The classification processing module is configured to receive the dimensionally reduced target data output by the data dimensionality reduction processing module, perform classification processing according to the dimensionally reduced target data, and output a classification label;
所述分类器模块,用于接收分类标签,采用分类标签对分类器模块进行训练,然后分类器模块接收实时的多组学肝癌数据并对肝癌生存期进行预测。The classifier module is used for receiving classification labels, and using the classification labels to train the classifier module, and then the classifier module receives real-time multi-omics liver cancer data and predicts the survival time of liver cancer.
进一步,所述预处理模块对肝癌多组学数据筛选包括:Further, the preprocessing module for screening liver cancer multi-omics data includes:
所述预处理模块基于单变量Cox-PH模型对肝癌多组学数据的每个特征进行评分,然后将分值Per1与设定阈值Py进行对比,筛选出Per1<Py的特征,并将筛选出的数据进行融合形成目标数据。The preprocessing module scores each feature of the multi-omics data of liver cancer based on the univariate Cox-PH model, then compares the score Per1 with the set threshold P y to screen out the features with Per1 < P y The filtered data are fused to form target data.
进一步,所述数据降维处理模块对目标数据进行降维处理具体包括:Further, the dimensionality reduction processing performed by the data dimensionality reduction processing module on the target data specifically includes:
SA1.在数据降维处理模块中构建K层自编码器,其中,K层自编码器的输出函数为:SA1. Build a K-layer autoencoder in the data dimensionality reduction processing module, where the output function of the K-layer autoencoder is:
x'=Relu(Wi·Relu(Wix+bi));其中,Wi为相邻自编码器之间的权重矩阵,bi为权重矩阵Wi的偏移量,x为m维目标数据X=(x1,x2,…,xm)中的特征值;x'=Relu(W i · Relu (W i x+ bi )); wherein, Wi is the weight matrix between adjacent self-encoders, bi is the offset of the weight matrix Wi , and x is m Dimensional target data X = eigenvalues in (x 1 , x 2 ,...,x m );
SA2.数据降维处理模块构建损失函数,其中,损失函数为:SA2. The data dimensionality reduction processing module constructs a loss function, where the loss function is:
其中,L(x,x')为损失函数,βw为正则化惩罚系数, Among them, L(x,x') is the loss function, βw is the regularization penalty coefficient,
SA3.通过损失函数进行迭代运算,更新权重矩阵Wi和权重矩阵Wi的偏移量bi,直至达到迭代次数后,数据降维处理模块输出降维处理后的目标数据。SA3. Perform an iterative operation through the loss function, update the weight matrix Wi and the offset bi of the weight matrix Wi , until the number of iterations is reached, the data dimensionality reduction processing module outputs the target data after dimensionality reduction.
进一步,所述分类处理模块的生存期预测具体包括:Further, the survival prediction of the classification processing module specifically includes:
SB1.分类处理模块采用单变量Cox-PH模型对降维处理后的目标数据中的特征再次进行评分,然后将特征的评分值Per2与设定阈值Py进行比较,筛选出Per2<Py的特征,并将筛选出的数据进行融合处理;SB1. The classification processing module uses the univariate Cox-PH model to score the features in the target data after dimensionality reduction processing again, and then compares the score value Per2 of the feature with the set threshold P y , and filters out Per2 < P y . features, and fuse the filtered data;
SB2.分类处理模块构建归一化处理模型,并对步骤SB1处理后的数据进行归一化处理,其中,归一化处理模型为:SB2. The classification processing module constructs a normalization processing model, and normalizes the data processed in step SB1, wherein the normalization processing model is:
p为步骤SB1输出的特征数据,P为归一化处理后的特征数据,Var(p)为特征数据p的方差,E(p)为特征数据p的经验平均值; p is the characteristic data output in step SB1, P is the normalized characteristic data, Var(p) is the variance of the characteristic data p, and E(p) is the empirical average value of the characteristic data p;
SB3.分类处理模块构建相似性函数:SB3. The classification processing module builds the similarity function:
其中,W(i,j)为第i个样本zi与第j个样本zj的相似性,θij为归一化因子;其中: Among them, W(i,j) is the similarity between the ith sample zi and the jth sample z j , and θ ij is the normalization factor; where:
λi为第i个样本zi的k个近邻,λj为第j个样本zj的k个近邻;zr表示λi里的第r个样本。 λ i is the k nearest neighbors of the ith sample zi i , λ j is the k nearest neighbors of the j th sample z j ; z r represents the r th sample in λ i .
SB4.分类处理模块根据相似性函数确定出分类标签,并输出至分类器模块。SB4. The classification processing module determines the classification label according to the similarity function, and outputs it to the classifier module.
本发明的有益效果:通过本发明,能够对肝癌多组学数据进行良好地融合,有效利用数据的互补性将肝癌多组学数据融合在一起,从而有效避免了在数据处理过程中特征信息丢失,有效确保数据处理的准确性,为后续肝癌生存期预测的准确性提供保障。Beneficial effects of the present invention: through the present invention, the multi-omics data of liver cancer can be well fused, and the complementarity of the data can be effectively used to fuse the multi-omics data of liver cancer, thereby effectively avoiding the loss of characteristic information in the process of data processing , which can effectively ensure the accuracy of data processing and provide a guarantee for the accuracy of subsequent liver cancer survival prediction.
附图说明Description of drawings
下面结合附图和实施例对本发明作进一步描述:Below in conjunction with accompanying drawing and embodiment, the present invention is further described:
图1为本发明的结构示意图。FIG. 1 is a schematic structural diagram of the present invention.
图2为本发明的分类标签示意图。FIG. 2 is a schematic diagram of a classification label of the present invention.
图3为本发明的具体实例对比图。3 is a comparison diagram of a specific example of the present invention.
具体实施方式Detailed ways
以下结合说明书附图对本发明做出进一步详细说明:The present invention is further described in detail below in conjunction with the accompanying drawings:
本发明提供的一种基于多组学数据的肝癌数据处理系统,包括预处理模块、数据降维处理模块、分类处理模块以及分类器模块;The invention provides a liver cancer data processing system based on multi-omics data, comprising a preprocessing module, a data dimensionality reduction processing module, a classification processing module and a classifier module;
所述预处理模块,用于对肝癌多组学数据进行筛选,并将筛选出的目标数据输出至数据降维处理模块中;The preprocessing module is used for screening liver cancer multi-omics data, and outputting the screened target data to the data dimension reduction processing module;
所述数据降维处理模块,用于接收预处理模块输出的目标数据,并对目标数据进行降维处理,并将降维处理后的目标数据输出至数据降维处理模块中;The data dimensionality reduction processing module is used to receive the target data output by the preprocessing module, perform dimensionality reduction processing on the target data, and output the dimensionality reduction processed target data to the data dimensionality reduction processing module;
所述分类处理模块,用于接收数据降维处理模块输出的降维后的目标数据,并根据降维后的目标数据进行分类处理,并输出分类标签;The classification processing module is configured to receive the dimensionally reduced target data output by the data dimensionality reduction processing module, perform classification processing according to the dimensionally reduced target data, and output a classification label;
所述分类器模块,用于接收分类标签,采用分类标签对分类器模块进行训练,然后分类器模块接收实时的多组学肝癌数据并对肝癌生存期进行预测;通过本发明,能够对肝癌多组学数据进行良好地融合,有效利用数据的互补性将肝癌多组学数据融合在一起,从而有效避免了在数据处理过程中特征信息丢失,有效确保数据处理的准确性,为后续肝癌生存期预测的准确性提供保障。The classifier module is used to receive the classification label, and use the classification label to train the classifier module, and then the classifier module receives the real-time multi-omics liver cancer data and predicts the survival period of liver cancer; The omics data is well fused, and the complementarity of the data is effectively used to fuse the multi-omics data of liver cancer, thereby effectively avoiding the loss of characteristic information in the process of data processing, effectively ensuring the accuracy of data processing, and improving the survival of subsequent liver cancer. The accuracy of the forecast is guaranteed.
本实施例中,所述预处理模块对肝癌多组学数据筛选包括:In this embodiment, the preprocessing module for screening liver cancer multi-omics data includes:
所述预处理模块基于单变量Cox-PH模型对肝癌多组学数据的每个特征进行评分,然后将分值Per1与设定阈值Py进行对比,筛选出Per1<Py的特征,并将筛选出的数据进行融合形成目标数据,其中,设定阈值Py一般设定为0.5,通过上述,能够有效防止处理过程中信息的丢失,从而确保最终结果的准确性。The preprocessing module scores each feature of the multi-omics data of liver cancer based on the univariate Cox-PH model, then compares the score Per1 with the set threshold P y to screen out the features with Per1 < P y The filtered data are fused to form target data, wherein the set threshold P y is generally set to 0.5. Through the above, the loss of information during processing can be effectively prevented, thereby ensuring the accuracy of the final result.
本实施例中,所述数据降维处理模块对目标数据进行降维处理具体包括:In this embodiment, the dimensionality reduction processing performed on the target data by the data dimensionality reduction processing module specifically includes:
SA1.在数据降维处理模块中构建K层自编码器,其中,K层自编码器的输出函数为:SA1. Build a K-layer autoencoder in the data dimensionality reduction processing module, where the output function of the K-layer autoencoder is:
x'=Relu(Wi·Relu(Wix+bi));其中,Wi为相邻自编码器之间的权重矩阵,bi为权重矩阵Wi的偏移量,x为m维目标数据X=(x1,x2,…,xm)中的特征值;x'=Relu(W i · Relu (W i x+ bi )); wherein, Wi is the weight matrix between adjacent self-encoders, bi is the offset of the weight matrix Wi , and x is m Dimensional target data X = eigenvalues in (x 1 , x 2 ,...,x m );
SA2.数据降维处理模块构建损失函数,其中,损失函数为:SA2. The data dimensionality reduction processing module constructs a loss function, where the loss function is:
其中,L(x,x')为损失函数,βw为正则化惩罚系数, Among them, L(x,x') is the loss function, βw is the regularization penalty coefficient,
SA3.通过损失函数进行迭代运算,更新权重矩阵Wi和权重矩阵Wi的偏移量bi,直至达到迭代次数后,数据降维处理模块输出降维处理后的目标数据。SA3. Perform an iterative operation through the loss function, update the weight matrix Wi and the offset bi of the weight matrix Wi , until the number of iterations is reached, the data dimensionality reduction processing module outputs the target data after dimensionality reduction.
本实施例中,所述分类处理模块的生存期预测具体包括:In this embodiment, the survival prediction of the classification processing module specifically includes:
SB1.分类处理模块采用单变量Cox-PH模型对降维处理后的目标数据中的特征再次进行评分,然后将特征的评分值Per2与设定阈值Py进行比较,筛选出Per2<Py的特征,并将筛选出的数据进行融合处理,其中,该数据融合过程中为将多个特征组合形成一个特征矩阵;SB1. The classification processing module uses the univariate Cox-PH model to score the features in the target data after dimensionality reduction processing again, and then compares the score value Per2 of the feature with the set threshold P y , and filters out Per2 < P y . features, and fuse the filtered data, wherein, in the data fusion process, a feature matrix is formed by combining multiple features;
SB2.分类处理模块构建归一化处理模型,并对步骤SB1处理后的数据进行归一化处理,其中,归一化处理模型为:SB2. The classification processing module constructs a normalization processing model, and normalizes the data processed in step SB1, wherein the normalization processing model is:
p为步骤SB1输出的特征数据,P为归一化处理后的特征数据,Var(p)为特征数据p的方差,E(p)为特征数据p的经验平均值; p is the characteristic data output in step SB1, P is the normalized characteristic data, Var(p) is the variance of the characteristic data p, and E(p) is the empirical average value of the characteristic data p;
SB3.分类处理模块构建相似性函数:SB3. The classification processing module builds the similarity function:
其中,W(i,j)为第i个样本zi与第j个样本zj的相似性,θij为归一化因子;其中: Among them, W(i,j) is the similarity between the ith sample zi and the jth sample z j , and θ ij is the normalization factor; where:
λi为第i个样本zi的k个近邻,λj为第j个样本zj的k个近邻;zr表示λi里的第r个样本。 λ i is the k nearest neighbors of the ith sample zi i , λ j is the k nearest neighbors of the j th sample z j ; z r represents the r th sample in λ i .
SB4.分类处理模块根据相似性函数确定出分类标签,并输出至分类器模块。其中,分类器模块采用XGBoost分类器,多组学肝癌数据包括RNA测序数据、miRNA数据、DNA甲基化数据;以RNA测序数据为例:在预处理模块进行筛选时,从RNA测序数据中筛选出符合筛选标准的特征数据,然后各个RNA测序数据的筛选数据进行重新组合,形成一个新的RNA测序数据。SB4. The classification processing module determines the classification label according to the similarity function, and outputs it to the classifier module. Among them, the classifier module uses the XGBoost classifier, and the multi-omics liver cancer data includes RNA sequencing data, miRNA data, and DNA methylation data; taking RNA sequencing data as an example: when the preprocessing module performs screening, the RNA sequencing data is screened. Feature data that meet the screening criteria are obtained, and then the screening data of each RNA sequencing data are recombined to form a new RNA sequencing data.
而在步骤SB1中,则将三种多组学数据筛选出的特征融合形成一个数据矩阵,该数据矩阵为n×n阶,将该矩阵的每一列作为一个样本,那么在进行聚类处理时具有n个样本{z1,z2,…,zn},分类器模块通过上述对各个样本进行聚类分析,得出最终的分类标签,一般来说,分类标签设定为2个。In step SB1, the features selected from the three kinds of multi-omics data are fused to form a data matrix, and the data matrix is of order n×n, and each column of the matrix is used as a sample. With n samples {z 1 , z 2 ,..., z n }, the classifier module obtains the final classification label by performing cluster analysis on each sample as described above. Generally speaking, the classification label is set to 2.
从GEO数据库中挖掘的数据集GSE14520和GSE31384分别作为RNA-seq和miRNA训练分类器的确认队列。对于这两个确认队列,我们首先选择训练集样本中的共同特征,然后使用与多组分数据规范化相同的方法对数据进行规范化。在研究中,我们需要为训练集和两个队列选择基于聚类标签的M个特征。这样,两个队列将作为验证数据集对模型进行测试,最终得到分类结果。在这里,我们设置M的值(50-100),发现当M的值设置为50时,所得到的训练模型可以获得最佳的预测结果。Datasets GSE14520 and GSE31384 were mined from the GEO database as validation cohorts for RNA-seq and miRNA training classifiers, respectively. For both confirmation cohorts, we first select common features in the training set samples, and then normalize the data using the same method as for multicomponent data normalization. In our study, we need to select M features based on cluster labels for the training set and two cohorts. In this way, the two cohorts will be used as validation datasets to test the model and finally get the classification result. Here, we set the value of M (50-100) and found that when the value of M is set to 50, the resulting trained model can achieve the best prediction results.
以TCGA为训练数据集,获得肝癌的RNA-seq、miRNA-seq和DNA甲基化数据,预测处理模块构建单变量Cox-PH模型得到Per1<0.05的特征,然后将处理后的多组学数据输入到降维处理模块处理后,输入到分类处理模块中再次构建单变量Cox-PH模型进行筛选得到Per1<0.05的特征,最后,分类器模块使用谱聚类获得两个生存差异显著的亚型,基于得到的聚类标签,分类器模块还使用XGBoost分类器通过聚类标签进行训练,然后输入实时的多组学肝癌数据进行生存期预测。为了验证该分类器在预测生存率方面的有效性,我们使用了来自GEO的两组数据,即GSE1452和GES31384来验证该模型如图2。对于两种生存亚型的生存曲线,我们的结果优于其他模型的结果,可见与其他已发表的模型相比,我们的模型的预测效果有了显著的提高。Using TCGA as the training data set, the RNA-seq, miRNA-seq and DNA methylation data of liver cancer were obtained, and the prediction processing module constructed a univariate Cox-PH model to obtain the feature of Per1<0.05, and then processed the multi-omics data. After input to the dimensionality reduction processing module for processing, input to the classification processing module to construct a univariate Cox-PH model again for screening to obtain features with Per1 < 0.05. Finally, the classifier module uses spectral clustering to obtain two subtypes with significant survival differences. , based on the obtained cluster labels, the classifier module also uses the XGBoost classifier to train with the cluster labels, and then input real-time multi-omics liver cancer data for survival prediction. To verify the effectiveness of this classifier in predicting survival, we used two sets of data from GEO, namely GSE1452 and GES31384, to validate the model as shown in Figure 2. For the survival curves of the two survival subtypes, our results outperformed those of the other models, showing a significant improvement in the predictive power of our model compared to other published models.
最后,我们还将我们的结果与其他模型的结果进行了比较。无论是对数秩P值还是C指数,我们的实验结果都明显优于其他实验结果,如图3。Finally, we also compare our results with those of other models. Both the log-rank P value and the C index, our experimental results are significantly better than other experimental results, as shown in Figure 3.
在差异基因表达分析中,我们可以鉴定1465个上调基因和930个下调基因,包括肿瘤标记基因BIRC5(P=2.07e-41)和干细胞标记基因CD24(P=2.83e-11)、KRT19(P=2.82e-26)和EPCAM(P=1.01e-6)。此外,我们还发现了28个基因(SLC2A2、AQP9、RGN、SULT2A1、CRYL1、SERPINC1、PAH、CDO1、PLG、APOC3、CYP27A1、PFKFB3、TM4SF1、ACSL5、RGS2、HN1、SERPINA10、CYB5A、EPHX2、SPHX2、RGS1、ADH1B、LECT2、TBX3、RNASE4、ALDOA、ADH6,SLC38A1)在我们确定的两个生存风险组之间是不同的,并且与肝癌的生存有很强的关系。In differential gene expression analysis, we could identify 1465 up-regulated genes and 930 down-regulated genes, including tumor marker gene BIRC5 (P=2.07e-41) and stem cell marker genes CD24 (P=2.83e-11), KRT19 (P=2.83e-11) = 2.82e-26) and EPCAM (P = 1.01e-6). In addition, we also found 28 genes (SLC2A2, AQP9, RGN, SULT2A1, CRYL1, SERPINC1, PAH, CDO1, PLG, APOC3, CYP27A1, PFKFB3, TM4SF1, ACSL5, RGS2, HN1, SERPINA10, CYB5A, EPHX2, SPHX2, RGS1, ADH1B, LECT2, TBX3, RNASE4, ALDOA, ADH6, SLC38A1) were different between the two survival risk groups we identified and were strongly associated with HCC survival.
对于通过差异分析获得的差异表达基因,我们还对两个亚组进行了基因和基因组京都百科全书(KEGG)途径分析。PI3K-Akt信号通路、细胞周期信号通路、p53信号通路等在侵袭性亚型(C2)中富含肿瘤相关途径,其中P13K-Akt信号通路也与CD8+T细胞浸润有关。低危生存亚型(C1)存在药物代谢、细胞色素P450、代谢途径和脂肪酸降解等相关途径。这些途径对研究肝癌的预后具有重要意义。For differentially expressed genes obtained by differential analysis, we also performed Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis on both subgroups. The PI3K-Akt signaling pathway, cell cycle signaling pathway, and p53 signaling pathway are enriched in tumor-related pathways in the aggressive subtype (C2), among which the P13K-Akt signaling pathway is also associated with CD8+ T cell infiltration. The low-risk survival subtype (C1) has related pathways such as drug metabolism, cytochrome P450, metabolic pathways and fatty acid degradation. These pathways have important implications for studying the prognosis of liver cancer.
最后说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本发明技术方案的宗旨和范围,其均应涵盖在本发明的权利要求范围当中。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent replacements without departing from the spirit and scope of the technical solutions of the present invention should be included in the scope of the claims of the present invention.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010963978.3A CN112086199B (en) | 2020-09-14 | 2020-09-14 | Liver cancer data processing system based on multiple groups of study data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010963978.3A CN112086199B (en) | 2020-09-14 | 2020-09-14 | Liver cancer data processing system based on multiple groups of study data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112086199A true CN112086199A (en) | 2020-12-15 |
CN112086199B CN112086199B (en) | 2023-06-09 |
Family
ID=73738141
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010963978.3A Active CN112086199B (en) | 2020-09-14 | 2020-09-14 | Liver cancer data processing system based on multiple groups of study data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112086199B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112820403A (en) * | 2021-02-25 | 2021-05-18 | 中山大学 | Deep learning method for predicting prognosis risk of cancer patient based on multiple groups of mathematical data |
CN115497561A (en) * | 2022-09-01 | 2022-12-20 | 北京吉因加医学检验实验室有限公司 | Method and device for layering screening of methylation markers |
CN115982644A (en) * | 2023-01-19 | 2023-04-18 | 中国医学科学院肿瘤医院 | A classification model construction and data processing method for esophageal squamous cell carcinoma |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100292303A1 (en) * | 2007-07-20 | 2010-11-18 | Birrer Michael J | Gene expression profile for predicting ovarian cancer patient survival |
CN105512477A (en) * | 2015-12-03 | 2016-04-20 | 万达信息股份有限公司 | Unplanned readmission risk assessment prediction model based on dimension reduction combination classification algorithm |
US20170039345A1 (en) * | 2015-07-13 | 2017-02-09 | Biodesix, Inc. | Predictive test for melanoma patient benefit from antibody drug blocking ligand activation of the T-cell programmed cell death 1 (PD-1) checkpoint protein and classifier development methods |
JP6080184B1 (en) * | 2016-02-29 | 2017-02-15 | 常雄 小林 | Data collection method used to classify cancer life |
CN107066781A (en) * | 2016-11-03 | 2017-08-18 | 西南大学 | Analysis method based on the related colorectal cancer data model of h and E |
CN107132268A (en) * | 2017-06-21 | 2017-09-05 | 佛山科学技术学院 | A kind of data processing equipment and system for being used to recognize cancerous lung tissue |
CN107169535A (en) * | 2017-07-06 | 2017-09-15 | 谈宜勇 | The deep learning sorting technique and device of biological multispectral image |
US20180357377A1 (en) * | 2017-06-13 | 2018-12-13 | Alexander Bagaev | Systems and methods for generating, visualizing and classifying molecular functional profiles |
CN110010250A (en) * | 2019-04-29 | 2019-07-12 | 青岛科技大学 | Classification method of frailty in patients with cardiovascular disease based on data mining technology |
CN110580956A (en) * | 2019-09-19 | 2019-12-17 | 青岛市市立医院 | A group of liver cancer prognostic markers and their application |
CN110852291A (en) * | 2019-11-15 | 2020-02-28 | 太原科技大学 | A Palatal Wrinkle Recognition Method Using Gabor Transform and Block Dimensionality Reduction |
CN111161882A (en) * | 2019-12-04 | 2020-05-15 | 深圳先进技术研究院 | Breast cancer life prediction method based on deep neural network |
US20200211716A1 (en) * | 2018-12-31 | 2020-07-02 | Tempus Labs | Method and process for predicting and analyzing patient cohort response, progression, and survival |
-
2020
- 2020-09-14 CN CN202010963978.3A patent/CN112086199B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100292303A1 (en) * | 2007-07-20 | 2010-11-18 | Birrer Michael J | Gene expression profile for predicting ovarian cancer patient survival |
US20170039345A1 (en) * | 2015-07-13 | 2017-02-09 | Biodesix, Inc. | Predictive test for melanoma patient benefit from antibody drug blocking ligand activation of the T-cell programmed cell death 1 (PD-1) checkpoint protein and classifier development methods |
CN105512477A (en) * | 2015-12-03 | 2016-04-20 | 万达信息股份有限公司 | Unplanned readmission risk assessment prediction model based on dimension reduction combination classification algorithm |
JP6080184B1 (en) * | 2016-02-29 | 2017-02-15 | 常雄 小林 | Data collection method used to classify cancer life |
CN107066781A (en) * | 2016-11-03 | 2017-08-18 | 西南大学 | Analysis method based on the related colorectal cancer data model of h and E |
US20180357377A1 (en) * | 2017-06-13 | 2018-12-13 | Alexander Bagaev | Systems and methods for generating, visualizing and classifying molecular functional profiles |
CN107132268A (en) * | 2017-06-21 | 2017-09-05 | 佛山科学技术学院 | A kind of data processing equipment and system for being used to recognize cancerous lung tissue |
CN107169535A (en) * | 2017-07-06 | 2017-09-15 | 谈宜勇 | The deep learning sorting technique and device of biological multispectral image |
US20200211716A1 (en) * | 2018-12-31 | 2020-07-02 | Tempus Labs | Method and process for predicting and analyzing patient cohort response, progression, and survival |
CN110010250A (en) * | 2019-04-29 | 2019-07-12 | 青岛科技大学 | Classification method of frailty in patients with cardiovascular disease based on data mining technology |
CN110580956A (en) * | 2019-09-19 | 2019-12-17 | 青岛市市立医院 | A group of liver cancer prognostic markers and their application |
CN110852291A (en) * | 2019-11-15 | 2020-02-28 | 太原科技大学 | A Palatal Wrinkle Recognition Method Using Gabor Transform and Block Dimensionality Reduction |
CN111161882A (en) * | 2019-12-04 | 2020-05-15 | 深圳先进技术研究院 | Breast cancer life prediction method based on deep neural network |
Non-Patent Citations (5)
Title |
---|
TONG,DY: "《Improving prediction performance of colon cancer prognosis based on the integration of clinical and multi-omics data》", 《BMC MEDICAL INFORMATICS AND DECISION MAKING》, vol. 20, no. 1 * |
潘浩;王昭;姚佳文;: "深度学习在肺癌患者生存预测中的应用研究", 计算机工程与应用, no. 14 * |
田梓君;崔新于;: "基于数据处理的肿瘤基因选择系统", 无线互联科技, no. 08 * |
陈景安: "《乳癌病人临床数据的降维处理及生存预测分析 》", 《医药卫生科技辑》, pages 072 - 1918 * |
齐惠颖: "《基于多组学数据融合构建乳腺癌生存预测模型 》", 《数据分析与知识发现 》, no. 8, pages 88 - 93 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112820403A (en) * | 2021-02-25 | 2021-05-18 | 中山大学 | Deep learning method for predicting prognosis risk of cancer patient based on multiple groups of mathematical data |
CN112820403B (en) * | 2021-02-25 | 2024-03-29 | 中山大学 | A deep learning method to predict the prognostic risk of cancer patients based on multi-omics data |
CN115497561A (en) * | 2022-09-01 | 2022-12-20 | 北京吉因加医学检验实验室有限公司 | Method and device for layering screening of methylation markers |
CN115497561B (en) * | 2022-09-01 | 2023-08-29 | 北京吉因加医学检验实验室有限公司 | Methylation marker layered screening method and device |
CN115982644A (en) * | 2023-01-19 | 2023-04-18 | 中国医学科学院肿瘤医院 | A classification model construction and data processing method for esophageal squamous cell carcinoma |
CN115982644B (en) * | 2023-01-19 | 2024-04-30 | 中国医学科学院肿瘤医院 | Esophageal squamous cell carcinoma classification model construction and data processing method |
Also Published As
Publication number | Publication date |
---|---|
CN112086199B (en) | 2023-06-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Dann et al. | Differential abundance testing on single-cell data using k-nearest neighbor graphs | |
JP7689557B2 (en) | An integrated machine learning framework for inferring homologous recombination defects | |
US20230187021A1 (en) | Methods for Non-Invasive Assessment of Genomic Instability | |
CN109072309B (en) | Cancer evolution detection and diagnosis | |
CN112086199B (en) | Liver cancer data processing system based on multiple groups of study data | |
Hu et al. | Classifying the multi-omics data of gastric cancer using a deep feature selection method | |
WO2018136888A1 (en) | Methods for non-invasive assessment of genetic alterations | |
IL267913B1 (en) | Methods and processes for assessment of genetic variations | |
Kalyakulina et al. | Disease classification for whole-blood DNA methylation: Meta-analysis, missing values imputation, and XAI | |
Zeng et al. | couple CoC+: An information-theoretic co-clustering-based transfer learning framework for the integrative analysis of single-cell genomic data | |
Tsui et al. | Artificial intelligence and machine learning in cell-free-DNA-based diagnostics | |
CN114974432A (en) | Screening method of biomarker and related application thereof | |
CN112037863B (en) | Early NSCLC prognosis prediction system | |
CN114360642A (en) | Cancer transcriptome data processing method based on gene co-expression network analysis | |
CN118841180A (en) | Method and system for constructing acute myeloid leukemia prognosis model | |
US20240312564A1 (en) | White blood cell contamination detection | |
Ahmad et al. | Deep learning-based computational approach for predicting ncRNAs-disease associations in metaplastic breast cancer diagnosis | |
US20240076744A1 (en) | METHODS AND SYSTEMS FOR mRNA BOUNDARY ANALYSIS IN NEXT GENERATION SEQUENCING | |
Cai et al. | Computational methods in predicting complex disease associated genes and environmental factors | |
Chowdhury et al. | predicting high-risk individuals for common diseases using multi-omics and epidemiological data | |
CN119120699B (en) | Marker for classifying olfactory neuroblastoma subtype and application thereof | |
CN113380324B (en) | T cell receptor sequence motif combination recognition detection method, storage medium and equipment | |
Özer et al. | SVM-DO: identification of tumor-discriminating mRNA signatures via support vectormachines supported by disease ontology | |
Fontanari | Investigating pooling in graph neural networks for cancer genomics classification and the generalizability of pan-cancer models to cancer-specific predictions | |
Roy et al. | Comparative transcriptomic analysis uncovers molecular heterogeneity in hepatobiliary cancers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: 401121 No. 53, middle section of Huangshan Avenue, Yubei District, Chongqing Patentee after: Western Research Institute of China Science and technology computing technology Country or region after: China Address before: 401121 No. 53, middle section of Huangshan Avenue, Yubei District, Chongqing Patentee before: Western Institute of advanced technology, Institute of computing, Chinese Academy of Sciences Country or region before: China |
|
CP03 | Change of name, title or address |