CN112086199A - Liver cancer data processing system based on multiple groups of mathematical data - Google Patents

Liver cancer data processing system based on multiple groups of mathematical data Download PDF

Info

Publication number
CN112086199A
CN112086199A CN202010963978.3A CN202010963978A CN112086199A CN 112086199 A CN112086199 A CN 112086199A CN 202010963978 A CN202010963978 A CN 202010963978A CN 112086199 A CN112086199 A CN 112086199A
Authority
CN
China
Prior art keywords
data
processing module
module
dimensionality reduction
liver cancer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010963978.3A
Other languages
Chinese (zh)
Other versions
CN112086199B (en
Inventor
任菲
王忠烈
谭光明
刘玉东
段勃
张春明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Western Research Institute Of China Science And Technology Computing Technology
Original Assignee
Western Institute Of Advanced Technology Institute Of Computing Chinese Academy Of Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Western Institute Of Advanced Technology Institute Of Computing Chinese Academy Of Sciences filed Critical Western Institute Of Advanced Technology Institute Of Computing Chinese Academy Of Sciences
Priority to CN202010963978.3A priority Critical patent/CN112086199B/en
Publication of CN112086199A publication Critical patent/CN112086199A/en
Application granted granted Critical
Publication of CN112086199B publication Critical patent/CN112086199B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Primary Health Care (AREA)
  • Bioethics (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Biotechnology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

本发明提供的一种基于多组学数据的肝癌数据处理系统,包括预处理模块、数据降维处理模块、分类处理模块以及分类器模块;所述预处理模块,用于对肝癌多组学数据进行筛选,并将筛选出的目标数据输出至数据降维处理模块中;所述数据降维处理模块,用于接收预处理模块输出的目标数据,并对目标数据进行降维处理,并将降维处理后的目标数据输出至数据降维处理模块中;所述分类处理模块,用于接收数据降维处理模块输出的降维后的目标数据,并根据降维后的目标数据进行分类处理,并输出分类标签;所述分类器模块,用于接收分类标签,采用分类标签对分类器模块进行训练,然后分类器模块接收实时的多组学肝癌数据并对肝癌生存期进行预测;能够对肝癌多组学数据进行良好地融合,有效利用数据的互补性将肝癌多组学数据融合在一起,从而有效避免了在数据处理过程中特征信息丢失,有效确保数据处理的准确性,为后续肝癌生存期预测的准确性提供保障。

Figure 202010963978

A liver cancer data processing system based on multi-omics data provided by the present invention includes a preprocessing module, a data dimensionality reduction processing module, a classification processing module and a classifier module; the preprocessing module is used for multi-omics data analysis of liver cancer. Perform screening, and output the screened target data to the data dimensionality reduction processing module; the data dimensionality reduction processing module is used to receive the target data output by the preprocessing module, perform dimensionality reduction processing on the target data, and reduce the The dimensionally processed target data is output to the data dimensionality reduction processing module; the classification processing module is used to receive the dimensionally reduced target data output by the data dimensionality reduction processing module, and perform classification processing according to the dimensionally reduced target data, and output the classification label; the classifier module is used to receive the classification label, use the classification label to train the classifier module, and then the classifier module receives the real-time multi-omics liver cancer data and predicts the survival time of liver cancer; The multi-omics data is well fused, and the complementarity of the data is effectively used to fuse the multi-omics data of liver cancer, thereby effectively avoiding the loss of characteristic information in the process of data processing, effectively ensuring the accuracy of data processing, and ensuring the survival of subsequent liver cancer. guarantee the accuracy of the forecast.

Figure 202010963978

Description

基于多组学数据的肝癌数据处理系统Liver cancer data processing system based on multi-omics data

技术领域technical field

本发明涉及一种数据处理系统,尤其涉及一种基于多组学数据的肝癌数据处理系统。The invention relates to a data processing system, in particular to a liver cancer data processing system based on multi-omics data.

背景技术Background technique

早期的肝癌主要以手术切除为主,但临床资料显示,术后肝癌复发率约为70%,严重阻碍了患者的长期生存。如果我们建立HCC的分型标准,对高危复发患者进行更加精细的分层管理,首先从源头上筛选出可能获益的人群再进行手术,对于改善患者生存、实现HCC的精准治疗可能具有更加重要的意义。基于多组学数据建立肝癌的分类标准,对不同的患者进行更准确的预后治疗和管理,将提高患者的生存率。因此,对于融合多组学数据从分子层面来对患者进行分型并预测患者的预后有着重要意义,这对患者的治疗也有着临床意义。Surgical resection is the main method for early stage liver cancer, but clinical data show that the recurrence rate of liver cancer after surgery is about 70%, which seriously hinders the long-term survival of patients. If we establish classification standards for HCC, carry out more detailed stratified management of high-risk recurrence patients, and first screen out the potentially beneficial population from the source before performing surgery, it may be more important to improve patient survival and achieve precise treatment of HCC. meaning. To establish a classification standard for liver cancer based on multi-omics data, and to carry out more accurate prognostic treatment and management for different patients, will improve the survival rate of patients. Therefore, it is of great significance to fuse multi-omics data to classify patients at the molecular level and predict the prognosis of patients, which also has clinical significance for the treatment of patients.

近年来也有融合RNA测序数据、miRNA数据、甲基化数据和肝癌患者的临床生存数据来对肝癌进行分型并预测预后的方法。但是,现有技术中,很少有研究者在研究分子亚型时考虑患者的生存状态。生存率对分子亚型的研究具有重要的临床意义,而生存率的巨大差异往往对分子亚型有很大的影响。利用多组学数据的融合来进行分子分型并预测预后有以下两个特点:(1)多组学数据的融合时期一般分为早期融合,中期融合和后期融合,不同的融合时期对融合结果存在很大的影响。(2)融合方式也有很大的影响。现有技术的融合方法或者系统存在以下缺陷:一方面采用自动编码器对输入数据进行集成,但是容易造成特征数据丢失,另一方面,现有技术对于数据的仅仅简单地将数据直接叠加,使得不同的数据融合性差,数据不能互补,不能提取出准确的信息。In recent years, there are also methods that fuse RNA sequencing data, miRNA data, methylation data and clinical survival data of liver cancer patients to classify liver cancer and predict prognosis. However, in the prior art, few researchers consider the patient's survival status when studying molecular subtypes. Survival rates have important clinical implications for the study of molecular subtypes, and large differences in survival rates often have a large impact on molecular subtypes. Using the fusion of multi-omics data to perform molecular typing and predict prognosis has the following two characteristics: (1) The fusion period of multi-omics data is generally divided into early fusion, mid-term fusion and late fusion. There is a big impact. (2) The fusion method also has a great influence. The fusion method or system of the prior art has the following defects: on the one hand, an automatic encoder is used to integrate the input data, but it is easy to cause the loss of characteristic data; The fusion of different data is poor, the data cannot complement each other, and accurate information cannot be extracted.

因此,为了解决上述技术问题,亟需提出一种新的技术手段。Therefore, in order to solve the above technical problems, it is urgent to propose a new technical means.

发明内容SUMMARY OF THE INVENTION

有鉴于此,本发明的目的是提供一种基于多组学数据的肝癌数据处理系统,能够对肝癌多组学数据进行良好地融合,有效利用数据的互补性将肝癌多组学数据融合在一起,从而有效避免了在数据处理过程中特征信息丢失,有效确保数据处理的准确性,为后续肝癌生存期预测的准确性提供保障。In view of this, the purpose of the present invention is to provide a liver cancer data processing system based on multi-omics data, which can well fuse the liver cancer multi-omics data, and effectively utilize the complementarity of the data to fuse the liver cancer multi-omics data together. , so as to effectively avoid the loss of characteristic information in the process of data processing, effectively ensure the accuracy of data processing, and provide a guarantee for the accuracy of subsequent liver cancer survival prediction.

本发明提供的一种基于多组学数据的肝癌数据处理系统,包括预处理模块、数据降维处理模块、分类处理模块以及分类器模块;The invention provides a liver cancer data processing system based on multi-omics data, comprising a preprocessing module, a data dimensionality reduction processing module, a classification processing module and a classifier module;

所述预处理模块,用于对肝癌多组学数据进行筛选,并将筛选出的目标数据输出至数据降维处理模块中;The preprocessing module is used for screening liver cancer multi-omics data, and outputting the screened target data to the data dimension reduction processing module;

所述数据降维处理模块,用于接收预处理模块输出的目标数据,并对目标数据进行降维处理,并将降维处理后的目标数据输出至数据降维处理模块中;The data dimensionality reduction processing module is used to receive the target data output by the preprocessing module, perform dimensionality reduction processing on the target data, and output the dimensionality reduction processed target data to the data dimensionality reduction processing module;

所述分类处理模块,用于接收数据降维处理模块输出的降维后的目标数据,并根据降维后的目标数据进行分类处理,并输出分类标签;The classification processing module is configured to receive the dimensionally reduced target data output by the data dimensionality reduction processing module, perform classification processing according to the dimensionally reduced target data, and output a classification label;

所述分类器模块,用于接收分类标签,采用分类标签对分类器模块进行训练,然后分类器模块接收实时的多组学肝癌数据并对肝癌生存期进行预测。The classifier module is used for receiving classification labels, and using the classification labels to train the classifier module, and then the classifier module receives real-time multi-omics liver cancer data and predicts the survival time of liver cancer.

进一步,所述预处理模块对肝癌多组学数据筛选包括:Further, the preprocessing module for screening liver cancer multi-omics data includes:

所述预处理模块基于单变量Cox-PH模型对肝癌多组学数据的每个特征进行评分,然后将分值Per1与设定阈值Py进行对比,筛选出Per1<Py的特征,并将筛选出的数据进行融合形成目标数据。The preprocessing module scores each feature of the multi-omics data of liver cancer based on the univariate Cox-PH model, then compares the score Per1 with the set threshold P y to screen out the features with Per1 < P y The filtered data are fused to form target data.

进一步,所述数据降维处理模块对目标数据进行降维处理具体包括:Further, the dimensionality reduction processing performed by the data dimensionality reduction processing module on the target data specifically includes:

SA1.在数据降维处理模块中构建K层自编码器,其中,K层自编码器的输出函数为:SA1. Build a K-layer autoencoder in the data dimensionality reduction processing module, where the output function of the K-layer autoencoder is:

x'=Relu(Wi·Relu(Wix+bi));其中,Wi为相邻自编码器之间的权重矩阵,bi为权重矩阵Wi的偏移量,x为m维目标数据X=(x1,x2,…,xm)中的特征值;x'=Relu(W i · Relu (W i x+ bi )); wherein, Wi is the weight matrix between adjacent self-encoders, bi is the offset of the weight matrix Wi , and x is m Dimensional target data X = eigenvalues in (x 1 , x 2 ,...,x m );

SA2.数据降维处理模块构建损失函数,其中,损失函数为:SA2. The data dimensionality reduction processing module constructs a loss function, where the loss function is:

Figure BDA0002681542370000031
其中,L(x,x')为损失函数,βw为正则化惩罚系数,
Figure BDA0002681542370000032
Figure BDA0002681542370000031
Among them, L(x,x') is the loss function, βw is the regularization penalty coefficient,
Figure BDA0002681542370000032

SA3.通过损失函数进行迭代运算,更新权重矩阵Wi和权重矩阵Wi的偏移量bi,直至达到迭代次数后,数据降维处理模块输出降维处理后的目标数据。SA3. Perform an iterative operation through the loss function, update the weight matrix Wi and the offset bi of the weight matrix Wi , until the number of iterations is reached, the data dimensionality reduction processing module outputs the target data after dimensionality reduction.

进一步,所述分类处理模块的生存期预测具体包括:Further, the survival prediction of the classification processing module specifically includes:

SB1.分类处理模块采用单变量Cox-PH模型对降维处理后的目标数据中的特征再次进行评分,然后将特征的评分值Per2与设定阈值Py进行比较,筛选出Per2<Py的特征,并将筛选出的数据进行融合处理;SB1. The classification processing module uses the univariate Cox-PH model to score the features in the target data after dimensionality reduction processing again, and then compares the score value Per2 of the feature with the set threshold P y , and filters out Per2 < P y . features, and fuse the filtered data;

SB2.分类处理模块构建归一化处理模型,并对步骤SB1处理后的数据进行归一化处理,其中,归一化处理模型为:SB2. The classification processing module constructs a normalization processing model, and normalizes the data processed in step SB1, wherein the normalization processing model is:

Figure BDA0002681542370000033
p为步骤SB1输出的特征数据,P为归一化处理后的特征数据,Var(p)为特征数据p的方差,E(p)为特征数据p的经验平均值;
Figure BDA0002681542370000033
p is the characteristic data output in step SB1, P is the normalized characteristic data, Var(p) is the variance of the characteristic data p, and E(p) is the empirical average value of the characteristic data p;

SB3.分类处理模块构建相似性函数:SB3. The classification processing module builds the similarity function:

Figure BDA0002681542370000034
其中,W(i,j)为第i个样本zi与第j个样本zj的相似性,θij为归一化因子;其中:
Figure BDA0002681542370000034
Among them, W(i,j) is the similarity between the ith sample zi and the jth sample z j , and θ ij is the normalization factor; where:

Figure BDA0002681542370000041
λi为第i个样本zi的k个近邻,λj为第j个样本zj的k个近邻;zr表示λi里的第r个样本。
Figure BDA0002681542370000041
λ i is the k nearest neighbors of the ith sample zi i , λ j is the k nearest neighbors of the j th sample z j ; z r represents the r th sample in λ i .

SB4.分类处理模块根据相似性函数确定出分类标签,并输出至分类器模块。SB4. The classification processing module determines the classification label according to the similarity function, and outputs it to the classifier module.

本发明的有益效果:通过本发明,能够对肝癌多组学数据进行良好地融合,有效利用数据的互补性将肝癌多组学数据融合在一起,从而有效避免了在数据处理过程中特征信息丢失,有效确保数据处理的准确性,为后续肝癌生存期预测的准确性提供保障。Beneficial effects of the present invention: through the present invention, the multi-omics data of liver cancer can be well fused, and the complementarity of the data can be effectively used to fuse the multi-omics data of liver cancer, thereby effectively avoiding the loss of characteristic information in the process of data processing , which can effectively ensure the accuracy of data processing and provide a guarantee for the accuracy of subsequent liver cancer survival prediction.

附图说明Description of drawings

下面结合附图和实施例对本发明作进一步描述:Below in conjunction with accompanying drawing and embodiment, the present invention is further described:

图1为本发明的结构示意图。FIG. 1 is a schematic structural diagram of the present invention.

图2为本发明的分类标签示意图。FIG. 2 is a schematic diagram of a classification label of the present invention.

图3为本发明的具体实例对比图。3 is a comparison diagram of a specific example of the present invention.

具体实施方式Detailed ways

以下结合说明书附图对本发明做出进一步详细说明:The present invention is further described in detail below in conjunction with the accompanying drawings:

本发明提供的一种基于多组学数据的肝癌数据处理系统,包括预处理模块、数据降维处理模块、分类处理模块以及分类器模块;The invention provides a liver cancer data processing system based on multi-omics data, comprising a preprocessing module, a data dimensionality reduction processing module, a classification processing module and a classifier module;

所述预处理模块,用于对肝癌多组学数据进行筛选,并将筛选出的目标数据输出至数据降维处理模块中;The preprocessing module is used for screening liver cancer multi-omics data, and outputting the screened target data to the data dimension reduction processing module;

所述数据降维处理模块,用于接收预处理模块输出的目标数据,并对目标数据进行降维处理,并将降维处理后的目标数据输出至数据降维处理模块中;The data dimensionality reduction processing module is used to receive the target data output by the preprocessing module, perform dimensionality reduction processing on the target data, and output the dimensionality reduction processed target data to the data dimensionality reduction processing module;

所述分类处理模块,用于接收数据降维处理模块输出的降维后的目标数据,并根据降维后的目标数据进行分类处理,并输出分类标签;The classification processing module is configured to receive the dimensionally reduced target data output by the data dimensionality reduction processing module, perform classification processing according to the dimensionally reduced target data, and output a classification label;

所述分类器模块,用于接收分类标签,采用分类标签对分类器模块进行训练,然后分类器模块接收实时的多组学肝癌数据并对肝癌生存期进行预测;通过本发明,能够对肝癌多组学数据进行良好地融合,有效利用数据的互补性将肝癌多组学数据融合在一起,从而有效避免了在数据处理过程中特征信息丢失,有效确保数据处理的准确性,为后续肝癌生存期预测的准确性提供保障。The classifier module is used to receive the classification label, and use the classification label to train the classifier module, and then the classifier module receives the real-time multi-omics liver cancer data and predicts the survival period of liver cancer; The omics data is well fused, and the complementarity of the data is effectively used to fuse the multi-omics data of liver cancer, thereby effectively avoiding the loss of characteristic information in the process of data processing, effectively ensuring the accuracy of data processing, and improving the survival of subsequent liver cancer. The accuracy of the forecast is guaranteed.

本实施例中,所述预处理模块对肝癌多组学数据筛选包括:In this embodiment, the preprocessing module for screening liver cancer multi-omics data includes:

所述预处理模块基于单变量Cox-PH模型对肝癌多组学数据的每个特征进行评分,然后将分值Per1与设定阈值Py进行对比,筛选出Per1<Py的特征,并将筛选出的数据进行融合形成目标数据,其中,设定阈值Py一般设定为0.5,通过上述,能够有效防止处理过程中信息的丢失,从而确保最终结果的准确性。The preprocessing module scores each feature of the multi-omics data of liver cancer based on the univariate Cox-PH model, then compares the score Per1 with the set threshold P y to screen out the features with Per1 < P y The filtered data are fused to form target data, wherein the set threshold P y is generally set to 0.5. Through the above, the loss of information during processing can be effectively prevented, thereby ensuring the accuracy of the final result.

本实施例中,所述数据降维处理模块对目标数据进行降维处理具体包括:In this embodiment, the dimensionality reduction processing performed on the target data by the data dimensionality reduction processing module specifically includes:

SA1.在数据降维处理模块中构建K层自编码器,其中,K层自编码器的输出函数为:SA1. Build a K-layer autoencoder in the data dimensionality reduction processing module, where the output function of the K-layer autoencoder is:

x'=Relu(Wi·Relu(Wix+bi));其中,Wi为相邻自编码器之间的权重矩阵,bi为权重矩阵Wi的偏移量,x为m维目标数据X=(x1,x2,…,xm)中的特征值;x'=Relu(W i · Relu (W i x+ bi )); wherein, Wi is the weight matrix between adjacent self-encoders, bi is the offset of the weight matrix Wi , and x is m Dimensional target data X = eigenvalues in (x 1 , x 2 ,...,x m );

SA2.数据降维处理模块构建损失函数,其中,损失函数为:SA2. The data dimensionality reduction processing module constructs a loss function, where the loss function is:

Figure BDA0002681542370000051
其中,L(x,x')为损失函数,βw为正则化惩罚系数,
Figure BDA0002681542370000052
Figure BDA0002681542370000051
Among them, L(x,x') is the loss function, βw is the regularization penalty coefficient,
Figure BDA0002681542370000052

SA3.通过损失函数进行迭代运算,更新权重矩阵Wi和权重矩阵Wi的偏移量bi,直至达到迭代次数后,数据降维处理模块输出降维处理后的目标数据。SA3. Perform an iterative operation through the loss function, update the weight matrix Wi and the offset bi of the weight matrix Wi , until the number of iterations is reached, the data dimensionality reduction processing module outputs the target data after dimensionality reduction.

本实施例中,所述分类处理模块的生存期预测具体包括:In this embodiment, the survival prediction of the classification processing module specifically includes:

SB1.分类处理模块采用单变量Cox-PH模型对降维处理后的目标数据中的特征再次进行评分,然后将特征的评分值Per2与设定阈值Py进行比较,筛选出Per2<Py的特征,并将筛选出的数据进行融合处理,其中,该数据融合过程中为将多个特征组合形成一个特征矩阵;SB1. The classification processing module uses the univariate Cox-PH model to score the features in the target data after dimensionality reduction processing again, and then compares the score value Per2 of the feature with the set threshold P y , and filters out Per2 < P y . features, and fuse the filtered data, wherein, in the data fusion process, a feature matrix is formed by combining multiple features;

SB2.分类处理模块构建归一化处理模型,并对步骤SB1处理后的数据进行归一化处理,其中,归一化处理模型为:SB2. The classification processing module constructs a normalization processing model, and normalizes the data processed in step SB1, wherein the normalization processing model is:

Figure BDA0002681542370000061
p为步骤SB1输出的特征数据,P为归一化处理后的特征数据,Var(p)为特征数据p的方差,E(p)为特征数据p的经验平均值;
Figure BDA0002681542370000061
p is the characteristic data output in step SB1, P is the normalized characteristic data, Var(p) is the variance of the characteristic data p, and E(p) is the empirical average value of the characteristic data p;

SB3.分类处理模块构建相似性函数:SB3. The classification processing module builds the similarity function:

Figure BDA0002681542370000062
其中,W(i,j)为第i个样本zi与第j个样本zj的相似性,θij为归一化因子;其中:
Figure BDA0002681542370000062
Among them, W(i,j) is the similarity between the ith sample zi and the jth sample z j , and θ ij is the normalization factor; where:

Figure BDA0002681542370000063
λi为第i个样本zi的k个近邻,λj为第j个样本zj的k个近邻;zr表示λi里的第r个样本。
Figure BDA0002681542370000063
λ i is the k nearest neighbors of the ith sample zi i , λ j is the k nearest neighbors of the j th sample z j ; z r represents the r th sample in λ i .

SB4.分类处理模块根据相似性函数确定出分类标签,并输出至分类器模块。其中,分类器模块采用XGBoost分类器,多组学肝癌数据包括RNA测序数据、miRNA数据、DNA甲基化数据;以RNA测序数据为例:在预处理模块进行筛选时,从RNA测序数据中筛选出符合筛选标准的特征数据,然后各个RNA测序数据的筛选数据进行重新组合,形成一个新的RNA测序数据。SB4. The classification processing module determines the classification label according to the similarity function, and outputs it to the classifier module. Among them, the classifier module uses the XGBoost classifier, and the multi-omics liver cancer data includes RNA sequencing data, miRNA data, and DNA methylation data; taking RNA sequencing data as an example: when the preprocessing module performs screening, the RNA sequencing data is screened. Feature data that meet the screening criteria are obtained, and then the screening data of each RNA sequencing data are recombined to form a new RNA sequencing data.

而在步骤SB1中,则将三种多组学数据筛选出的特征融合形成一个数据矩阵,该数据矩阵为n×n阶,将该矩阵的每一列作为一个样本,那么在进行聚类处理时具有n个样本{z1,z2,…,zn},分类器模块通过上述对各个样本进行聚类分析,得出最终的分类标签,一般来说,分类标签设定为2个。In step SB1, the features selected from the three kinds of multi-omics data are fused to form a data matrix, and the data matrix is of order n×n, and each column of the matrix is used as a sample. With n samples {z 1 , z 2 ,..., z n }, the classifier module obtains the final classification label by performing cluster analysis on each sample as described above. Generally speaking, the classification label is set to 2.

从GEO数据库中挖掘的数据集GSE14520和GSE31384分别作为RNA-seq和miRNA训练分类器的确认队列。对于这两个确认队列,我们首先选择训练集样本中的共同特征,然后使用与多组分数据规范化相同的方法对数据进行规范化。在研究中,我们需要为训练集和两个队列选择基于聚类标签的M个特征。这样,两个队列将作为验证数据集对模型进行测试,最终得到分类结果。在这里,我们设置M的值(50-100),发现当M的值设置为50时,所得到的训练模型可以获得最佳的预测结果。Datasets GSE14520 and GSE31384 were mined from the GEO database as validation cohorts for RNA-seq and miRNA training classifiers, respectively. For both confirmation cohorts, we first select common features in the training set samples, and then normalize the data using the same method as for multicomponent data normalization. In our study, we need to select M features based on cluster labels for the training set and two cohorts. In this way, the two cohorts will be used as validation datasets to test the model and finally get the classification result. Here, we set the value of M (50-100) and found that when the value of M is set to 50, the resulting trained model can achieve the best prediction results.

以TCGA为训练数据集,获得肝癌的RNA-seq、miRNA-seq和DNA甲基化数据,预测处理模块构建单变量Cox-PH模型得到Per1<0.05的特征,然后将处理后的多组学数据输入到降维处理模块处理后,输入到分类处理模块中再次构建单变量Cox-PH模型进行筛选得到Per1<0.05的特征,最后,分类器模块使用谱聚类获得两个生存差异显著的亚型,基于得到的聚类标签,分类器模块还使用XGBoost分类器通过聚类标签进行训练,然后输入实时的多组学肝癌数据进行生存期预测。为了验证该分类器在预测生存率方面的有效性,我们使用了来自GEO的两组数据,即GSE1452和GES31384来验证该模型如图2。对于两种生存亚型的生存曲线,我们的结果优于其他模型的结果,可见与其他已发表的模型相比,我们的模型的预测效果有了显著的提高。Using TCGA as the training data set, the RNA-seq, miRNA-seq and DNA methylation data of liver cancer were obtained, and the prediction processing module constructed a univariate Cox-PH model to obtain the feature of Per1<0.05, and then processed the multi-omics data. After input to the dimensionality reduction processing module for processing, input to the classification processing module to construct a univariate Cox-PH model again for screening to obtain features with Per1 < 0.05. Finally, the classifier module uses spectral clustering to obtain two subtypes with significant survival differences. , based on the obtained cluster labels, the classifier module also uses the XGBoost classifier to train with the cluster labels, and then input real-time multi-omics liver cancer data for survival prediction. To verify the effectiveness of this classifier in predicting survival, we used two sets of data from GEO, namely GSE1452 and GES31384, to validate the model as shown in Figure 2. For the survival curves of the two survival subtypes, our results outperformed those of the other models, showing a significant improvement in the predictive power of our model compared to other published models.

最后,我们还将我们的结果与其他模型的结果进行了比较。无论是对数秩P值还是C指数,我们的实验结果都明显优于其他实验结果,如图3。Finally, we also compare our results with those of other models. Both the log-rank P value and the C index, our experimental results are significantly better than other experimental results, as shown in Figure 3.

在差异基因表达分析中,我们可以鉴定1465个上调基因和930个下调基因,包括肿瘤标记基因BIRC5(P=2.07e-41)和干细胞标记基因CD24(P=2.83e-11)、KRT19(P=2.82e-26)和EPCAM(P=1.01e-6)。此外,我们还发现了28个基因(SLC2A2、AQP9、RGN、SULT2A1、CRYL1、SERPINC1、PAH、CDO1、PLG、APOC3、CYP27A1、PFKFB3、TM4SF1、ACSL5、RGS2、HN1、SERPINA10、CYB5A、EPHX2、SPHX2、RGS1、ADH1B、LECT2、TBX3、RNASE4、ALDOA、ADH6,SLC38A1)在我们确定的两个生存风险组之间是不同的,并且与肝癌的生存有很强的关系。In differential gene expression analysis, we could identify 1465 up-regulated genes and 930 down-regulated genes, including tumor marker gene BIRC5 (P=2.07e-41) and stem cell marker genes CD24 (P=2.83e-11), KRT19 (P=2.83e-11) = 2.82e-26) and EPCAM (P = 1.01e-6). In addition, we also found 28 genes (SLC2A2, AQP9, RGN, SULT2A1, CRYL1, SERPINC1, PAH, CDO1, PLG, APOC3, CYP27A1, PFKFB3, TM4SF1, ACSL5, RGS2, HN1, SERPINA10, CYB5A, EPHX2, SPHX2, RGS1, ADH1B, LECT2, TBX3, RNASE4, ALDOA, ADH6, SLC38A1) were different between the two survival risk groups we identified and were strongly associated with HCC survival.

对于通过差异分析获得的差异表达基因,我们还对两个亚组进行了基因和基因组京都百科全书(KEGG)途径分析。PI3K-Akt信号通路、细胞周期信号通路、p53信号通路等在侵袭性亚型(C2)中富含肿瘤相关途径,其中P13K-Akt信号通路也与CD8+T细胞浸润有关。低危生存亚型(C1)存在药物代谢、细胞色素P450、代谢途径和脂肪酸降解等相关途径。这些途径对研究肝癌的预后具有重要意义。For differentially expressed genes obtained by differential analysis, we also performed Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis on both subgroups. The PI3K-Akt signaling pathway, cell cycle signaling pathway, and p53 signaling pathway are enriched in tumor-related pathways in the aggressive subtype (C2), among which the P13K-Akt signaling pathway is also associated with CD8+ T cell infiltration. The low-risk survival subtype (C1) has related pathways such as drug metabolism, cytochrome P450, metabolic pathways and fatty acid degradation. These pathways have important implications for studying the prognosis of liver cancer.

最后说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本发明技术方案的宗旨和范围,其均应涵盖在本发明的权利要求范围当中。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent replacements without departing from the spirit and scope of the technical solutions of the present invention should be included in the scope of the claims of the present invention.

Claims (4)

1.一种基于多组学数据的肝癌数据处理系统,其特征在于:包括预处理模块、数据降维处理模块、分类处理模块以及分类器模块;1. a liver cancer data processing system based on multi-omics data, is characterized in that: comprise preprocessing module, data dimensionality reduction processing module, classification processing module and classifier module; 所述预处理模块,用于对肝癌多组学数据进行筛选,并将筛选出的目标数据输出至数据降维处理模块中;The preprocessing module is used for screening liver cancer multi-omics data, and outputting the screened target data to the data dimension reduction processing module; 所述数据降维处理模块,用于接收预处理模块输出的目标数据,并对目标数据进行降维处理,并将降维处理后的目标数据输出至数据降维处理模块中;The data dimensionality reduction processing module is used to receive the target data output by the preprocessing module, perform dimensionality reduction processing on the target data, and output the dimensionality reduction processed target data to the data dimensionality reduction processing module; 所述分类处理模块,用于接收数据降维处理模块输出的降维后的目标数据,并根据降维后的目标数据进行分类处理,并输出分类标签;The classification processing module is configured to receive the dimensionally reduced target data output by the data dimensionality reduction processing module, perform classification processing according to the dimensionally reduced target data, and output a classification label; 所述分类器模块,用于接收分类标签,采用分类标签对分类器模块进行训练,然后分类器模块接收实时的多组学肝癌数据并对肝癌生存期进行预测。The classifier module is used for receiving classification labels, and using the classification labels to train the classifier module, and then the classifier module receives real-time multi-omics liver cancer data and predicts the survival time of liver cancer. 2.根据权利要求1所述基于多组学数据的肝癌数据处理系统,其特征在于:所述预处理模块对肝癌多组学数据筛选包括:2. The liver cancer data processing system based on multi-omics data according to claim 1, characterized in that: the screening of liver cancer multi-omics data by the preprocessing module comprises: 所述预处理模块基于单变量Cox-PH模型对肝癌多组学数据的每个特征进行评分,然后将分值Per1与设定阈值Py进行对比,筛选出Per1<Py的特征,并将筛选出的数据进行融合形成目标数据。The preprocessing module scores each feature of the multi-omics data of liver cancer based on the univariate Cox-PH model, and then compares the score Per1 with the set threshold P y to screen out the features with Per1 < P y The filtered data are fused to form target data. 3.根据权利要求2所述基于多组学数据的肝癌数据处理系统,其特征在于:所述数据降维处理模块对目标数据进行降维处理具体包括:3. The liver cancer data processing system based on multi-omics data according to claim 2, wherein the data dimensionality reduction processing module performs dimensionality reduction processing on the target data specifically comprising: SA1.在数据降维处理模块中构建K层自编码器,其中,K层自编码器的输出函数为:SA1. Build a K-layer autoencoder in the data dimensionality reduction processing module, where the output function of the K-layer autoencoder is: x'=Relu(Wi·Relu(Wix+bi));其中,Wi为相邻自编码器之间的权重矩阵,bi为权重矩阵Wi的偏移量,x为m维目标数据X=(x1,x2,…,xm)中的特征值;x'=Relu(W i · Relu (W i x+ bi )); wherein, Wi is the weight matrix between adjacent self-encoders, bi is the offset of the weight matrix Wi , and x is m Dimensional target data X = eigenvalues in (x 1 , x 2 ,...,x m ); SA2.数据降维处理模块构建损失函数,其中,损失函数为:SA2. The data dimensionality reduction processing module constructs a loss function, where the loss function is:
Figure FDA0002681542360000021
其中,L(x,x')为损失函数,βw为正则化惩罚系数,
Figure FDA0002681542360000022
Figure FDA0002681542360000021
Among them, L(x,x') is the loss function, βw is the regularization penalty coefficient,
Figure FDA0002681542360000022
SA3.通过损失函数进行迭代运算,更新权重矩阵Wi和权重矩阵Wi的偏移量bi,直至达到迭代次数后,数据降维处理模块输出降维处理后的目标数据。SA3. Perform an iterative operation through the loss function, update the weight matrix Wi and the offset bi of the weight matrix Wi , until the number of iterations is reached, the data dimensionality reduction processing module outputs the target data after dimensionality reduction.
4.根据权利要求3所述基于多组学数据的肝癌数据处理系统,其特征在于:所述分类处理模块的生存期预测具体包括:4. The liver cancer data processing system based on multi-omics data according to claim 3, wherein the survival prediction of the classification processing module specifically includes: SB1.分类处理模块采用单变量Cox-PH模型对降维处理后的目标数据中的特征再次进行评分,然后将特征的评分值Per2与设定阈值Py进行比较,筛选出Per2<Py的特征,并将筛选出的数据进行融合处理;SB1. The classification processing module uses the univariate Cox-PH model to score the features in the target data after dimensionality reduction processing again, and then compares the score value Per2 of the feature with the set threshold P y , and filters out Per2 < P y . features, and fuse the filtered data; SB2.分类处理模块构建归一化处理模型,并对步骤SB1处理后的数据进行归一化处理,其中,归一化处理模型为:SB2. The classification processing module constructs a normalization processing model, and normalizes the data processed in step SB1, wherein the normalization processing model is:
Figure FDA0002681542360000023
p为步骤SB1输出的特征数据,P为归一化处理后的特征数据,Var(p)为特征数据p的方差,E(p)为特征数据p的经验平均值;
Figure FDA0002681542360000023
p is the characteristic data output in step SB1, P is the normalized characteristic data, Var(p) is the variance of the characteristic data p, and E(p) is the empirical average value of the characteristic data p;
SB3.分类处理模块构建相似性函数:SB3. The classification processing module builds the similarity function:
Figure FDA0002681542360000024
其中,W(i,j)为第i个样本zi与第j个样本zj的相似性,θij为归一化因子;其中:
Figure FDA0002681542360000024
Among them, W(i,j) is the similarity between the ith sample zi and the jth sample z j , and θ ij is the normalization factor; where:
Figure FDA0002681542360000025
λi为第i个样本zi的k个近邻,λj为第j个样本zj的k个近邻;zr表示λi里的第r个样本。
Figure FDA0002681542360000025
λ i is the k nearest neighbors of the ith sample zi i , λ j is the k nearest neighbors of the j th sample z j ; z r represents the r th sample in λ i .
SB4.分类处理模块根据相似性函数确定出分类标签,并输出至分类器模块。SB4. The classification processing module determines the classification label according to the similarity function, and outputs it to the classifier module.
CN202010963978.3A 2020-09-14 2020-09-14 Liver cancer data processing system based on multiple groups of study data Active CN112086199B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010963978.3A CN112086199B (en) 2020-09-14 2020-09-14 Liver cancer data processing system based on multiple groups of study data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010963978.3A CN112086199B (en) 2020-09-14 2020-09-14 Liver cancer data processing system based on multiple groups of study data

Publications (2)

Publication Number Publication Date
CN112086199A true CN112086199A (en) 2020-12-15
CN112086199B CN112086199B (en) 2023-06-09

Family

ID=73738141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010963978.3A Active CN112086199B (en) 2020-09-14 2020-09-14 Liver cancer data processing system based on multiple groups of study data

Country Status (1)

Country Link
CN (1) CN112086199B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112820403A (en) * 2021-02-25 2021-05-18 中山大学 Deep learning method for predicting prognosis risk of cancer patient based on multiple groups of mathematical data
CN115497561A (en) * 2022-09-01 2022-12-20 北京吉因加医学检验实验室有限公司 Method and device for layering screening of methylation markers
CN115982644A (en) * 2023-01-19 2023-04-18 中国医学科学院肿瘤医院 A classification model construction and data processing method for esophageal squamous cell carcinoma

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100292303A1 (en) * 2007-07-20 2010-11-18 Birrer Michael J Gene expression profile for predicting ovarian cancer patient survival
CN105512477A (en) * 2015-12-03 2016-04-20 万达信息股份有限公司 Unplanned readmission risk assessment prediction model based on dimension reduction combination classification algorithm
US20170039345A1 (en) * 2015-07-13 2017-02-09 Biodesix, Inc. Predictive test for melanoma patient benefit from antibody drug blocking ligand activation of the T-cell programmed cell death 1 (PD-1) checkpoint protein and classifier development methods
JP6080184B1 (en) * 2016-02-29 2017-02-15 常雄 小林 Data collection method used to classify cancer life
CN107066781A (en) * 2016-11-03 2017-08-18 西南大学 Analysis method based on the related colorectal cancer data model of h and E
CN107132268A (en) * 2017-06-21 2017-09-05 佛山科学技术学院 A kind of data processing equipment and system for being used to recognize cancerous lung tissue
CN107169535A (en) * 2017-07-06 2017-09-15 谈宜勇 The deep learning sorting technique and device of biological multispectral image
US20180357377A1 (en) * 2017-06-13 2018-12-13 Alexander Bagaev Systems and methods for generating, visualizing and classifying molecular functional profiles
CN110010250A (en) * 2019-04-29 2019-07-12 青岛科技大学 Classification method of frailty in patients with cardiovascular disease based on data mining technology
CN110580956A (en) * 2019-09-19 2019-12-17 青岛市市立医院 A group of liver cancer prognostic markers and their application
CN110852291A (en) * 2019-11-15 2020-02-28 太原科技大学 A Palatal Wrinkle Recognition Method Using Gabor Transform and Block Dimensionality Reduction
CN111161882A (en) * 2019-12-04 2020-05-15 深圳先进技术研究院 Breast cancer life prediction method based on deep neural network
US20200211716A1 (en) * 2018-12-31 2020-07-02 Tempus Labs Method and process for predicting and analyzing patient cohort response, progression, and survival

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100292303A1 (en) * 2007-07-20 2010-11-18 Birrer Michael J Gene expression profile for predicting ovarian cancer patient survival
US20170039345A1 (en) * 2015-07-13 2017-02-09 Biodesix, Inc. Predictive test for melanoma patient benefit from antibody drug blocking ligand activation of the T-cell programmed cell death 1 (PD-1) checkpoint protein and classifier development methods
CN105512477A (en) * 2015-12-03 2016-04-20 万达信息股份有限公司 Unplanned readmission risk assessment prediction model based on dimension reduction combination classification algorithm
JP6080184B1 (en) * 2016-02-29 2017-02-15 常雄 小林 Data collection method used to classify cancer life
CN107066781A (en) * 2016-11-03 2017-08-18 西南大学 Analysis method based on the related colorectal cancer data model of h and E
US20180357377A1 (en) * 2017-06-13 2018-12-13 Alexander Bagaev Systems and methods for generating, visualizing and classifying molecular functional profiles
CN107132268A (en) * 2017-06-21 2017-09-05 佛山科学技术学院 A kind of data processing equipment and system for being used to recognize cancerous lung tissue
CN107169535A (en) * 2017-07-06 2017-09-15 谈宜勇 The deep learning sorting technique and device of biological multispectral image
US20200211716A1 (en) * 2018-12-31 2020-07-02 Tempus Labs Method and process for predicting and analyzing patient cohort response, progression, and survival
CN110010250A (en) * 2019-04-29 2019-07-12 青岛科技大学 Classification method of frailty in patients with cardiovascular disease based on data mining technology
CN110580956A (en) * 2019-09-19 2019-12-17 青岛市市立医院 A group of liver cancer prognostic markers and their application
CN110852291A (en) * 2019-11-15 2020-02-28 太原科技大学 A Palatal Wrinkle Recognition Method Using Gabor Transform and Block Dimensionality Reduction
CN111161882A (en) * 2019-12-04 2020-05-15 深圳先进技术研究院 Breast cancer life prediction method based on deep neural network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
TONG,DY: "《Improving prediction performance of colon cancer prognosis based on the integration of clinical and multi-omics data》", 《BMC MEDICAL INFORMATICS AND DECISION MAKING》, vol. 20, no. 1 *
潘浩;王昭;姚佳文;: "深度学习在肺癌患者生存预测中的应用研究", 计算机工程与应用, no. 14 *
田梓君;崔新于;: "基于数据处理的肿瘤基因选择系统", 无线互联科技, no. 08 *
陈景安: "《乳癌病人临床数据的降维处理及生存预测分析 》", 《医药卫生科技辑》, pages 072 - 1918 *
齐惠颖: "《基于多组学数据融合构建乳腺癌生存预测模型 》", 《数据分析与知识发现 》, no. 8, pages 88 - 93 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112820403A (en) * 2021-02-25 2021-05-18 中山大学 Deep learning method for predicting prognosis risk of cancer patient based on multiple groups of mathematical data
CN112820403B (en) * 2021-02-25 2024-03-29 中山大学 A deep learning method to predict the prognostic risk of cancer patients based on multi-omics data
CN115497561A (en) * 2022-09-01 2022-12-20 北京吉因加医学检验实验室有限公司 Method and device for layering screening of methylation markers
CN115497561B (en) * 2022-09-01 2023-08-29 北京吉因加医学检验实验室有限公司 Methylation marker layered screening method and device
CN115982644A (en) * 2023-01-19 2023-04-18 中国医学科学院肿瘤医院 A classification model construction and data processing method for esophageal squamous cell carcinoma
CN115982644B (en) * 2023-01-19 2024-04-30 中国医学科学院肿瘤医院 Esophageal squamous cell carcinoma classification model construction and data processing method

Also Published As

Publication number Publication date
CN112086199B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
Dann et al. Differential abundance testing on single-cell data using k-nearest neighbor graphs
JP7689557B2 (en) An integrated machine learning framework for inferring homologous recombination defects
US20230187021A1 (en) Methods for Non-Invasive Assessment of Genomic Instability
CN109072309B (en) Cancer evolution detection and diagnosis
CN112086199B (en) Liver cancer data processing system based on multiple groups of study data
Hu et al. Classifying the multi-omics data of gastric cancer using a deep feature selection method
WO2018136888A1 (en) Methods for non-invasive assessment of genetic alterations
IL267913B1 (en) Methods and processes for assessment of genetic variations
Kalyakulina et al. Disease classification for whole-blood DNA methylation: Meta-analysis, missing values imputation, and XAI
Zeng et al. couple CoC+: An information-theoretic co-clustering-based transfer learning framework for the integrative analysis of single-cell genomic data
Tsui et al. Artificial intelligence and machine learning in cell-free-DNA-based diagnostics
CN114974432A (en) Screening method of biomarker and related application thereof
CN112037863B (en) Early NSCLC prognosis prediction system
CN114360642A (en) Cancer transcriptome data processing method based on gene co-expression network analysis
CN118841180A (en) Method and system for constructing acute myeloid leukemia prognosis model
US20240312564A1 (en) White blood cell contamination detection
Ahmad et al. Deep learning-based computational approach for predicting ncRNAs-disease associations in metaplastic breast cancer diagnosis
US20240076744A1 (en) METHODS AND SYSTEMS FOR mRNA BOUNDARY ANALYSIS IN NEXT GENERATION SEQUENCING
Cai et al. Computational methods in predicting complex disease associated genes and environmental factors
Chowdhury et al. predicting high-risk individuals for common diseases using multi-omics and epidemiological data
CN119120699B (en) Marker for classifying olfactory neuroblastoma subtype and application thereof
CN113380324B (en) T cell receptor sequence motif combination recognition detection method, storage medium and equipment
Özer et al. SVM-DO: identification of tumor-discriminating mRNA signatures via support vectormachines supported by disease ontology
Fontanari Investigating pooling in graph neural networks for cancer genomics classification and the generalizability of pan-cancer models to cancer-specific predictions
Roy et al. Comparative transcriptomic analysis uncovers molecular heterogeneity in hepatobiliary cancers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 401121 No. 53, middle section of Huangshan Avenue, Yubei District, Chongqing

Patentee after: Western Research Institute of China Science and technology computing technology

Country or region after: China

Address before: 401121 No. 53, middle section of Huangshan Avenue, Yubei District, Chongqing

Patentee before: Western Institute of advanced technology, Institute of computing, Chinese Academy of Sciences

Country or region before: China

CP03 Change of name, title or address