CN112086199A - Liver cancer data processing system based on multiple groups of mathematical data - Google Patents

Liver cancer data processing system based on multiple groups of mathematical data Download PDF

Info

Publication number
CN112086199A
CN112086199A CN202010963978.3A CN202010963978A CN112086199A CN 112086199 A CN112086199 A CN 112086199A CN 202010963978 A CN202010963978 A CN 202010963978A CN 112086199 A CN112086199 A CN 112086199A
Authority
CN
China
Prior art keywords
data
liver cancer
module
processing module
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010963978.3A
Other languages
Chinese (zh)
Other versions
CN112086199B (en
Inventor
任菲
王忠烈
谭光明
刘玉东
段勃
张春明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Western Institute Of Advanced Technology Institute Of Computing Chinese Academy Of Sciences
Original Assignee
Western Institute Of Advanced Technology Institute Of Computing Chinese Academy Of Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Western Institute Of Advanced Technology Institute Of Computing Chinese Academy Of Sciences filed Critical Western Institute Of Advanced Technology Institute Of Computing Chinese Academy Of Sciences
Priority to CN202010963978.3A priority Critical patent/CN112086199B/en
Publication of CN112086199A publication Critical patent/CN112086199A/en
Application granted granted Critical
Publication of CN112086199B publication Critical patent/CN112086199B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Primary Health Care (AREA)
  • Bioethics (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Biotechnology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention provides a liver cancer data processing system based on multigroup data, which comprises a preprocessing module, a data dimension reduction processing module, a classification processing module and a classifier module, wherein the preprocessing module is used for preprocessing the liver cancer data; the preprocessing module is used for screening the liver cancer multigroup data and outputting the screened target data to the data dimension reduction processing module; the data dimension reduction processing module is used for receiving the target data output by the preprocessing module, performing dimension reduction processing on the target data, and outputting the target data subjected to the dimension reduction processing to the data dimension reduction processing module; the classification processing module is used for receiving the target data after dimensionality reduction output by the data dimensionality reduction processing module, performing classification processing according to the target data after dimensionality reduction, and outputting a classification label; the classifier module is used for receiving the classification labels, training the classifier module by adopting the classification labels, receiving real-time multigroup liver cancer data and predicting the life cycle of the liver cancer; the method can well fuse multiple groups of chemical data of the liver cancer, effectively fuse the multiple groups of chemical data of the liver cancer by utilizing the complementarity of the data, thereby effectively avoiding the loss of characteristic information in the data processing process, effectively ensuring the accuracy of data processing and providing guarantee for the accuracy of the prediction of the subsequent life cycle of the liver cancer.

Description

Liver cancer data processing system based on multiple groups of mathematical data
Technical Field
The invention relates to a data processing system, in particular to a liver cancer data processing system based on multigroup data.
Background
Early liver cancer is mainly removed by operation, but clinical data show that the recurrence rate of liver cancer after operation is about 70%, which seriously hinders the long-term survival of patients. If we establish HCC typing standard, more detailed hierarchical management is carried out on high-risk recurrent patients, people who may benefit are firstly screened from the source and then surgery is carried out, and the HCC typing standard has more important significance on improving the survival of the patients and realizing accurate treatment of HCC. Establishing classification standard of liver cancer based on multiple groups of data, and performing more accurate prognosis treatment and management on different patients to improve survival rate of the patients. Therefore, the method has important significance for fusing multigroup data to classify patients from molecular level and predict the prognosis of the patients, and also has clinical significance for the treatment of the patients.
In recent years, there is also a method of predicting prognosis by typing liver cancer by fusing RNA sequencing data, miRNA data, methylation data, and clinical survival data of patients with liver cancer. However, few researchers in the prior art have considered the survival status of patients when studying molecular subtypes. The survival rate has important clinical significance for the research of molecular subtypes, and the huge difference of the survival rate has great influence on the molecular subtypes. The fusion of multiple sets of mathematical data for molecular typing and prediction of prognosis has the following two characteristics: (1) the fusion period of multigroup data is generally divided into early fusion, middle fusion and later fusion, and different fusion periods have great influence on the fusion result. (2) The way of fusion also has a great influence. The fusion method or system in the prior art has the following defects: on one hand, an automatic encoder is adopted to integrate input data, but characteristic data are easily lost, and on the other hand, the prior art only simply and directly superposes the data, so that different data are poor in fusion, the data cannot be complemented, and accurate information cannot be extracted.
Therefore, in order to solve the above technical problems, it is necessary to provide a new technical means.
Disclosure of Invention
In view of this, the present invention provides a liver cancer data processing system based on multiple sets of mathematical data, which can fuse multiple sets of mathematical data of a liver cancer well, and effectively fuse multiple sets of mathematical data of the liver cancer by using complementarity of the data, thereby effectively avoiding loss of characteristic information during data processing, effectively ensuring accuracy of data processing, and providing guarantee for accuracy of prediction of a subsequent liver cancer lifetime.
The invention provides a liver cancer data processing system based on multigroup data, which comprises a preprocessing module, a data dimension reduction processing module, a classification processing module and a classifier module, wherein the preprocessing module is used for preprocessing the liver cancer data;
the preprocessing module is used for screening the liver cancer multigroup data and outputting the screened target data to the data dimension reduction processing module;
the data dimension reduction processing module is used for receiving the target data output by the preprocessing module, performing dimension reduction processing on the target data, and outputting the target data subjected to the dimension reduction processing to the data dimension reduction processing module;
the classification processing module is used for receiving the target data after dimensionality reduction output by the data dimensionality reduction processing module, performing classification processing according to the target data after dimensionality reduction, and outputting a classification label;
the classifier module is used for receiving the classification labels, training the classifier module by adopting the classification labels, receiving real-time multigroup liver cancer data and predicting the life cycle of the liver cancer.
Further, the preprocessing module screens the liver cancer multigroup data, and comprises:
the preprocessing module scores each feature of the liver cancer multiomics data based on a univariate Cox-PH model and then scores Per1 and a set threshold value PyFor comparison, screening out Per1< PyAnd fusing the screened data to form target data.
Further, the performing, by the data dimension reduction processing module, dimension reduction processing on the target data specifically includes:
SA1, constructing a K-layer self-encoder in a data dimension reduction processing module, wherein an output function of the K-layer self-encoder is as follows:
x'=Relu(Wi·Relu(Wix+bi) ); wherein, WiAs a weight matrix between adjacent autocoders, biIs a weight matrix WiX is m-dimensional target data X ═ X1,x2,…,xm) The characteristic value of (1);
SA2, the data dimension reduction processing module constructs a loss function, wherein the loss function is as follows:
Figure BDA0002681542370000031
wherein L (x, x') is a loss function, βwIn order to regularize the penalty coefficients,
Figure BDA0002681542370000032
SA3, carrying out iterative operation through a loss function, and updating a weight matrix WiAnd a weight matrix WiOffset b ofiAnd after the iteration times are reached, the data dimension reduction processing module outputs the target data after dimension reduction processing.
Further, the lifetime prediction of the classification processing module specifically includes:
SB1, the classification processing module adopts a univariate Cox-PH model to score the features in the target data after the dimensionality reduction again, and then the features are subjected to the dimensionality reduction processingFeature score value Per2 and set threshold value PyComparing and screening out Per2 < PyThe screened data are fused;
and SB2, the classification processing module constructs a normalization processing model and normalizes the data processed in the step SB1, wherein the normalization processing model is as follows:
Figure BDA0002681542370000033
p is the feature data output in step SB1, P is the feature data after normalization, var (P) is the variance of the feature data P, e (P) is the empirical mean of the feature data P;
and SB3, constructing a similarity function by a classification processing module:
Figure BDA0002681542370000034
wherein W (i, j) is the ith sample ziAnd j sample zjOf (a) similarity, θijIs a normalization factor; wherein:
Figure BDA0002681542370000041
λiis the ith sample ziK neighbors of, lambdajIs the jth sample zjK neighbors of (a); z is a radical ofrDenotes λiThe r-th sample of (2).
And SB4, the classification processing module determines a classification label according to the similarity function and outputs the classification label to the classifier module.
The invention has the beneficial effects that: according to the invention, the liver cancer multigroup chemical data can be well fused, and the liver cancer multigroup chemical data can be fused together by effectively utilizing the complementarity of the data, so that the loss of characteristic information in the data processing process is effectively avoided, the accuracy of data processing is effectively ensured, and the accuracy of the prediction of the subsequent liver cancer life cycle is guaranteed.
Drawings
The invention is further described below with reference to the following figures and examples:
FIG. 1 is a schematic structural diagram of the present invention.
FIG. 2 is a schematic diagram of the present invention.
FIG. 3 is a comparison of an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings of the specification:
the invention provides a liver cancer data processing system based on multigroup data, which comprises a preprocessing module, a data dimension reduction processing module, a classification processing module and a classifier module, wherein the preprocessing module is used for preprocessing the liver cancer data;
the preprocessing module is used for screening the liver cancer multigroup data and outputting the screened target data to the data dimension reduction processing module;
the data dimension reduction processing module is used for receiving the target data output by the preprocessing module, performing dimension reduction processing on the target data, and outputting the target data subjected to the dimension reduction processing to the data dimension reduction processing module;
the classification processing module is used for receiving the target data after dimensionality reduction output by the data dimensionality reduction processing module, performing classification processing according to the target data after dimensionality reduction, and outputting a classification label;
the classifier module is used for receiving the classification labels, training the classifier module by adopting the classification labels, receiving real-time multigroup liver cancer data and predicting the life cycle of the liver cancer; according to the invention, the liver cancer multigroup chemical data can be well fused, and the liver cancer multigroup chemical data can be fused together by effectively utilizing the complementarity of the data, so that the loss of characteristic information in the data processing process is effectively avoided, the accuracy of data processing is effectively ensured, and the accuracy of the prediction of the subsequent liver cancer life cycle is guaranteed.
In this embodiment, the screening of the liver cancer multigroup mathematical data by the preprocessing module includes:
the preprocessing module scores each feature of the liver cancer multiomics data based on a univariate Cox-PH model and then scores Per1 and a set threshold value PyFor comparison, screening out Per1< PyAnd fusing the screened data to form target data, wherein a threshold value P is setyGenerally set to 0.5, which can effectively prevent information loss during processing and ensure the accuracy of the final result.
In this embodiment, the performing, by the data dimension reduction processing module, dimension reduction processing on the target data specifically includes:
SA1, constructing a K-layer self-encoder in a data dimension reduction processing module, wherein an output function of the K-layer self-encoder is as follows:
x'=Relu(Wi·Relu(Wix+bi) ); wherein, WiAs a weight matrix between adjacent autocoders, biIs a weight matrix WiX is m-dimensional target data X ═ X1,x2,…,xm) The characteristic value of (1);
SA2, the data dimension reduction processing module constructs a loss function, wherein the loss function is as follows:
Figure BDA0002681542370000051
wherein L (x, x') is a loss function, βwIn order to regularize the penalty coefficients,
Figure BDA0002681542370000052
SA3, carrying out iterative operation through a loss function, and updating a weight matrix WiAnd a weight matrix WiOffset b ofiAnd after the iteration times are reached, the data dimension reduction processing module outputs the target data after dimension reduction processing.
In this embodiment, the predicting the lifetime of the classification processing module specifically includes:
SB1, the classification processing module uses a univariate Cox-PH model to score the features in the target data after the dimensionality reduction again, and then scores the features Per2 and a set threshold value PyComparing and screening out Per2 < PyAnd fusing the screened data, wherein the data is obtained by the step (a)Combining a plurality of features to form a feature matrix in the data fusion process;
and SB2, the classification processing module constructs a normalization processing model and normalizes the data processed in the step SB1, wherein the normalization processing model is as follows:
Figure BDA0002681542370000061
p is the feature data output in step SB1, P is the feature data after normalization, var (P) is the variance of the feature data P, e (P) is the empirical mean of the feature data P;
and SB3, constructing a similarity function by a classification processing module:
Figure BDA0002681542370000062
wherein W (i, j) is the ith sample ziAnd j sample zjOf (a) similarity, θijIs a normalization factor; wherein:
Figure BDA0002681542370000063
λiis the ith sample ziK neighbors of, lambdajIs the jth sample zjK neighbors of (a); z is a radical ofrDenotes λiThe r-th sample of (2).
And SB4, the classification processing module determines a classification label according to the similarity function and outputs the classification label to the classifier module. The classifier module adopts an XGboost classifier, and the multigroup liver cancer data comprises RNA sequencing data, miRNA data and DNA methylation data; taking RNA sequencing data as an example: when the pretreatment module is used for screening, feature data meeting the screening standard is screened from the RNA sequencing data, and then the screening data of each RNA sequencing data is recombined to form new RNA sequencing data.
In step SB1, the features screened out from the three types of omics data are fused to form a data matrix of n × n order, and each column of the matrix is used as a sample, so that the clustering process is performedWith n samples z1,z2,…,znAnd (4) performing cluster analysis on each sample by the classifier module to obtain final classification labels, wherein generally, the number of the classification labels is set to 2.
The data sets GSE14520 and GSE31384 mined from the GEO database serve as validation queues for RNA-seq and miRNA-trained classifiers, respectively. For both validation queues, we first select common features in the training set samples and then normalize the data using the same method as for multi-component data normalization. In the study, we needed to select M features based on cluster labels for the training set and both queues. Thus, the two queues are used as verification data sets to test the model, and finally, a classification result is obtained. Here, we set the value of M (50-100), and found that when the value of M is set to 50, the obtained training model can obtain the best prediction result.
The method comprises the steps of obtaining RNA-seq, miRNA-seq and DNA methylation data of liver cancer by taking TCGA as a training data set, constructing a univariate Cox-PH model by a prediction processing module to obtain the characteristic of Per1<0.05, inputting the processed multigroup chemical data into a dimensionality reduction processing module for processing, inputting the processed multigroup chemical data into a classification processing module to construct the univariate Cox-PH model again for screening to obtain the characteristic of Per1<0.05, finally obtaining two subtypes with significant survival difference by a classifier module through spectral clustering, training the classifier module through an XGboost classifier through a clustering label based on the obtained clustering label, and inputting real-time multigroup chemical liver cancer data for life prediction. To verify the effectiveness of the classifier in predicting survival, we validated the model as in fig. 2 using two sets of data from GEO, i.e., GSE1452 and GES 31384. For the survival curves of the two survival subtypes, the results of the model are superior to those of other models, and the prediction effect of the model is obviously improved compared with other published models.
Finally, we also compared our results with those of other models. Whether the log rank is P value or C index, the experimental result is obviously better than other experimental results, such as figure 3.
In differential gene expression analysis, we could identify 1465 up-regulated genes and 930 down-regulated genes, including the tumor marker gene BIRC5(P ═ 2.07e-41) and the stem cell marker genes CD24(P ═ 2.83e-11), KRT19(P ═ 2.82e-26) and EPCAM (P ═ 1.01 e-6). Furthermore, we have found that 28 genes (SLC2a2, AQP9, RGN, SULT2a1, CRYL1, SERPINC1, PAH, CDO1, PLG, APOC3, CYP27a1, PFKFB3, TM4SF1, ACSL5, RGS2, HN1, SERPINA10, CYB5A, EPHX2, SPHX2, RGS1, ADH1B, LECT2, TBX3, RNASE4, ALDOA, ADH6, SLC38a1) differ between the two survival risk groups we identified and have a strong relationship with the survival of liver cancer.
For differentially expressed genes obtained by differential analysis, we also performed the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis on both subgroups. PI3K-Akt signal pathway, cell cycle signal pathway, P53 signal pathway and the like are rich in tumor-related pathways in invasive subtype (C2), wherein the P13K-Akt signal pathway is also related to CD8+ T cell infiltration. The low-risk survival subtype (C1) has related pathways such as drug metabolism, cytochrome P450, metabolic pathway and fatty acid degradation. These pathways have important significance for studying the prognosis of liver cancer.
Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims (4)

1. A liver cancer data processing system based on multigroup data is characterized in that: the system comprises a preprocessing module, a data dimension reduction processing module, a classification processing module and a classifier module;
the preprocessing module is used for screening the liver cancer multigroup data and outputting the screened target data to the data dimension reduction processing module;
the data dimension reduction processing module is used for receiving the target data output by the preprocessing module, performing dimension reduction processing on the target data, and outputting the target data subjected to the dimension reduction processing to the data dimension reduction processing module;
the classification processing module is used for receiving the target data after dimensionality reduction output by the data dimensionality reduction processing module, performing classification processing according to the target data after dimensionality reduction, and outputting a classification label;
the classifier module is used for receiving the classification labels, training the classifier module by adopting the classification labels, receiving real-time multigroup liver cancer data and predicting the life cycle of the liver cancer.
2. The system for processing liver cancer data based on multiple sets of mathematical data as claimed in claim 1, wherein: the pretreatment module screens the liver cancer multigroup data, and comprises the following steps:
the preprocessing module scores each feature of the liver cancer multiomics data based on a univariate Cox-PH model and then scores Per1 and a set threshold value PyFor comparison, screening out Per1< PyAnd fusing the screened data to form target data.
3. The system for processing liver cancer data based on multiple sets of mathematical data as claimed in claim 2, wherein: the data dimension reduction processing module specifically performs dimension reduction processing on the target data, and includes:
SA1, constructing a K-layer self-encoder in a data dimension reduction processing module, wherein an output function of the K-layer self-encoder is as follows:
x'=Relu(Wi·Relu(Wix+bi) ); wherein, WiAs a weight matrix between adjacent autocoders, biIs a weight matrix WiX is m-dimensional target data X ═ X1,x2,…,xm) The characteristic value of (1);
SA2, the data dimension reduction processing module constructs a loss function, wherein the loss function is as follows:
Figure FDA0002681542360000021
wherein L (x, x') is a loss function, βwIn order to regularize the penalty coefficients,
Figure FDA0002681542360000022
SA3, carrying out iterative operation through a loss function, and updating a weight matrix WiAnd a weight matrix WiOffset b ofiAnd after the iteration times are reached, the data dimension reduction processing module outputs the target data after dimension reduction processing.
4. The system of claim 3, wherein the liver cancer data processing system comprises: the lifetime prediction of the classification processing module specifically includes:
SB1, the classification processing module uses a univariate Cox-PH model to score the features in the target data after the dimensionality reduction again, and then scores the features Per2 and a set threshold value PyComparing and screening out Per2 < PyThe screened data are fused;
and SB2, the classification processing module constructs a normalization processing model and normalizes the data processed in the step SB1, wherein the normalization processing model is as follows:
Figure FDA0002681542360000023
p is the feature data output in step SB1, P is the feature data after normalization, var (P) is the variance of the feature data P, e (P) is the empirical mean of the feature data P;
and SB3, constructing a similarity function by a classification processing module:
Figure FDA0002681542360000024
wherein W (i, j) is the ith sample ziAnd j sample zjOf (a) similarity, θijIs a normalization factor; wherein:
Figure FDA0002681542360000025
λiis the ith sample ziK neighbors of, lambdajIs the jth sample zjK neighbors of (a); z is a radical ofrDenotes λiThe r-th sample of (2).
And SB4, the classification processing module determines a classification label according to the similarity function and outputs the classification label to the classifier module.
CN202010963978.3A 2020-09-14 2020-09-14 Liver cancer data processing system based on multiple groups of study data Active CN112086199B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010963978.3A CN112086199B (en) 2020-09-14 2020-09-14 Liver cancer data processing system based on multiple groups of study data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010963978.3A CN112086199B (en) 2020-09-14 2020-09-14 Liver cancer data processing system based on multiple groups of study data

Publications (2)

Publication Number Publication Date
CN112086199A true CN112086199A (en) 2020-12-15
CN112086199B CN112086199B (en) 2023-06-09

Family

ID=73738141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010963978.3A Active CN112086199B (en) 2020-09-14 2020-09-14 Liver cancer data processing system based on multiple groups of study data

Country Status (1)

Country Link
CN (1) CN112086199B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112820403A (en) * 2021-02-25 2021-05-18 中山大学 Deep learning method for predicting prognosis risk of cancer patient based on multiple groups of mathematical data
CN115497561A (en) * 2022-09-01 2022-12-20 北京吉因加医学检验实验室有限公司 Method and device for layering screening of methylation markers
CN115982644A (en) * 2023-01-19 2023-04-18 中国医学科学院肿瘤医院 Esophageal squamous cell carcinoma classification model construction and data processing method

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100292303A1 (en) * 2007-07-20 2010-11-18 Birrer Michael J Gene expression profile for predicting ovarian cancer patient survival
CN105512477A (en) * 2015-12-03 2016-04-20 万达信息股份有限公司 Unplanned readmission risk assessment prediction model based on dimension reduction combination classification algorithm
US20170039345A1 (en) * 2015-07-13 2017-02-09 Biodesix, Inc. Predictive test for melanoma patient benefit from antibody drug blocking ligand activation of the T-cell programmed cell death 1 (PD-1) checkpoint protein and classifier development methods
JP6080184B1 (en) * 2016-02-29 2017-02-15 常雄 小林 Data collection method used to classify cancer life
CN107066781A (en) * 2016-11-03 2017-08-18 西南大学 Analysis method based on the related colorectal cancer data model of h and E
CN107132268A (en) * 2017-06-21 2017-09-05 佛山科学技术学院 A kind of data processing equipment and system for being used to recognize cancerous lung tissue
CN107169535A (en) * 2017-07-06 2017-09-15 谈宜勇 The deep learning sorting technique and device of biological multispectral image
US20180357377A1 (en) * 2017-06-13 2018-12-13 Alexander Bagaev Systems and methods for generating, visualizing and classifying molecular functional profiles
CN110010250A (en) * 2019-04-29 2019-07-12 青岛科技大学 Cardiovascular patient weakness disease stage division based on data mining technology
CN110580956A (en) * 2019-09-19 2019-12-17 青岛市市立医院 liver cancer prognosis markers and application thereof
CN110852291A (en) * 2019-11-15 2020-02-28 太原科技大学 Palate wrinkle identification method adopting Gabor transformation and blocking dimension reduction
CN111161882A (en) * 2019-12-04 2020-05-15 深圳先进技术研究院 Breast cancer life prediction method based on deep neural network
US20200211716A1 (en) * 2018-12-31 2020-07-02 Tempus Labs Method and process for predicting and analyzing patient cohort response, progression, and survival

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100292303A1 (en) * 2007-07-20 2010-11-18 Birrer Michael J Gene expression profile for predicting ovarian cancer patient survival
US20170039345A1 (en) * 2015-07-13 2017-02-09 Biodesix, Inc. Predictive test for melanoma patient benefit from antibody drug blocking ligand activation of the T-cell programmed cell death 1 (PD-1) checkpoint protein and classifier development methods
CN105512477A (en) * 2015-12-03 2016-04-20 万达信息股份有限公司 Unplanned readmission risk assessment prediction model based on dimension reduction combination classification algorithm
JP6080184B1 (en) * 2016-02-29 2017-02-15 常雄 小林 Data collection method used to classify cancer life
CN107066781A (en) * 2016-11-03 2017-08-18 西南大学 Analysis method based on the related colorectal cancer data model of h and E
US20180357377A1 (en) * 2017-06-13 2018-12-13 Alexander Bagaev Systems and methods for generating, visualizing and classifying molecular functional profiles
CN107132268A (en) * 2017-06-21 2017-09-05 佛山科学技术学院 A kind of data processing equipment and system for being used to recognize cancerous lung tissue
CN107169535A (en) * 2017-07-06 2017-09-15 谈宜勇 The deep learning sorting technique and device of biological multispectral image
US20200211716A1 (en) * 2018-12-31 2020-07-02 Tempus Labs Method and process for predicting and analyzing patient cohort response, progression, and survival
CN110010250A (en) * 2019-04-29 2019-07-12 青岛科技大学 Cardiovascular patient weakness disease stage division based on data mining technology
CN110580956A (en) * 2019-09-19 2019-12-17 青岛市市立医院 liver cancer prognosis markers and application thereof
CN110852291A (en) * 2019-11-15 2020-02-28 太原科技大学 Palate wrinkle identification method adopting Gabor transformation and blocking dimension reduction
CN111161882A (en) * 2019-12-04 2020-05-15 深圳先进技术研究院 Breast cancer life prediction method based on deep neural network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
TONG,DY: "《Improving prediction performance of colon cancer prognosis based on the integration of clinical and multi-omics data》", 《BMC MEDICAL INFORMATICS AND DECISION MAKING》, vol. 20, no. 1 *
潘浩;王昭;姚佳文;: "深度学习在肺癌患者生存预测中的应用研究", 计算机工程与应用, no. 14 *
田梓君;崔新于;: "基于数据处理的肿瘤基因选择系统", 无线互联科技, no. 08 *
陈景安: "《乳癌病人临床数据的降维处理及生存预测分析 》", 《医药卫生科技辑》, pages 072 - 1918 *
齐惠颖: "《基于多组学数据融合构建乳腺癌生存预测模型 》", 《数据分析与知识发现 》, no. 8, pages 88 - 93 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112820403A (en) * 2021-02-25 2021-05-18 中山大学 Deep learning method for predicting prognosis risk of cancer patient based on multiple groups of mathematical data
CN112820403B (en) * 2021-02-25 2024-03-29 中山大学 Deep learning method for predicting prognosis risk of cancer patient based on multiple sets of learning data
CN115497561A (en) * 2022-09-01 2022-12-20 北京吉因加医学检验实验室有限公司 Method and device for layering screening of methylation markers
CN115497561B (en) * 2022-09-01 2023-08-29 北京吉因加医学检验实验室有限公司 Methylation marker layered screening method and device
CN115982644A (en) * 2023-01-19 2023-04-18 中国医学科学院肿瘤医院 Esophageal squamous cell carcinoma classification model construction and data processing method
CN115982644B (en) * 2023-01-19 2024-04-30 中国医学科学院肿瘤医院 Esophageal squamous cell carcinoma classification model construction and data processing method

Also Published As

Publication number Publication date
CN112086199B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
Liu et al. DeepCDR: a hybrid graph convolutional network for predicting cancer drug response
Caudai et al. AI applications in functional genomics
CN112086199A (en) Liver cancer data processing system based on multiple groups of mathematical data
Zhang et al. CircRNA-disease associations prediction based on metapath2vec++ and matrix factorization
WO2018136888A1 (en) Methods for non-invasive assessment of genetic alterations
Arslan et al. Machine learning in epigenomics: Insights into cancer biology and medicine
Zeng et al. couple CoC+: An information-theoretic co-clustering-based transfer learning framework for the integrative analysis of single-cell genomic data
CN115116624A (en) Drug sensitivity prediction method and device based on semi-supervised transfer learning
Dou et al. Single-nucleotide variant calling in single-cell sequencing data with Monopogen
Titus et al. Unsupervised deep learning with variational autoencoders applied to breast tumor genome-wide DNA methylation data with biologic feature extraction
Ming et al. LPM: a latent probit model to characterize the relationship among complex traits using summary statistics from multiple GWASs and functional annotations
Sun et al. Molecular subtyping of cancer based on distinguishing co-expression modules and machine learning
Thibodeau et al. CoRE-ATAC: A deep learning model for the functional classification of regulatory elements from single cell and bulk ATAC-seq data
CN114360642A (en) Cancer transcriptome data processing method based on gene co-expression network analysis
Kalyakulina et al. Disease classification for whole-blood DNA methylation: meta-analysis, missing values imputation, and XAI
Shi et al. Fundamental and practical approaches for single-cell ATAC-seq analysis
CN112037863B (en) Early NSCLC prognosis prediction system
KR20210110241A (en) Prediction system and method of cancer immunotherapy drug Sensitivity using multiclass classification A.I based on HLA Haplotype
CN110211634B (en) Method for joint analysis of multiple groups of chemical data
CN117457065A (en) Method and system for identifying phenotype-associated cell types based on single-cell multi-set chemical data
CN113921084B (en) Multi-dimensional target prediction method and system for disease-related non-coding RNA (ribonucleic acid) regulation and control axis
Gong et al. Interpretable single-cell transcription factor prediction based on deep learning with attention mechanism
Poinsignon et al. Working with Omics Data: An Interdisciplinary Challenge at the Crossroads of Biology and Computer Science
Shanan et al. Using alignment-free methods as preprocessing stage to classification whole genomes
Chowdhury et al. Predicting High-Risk Individuals for Common Diseases Using Multi-Omics and Epidemiological Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant