CN110942808A - Prognosis prediction method and prediction system based on gene big data - Google Patents

Prognosis prediction method and prediction system based on gene big data Download PDF

Info

Publication number
CN110942808A
CN110942808A CN201911256723.7A CN201911256723A CN110942808A CN 110942808 A CN110942808 A CN 110942808A CN 201911256723 A CN201911256723 A CN 201911256723A CN 110942808 A CN110942808 A CN 110942808A
Authority
CN
China
Prior art keywords
gene
data
prognosis
algorithm
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911256723.7A
Other languages
Chinese (zh)
Inventor
张海霞
刘艺迪
袁东风
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN201911256723.7A priority Critical patent/CN110942808A/en
Publication of CN110942808A publication Critical patent/CN110942808A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Abstract

The invention relates to a prognosis prediction method and a prediction system based on gene big data, which belong to the technical field of artificial intelligence and mainly comprise the following steps: extracting gene information in a tissue sample to form a training set, sequencing the gene importance by using a relief algorithm, performing fitting classification on the prognosis time by using a machine learning algorithm model, and selecting the algorithm model with the highest accuracy and the gene characteristic number as the gene characteristic number of the specific disease and a prediction method. The method can quickly test new gene data after model training is completed, and can help to carry out prognosis evaluation.

Description

Prognosis prediction method and prediction system based on gene big data
Technical Field
The invention relates to a cancer prognosis prediction method and a prediction system based on gene big data, belonging to the technical field of artificial intelligence.
Background
According to annual statistics reported by the american cancer society, 1 out of 4 cancer deaths died from lung cancer. While previous scholars have acquired a large amount of data from microarray technology and Next Generation Sequencing (NGS), the information in these data may not be fully explored. Traditional survival predictions depend on the clinical pathology of the patient and are sometimes inaccurate.
In recent years, with the development of next-generation sequencing technology, large-scale cancer sample gene sequencing data can be obtained, and the development of big data artificial intelligence makes it possible to mine valuable potential information from the massive data. At present, aiming at the problem of cancer prognosis prediction, intuitive clinical characteristics are generally used, and the prediction is carried out by combining a traditional statistical method. Although some studies have shifted the research focus to the level of gene characteristics, the traditional statistical methods are used to select gene characteristics according to the differences of gene expression, and some genes with smaller expression differences but larger influence on prognosis cannot be found. To be more accurate, in the present application, genetic features selected from the above data are correlated with the survival time of the patient, and the correlation between the genes and survival time is determined, resulting in a calibrated predictive model.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a novel cancer prognosis modeling method and a novel prediction system, and relates to a method for predicting and classifying cancer prognosis time based on gene big data and relevant finding of relevant pathogenic gene genes. The method is simple, efficient, and suitable for wide range of different cancers based on gene expression. The method comprises the steps of screening and cleaning sample data, screening and cleaning gene data, sequencing gene importance, training and selecting a model, and finally predicting a new sample. Helps doctors to pre-estimate the disease condition of cancer patients and assist in treatment.
The technical scheme of the invention is as follows:
a prognosis prediction method based on gene big data comprises the following steps:
(1) collecting and fusing data; collecting fresh or frozen cancer tissue samples of patients, sequencing to obtain gene data, and obtaining the survival time and the clinical data of the survival state of the patients according to follow-up visit investigation; fusing gene data with clinical data, matching the corresponding clinical data according to sample names, namely survival time data, deleting samples with missing survival time, and standardizing the gene data into FPKM (fragments Per Kilobase Million) format data for subsequent processing after sequencing to obtain raw counts values;
(2) screening samples according to the prescribed conditions of clinical data: selecting samples with the survival states of death and survival and the survival time of more than two years in clinical data; samples that survive but have a survival time of less than two years are discarded here because we cannot determine whether the final survival time of similar samples belongs to the longer group (>3 years) or the shorter group (<3 years).
(3) Screening samples according to the specified conditions of the genetic data:
deleting excessive gene characteristics which cannot be detected to be expressed, and normalizing gene data; the excessive undetected expressed gene characteristics specifically mean that if a certain gene is expressed to be zero in more than 85 percent of samples, the gene is determined not to be detected in most samples, and the characteristics are discarded; the normalization method is to divide the FPKM value of each gene by the maximum value of the gene expression, so that the FPKM value of each gene is between 0 and 1; the gene expression refers to the amount of a functional gene product synthesized by measuring genetic information from a gene, and is data in FPKM format;
then deleting the non-primary tumor sample, i.e., the non-cancer tissue sample, from the data set, leaving only the cancer tissue sample;
(4) dividing the sample screened in the step (2) and the step (3) into two types of prognosis time more than three years and prognosis time less than or equal to three years according to the prognosis time, and using a relief algorithm to carry out importance ranking on the genes; taking a certain amount of gene data, sequentially using at least two machine learning algorithms to perform cross validation on cancer prognosis by gradually increasing gene characteristic numbers, and selecting an optimal model and a gene characteristic number by result comparison, wherein the gene characteristic number is the number of the gene data.
Preferably, in step (4), the importance ranks are: the relief algorithm is trained to generate a corresponding weight for each feature, namely a gene, wherein the higher the weight is, the more important the contribution of the gene to distinguishing two groups of samples (the prognosis time is more than three years and the prognosis time is less than or equal to three years), and the higher the ranking is.
Preferably, in the step (4), the number of the gene data is selected to be at least one.
Preferably, in the step (4), gene data are taken, 8 machine learning algorithm models are sequentially used for cross validation of cancer prognosis by gradually increasing gene feature numbers, and the 8 machine learning algorithm models are respectively a support vector machine, a random forest, a Logistic regression, naive Bayes, a linear regression, a support vector regression-polynomial kernel function, a support vector regression-linear kernel function and a ridge regression;
respectively training 8 algorithm models, recording results, and when each algorithm model is trained, firstly taking 1 gene data for training, then taking two gene data for training, and sequentially increasing the number of the gene data for training; obtaining and recording the accuracy rate through the training of the algorithm model each time, wherein the accuracy rate is the ratio of the prognosis time obtained by the algorithm model to the survival time recorded by the actual clinical data, and the number of samples with accurate prognosis to the total number of samples; recording the number of the selected gene data corresponding to the highest accuracy under each algorithm model; and comparing the results of the 8 algorithm models, and selecting the algorithm model with the highest accuracy and the number of the selected gene data corresponding to the algorithm model with the highest accuracy.
Preferably, in the step (4), ten-fold cross validation with an english name of 10-fold cross-validation is adopted for each training, and is used for testing the accuracy of the algorithm model.
A prognosis prediction system based on gene big data comprises a data preprocessing module, a screening module and a training verification module, wherein the data preprocessing module is used for downloading data from a public database TCGA (TCGA) and standardizing the data into data in an FPKM (flexible flat panel display) format, and the data comprises gene data and clinical data; the screening module is used for screening the data according to two types of conditions, wherein the two types of conditions are respectively specified conditions of clinical data and specified conditions of gene data; the training verification module comprises at least two algorithm models, and is used for classifying the samples screened by the screening module again, ranking the importance of the genes by using a relief algorithm, training the input data of the different algorithm models respectively, comparing the results of the different algorithm models by the training verification module, and selecting the algorithm model with the highest accuracy and the number of the selected gene data corresponding to the algorithm model with the highest accuracy.
After the main pathogenic genes and the model of a certain cancer are determined, new gene data can be directly introduced into a trained model for prediction, clinical data can be judged, and reference is provided.
The invention has the beneficial effects that:
the invention provides a method for modeling cancer patient prognosis based on combination of a feature importance ranking algorithm and a plurality of classification fitting models. The method is based on the ordering of the importance of certain gene characteristics in two groups with larger difference in differentiated survival time (3 years group and 3 years group) and then is combined with different machine learning models for screening, so that not only can the accurate prediction of different cancer prognosis time be realized, but also supplement and support can be provided for the discovery of oncogenes of different cancers and key genes influencing prognosis.
Drawings
FIG. 1 is a schematic diagram of a data processing flow;
fig. 2 is an overall flowchart.
Detailed Description
The present invention will be further described by way of examples, but not limited thereto, with reference to the accompanying drawings.
Example 1:
a prognosis prediction method based on gene big data comprises the following steps:
(1) collecting and fusing data;
collecting fresh or frozen cancer tissue samples of patients, sequencing to obtain gene data, and obtaining the survival time and the clinical data of the survival state of the patients according to follow-up visit investigation; the present embodiment uses a common data set: taking lung adenocarcinoma as an example, lung adenocarcinoma LUAD related data https:// portal.gdc.cancer.gov/, including genetic data and clinical data, are downloaded from a public database TCGA;
fusing gene data with clinical data, matching the corresponding clinical data according to the name of the sample, namely survival time data, deleting the sample with missing survival time, and standardizing the gene data into FPKM (fragments Per Kilost Million) format data for subsequent processing after sequencing the gene data to obtain raw counts value.
(2) Screening samples according to the prescribed conditions of clinical data:
selecting samples with the survival states of death and survival and the survival time of more than two years in clinical data; samples that survive but have a survival time of less than two years are discarded here because we cannot determine whether the final survival time of similar samples belongs to the longer group (>3 years) or the shorter group (<3 years).
(3) Screening samples according to the specified conditions of the genetic data:
deleting excessive gene characteristics which cannot be detected to be expressed, and normalizing gene data; the excessive undetected expressed gene characteristics specifically mean that if a certain gene is expressed to be zero in more than 85 percent of samples, the gene is determined not to be detected in most samples, and the characteristics are discarded; the normalization method is to divide the FPKM value of each gene by the maximum value of the gene expression, so that the FPKM value of each gene is between 0 and 1; the gene expression refers to the amount of a functional gene product synthesized by measuring genetic information from a gene, and is data in FPKM format;
then deleting the non-primary tumor sample, i.e., the non-cancer tissue sample, from the data set, leaving only the cancer tissue sample;
(4) dividing the sample screened in the step (2) and the step (3) into two types of samples with prognosis time more than three years and with prognosis time less than or equal to three years according to the prognosis time;
genes were ranked for importance using relief algorithm: the relief algorithm is trained to generate a corresponding weight for each feature, namely a gene, wherein the higher the weight is, the more important the contribution of the gene to distinguishing two groups of samples (the prognosis time is more than three years and the prognosis time is less than or equal to three years), and the higher the ranking is.
And sequentially using 8 machine learning algorithm models to perform cross validation on the prognosis of the cancer by gradually increasing the gene feature number, wherein the 8 machine learning algorithm models are respectively a support vector machine, a random forest, a Logistic regression, a naive Bayes, a linear regression, a support vector regression-polynomial kernel function, a support vector regression-linear kernel function and a ridge regression, and the algorithm models are all the existing models.
Respectively training 8 algorithm models, recording results, and when each algorithm model is trained, firstly taking 1 gene data for training, then taking two gene data for training, and sequentially increasing the number of the gene data to 200 for training; obtaining and recording the accuracy rate through the training of the algorithm model each time, wherein the accuracy rate is the ratio of the prognosis time obtained by the algorithm model to the survival time recorded by the actual clinical data, and the number of samples with accurate prognosis to the total number of samples; recording the number of the selected gene data corresponding to the highest accuracy under each algorithm model; and comparing the results of the 8 algorithm models, and selecting the algorithm model with the highest accuracy and the number of the selected gene data corresponding to the algorithm model with the highest accuracy.
The method comprises the specific implementation steps of dividing a data set into ten parts, taking 9 parts as training data and 1 part as test data in turn, and carrying out a test.
Example 2:
a prognosis prediction system based on gene big data comprises a data preprocessing module, a screening module and a training verification module, wherein the data preprocessing module is used for downloading data from a public database TCGA (TCGA) and standardizing the data into data in an FPKM (flexible flat panel display) format, and the data comprises gene data and clinical data; the screening module is used for screening the data according to two types of conditions, wherein the two types of conditions are respectively specified conditions of clinical data and specified conditions of gene data; the training verification module comprises at least two algorithm models, and is used for classifying the samples screened by the screening module again, ranking the importance of the genes by using a relief algorithm, training the input data of the different algorithm models respectively, comparing the results of the different algorithm models by the training verification module, and selecting the algorithm model with the highest accuracy and the number of the selected gene data corresponding to the algorithm model with the highest accuracy.
The number of the prognosis optimal models of different cancer genes and the corresponding gene data can be selected through the training result of the algorithm model. For a new sample, sequencing can be carried out to obtain a gene expression value, then corresponding gene characteristics are selected according to the determined optimal gene characteristic number, and a trained model is used for prediction.

Claims (6)

1. A prognosis prediction method based on gene big data is characterized by comprising the following steps:
(1) collecting and fusing data; collecting fresh or frozen cancer tissue samples of patients, sequencing to obtain gene data, and obtaining the survival time and the clinical data of the survival state of the patients according to follow-up visit investigation; fusing gene data and clinical data, matching corresponding clinical data according to sample names, deleting samples with missing life time, and standardizing the gene data into FPKM format data for subsequent processing after sequencing to obtain raw counts numerical values;
(2) screening samples according to the prescribed conditions of clinical data: selecting samples with the survival states of death and survival and the survival time of more than two years in clinical data;
(3) screening samples according to the specified conditions of the genetic data:
deleting excessive gene characteristics which cannot be detected to be expressed, and normalizing gene data; the excessive undetected expression gene characteristic means that if a certain gene is expressed to be zero in more than 85 percent of samples, the gene is determined to be undetected in most samples; the normalization method is to divide the FPKM value of each gene by the maximum value of the gene expression, so that the FPKM value of each gene is between 0 and 1; the gene expression refers to the amount of a functional gene product synthesized by measuring genetic information from a gene, and is data in FPKM format;
then deleting the non-cancer tissue sample from the data set, and only keeping the cancer tissue sample;
(4) dividing the sample screened in the step (2) and the step (3) into two types of prognosis time more than three years and prognosis time less than or equal to three years according to the prognosis time, and using a relief algorithm to carry out importance ranking on the genes; taking a certain amount of gene data, sequentially using at least two machine learning algorithms to perform cross validation on cancer prognosis by gradually increasing gene characteristic numbers, and selecting an optimal model and a gene characteristic number by result comparison, wherein the gene characteristic number is the number of the gene data.
2. The method for prognosis prediction based on gene big data according to claim 1, wherein in the step (4), the importance ranks are as follows: the relief algorithm is trained to generate a corresponding weight for each gene, and the higher the weight is, the more the contribution of the gene to distinguishing two groups of samples is, the more important the gene is, and the higher the ranking is.
3. The method according to claim 1, wherein in the step (4), at least one gene data is selected.
4. The prognosis prediction method based on gene big data as claimed in claim 1, wherein in step (4), the gene data is taken, 8 machine learning algorithm models are sequentially used for cross validation of cancer prognosis by gradually increasing the number of gene features, and the 8 machine learning algorithm models are respectively support vector machine, random forest, Logistic regression, naive Bayes, linear regression, support vector regression-polynomial kernel function, support vector regression-linear kernel function and ridge regression;
respectively training 8 algorithm models, recording results, and when each algorithm model is trained, firstly taking 1 gene data for training, then taking two gene data for training, and sequentially increasing the number of the gene data for training; obtaining and recording the accuracy rate through the training of the algorithm model each time, wherein the accuracy rate is the ratio of the prognosis time obtained by the algorithm model to the survival time recorded by the actual clinical data, and the number of samples with accurate prognosis to the total number of samples; recording the number of the selected gene data corresponding to the highest accuracy under each algorithm model; and comparing the results of the 8 algorithm models, and selecting the algorithm model with the highest accuracy and the number of the selected gene data corresponding to the algorithm model with the highest accuracy.
5. The method for prognosis prediction based on gene big data as claimed in claim 4, wherein in step (4), each training is performed by ten-fold cross validation, which is used to test the accuracy of the algorithm model, and the specific implementation step is dividing the data set into ten parts, and taking 9 parts as training data and 1 part as test data in turn to perform the test.
6. The prognosis prediction system based on gene big data is characterized by comprising a data preprocessing module, a screening module and a training verification module, wherein the data preprocessing module is used for downloading data from a public database TCGA (TCGA) and standardizing the data into data in FPKM (fast Fourier transform and genetic Algorithm) format, and the data comprises gene data and clinical data; the screening module is used for screening the data according to two types of conditions, wherein the two types of conditions are respectively specified conditions of clinical data and specified conditions of gene data; the training verification module comprises at least two algorithm models, and is used for classifying the samples screened by the screening module again, ranking the importance of the genes by using a relief algorithm, training the input data of the different algorithm models respectively, comparing the results of the different algorithm models by the training verification module, and selecting the algorithm model with the highest accuracy and the number of the selected gene data corresponding to the algorithm model with the highest accuracy.
CN201911256723.7A 2019-12-10 2019-12-10 Prognosis prediction method and prediction system based on gene big data Pending CN110942808A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911256723.7A CN110942808A (en) 2019-12-10 2019-12-10 Prognosis prediction method and prediction system based on gene big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911256723.7A CN110942808A (en) 2019-12-10 2019-12-10 Prognosis prediction method and prediction system based on gene big data

Publications (1)

Publication Number Publication Date
CN110942808A true CN110942808A (en) 2020-03-31

Family

ID=69910354

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911256723.7A Pending CN110942808A (en) 2019-12-10 2019-12-10 Prognosis prediction method and prediction system based on gene big data

Country Status (1)

Country Link
CN (1) CN110942808A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111540419A (en) * 2020-04-28 2020-08-14 上海交通大学 Anti-senile dementia drug effectiveness prediction system based on deep learning
CN112820403A (en) * 2021-02-25 2021-05-18 中山大学 Deep learning method for predicting prognosis risk of cancer patient based on multiple groups of mathematical data

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070105142A1 (en) * 2005-10-31 2007-05-10 Scott Wilhelm Methods for prognosis and monitoring cancer therapy
CN103761451A (en) * 2014-01-02 2014-04-30 中国科学院数学与系统科学研究院 Biomarker combination identification method and system based on biomedical big data
CN106407689A (en) * 2016-09-27 2017-02-15 牟合(上海)生物科技有限公司 Stomach cancer prognostic marker screening and classifying method based on gene expression profile
CN107463798A (en) * 2017-08-02 2017-12-12 南京高新生物医药公共服务平台有限公司 Predict the 12 gene expressions classification device and its construction method of adenocarcinoma of colon prognosis
CN107574243A (en) * 2016-06-30 2018-01-12 博奥生物集团有限公司 The construction method of molecular marker, reference gene and its application, detection kit and detection model
CN108130372A (en) * 2018-01-17 2018-06-08 华中科技大学鄂州工业技术研究院 A kind of method and device for the instruction of acute myeloid leukemia drug
CN109136370A (en) * 2018-05-31 2019-01-04 广州表观生物科技有限公司 A kind of prognostic markers object of lung cancer and its application
CN109887600A (en) * 2019-04-16 2019-06-14 上海理工大学 A kind of analysis method of pair of non-small cell lung cancer prognosis Survival
US20190241972A1 (en) * 2017-04-24 2019-08-08 Novomics Co., Ltd. Cluster classification and prognosis prediction system based on biological characteristics of gastric cancer

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070105142A1 (en) * 2005-10-31 2007-05-10 Scott Wilhelm Methods for prognosis and monitoring cancer therapy
CN103761451A (en) * 2014-01-02 2014-04-30 中国科学院数学与系统科学研究院 Biomarker combination identification method and system based on biomedical big data
CN107574243A (en) * 2016-06-30 2018-01-12 博奥生物集团有限公司 The construction method of molecular marker, reference gene and its application, detection kit and detection model
CN106407689A (en) * 2016-09-27 2017-02-15 牟合(上海)生物科技有限公司 Stomach cancer prognostic marker screening and classifying method based on gene expression profile
US20190241972A1 (en) * 2017-04-24 2019-08-08 Novomics Co., Ltd. Cluster classification and prognosis prediction system based on biological characteristics of gastric cancer
CN107463798A (en) * 2017-08-02 2017-12-12 南京高新生物医药公共服务平台有限公司 Predict the 12 gene expressions classification device and its construction method of adenocarcinoma of colon prognosis
CN108130372A (en) * 2018-01-17 2018-06-08 华中科技大学鄂州工业技术研究院 A kind of method and device for the instruction of acute myeloid leukemia drug
CN109136370A (en) * 2018-05-31 2019-01-04 广州表观生物科技有限公司 A kind of prognostic markers object of lung cancer and its application
CN109887600A (en) * 2019-04-16 2019-06-14 上海理工大学 A kind of analysis method of pair of non-small cell lung cancer prognosis Survival

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WANG HY等: "Cancers Screening in an Asymptomatic Population by Using Multiple Tumour Markers", 《PLOS ONE》 *
张飞: "机器学习算法在非小型细胞肺癌癌症阶段分类上的应用", 《中国优秀硕士学位论文全文数据库:医药卫生科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111540419A (en) * 2020-04-28 2020-08-14 上海交通大学 Anti-senile dementia drug effectiveness prediction system based on deep learning
CN112820403A (en) * 2021-02-25 2021-05-18 中山大学 Deep learning method for predicting prognosis risk of cancer patient based on multiple groups of mathematical data
CN112820403B (en) * 2021-02-25 2024-03-29 中山大学 Deep learning method for predicting prognosis risk of cancer patient based on multiple sets of learning data

Similar Documents

Publication Publication Date Title
Birnbaum et al. Model-assisted cohort selection with bias analysis for generating large-scale cohorts from the EHR for oncology research
Zhang et al. An efficient feature selection strategy based on multiple support vector machine technology with gene expression data
CN108198621B (en) Database data comprehensive diagnosis and treatment decision method based on neural network
CN112635063B (en) Comprehensive lung cancer prognosis prediction model, construction method and device
SG194594A1 (en) Analyzing the expression of biomarkers in cells with clusters
CN108335756B (en) Nasopharyngeal carcinoma database and comprehensive diagnosis and treatment decision method based on database
CN108206056B (en) Nasopharyngeal darcinoma artificial intelligence assists diagnosis and treatment decision-making terminal
CN111440869A (en) DNA methylation marker for predicting primary breast cancer occurrence risk and screening method and application thereof
CN110942808A (en) Prognosis prediction method and prediction system based on gene big data
Sahu et al. Efficient role of machine learning classifiers in the prediction and detection of breast cancer
CN113362894A (en) Method for predicting syndromal cancer driver gene
Zolfaghari et al. Cancer prognosis and diagnosis methods based on ensemble learning
CN110010204B (en) Fusion network and multi-scoring strategy based prognostic biomarker identification method
CN107480441A (en) A kind of modeling method and system of children&#39;s septic shock prognosis prediction based on SVMs
CN108320797B (en) Nasopharyngeal carcinoma database and comprehensive diagnosis and treatment decision method based on database
KR101990430B1 (en) System and method of biomarker identification for cancer recurrence prediction
CN115715416A (en) Medical data inspector based on machine learning
Zhang et al. Deep learning-based methods for classification of microsatellite instability in endometrial cancer from HE-stained pathological images
CN116864011A (en) Colorectal cancer molecular marker identification method and system based on multiple sets of chemical data
CN116680594A (en) Method for improving classification accuracy of thyroid cancer of multiple groups of chemical data by using depth feature selection algorithm
CN116312800A (en) Lung cancer characteristic identification method, device and storage medium based on circulating RNA whole transcriptome sequencing in blood plasma
Zhao et al. Rfe based feature selection improves performance of classifying multiple-causes deaths in colorectal cancer
Rohimat et al. Implementation of Genetic Algorithm-Support Vector Machine on Gene Expression Data in Identification of Non-Small Cell Lung Cancer in Nonsmoking Female
US20220044762A1 (en) Methods of assessing breast cancer using machine learning systems
Irigoien et al. Identification of differentially expressed genes by means of outlier detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200331