CN110942808A - Prognosis prediction method and prediction system based on gene big data - Google Patents
Prognosis prediction method and prediction system based on gene big data Download PDFInfo
- Publication number
- CN110942808A CN110942808A CN201911256723.7A CN201911256723A CN110942808A CN 110942808 A CN110942808 A CN 110942808A CN 201911256723 A CN201911256723 A CN 201911256723A CN 110942808 A CN110942808 A CN 110942808A
- Authority
- CN
- China
- Prior art keywords
- gene
- data
- prognosis
- algorithm
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 122
- 238000004393 prognosis Methods 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 20
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 65
- 238000012549 training Methods 0.000 claims abstract description 35
- 238000010801 machine learning Methods 0.000 claims abstract description 10
- 238000012163 sequencing technique Methods 0.000 claims abstract description 10
- 238000012360 testing method Methods 0.000 claims abstract description 7
- 206010028980 Neoplasm Diseases 0.000 claims description 30
- 230000004083 survival effect Effects 0.000 claims description 30
- 201000011510 cancer Diseases 0.000 claims description 26
- 238000012216 screening Methods 0.000 claims description 21
- 230000014509 gene expression Effects 0.000 claims description 11
- 230000002068 genetic effect Effects 0.000 claims description 9
- 238000012795 verification Methods 0.000 claims description 9
- 238000002790 cross-validation Methods 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 6
- 230000034994 death Effects 0.000 claims description 4
- 231100000517 death Toxicity 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 101150084750 1 gene Proteins 0.000 claims description 3
- 238000011835 investigation Methods 0.000 claims description 3
- 238000012417 linear regression Methods 0.000 claims description 3
- 238000007477 logistic regression Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000007637 random forest analysis Methods 0.000 claims description 3
- 238000012706 support-vector machine Methods 0.000 claims description 3
- 238000013473 artificial intelligence Methods 0.000 abstract description 3
- 201000010099 disease Diseases 0.000 abstract description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 abstract description 2
- 238000011156 evaluation Methods 0.000 abstract 1
- 230000000875 corresponding effect Effects 0.000 description 12
- 238000007481 next generation sequencing Methods 0.000 description 3
- 208000010507 Adenocarcinoma of Lung Diseases 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 201000005249 lung adenocarcinoma Diseases 0.000 description 2
- 230000001717 pathogenic effect Effects 0.000 description 2
- 239000000047 product Substances 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 1
- 108700019961 Neoplasm Genes Proteins 0.000 description 1
- 102000048850 Neoplasm Genes Human genes 0.000 description 1
- 108700020796 Oncogene Proteins 0.000 description 1
- 102000043276 Oncogene Human genes 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 201000005202 lung cancer Diseases 0.000 description 1
- 208000020816 lung neoplasm Diseases 0.000 description 1
- 238000012775 microarray technology Methods 0.000 description 1
- 230000007170 pathology Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Abstract
The invention relates to a prognosis prediction method and a prediction system based on gene big data, which belong to the technical field of artificial intelligence and mainly comprise the following steps: extracting gene information in a tissue sample to form a training set, sequencing the gene importance by using a relief algorithm, performing fitting classification on the prognosis time by using a machine learning algorithm model, and selecting the algorithm model with the highest accuracy and the gene characteristic number as the gene characteristic number of the specific disease and a prediction method. The method can quickly test new gene data after model training is completed, and can help to carry out prognosis evaluation.
Description
Technical Field
The invention relates to a cancer prognosis prediction method and a prediction system based on gene big data, belonging to the technical field of artificial intelligence.
Background
According to annual statistics reported by the american cancer society, 1 out of 4 cancer deaths died from lung cancer. While previous scholars have acquired a large amount of data from microarray technology and Next Generation Sequencing (NGS), the information in these data may not be fully explored. Traditional survival predictions depend on the clinical pathology of the patient and are sometimes inaccurate.
In recent years, with the development of next-generation sequencing technology, large-scale cancer sample gene sequencing data can be obtained, and the development of big data artificial intelligence makes it possible to mine valuable potential information from the massive data. At present, aiming at the problem of cancer prognosis prediction, intuitive clinical characteristics are generally used, and the prediction is carried out by combining a traditional statistical method. Although some studies have shifted the research focus to the level of gene characteristics, the traditional statistical methods are used to select gene characteristics according to the differences of gene expression, and some genes with smaller expression differences but larger influence on prognosis cannot be found. To be more accurate, in the present application, genetic features selected from the above data are correlated with the survival time of the patient, and the correlation between the genes and survival time is determined, resulting in a calibrated predictive model.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a novel cancer prognosis modeling method and a novel prediction system, and relates to a method for predicting and classifying cancer prognosis time based on gene big data and relevant finding of relevant pathogenic gene genes. The method is simple, efficient, and suitable for wide range of different cancers based on gene expression. The method comprises the steps of screening and cleaning sample data, screening and cleaning gene data, sequencing gene importance, training and selecting a model, and finally predicting a new sample. Helps doctors to pre-estimate the disease condition of cancer patients and assist in treatment.
The technical scheme of the invention is as follows:
a prognosis prediction method based on gene big data comprises the following steps:
(1) collecting and fusing data; collecting fresh or frozen cancer tissue samples of patients, sequencing to obtain gene data, and obtaining the survival time and the clinical data of the survival state of the patients according to follow-up visit investigation; fusing gene data with clinical data, matching the corresponding clinical data according to sample names, namely survival time data, deleting samples with missing survival time, and standardizing the gene data into FPKM (fragments Per Kilobase Million) format data for subsequent processing after sequencing to obtain raw counts values;
(2) screening samples according to the prescribed conditions of clinical data: selecting samples with the survival states of death and survival and the survival time of more than two years in clinical data; samples that survive but have a survival time of less than two years are discarded here because we cannot determine whether the final survival time of similar samples belongs to the longer group (>3 years) or the shorter group (<3 years).
(3) Screening samples according to the specified conditions of the genetic data:
deleting excessive gene characteristics which cannot be detected to be expressed, and normalizing gene data; the excessive undetected expressed gene characteristics specifically mean that if a certain gene is expressed to be zero in more than 85 percent of samples, the gene is determined not to be detected in most samples, and the characteristics are discarded; the normalization method is to divide the FPKM value of each gene by the maximum value of the gene expression, so that the FPKM value of each gene is between 0 and 1; the gene expression refers to the amount of a functional gene product synthesized by measuring genetic information from a gene, and is data in FPKM format;
then deleting the non-primary tumor sample, i.e., the non-cancer tissue sample, from the data set, leaving only the cancer tissue sample;
(4) dividing the sample screened in the step (2) and the step (3) into two types of prognosis time more than three years and prognosis time less than or equal to three years according to the prognosis time, and using a relief algorithm to carry out importance ranking on the genes; taking a certain amount of gene data, sequentially using at least two machine learning algorithms to perform cross validation on cancer prognosis by gradually increasing gene characteristic numbers, and selecting an optimal model and a gene characteristic number by result comparison, wherein the gene characteristic number is the number of the gene data.
Preferably, in step (4), the importance ranks are: the relief algorithm is trained to generate a corresponding weight for each feature, namely a gene, wherein the higher the weight is, the more important the contribution of the gene to distinguishing two groups of samples (the prognosis time is more than three years and the prognosis time is less than or equal to three years), and the higher the ranking is.
Preferably, in the step (4), the number of the gene data is selected to be at least one.
Preferably, in the step (4), gene data are taken, 8 machine learning algorithm models are sequentially used for cross validation of cancer prognosis by gradually increasing gene feature numbers, and the 8 machine learning algorithm models are respectively a support vector machine, a random forest, a Logistic regression, naive Bayes, a linear regression, a support vector regression-polynomial kernel function, a support vector regression-linear kernel function and a ridge regression;
respectively training 8 algorithm models, recording results, and when each algorithm model is trained, firstly taking 1 gene data for training, then taking two gene data for training, and sequentially increasing the number of the gene data for training; obtaining and recording the accuracy rate through the training of the algorithm model each time, wherein the accuracy rate is the ratio of the prognosis time obtained by the algorithm model to the survival time recorded by the actual clinical data, and the number of samples with accurate prognosis to the total number of samples; recording the number of the selected gene data corresponding to the highest accuracy under each algorithm model; and comparing the results of the 8 algorithm models, and selecting the algorithm model with the highest accuracy and the number of the selected gene data corresponding to the algorithm model with the highest accuracy.
Preferably, in the step (4), ten-fold cross validation with an english name of 10-fold cross-validation is adopted for each training, and is used for testing the accuracy of the algorithm model.
A prognosis prediction system based on gene big data comprises a data preprocessing module, a screening module and a training verification module, wherein the data preprocessing module is used for downloading data from a public database TCGA (TCGA) and standardizing the data into data in an FPKM (flexible flat panel display) format, and the data comprises gene data and clinical data; the screening module is used for screening the data according to two types of conditions, wherein the two types of conditions are respectively specified conditions of clinical data and specified conditions of gene data; the training verification module comprises at least two algorithm models, and is used for classifying the samples screened by the screening module again, ranking the importance of the genes by using a relief algorithm, training the input data of the different algorithm models respectively, comparing the results of the different algorithm models by the training verification module, and selecting the algorithm model with the highest accuracy and the number of the selected gene data corresponding to the algorithm model with the highest accuracy.
After the main pathogenic genes and the model of a certain cancer are determined, new gene data can be directly introduced into a trained model for prediction, clinical data can be judged, and reference is provided.
The invention has the beneficial effects that:
the invention provides a method for modeling cancer patient prognosis based on combination of a feature importance ranking algorithm and a plurality of classification fitting models. The method is based on the ordering of the importance of certain gene characteristics in two groups with larger difference in differentiated survival time (3 years group and 3 years group) and then is combined with different machine learning models for screening, so that not only can the accurate prediction of different cancer prognosis time be realized, but also supplement and support can be provided for the discovery of oncogenes of different cancers and key genes influencing prognosis.
Drawings
FIG. 1 is a schematic diagram of a data processing flow;
fig. 2 is an overall flowchart.
Detailed Description
The present invention will be further described by way of examples, but not limited thereto, with reference to the accompanying drawings.
Example 1:
a prognosis prediction method based on gene big data comprises the following steps:
(1) collecting and fusing data;
collecting fresh or frozen cancer tissue samples of patients, sequencing to obtain gene data, and obtaining the survival time and the clinical data of the survival state of the patients according to follow-up visit investigation; the present embodiment uses a common data set: taking lung adenocarcinoma as an example, lung adenocarcinoma LUAD related data https:// portal.gdc.cancer.gov/, including genetic data and clinical data, are downloaded from a public database TCGA;
fusing gene data with clinical data, matching the corresponding clinical data according to the name of the sample, namely survival time data, deleting the sample with missing survival time, and standardizing the gene data into FPKM (fragments Per Kilost Million) format data for subsequent processing after sequencing the gene data to obtain raw counts value.
(2) Screening samples according to the prescribed conditions of clinical data:
selecting samples with the survival states of death and survival and the survival time of more than two years in clinical data; samples that survive but have a survival time of less than two years are discarded here because we cannot determine whether the final survival time of similar samples belongs to the longer group (>3 years) or the shorter group (<3 years).
(3) Screening samples according to the specified conditions of the genetic data:
deleting excessive gene characteristics which cannot be detected to be expressed, and normalizing gene data; the excessive undetected expressed gene characteristics specifically mean that if a certain gene is expressed to be zero in more than 85 percent of samples, the gene is determined not to be detected in most samples, and the characteristics are discarded; the normalization method is to divide the FPKM value of each gene by the maximum value of the gene expression, so that the FPKM value of each gene is between 0 and 1; the gene expression refers to the amount of a functional gene product synthesized by measuring genetic information from a gene, and is data in FPKM format;
then deleting the non-primary tumor sample, i.e., the non-cancer tissue sample, from the data set, leaving only the cancer tissue sample;
(4) dividing the sample screened in the step (2) and the step (3) into two types of samples with prognosis time more than three years and with prognosis time less than or equal to three years according to the prognosis time;
genes were ranked for importance using relief algorithm: the relief algorithm is trained to generate a corresponding weight for each feature, namely a gene, wherein the higher the weight is, the more important the contribution of the gene to distinguishing two groups of samples (the prognosis time is more than three years and the prognosis time is less than or equal to three years), and the higher the ranking is.
And sequentially using 8 machine learning algorithm models to perform cross validation on the prognosis of the cancer by gradually increasing the gene feature number, wherein the 8 machine learning algorithm models are respectively a support vector machine, a random forest, a Logistic regression, a naive Bayes, a linear regression, a support vector regression-polynomial kernel function, a support vector regression-linear kernel function and a ridge regression, and the algorithm models are all the existing models.
Respectively training 8 algorithm models, recording results, and when each algorithm model is trained, firstly taking 1 gene data for training, then taking two gene data for training, and sequentially increasing the number of the gene data to 200 for training; obtaining and recording the accuracy rate through the training of the algorithm model each time, wherein the accuracy rate is the ratio of the prognosis time obtained by the algorithm model to the survival time recorded by the actual clinical data, and the number of samples with accurate prognosis to the total number of samples; recording the number of the selected gene data corresponding to the highest accuracy under each algorithm model; and comparing the results of the 8 algorithm models, and selecting the algorithm model with the highest accuracy and the number of the selected gene data corresponding to the algorithm model with the highest accuracy.
The method comprises the specific implementation steps of dividing a data set into ten parts, taking 9 parts as training data and 1 part as test data in turn, and carrying out a test.
Example 2:
a prognosis prediction system based on gene big data comprises a data preprocessing module, a screening module and a training verification module, wherein the data preprocessing module is used for downloading data from a public database TCGA (TCGA) and standardizing the data into data in an FPKM (flexible flat panel display) format, and the data comprises gene data and clinical data; the screening module is used for screening the data according to two types of conditions, wherein the two types of conditions are respectively specified conditions of clinical data and specified conditions of gene data; the training verification module comprises at least two algorithm models, and is used for classifying the samples screened by the screening module again, ranking the importance of the genes by using a relief algorithm, training the input data of the different algorithm models respectively, comparing the results of the different algorithm models by the training verification module, and selecting the algorithm model with the highest accuracy and the number of the selected gene data corresponding to the algorithm model with the highest accuracy.
The number of the prognosis optimal models of different cancer genes and the corresponding gene data can be selected through the training result of the algorithm model. For a new sample, sequencing can be carried out to obtain a gene expression value, then corresponding gene characteristics are selected according to the determined optimal gene characteristic number, and a trained model is used for prediction.
Claims (6)
1. A prognosis prediction method based on gene big data is characterized by comprising the following steps:
(1) collecting and fusing data; collecting fresh or frozen cancer tissue samples of patients, sequencing to obtain gene data, and obtaining the survival time and the clinical data of the survival state of the patients according to follow-up visit investigation; fusing gene data and clinical data, matching corresponding clinical data according to sample names, deleting samples with missing life time, and standardizing the gene data into FPKM format data for subsequent processing after sequencing to obtain raw counts numerical values;
(2) screening samples according to the prescribed conditions of clinical data: selecting samples with the survival states of death and survival and the survival time of more than two years in clinical data;
(3) screening samples according to the specified conditions of the genetic data:
deleting excessive gene characteristics which cannot be detected to be expressed, and normalizing gene data; the excessive undetected expression gene characteristic means that if a certain gene is expressed to be zero in more than 85 percent of samples, the gene is determined to be undetected in most samples; the normalization method is to divide the FPKM value of each gene by the maximum value of the gene expression, so that the FPKM value of each gene is between 0 and 1; the gene expression refers to the amount of a functional gene product synthesized by measuring genetic information from a gene, and is data in FPKM format;
then deleting the non-cancer tissue sample from the data set, and only keeping the cancer tissue sample;
(4) dividing the sample screened in the step (2) and the step (3) into two types of prognosis time more than three years and prognosis time less than or equal to three years according to the prognosis time, and using a relief algorithm to carry out importance ranking on the genes; taking a certain amount of gene data, sequentially using at least two machine learning algorithms to perform cross validation on cancer prognosis by gradually increasing gene characteristic numbers, and selecting an optimal model and a gene characteristic number by result comparison, wherein the gene characteristic number is the number of the gene data.
2. The method for prognosis prediction based on gene big data according to claim 1, wherein in the step (4), the importance ranks are as follows: the relief algorithm is trained to generate a corresponding weight for each gene, and the higher the weight is, the more the contribution of the gene to distinguishing two groups of samples is, the more important the gene is, and the higher the ranking is.
3. The method according to claim 1, wherein in the step (4), at least one gene data is selected.
4. The prognosis prediction method based on gene big data as claimed in claim 1, wherein in step (4), the gene data is taken, 8 machine learning algorithm models are sequentially used for cross validation of cancer prognosis by gradually increasing the number of gene features, and the 8 machine learning algorithm models are respectively support vector machine, random forest, Logistic regression, naive Bayes, linear regression, support vector regression-polynomial kernel function, support vector regression-linear kernel function and ridge regression;
respectively training 8 algorithm models, recording results, and when each algorithm model is trained, firstly taking 1 gene data for training, then taking two gene data for training, and sequentially increasing the number of the gene data for training; obtaining and recording the accuracy rate through the training of the algorithm model each time, wherein the accuracy rate is the ratio of the prognosis time obtained by the algorithm model to the survival time recorded by the actual clinical data, and the number of samples with accurate prognosis to the total number of samples; recording the number of the selected gene data corresponding to the highest accuracy under each algorithm model; and comparing the results of the 8 algorithm models, and selecting the algorithm model with the highest accuracy and the number of the selected gene data corresponding to the algorithm model with the highest accuracy.
5. The method for prognosis prediction based on gene big data as claimed in claim 4, wherein in step (4), each training is performed by ten-fold cross validation, which is used to test the accuracy of the algorithm model, and the specific implementation step is dividing the data set into ten parts, and taking 9 parts as training data and 1 part as test data in turn to perform the test.
6. The prognosis prediction system based on gene big data is characterized by comprising a data preprocessing module, a screening module and a training verification module, wherein the data preprocessing module is used for downloading data from a public database TCGA (TCGA) and standardizing the data into data in FPKM (fast Fourier transform and genetic Algorithm) format, and the data comprises gene data and clinical data; the screening module is used for screening the data according to two types of conditions, wherein the two types of conditions are respectively specified conditions of clinical data and specified conditions of gene data; the training verification module comprises at least two algorithm models, and is used for classifying the samples screened by the screening module again, ranking the importance of the genes by using a relief algorithm, training the input data of the different algorithm models respectively, comparing the results of the different algorithm models by the training verification module, and selecting the algorithm model with the highest accuracy and the number of the selected gene data corresponding to the algorithm model with the highest accuracy.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911256723.7A CN110942808A (en) | 2019-12-10 | 2019-12-10 | Prognosis prediction method and prediction system based on gene big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911256723.7A CN110942808A (en) | 2019-12-10 | 2019-12-10 | Prognosis prediction method and prediction system based on gene big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110942808A true CN110942808A (en) | 2020-03-31 |
Family
ID=69910354
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911256723.7A Pending CN110942808A (en) | 2019-12-10 | 2019-12-10 | Prognosis prediction method and prediction system based on gene big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110942808A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111540419A (en) * | 2020-04-28 | 2020-08-14 | 上海交通大学 | Anti-senile dementia drug effectiveness prediction system based on deep learning |
CN112820403A (en) * | 2021-02-25 | 2021-05-18 | 中山大学 | Deep learning method for predicting prognosis risk of cancer patient based on multiple groups of mathematical data |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070105142A1 (en) * | 2005-10-31 | 2007-05-10 | Scott Wilhelm | Methods for prognosis and monitoring cancer therapy |
CN103761451A (en) * | 2014-01-02 | 2014-04-30 | 中国科学院数学与系统科学研究院 | Biomarker combination identification method and system based on biomedical big data |
CN106407689A (en) * | 2016-09-27 | 2017-02-15 | 牟合(上海)生物科技有限公司 | Stomach cancer prognostic marker screening and classifying method based on gene expression profile |
CN107463798A (en) * | 2017-08-02 | 2017-12-12 | 南京高新生物医药公共服务平台有限公司 | Predict the 12 gene expressions classification device and its construction method of adenocarcinoma of colon prognosis |
CN107574243A (en) * | 2016-06-30 | 2018-01-12 | 博奥生物集团有限公司 | The construction method of molecular marker, reference gene and its application, detection kit and detection model |
CN108130372A (en) * | 2018-01-17 | 2018-06-08 | 华中科技大学鄂州工业技术研究院 | A kind of method and device for the instruction of acute myeloid leukemia drug |
CN109136370A (en) * | 2018-05-31 | 2019-01-04 | 广州表观生物科技有限公司 | A kind of prognostic markers object of lung cancer and its application |
CN109887600A (en) * | 2019-04-16 | 2019-06-14 | 上海理工大学 | A kind of analysis method of pair of non-small cell lung cancer prognosis Survival |
US20190241972A1 (en) * | 2017-04-24 | 2019-08-08 | Novomics Co., Ltd. | Cluster classification and prognosis prediction system based on biological characteristics of gastric cancer |
-
2019
- 2019-12-10 CN CN201911256723.7A patent/CN110942808A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070105142A1 (en) * | 2005-10-31 | 2007-05-10 | Scott Wilhelm | Methods for prognosis and monitoring cancer therapy |
CN103761451A (en) * | 2014-01-02 | 2014-04-30 | 中国科学院数学与系统科学研究院 | Biomarker combination identification method and system based on biomedical big data |
CN107574243A (en) * | 2016-06-30 | 2018-01-12 | 博奥生物集团有限公司 | The construction method of molecular marker, reference gene and its application, detection kit and detection model |
CN106407689A (en) * | 2016-09-27 | 2017-02-15 | 牟合(上海)生物科技有限公司 | Stomach cancer prognostic marker screening and classifying method based on gene expression profile |
US20190241972A1 (en) * | 2017-04-24 | 2019-08-08 | Novomics Co., Ltd. | Cluster classification and prognosis prediction system based on biological characteristics of gastric cancer |
CN107463798A (en) * | 2017-08-02 | 2017-12-12 | 南京高新生物医药公共服务平台有限公司 | Predict the 12 gene expressions classification device and its construction method of adenocarcinoma of colon prognosis |
CN108130372A (en) * | 2018-01-17 | 2018-06-08 | 华中科技大学鄂州工业技术研究院 | A kind of method and device for the instruction of acute myeloid leukemia drug |
CN109136370A (en) * | 2018-05-31 | 2019-01-04 | 广州表观生物科技有限公司 | A kind of prognostic markers object of lung cancer and its application |
CN109887600A (en) * | 2019-04-16 | 2019-06-14 | 上海理工大学 | A kind of analysis method of pair of non-small cell lung cancer prognosis Survival |
Non-Patent Citations (2)
Title |
---|
WANG HY等: "Cancers Screening in an Asymptomatic Population by Using Multiple Tumour Markers", 《PLOS ONE》 * |
张飞: "机器学习算法在非小型细胞肺癌癌症阶段分类上的应用", 《中国优秀硕士学位论文全文数据库:医药卫生科技辑》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111540419A (en) * | 2020-04-28 | 2020-08-14 | 上海交通大学 | Anti-senile dementia drug effectiveness prediction system based on deep learning |
CN112820403A (en) * | 2021-02-25 | 2021-05-18 | 中山大学 | Deep learning method for predicting prognosis risk of cancer patient based on multiple groups of mathematical data |
CN112820403B (en) * | 2021-02-25 | 2024-03-29 | 中山大学 | Deep learning method for predicting prognosis risk of cancer patient based on multiple sets of learning data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Birnbaum et al. | Model-assisted cohort selection with bias analysis for generating large-scale cohorts from the EHR for oncology research | |
Zhang et al. | An efficient feature selection strategy based on multiple support vector machine technology with gene expression data | |
CN108198621B (en) | Database data comprehensive diagnosis and treatment decision method based on neural network | |
CN112635063B (en) | Comprehensive lung cancer prognosis prediction model, construction method and device | |
SG194594A1 (en) | Analyzing the expression of biomarkers in cells with clusters | |
CN108335756B (en) | Nasopharyngeal carcinoma database and comprehensive diagnosis and treatment decision method based on database | |
CN108206056B (en) | Nasopharyngeal darcinoma artificial intelligence assists diagnosis and treatment decision-making terminal | |
CN111440869A (en) | DNA methylation marker for predicting primary breast cancer occurrence risk and screening method and application thereof | |
CN110942808A (en) | Prognosis prediction method and prediction system based on gene big data | |
Sahu et al. | Efficient role of machine learning classifiers in the prediction and detection of breast cancer | |
CN113362894A (en) | Method for predicting syndromal cancer driver gene | |
Zolfaghari et al. | Cancer prognosis and diagnosis methods based on ensemble learning | |
CN110010204B (en) | Fusion network and multi-scoring strategy based prognostic biomarker identification method | |
CN107480441A (en) | A kind of modeling method and system of children's septic shock prognosis prediction based on SVMs | |
CN108320797B (en) | Nasopharyngeal carcinoma database and comprehensive diagnosis and treatment decision method based on database | |
KR101990430B1 (en) | System and method of biomarker identification for cancer recurrence prediction | |
CN115715416A (en) | Medical data inspector based on machine learning | |
Zhang et al. | Deep learning-based methods for classification of microsatellite instability in endometrial cancer from HE-stained pathological images | |
CN116864011A (en) | Colorectal cancer molecular marker identification method and system based on multiple sets of chemical data | |
CN116680594A (en) | Method for improving classification accuracy of thyroid cancer of multiple groups of chemical data by using depth feature selection algorithm | |
CN116312800A (en) | Lung cancer characteristic identification method, device and storage medium based on circulating RNA whole transcriptome sequencing in blood plasma | |
Zhao et al. | Rfe based feature selection improves performance of classifying multiple-causes deaths in colorectal cancer | |
Rohimat et al. | Implementation of Genetic Algorithm-Support Vector Machine on Gene Expression Data in Identification of Non-Small Cell Lung Cancer in Nonsmoking Female | |
US20220044762A1 (en) | Methods of assessing breast cancer using machine learning systems | |
Irigoien et al. | Identification of differentially expressed genes by means of outlier detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200331 |