CN112382342A - Cancer methylation data classification method based on integrated feature selection - Google Patents

Cancer methylation data classification method based on integrated feature selection Download PDF

Info

Publication number
CN112382342A
CN112382342A CN202011329335.XA CN202011329335A CN112382342A CN 112382342 A CN112382342 A CN 112382342A CN 202011329335 A CN202011329335 A CN 202011329335A CN 112382342 A CN112382342 A CN 112382342A
Authority
CN
China
Prior art keywords
data
feature selection
cancer
samples
methylation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011329335.XA
Other languages
Chinese (zh)
Inventor
潘晓光
田奇
董虎弟
陈智娇
白丽霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanxi Sanyouhe Smart Information Technology Co Ltd
Original Assignee
Shanxi Sanyouhe Smart Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanxi Sanyouhe Smart Information Technology Co Ltd filed Critical Shanxi Sanyouhe Smart Information Technology Co Ltd
Priority to CN202011329335.XA priority Critical patent/CN112382342A/en
Publication of CN112382342A publication Critical patent/CN112382342A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/259Fusion by voting

Abstract

The invention belongs to the technical field of data processing, and particularly relates to a cancer methylation data classification method based on integrated feature selection, which comprises the following steps: inputting a cancer and normal sample dataset of methylation sites, wherein each row in the dataset represents an individual to be tested and is marked as normal or cancer, and each column represents a characteristic site; preprocessing data, and filtering various missing values in the data set; the selection of firm differential methylation sites is realized by an integrated feature selection method; training a multi-classifier model based on the stable differential methylation sites, and voting according to the prediction result of each classifier to obtain a final classification judgment result; and outputting a final classification result. The method can effectively solve the problems of differential site identification of high-flux methylated data and classification of potentially uncertain samples. The invention is used for classification of cancer methylation data.

Description

Cancer methylation data classification method based on integrated feature selection
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a cancer methylation data classification method based on integrated feature selection.
Background
With the development of computers and sequencing technologies, more and more large-scale biological data are generated, and how to mine the value contained in the data is one of important means for further developing precise medical treatment. DNA methylation, a widely studied epigenetic marker, plays a crucial role in tumorigenesis. Advances in high throughput sequencing technologies, such as the Infinium 450K platform, have made it possible to provide genome-scale DNA methylation data with single CpG site resolution. On this basis, how to identify sites with differential expression in normal and cancer samples and thereby differentiate epigenetic differences between cancer and normal persons can enhance the early detection and prevention of cancer in humans. However, there is a strong imbalance between the number of samples and sites (about 1:1000) with respect to the data currently available, which makes it particularly difficult to analyze methylation data on a large scale between cancer patients and normal persons. Currently, methods for differentiating cancer and normal samples based on large-scale methylation data are available, and most methods are based on simple feature preprocessing and a single classifier, so that it is difficult to accurately differentiate cancer from normal samples, and it is difficult to obtain differential methylation sites which are crucial for differentiating cancer from normal samples.
Disclosure of Invention
Aiming at the technical problem that the existing method for distinguishing the cancer from the normal sample based on the large-scale methylation data is difficult to accurately distinguish the cancer from the normal sample, the invention provides the method for classifying the cancer methylation data based on the integrated feature selection, which has high classification accuracy, strong identification capability and high efficiency.
In order to solve the technical problems, the invention adopts the technical scheme that:
a method for classifying cancer methylation data based on integrated feature selection, comprising the steps of:
s1, inputting a cancer of a methylation site and a normal sample data set, wherein each row in the data set represents a tested individual and is marked as normal or cancer, and each column represents a characteristic site;
s2, preprocessing data, and filtering various missing values in the data set;
s3, selecting stable differential methylation sites by an integrated feature selection method;
s4, training a multi-classifier model based on the stable differential methylation sites, and voting according to the prediction result of each classifier to obtain a final classification judgment result;
and S5, outputting the final classification result.
The data preprocessing method in the step S2 includes: comprises the following steps:
s2.1, searching missing values in the data, and filtering columns or features containing the missing values if the missing values exist in the original data;
s2.2, correcting the batch effect of the data without the missing value;
s2.3, filtering out the position point set with the minimum variance, sorting all the positions from large to small according to the variance by calculating the variance of the methylation values of the positions in all the measured samples, and then cutting off the positions around 1/3 which are arranged at the tail end.
In the S2.2, an empirical Bayesian EB method is adopted to eliminate the influence of batch effect.
The integrated feature selection method in the step S3 is as follows: comprises the following steps:
s3.1, introducing sample diversity, wherein the sample diversity is obtained by carrying out multiple random sampling on original data in an equal proportion to obtain different sample subsets, and then applying a feature selection method on the sample subsets to obtain different feature site sets;
s3.2, introducing function diversity, namely obtaining different differential methylation site sets by applying different feature selection methods on the same sample subset;
and S3.3, extracting the two different site sets by adopting a plurality of feature selection methods, obtaining the two feature site subsets by each sample subset, obtaining a feature subset corresponding to each sample subset by taking the union of the two feature site subsets, and finally obtaining the intersection of the feature subsets corresponding to all the sample subsets to obtain a stable different site set.
The method for obtaining the final classification judgment result in S4 includes: comprises the following steps:
s4.1, training logistic regression according to the result of the integrated feature selection method, integrating the output of the logistic regression classifier to the distribution of {0,1} probability through a maximum likelihood function and a sigmoid function, and thus realizing the division of the samples;
s4.2, classifying the samples through a support vector machine, wherein the support vector machine realizes the division of the samples by searching the support vectors in the samples and maximizing the distance between the two types of samples;
s4.3, classifying the samples through a random forest classifier, wherein the random forest classifier gradually realizes the division of the samples according to the value of the characteristic parameters through the structure of a tree;
and S4.4, integrating the prediction results of the three classifiers in a voting mode.
Compared with the prior art, the invention has the following beneficial effects:
the method can effectively solve the problems of differential site identification of high-flux methylated data and classification of potentially uncertain samples. By integrating the feature selection method, robust differential methylation sites in the input methylation data can be efficiently identified, and classification of samples is achieved based on these robust differential methylation sites. Compared with the traditional method based on single feature selection and a single classifier, the method introduces integrated feature selection in the differential locus identification process, can obtain more reliable and more discriminative differential methylation loci, and can effectively improve the classification accuracy of the sample to be evaluated in a voting fusion mode of multiple classifiers.
Drawings
FIG. 1 is a flow chart of the operation of the present invention;
FIG. 2 is a schematic diagram of the main steps of the present invention;
FIG. 3 is a flowchart illustrating an integrated feature selection method according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A method for classifying cancer methylation data based on integrated feature selection, as shown in fig. 1, comprising the steps of:
step 1, taking Infinium 450K platform data as an example, inputting a cancer and normal sample data set containing large-scale methylation sites, wherein a row represents a sample, namely a tested individual, is marked as normal or cancer, and a column represents a characteristic, namely a site;
and 2, after data is input, preprocessing is firstly carried out. The method comprises the following steps that firstly, missing values in data are searched, if the missing values exist in original data, the data are considered to be high in dimensionality, and columns or features containing the missing values are filtered by the data which contain hundreds of thousands of measuring sites; the second step is to correct for batch effects on data that do not contain missing values. Batch effects refer to the fact that in reality, a single measurement sample is limited, and more samples may be measured several days or months apart, and thus systematic "batch effects" or non-biological differences, making samples from different batches not directly comparable, may result in data errors due to variations in such biologically irrelevant factors. Here we use an Empirical Bayesian (EB) method to eliminate the effect of the batch effect. EB methods perform very well in the microarray problem because they can robustly process high dimensional data when the sample size is small. The data processed by the EB method can be used for subsequent computational analysis. The third step is to filter out the set of the bits with the smallest variance. Here, all sites are sorted by variance from large to small by calculating the variance of the methylation value of each list of features or sites in all samples measured, and then the sites around 1/3 that are at the end are discarded. On the one hand, for sites with small variance, they show little difference in both normal and cancer samples and therefore cannot guide the subsequent classification; on the other hand, filtering out the sites with small variance can reduce the dimensionality of the data, thereby saving computing resources in subsequent computational analysis.
Step 3, after completing the above pre-treatment, we achieve the selection of robust differential methylation sites by integrating feature selection methods as shown in fig. 3. The integrated feature selection method realizes stable feature selection from two angles, firstly, sample diversity is introduced, namely, original data is subjected to multiple times of random sampling in an equal proportion to obtain different sample subsets, and then the feature selection method is applied to the sample subsets to obtain different feature site sets; second, we introduce "functional diversity", i.e., by applying different feature selection methods on the same sample subset to obtain different sets of differentially methylated sites. Specifically, we combine cross validation and multi-feature selection methods to realize stable site set extraction, and we firstly use the thought of multi-fold cross validation for reference, evenly divide the preprocessed data into m parts according to the original proportion of normal and cancer samples, use one part as a test set to evaluate the classification performance of the feature selection result, and use the remaining m-1 parts as a training set as input. And then extracting a difference site set by adopting a plurality of feature selection methods. Here we use elastic canonical net (ElasticNet) and Relief feature selection algorithms to achieve the extraction of the set of difference sites. The former combines L1 and L2 regularization methods to realize the filtering of irrelevant features and redundant features, and the latter selects the feature site most relevant to the classification result by giving different weights to the features through the relevance of the features and the classification labels. For each sample subset, we can obtain two feature site subsets, then we take the union of the two to obtain the feature subset corresponding to each sample subset, and finally, the feature subsets corresponding to the m sample subsets are intersected to obtain a stable difference site set. The specific algorithm principle is shown in fig. 2.
Based on the obtained robust differential methylation site set, we can build a classification model to predict whether the location sample belongs to cancer or normal sample. Specifically, according to the result of the integrated feature selection method, logistic regression, a support vector machine and a random forest classifier are trained to realize classification of samples. The logistic regression classifier achieves the partitioning of the samples by maximizing the likelihood function and integrating the output by the sigmoid function to the distribution for {0,1} probabilities. The support vector machine realizes the division of the samples by searching the support vectors in the samples and maximizing the distance between the two types of samples. And the random forest gradually realizes the division of the samples according to the value of the characteristic parameters through the structure of the tree. Since the three classifiers analyze from different aspects of sample properties and obtain partitions of the sample, their decision results for the same sample may not be consistent. Therefore, the prediction results of each classifier are integrated by voting. Taking a training process of a certain time as an example, for a certain sample to be evaluated, assuming that the three classifiers respectively output three judgment results of normal, normal and cancer, according to the voting principle, the prediction result of the sample is finally normal.
And 5, after the construction of various classifiers is completed, predicting the sample attribute by inputting methylation data aiming at an unknown sample to be evaluated.
Although only the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art, and all changes are encompassed in the scope of the present invention.

Claims (5)

1. A method for classifying cancer methylation data based on integrated feature selection, comprising: comprises the following steps:
s1, inputting a cancer of a methylation site and a normal sample data set, wherein each row in the data set represents a tested individual and is marked as normal or cancer, and each column represents a characteristic site;
s2, preprocessing data, and filtering various missing values in the data set;
s3, selecting stable differential methylation sites by an integrated feature selection method;
s4, training a multi-classifier model based on the stable differential methylation sites, and voting according to the prediction result of each classifier to obtain a final classification judgment result;
and S5, outputting the final classification result.
2. The method of claim 1, wherein the integrated feature selection-based classification of cancer methylation data comprises: the data preprocessing method in the step S2 includes: comprises the following steps:
s2.1, searching missing values in the data, and filtering columns or features containing the missing values if the missing values exist in the original data;
s2.2, correcting the batch effect of the data without the missing value;
s2.3, filtering out the position point set with the minimum variance, sorting all the positions from large to small according to the variance by calculating the variance of the methylation values of the positions in all the measured samples, and then cutting off the positions around 1/3 which are arranged at the tail end.
3. The method of claim 2, wherein the integrated feature selection-based classification of cancer methylation data comprises: in the S2.2, an empirical Bayesian EB method is adopted to eliminate the influence of batch effect.
4. The method of claim 1, wherein the integrated feature selection-based classification of cancer methylation data comprises: the integrated feature selection method in the step S3 is as follows: comprises the following steps:
s3.1, introducing sample diversity, wherein the sample diversity is obtained by carrying out multiple random sampling on original data in an equal proportion to obtain different sample subsets, and then applying a feature selection method on the sample subsets to obtain different feature site sets;
s3.2, introducing function diversity, namely obtaining different differential methylation site sets by applying different feature selection methods on the same sample subset;
and S3.3, extracting the two different site sets by adopting a plurality of feature selection methods, obtaining the two feature site subsets by each sample subset, obtaining a feature subset corresponding to each sample subset by taking the union of the two feature site subsets, and finally obtaining the intersection of the feature subsets corresponding to all the sample subsets to obtain a stable different site set.
5. The method of claim 1, wherein the integrated feature selection-based classification of cancer methylation data comprises: the method for obtaining the final classification judgment result in S4 includes: comprises the following steps:
s4.1, training logistic regression according to the result of the integrated feature selection method, integrating the output of the logistic regression classifier to the distribution of {0,1} probability through a maximum likelihood function and a sigmoid function, and thus realizing the division of the samples;
s4.2, classifying the samples through a support vector machine, wherein the support vector machine realizes the division of the samples by searching the support vectors in the samples and maximizing the distance between the two types of samples;
s4.3, classifying the samples through a random forest classifier, wherein the random forest classifier gradually realizes the division of the samples according to the value of the characteristic parameters through the structure of a tree;
and S4.4, integrating the prediction results of the three classifiers in a voting mode.
CN202011329335.XA 2020-11-24 2020-11-24 Cancer methylation data classification method based on integrated feature selection Pending CN112382342A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011329335.XA CN112382342A (en) 2020-11-24 2020-11-24 Cancer methylation data classification method based on integrated feature selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011329335.XA CN112382342A (en) 2020-11-24 2020-11-24 Cancer methylation data classification method based on integrated feature selection

Publications (1)

Publication Number Publication Date
CN112382342A true CN112382342A (en) 2021-02-19

Family

ID=74588999

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011329335.XA Pending CN112382342A (en) 2020-11-24 2020-11-24 Cancer methylation data classification method based on integrated feature selection

Country Status (1)

Country Link
CN (1) CN112382342A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926640A (en) * 2021-02-22 2021-06-08 齐鲁工业大学 Cancer gene classification method and equipment based on two-stage depth feature selection and storage medium
CN117059165A (en) * 2023-07-27 2023-11-14 上海睿璟生物科技有限公司 Differential methylation region selection and screening method, system, terminal and medium based on ensemble learning

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100279879A1 (en) * 2007-09-17 2010-11-04 Koninklijke Philips Electronics N.V. Method for the analysis of breast cancer disorders
WO2015181330A1 (en) * 2014-05-28 2015-12-03 Syvänen Ann-Christine Method for all cancer category determination by means of methylation profiling
CN105550715A (en) * 2016-01-22 2016-05-04 大连理工大学 Affinity propagation clustering-based integrated classifier constructing method
CN107066781A (en) * 2016-11-03 2017-08-18 西南大学 Analysis method based on the related colorectal cancer data model of h and E
CN107247873A (en) * 2017-03-29 2017-10-13 电子科技大学 A kind of recognition methods of differential methylation site
CN109119167A (en) * 2018-07-11 2019-01-01 山东师范大学 Pyemia anticipated mortality system based on integrated model
CN109686414A (en) * 2018-12-28 2019-04-26 陈洪亮 It is only used for the choosing method of the special DNA methylation assay Sites Combination of Hepatocarcinoma screening
CN109685107A (en) * 2018-11-22 2019-04-26 东软集团股份有限公司 Feature selection approach, system, computer readable storage medium and electronic equipment
CN111094590A (en) * 2017-07-12 2020-05-01 大学健康网络 Cancer detection and classification using methylation component analysis
CN111378754A (en) * 2020-04-23 2020-07-07 嘉兴市第一医院 TCGA (TCGA-based genetic algorithm) database-based breast cancer methylation biomarker and screening method thereof
CN111461354A (en) * 2019-12-24 2020-07-28 武汉大学 Machine learning integration classification method and software system for high-dimensional data
CN111863250A (en) * 2020-08-14 2020-10-30 中国科学院大学温州研究院(温州生物材料与工程研究所) Combined diagnosis model and system for early breast cancer

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100279879A1 (en) * 2007-09-17 2010-11-04 Koninklijke Philips Electronics N.V. Method for the analysis of breast cancer disorders
WO2015181330A1 (en) * 2014-05-28 2015-12-03 Syvänen Ann-Christine Method for all cancer category determination by means of methylation profiling
CN105550715A (en) * 2016-01-22 2016-05-04 大连理工大学 Affinity propagation clustering-based integrated classifier constructing method
CN107066781A (en) * 2016-11-03 2017-08-18 西南大学 Analysis method based on the related colorectal cancer data model of h and E
CN107247873A (en) * 2017-03-29 2017-10-13 电子科技大学 A kind of recognition methods of differential methylation site
CN111094590A (en) * 2017-07-12 2020-05-01 大学健康网络 Cancer detection and classification using methylation component analysis
CN109119167A (en) * 2018-07-11 2019-01-01 山东师范大学 Pyemia anticipated mortality system based on integrated model
CN109685107A (en) * 2018-11-22 2019-04-26 东软集团股份有限公司 Feature selection approach, system, computer readable storage medium and electronic equipment
CN109686414A (en) * 2018-12-28 2019-04-26 陈洪亮 It is only used for the choosing method of the special DNA methylation assay Sites Combination of Hepatocarcinoma screening
CN111461354A (en) * 2019-12-24 2020-07-28 武汉大学 Machine learning integration classification method and software system for high-dimensional data
CN111378754A (en) * 2020-04-23 2020-07-07 嘉兴市第一医院 TCGA (TCGA-based genetic algorithm) database-based breast cancer methylation biomarker and screening method thereof
CN111863250A (en) * 2020-08-14 2020-10-30 中国科学院大学温州研究院(温州生物材料与工程研究所) Combined diagnosis model and system for early breast cancer

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KANG LENG CHIEW等: "A new hybrid ensemble feature selection framework for machine learning-based phishing detection system", 《INFORMATION SCIENCES》 *
刘超: "基于DNA甲基化不平衡数据的胃癌分类模型研究", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926640A (en) * 2021-02-22 2021-06-08 齐鲁工业大学 Cancer gene classification method and equipment based on two-stage depth feature selection and storage medium
CN112926640B (en) * 2021-02-22 2023-02-28 齐鲁工业大学 Cancer gene classification method and equipment based on two-stage depth feature selection and storage medium
CN117059165A (en) * 2023-07-27 2023-11-14 上海睿璟生物科技有限公司 Differential methylation region selection and screening method, system, terminal and medium based on ensemble learning

Similar Documents

Publication Publication Date Title
CN109801680B (en) Tumor metastasis and recurrence prediction method and system based on TCGA database
US20020095260A1 (en) Methods for efficiently mining broad data sets for biological markers
US20110246409A1 (en) Data set dimensionality reduction processes and machines
Bhargava et al. DNA barcoding in plants: evolution and applications of in silico approaches and resources
Shukla et al. Identification of potential biomarkers on microarray data using distributed gene selection approach
CN111710364B (en) Method, device, terminal and storage medium for acquiring flora marker
CN112382342A (en) Cancer methylation data classification method based on integrated feature selection
Benso et al. A cDNA microarray gene expression data classifier for clinical diagnostics based on graph theory
Shaker et al. Information retrieval for cancer cell detection based on advanced machine learning techniques
Pyingkodi et al. Informative gene selection for cancer classification with microarray data using a metaheuristic framework
CN110246544B (en) Biomarker selection method and system based on integration analysis
US7272583B2 (en) Using supervised classifiers with unsupervised data
Grabski et al. Bayesian combinatorial MultiStudy factor analysis
CN111863135B (en) False positive structure variation filtering method, storage medium and computing device
CN116864011A (en) Colorectal cancer molecular marker identification method and system based on multiple sets of chemical data
US8140456B2 (en) Method and system of extracting factors using generalized Fisher ratios
CN111105041A (en) Machine learning method and device for intelligent data collision
Bawankar et al. Implementation of ensemble method on dna data using various cross validation techniques
CN110502669A (en) The unsupervised chart dendrography learning method of lightweight and device based on the side N DFS subgraph
Khalilabad et al. Fully automatic classification of breast cancer microarray images
Yoon et al. Direct integration of microarrays for selecting informative genes and phenotype classification
CN105095689A (en) Data mining method of electronic noses based on Wayne prediction
Mohanty et al. Cancer tumor detection using genetic mutated data and machine learning models
CN111383717A (en) Method and system for constructing biological information analysis reference data set
Fouodo et al. Effect of hyperparameters on variable selection in random forests

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210219