CN115662504A - Multi-angle fusion-based biological omics data analysis method - Google Patents
Multi-angle fusion-based biological omics data analysis method Download PDFInfo
- Publication number
- CN115662504A CN115662504A CN202211361898.6A CN202211361898A CN115662504A CN 115662504 A CN115662504 A CN 115662504A CN 202211361898 A CN202211361898 A CN 202211361898A CN 115662504 A CN115662504 A CN 115662504A
- Authority
- CN
- China
- Prior art keywords
- feature
- scaled
- spearman
- subset
- correlation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A multi-angle fusion-based biological omics data analysis method is used for systematically analyzing the association of genomics, metabonomics and other omics data and diseases from multiple angles, constructing a plurality of characteristic subspaces rich in biological information and ensuring the information richness. In order to solve the influence of small sample size and high dimensionality of the biological omics data on the effectiveness of the analysis method, considering the diversity of the relationship among the characteristics of each component in a living body, from the perspective of multi-angle fusion, three characteristic subspaces which are representative and rich in biological information are constructed from different angles by using a characteristic selection method from three different angles, and a fusion classification model is established on the basis of the characteristic subspace to carry out data analysis. The results of the public data sets based on a plurality of different omics show that the data analysis method with multi-angle fusion is effective in analysis results and more superior in classification performance, provides practical and effective data analysis means for the research of various biological omics data such as genomics, metabonomics and proteomics, and has high application value.
Description
Technical Field
The invention belongs to the technical field of biological omics data analysis, and relates to a biological omics data analysis method based on multi-angle fusion.
Background
With the rapid development of science and technology and the continuous progress of omics technology, a great amount of biological omics data are continuously emerged. Common omics data include: genomic data, transcriptomic data, proteomic data, metabolomic data, and the like. Of these omics data, genomics refers to the collective and quantitative study of all genes in an organism and the comparison of differences between different genes, and is currently the most mature field of biology. Genomics focuses on the study of entire genomes rather than on a few or a single gene of interest in the traditional genetics field. Genomics provides reliable guarantees for deciphering genetic information, studying complex diseases and specific genetic variations. The gene becomes a protein which is a life embodiment through processes of transcription, translation and the like, and is closely related to various biochemical reaction processes in cells. Thus, proteomics has received a great deal of attention from researchers behind genomics. Proteomics is the discipline for studying protein expression levels, post-translational modifications, and protein interactions. Proteins in the human body undergo dynamic change processes, have natural complexity, and the analysis of information contained in proteomics plays a crucial role in understanding the processes of life activities. However, it is not sufficient to decrypt the human life code only through genomics, proteomics, for example, the same genotype may show different characteristics, which are caused by both genetic and environmental factors. In the case of diseases, the occurrence of a disease may be related to a mutation in a gene, or may be related to an error in the transcription, translation or other process of a gene. Therefore, the role of other biotomics in the human body remains largely unappreciated. Transcriptomics researches the whole genome transcription condition and the transcription regulation rule; phenomics makes an overall study of the modified characteristics of genomic DNA or DNA binding proteins; metabolomics quantitatively analyzes all metabolites (e.g., amino acids, fatty acids, carbohydrates, etc.) in an organism and correlates the metabolites with corresponding diseases. Therefore, the existence of the biological group data is of great significance for people to understand the phenomena in the life activities, analyze and research organisms, search the characteristics rich in biological information and explore the specific research directions such as the occurrence and development of diseases.
However, most of the biological data have a serious problem, that is, the characteristics of high data dimension, high noise and small sample number, so that researchers have many limitations in the process of analyzing and mining the biological data, and how to realize effective analysis and mining of the biological data with the characteristics has great biological significance in the directions of disease research, medical treatment methods and the like in the biological field.
According to the invention, from the multi-angle fusion direction, from the angle research determined by the characteristic subspace of the omics data, three different characteristic selection methods ERGS and mRMR and a characteristic selection method based on a Spearman's difference correlation network are used, characteristics rich in biological information in the omics data are screened from three different angles, the characteristic subspaces reflecting different physiological and pathological states of an organism are determined, and then a fusion classifier is established on the three determined characteristic subspaces, so that the biological omics data are effectively analyzed and mined. According to the method, an effective biological omics data analysis model is constructed by using three different feature selection methods from multiple angles and establishing a fusion classifier, a feature subspace with certain discrimination ability is screened out from the original data set, and good classification performance of the analysis of the biological data is obtained.
Disclosure of Invention
The invention aims to mine a characteristic subspace which is rich in biological information in biological data by using three different characteristic selection methods from multiple angles based on the characteristics of high dimensionality, small sample size, more noise, complex and various relationships among characteristics and the like of the biological data, thereby effectively analyzing the biological data. The model is suitable for the analysis and research of biological omics data, can mine important information in the omics data from different angles, and can be used in the fields of omics data analysis, precise medical treatment and the like. The core technology of the method is based on the determination of different feature subspaces of multi-angle fusion.
In order to achieve the above object, the technical scheme adopted by the invention is as follows:
a multi-angle fusion-based method for analyzing biological omics data comprises the following steps:
step one, data preprocessing
The data set is preprocessed and mainly divided into two parts, wherein the first part is used for processing missing value parts in the data set, and the processing method comprises the following steps: deleting the characteristics that the number of the missing values of each type of samples exceeds eighty percent of the total number of the type of samples, and filling the missing values of the remaining characteristics into the average value of the same type of samples on the characteristics; let F = { F 1 ,f 2 ,…,f m The feature set is used as the input data, and m represents the number of features; y = { Y j J =1,2 is a set of class labels; s = { S = 1 ,s 2 ,…,s n Is the sample set and n represents the number of samples.
The second part is to standardize the data by using a Z-Score method, and the calculation formula of the Z-Score standardization is shown as (1);
wherein f is scaled ik In a sample s k Upper characteristic f i Value after Z-Score data normalization, f ik Is a sample s k Upper characteristic f i Of original value u i Is characterized by i Mean value over all samples, σ i Is characterized by i Standard deviations found on all samples; thereby obtaining a normalized feature set F = { F = { F } scaled 1 ,f scaled 2 ,…,f scaled m }。
Step two, determining a feature subspace from multiple angles
Determining three different feature subspaces from multiple angles by using an ERGS feature selection method, an mRMR feature selection method and a feature selection method based on a Spearman difference correlation network;
the first feature subspace (Subset-ERGS) is determined by using an ERGS feature selection method, and the specific formula of the ERGS is as follows:
w i =1-AC i /max{AC u :u=1,2} (3)
(2) In the formula:
R ij is characterized by scaled i In category y j (j =1, 2);
r + ij and r - ij Is the effective range R ij Upper and lower bounds of;
u ij is y j Class-specific features f scaled i The mean value of (a);
σ ij is a feature f in yj class scaled i Standard deviation of (d);
p j is y j A prior probability of a class;
the coefficient 1.732 is determined by the chebyshev inequality, ensuring that the valid range contains at least 2/3 of the samples;
(3) In the formula:
w i is characterized by scaled i The weight value of (2);
(4) In the formula:
OA i as features f between different classes scaled i The overlapping area of the effective ranges of (a);
AC i to calculate w i The median of (a), represents the overlapping area ratio of the effective range;
finally selecting weight value w by ERGS method i High in character.
The second feature subspace (Subset-mRMR) was determined using the mRMR feature selection method, which is specifically formulated as follows:
(5) In the formula:
w i is the characteristic f calculated by the method of mRMR scaled i The final score of (2);
I(f i scaled ;y j ) Is expressed as a characteristic f scaled i And class label y j A mutual information value of;
I(f i scaled (ii) a x) is a characteristic f scaled i A mutual information value with a selected feature X, X representing a selected feature set;
and the mRMR characteristic selection method is used for carrying out characteristic selection according to the finally calculated score w and selecting the characteristics with high scores.
A third feature subspace (Subset-Spearman) is determined by using a feature selection method based on a Spearman difference correlation network, spearman correlation coefficients among features are calculated, the difference correlation network is established for feature screening, and therefore the determination of the feature subspace is completed, and the method relates to the specific mode that:
first, according to the set of class labels Y = { Y = { Y = j (ii) a j =1,2} the whole sample is divided into two categories, and the correlation between features is calculated on the two categories, respectively, and the correlation is calculated using Spearman correlation formula:
(6) In the formula:
q ij representing the resulting calculated features f scaled i And f scaled j Spearman correlation score between, u i And u j Is a characteristic f scaled i And f scaled j Average over all samples, f scaled ik And f scaled jk Is characterized by scaled i And f scaled j In the sample s k The value of (c) is as follows. Q finally calculated ij The larger the value is, the two characteristics f are indicated scaled i And f scaled j The higher the Spearman correlation between;
the difference in Spearman correlation found across the two classes was then calculated separately:
wherein o is ij Is f scaled i And f scaled j Final Spearman correlation difference score, q, for two features over two categories + ij And q is - ij Are respectively f scaled i And f scaled j Spearman relevance scores for two features on two different categories; the Spearman correlation difference o is obtained by calculation ij Then, constructing a final difference correlation network; set of features F = { F) in a diversity correlation network scaled 1 ,f scaled 2 ,…,f scaled m Each feature in (1) is defined as a node of the network, and (7) is o obtained by the following equation ij As the weights corresponding to the edges between two nodes, the network weight values and the network node weight values of the constructed differential network are respectively: netEdge _ Weight and netNode _ Weight:
netEdge_Weight ij =o ij (8)
wherein, netNode _ Weight i Is f scaled i The final Weight of the network node corresponding to the characteristics is obtained by the integral summation of the weights of all sides connected with the node, a final difference correlation network netS is constructed, evaluation is carried out according to the final Weight netNode _ Weight of each node of the netS in the network, network nodes with high Weight scores are screened and selected, and a characteristic subspace Subset-Spearman rich in biological information is constructed;
the three feature selection methods used from three different angles are used for establishing a Subset-ERGS, a Subset-mRMR and a Subset-Spearman feature subspace from three angles of establishing a difference correlation network by respectively considering a single feature score, calculating the correlation between features and class marks and the redundancy between feature pairs and considering the synergistic action among all feature variables;
step three, establishing a fusion classifier on different feature subspaces
On the obtained three feature subspaces, establishing a fusion classifier by using a Support Vector Machine (SVM) method and a Deep Neural Network (DNN) to classify the data;
the feature subspaces obtained by the feature selection method of three different angles are respectively Subset-ERGS, subset-mRMR and Subset-Spearman; the three feature subspaces are rich in different biological information from different angles, the information richness selected by the feature subspaces is ensured, the Subset-ERGS is the feature subspace obtained by ranking through a single feature score, the Subset-mRMR is the feature subspace established by comprehensively considering the redundancy between feature pairs and the correlation between features and class marks, and the Subset-Spearman is the feature subspace obtained by constructing an overall difference correlation network by considering the Spearman correlation coefficient synergistic effect among all features by utilizing a Spearman correlation network construction method based on Spearman;
applying a Support Vector Machine (SVM) and a Deep Neural Network (DNN) classification method to three feature subspaces of Subset-ERGS, subset-mRMR and Subset-Spearman to respectively establish classifiers, integrating classification results by using a Majority Voting method (Majority Voting) to establish an integral fusion classifier, and carrying out complete data analysis to obtain a final classification result.
The invention combines the characteristics of gene regulation, metabolic reaction, protein synthesis and the like in organisms, systematically analyzes the association of genomics, metabonomics and other omics data and diseases from multiple angles, constructs a plurality of characteristic subspaces rich in biological information, and ensures the information richness of the selected characteristic subspaces. In the invention, in order to solve the influence of small sample size and high dimensionality of the biological omics data on the effectiveness of the analysis method, considering the diversity of the relationship among the characteristics of all components in a living body, starting from the angle of multi-angle fusion, three characteristic subspaces which are representative and rich in biological information are constructed by using the characteristic selection methods of three different angles from different angles, and a fusion classification model is established on the basis of the characteristic subspace to carry out data analysis. The results of the public data sets based on a plurality of different omics show that compared with other commonly used data analysis methods, the data analysis method based on multi-angle fusion provided by the invention has the advantages of effective analysis results and more superior classification performance. Through analysis of theory and experiment, the invention can provide practical and effective data analysis means for research of various biological omics data such as genomics, metabonomics, proteomics and the like, and has strong application value.
Drawings
Fig. 1 is an overall architecture diagram of an overall integrated data analysis model established by the invention.
Fig. 2 is a diagram of a network structure of a DNN classifier used in the present invention.
FIG. 3 is a PCA diagram drawn after the human gastric cancer miRNA data set training set part screens a feature subspace by using an ERGS feature selection method.
Fig. 4 is a PCA chart drawn by screening feature subspace by using mRMR feature selection method in the human gastric cancer miRNA data set training set section.
FIG. 5 is a PCA graph drawn after a feature subspace is screened by a Spearman-based difference correlation network feature selection method in a human gastric cancer miRNA data set training set part.
Detailed Description
The following further describes the specific embodiments of the present invention in conjunction with the technical solutions. The miRNA dataset of human gastric cancer is taken as an example to briefly explain the execution process.
The public omics dataset used in this example is a miRNA dataset of human gastric cancer, and after the samples are effectively analyzed and processed by the relevant bioanalytical technology, the total number of the datasets includes 44 samples, wherein 22 diseased samples, 22 non-diseased samples and 556 characteristic numbers, which completely meet the basic characteristics of small sample size and high dimensionality of the public omics dataset to which the present invention is directed.
Step one, preprocessing a human gastric cancer miRNA data set. Before specific data analysis, missing value filling and Z-Score standardization data preprocessing steps are carried out on the data set, and finally standardized data capable of being further analyzed are obtained.
Step two, determining a feature subspace from multiple angles
In the embodiment, the number of the common samples is 44, the number of the features is 556, in order to remove invalid features with small disease guidance and screen effective features, three feature selection methods ERGS and mRMR from different angles and a difference correlation network feature selection method based on Spearman are used for feature selection on the data set, the number of the features of feature subspaces constructed by the three feature selection methods is uniformly set to be 100, and feature subspaces Subset-ERGS, subset-mRMR and Subset-Spearman which contain 100 features and are screened by the three feature selection methods are respectively used. Fig. 3-5 show PCA charts constructed on two types of samples after determining corresponding feature subspaces by using three different feature selection methods in the training set part of the miRNA data set for human gastric cancer, respectively, from the three charts, it can be seen that the two types of samples in the three charts have a relatively clear separation trend, which indicates that the three feature subspaces determined from three angles have relatively strong distinguishing and discriminating capabilities.
Step three, establishing a fusion classifier on the three feature subspaces
Three feature subspaces Subset-ERGS, subset-mRMR and Subset-Spearman constructed on the data set respectively use SVM and DNN to establish classifiers, and a majority voting mode is used for integrating final classification results. The classifier SVM uses a linear kernel function, the DNN uses a grid search method to carry out parameter optimization on a plurality of parameters such as a network structure, a learning rate, an activation function, the number of training rounds and the training size of each round, and a fifty-time quintupling cross validation method is used to validate the classification performance index Accuracy (AUC), the Specificity (SPE) and the Sensitivity (SEN). Fig. 1 is a complete structural diagram of an integrated classification model established by the present invention, and fig. 2 is a diagram of a network architecture used by a DNN classifier used in the present invention after optimization.
The following table is a comparison of classification performance of the inventive methods (EMS-SVM and EMS-DNN) with other data analysis methods commonly used in the analysis of biological omics data, including fifty-five-fold cross-validation of SVM-RFE, RF, and XGBOOST on ten public datasets, with bold font for the best performance obtained by the data analysis method on each dataset. From the results, the classification performance of the method is far higher than that of other technologies no matter the indexes are AUC, SPE or SEN indexes, and the effectiveness of the method is proved.
TABLE 1 EMS-DNN and EMS-SVM for comparison of accuracy with other effective methods
TABLE 2 sensitivity comparison of EMS-DNN and EMS-SVM with other effective methods
TABLE 3 specificity comparison of EMS-DNN and EMS-SVM with other effective methods
Claims (1)
1. A multi-angle fusion-based method for analyzing biological omics data comprises the following steps:
step one, data preprocessing
The data set is preprocessed and mainly divided into two parts, wherein the first part is used for processing missing value parts in the data set, and the processing method comprises the following steps: the number of missing values on each type of sample is deletedFeatures exceeding eighty percent of the total number of the samples of the same type, and the missing values of the remaining features are filled as the average value of the samples of the same type on the features; let F = { F 1 ,f 2 ,…,f m The feature set is used as the input data, and m represents the number of features; y = { Y j J =1,2 is a set of class labels; s = { S = 1 ,s 2 ,…,s n Is a sample set, n represents the number of samples;
the second part is to standardize the data by using a Z-Score method, and the calculation formula of the Z-Score standardization is shown as (1);
wherein, f scaled ik In a sample s k Upper characteristic f i Value after Z-Score data normalization, f ik As a sample s k Upper characteristic f i Original value of u i Is characterized by f i Mean value over all samples, σ i Is characterized by i Standard deviations found on all samples; thereby obtaining a normalized feature set F = { F = scaled 1 ,f scaled 2 ,…,f scaled m };
Step two, determining the feature subspace from multiple angles
Determining three different feature subspaces from multiple angles by using an ERGS feature selection method, an mRMR feature selection method and a feature selection method based on a Spearman difference correlation network;
the first feature subspace (Subset-ERGS) is determined by using an ERGS feature selection method, and a specific formula of the ERGS is as follows:
w i =1-AC i /max{AC u :u=1,2} (3)
(2) In the formula:
R ij is characterized by scaled i In category y j (j =1, 2);
r + ij and r - ij Is the effective range R ij Upper and lower bounds of;
u ij is y j Class-in feature f scaled i The mean value of (a);
σ ij is a feature f in yj class scaled i Standard deviation of (d);
p j is y j A prior probability of a class;
the coefficient 1.732 is determined by the Chebyshev inequality, ensuring that the valid range contains at least 2/3 of the samples;
(3) In the formula:
w i is characterized by scaled i The weight value of (1);
(4) In the formula:
OA i as features f between different classes scaled i The overlapping area of the effective ranges of (a);
AC i to calculate w i The median of (a), represents the overlapping area ratio of the effective range;
finally selecting weight value w by ERGS method i A high profile;
the second feature subspace (Subset-mRMR) was determined using the mRMR feature selection method, which is specifically formulated as follows:
(5) In the formula:
w i is the calculation of the characteristic f by the mRMR method scaled i (ii) a final score of;
I(f i scaled ;y j ) Is expressed as a characteristic f scaled i And class label y j A mutual information value of;
I(f i scaled (ii) a x) is a feature f scaled i A mutual information value with a selected feature X, X representing a selected feature set;
the mRMR feature selection method is used for selecting features according to the finally calculated score w and selecting features with high scores;
a third feature subspace (Subset-Spearman) is determined by using a feature selection method based on a Spearman difference correlation network, spearman correlation coefficients among features are calculated, the difference correlation network is established for feature screening, and therefore the determination of the feature subspace is completed, and the method relates to the specific mode that:
first, according to the set of class labels Y = { Y = { Y = j (ii) a j =1,2} the whole sample is divided into two categories, and the correlation between features is calculated on the two categories, respectively, and the correlation is calculated using Spearman correlation formula:
(6) In the formula:
q ij representing the resulting calculated features f scaled i And f scaled j Spearman correlation score between u i And u j Is a characteristic f scaled i And f scaled j Average over all samples, f scaled ik And f scaled jk Is characterized by f scaled i And f scaled j In a sample s k The value of the above, and the finally calculated q ij The larger the value is, the two characteristics f are indicated scaled i And f scaled j The higher the Spearman correlation between;
the difference in Spearman correlation found across the two classes was then calculated separately:
wherein o is ij Is f scaled i And f scaled j Final Spearman correlation difference score, q, for two features over two categories + ij And q is - ij Are respectively f scaled i And f scaled j Spearman relevance scores for two features on two different categories; the Spearman correlation difference o is obtained by calculation ij Then, constructing a final difference correlation network; set of features F = { F in a diversity correlation network scaled 1 ,f scaled 2 ,…,f scaled m Each feature in (7) is defined as a node of the network, and o is obtained in the formula ij As the weights corresponding to the edges between two nodes, the network weight values and the network node weight values of the constructed differential network are respectively: netEdge _ Weight and netNode _ Weight:
netEdge_Weight ij =o ij (8)
wherein, netNode _ Weight i Is f scaled i The final Weight of the network node corresponding to the characteristics is obtained by the integral summation of the weights of all sides connected with the node, a final difference correlation network netS is constructed, evaluation is carried out according to the final Weight netNode _ Weight of each node of the netS in the network, network nodes with high Weight scores are screened and selected, and a characteristic subspace Subset-Spearman rich in biological information is constructed;
the three feature selection methods used from three different angles are used for establishing a Subset-ERGS, a Subset-mRMR and a Subset-Spearman feature subspace from three angles of establishing a difference correlation network by respectively considering a single feature score, calculating the correlation between features and class marks and the redundancy between feature pairs and considering the synergistic action among all feature variables;
step three, establishing a fusion classifier on different feature subspaces
On the obtained three feature subspaces, establishing a fusion classifier by using a Support Vector Machine (SVM) method and a Deep Neural Network (DNN) to classify the data;
the feature subspaces obtained by the feature selection method of three different angles are respectively Subset-ERGS, subset-mRMR and Subset-Spearman; the three feature subspaces are rich in different biological information from different angles, the information richness selected by the feature subspaces is ensured, the Subset-ERGS is the feature subspace obtained by ranking through a single feature score, the Subset-mRMR is the feature subspace established by comprehensively considering the redundancy between feature pairs and the correlation between features and class marks, and the Subset-Spearman is the feature subspace obtained by constructing an overall difference correlation network by considering the Spearman correlation coefficient synergistic effect among all features by utilizing a Spearman correlation network construction method based on Spearman;
applying a Support Vector Machine (SVM) and a Deep Neural Network (DNN) classification method to three feature subspaces of Subset-ERGS, subset-mRMR and Subset-Spearman to respectively establish classifiers, integrating classification results by using a Majority Voting method (Majority Voting) to establish an integral fusion classifier, and carrying out complete data analysis to obtain a final classification result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211361898.6A CN115662504A (en) | 2022-11-02 | 2022-11-02 | Multi-angle fusion-based biological omics data analysis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211361898.6A CN115662504A (en) | 2022-11-02 | 2022-11-02 | Multi-angle fusion-based biological omics data analysis method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115662504A true CN115662504A (en) | 2023-01-31 |
Family
ID=84995424
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211361898.6A Pending CN115662504A (en) | 2022-11-02 | 2022-11-02 | Multi-angle fusion-based biological omics data analysis method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115662504A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117218433A (en) * | 2023-09-13 | 2023-12-12 | 珠海圣美生物诊断技术有限公司 | Household multi-cancer detection device and multi-mode fusion model construction method and device |
-
2022
- 2022-11-02 CN CN202211361898.6A patent/CN115662504A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117218433A (en) * | 2023-09-13 | 2023-12-12 | 珠海圣美生物诊断技术有限公司 | Household multi-cancer detection device and multi-mode fusion model construction method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Caudai et al. | AI applications in functional genomics | |
David et al. | Comparative analysis of data mining tools and classification techniques using weka in medical bioinformatics | |
Momeni et al. | A survey on single and multi omics data mining methods in cancer data classification | |
US20110246409A1 (en) | Data set dimensionality reduction processes and machines | |
Flores et al. | Deep learning tackles single-cell analysis—a survey of deep learning for scRNA-seq analysis | |
JP2005538437A (en) | Prediction with likelihood set from emerging patterns | |
Arowolo et al. | Optimized hybrid investigative based dimensionality reduction methods for malaria vector using KNN classifier | |
Yang et al. | Applying the Fisher score to identify Alzheimer’s disease-related genes | |
CN101923604A (en) | Classification method for weighted KNN oncogene expression profiles based on neighborhood rough set | |
Chamlal et al. | A hybrid feature selection approach for microarray datasets using graph theoretic-based method | |
Qi et al. | String kernels construction and fusion: a survey with bioinformatics application | |
Arowolo et al. | An efficient PCA Ensemble learning approach for prediction of RNA-Seq malaria vector gene expression data classification | |
Chamlal et al. | A graph based preordonnances theoretic supervised feature selection in high dimensional data | |
CN115662504A (en) | Multi-angle fusion-based biological omics data analysis method | |
Hu et al. | Cancer gene selection with adaptive optimization spiking neural p systems and hybrid classifiers | |
CN117637035A (en) | Classification model and method for multiple groups of credible integration of students based on graph neural network | |
Saha et al. | Aggregation of multi-objective fuzzy symmetry-based clustering techniques for improving gene and cancer classification | |
CN117423391A (en) | Method, system and equipment for establishing gene regulation network database | |
Kumar et al. | Ubiquitous machine learning and its applications | |
Arowolo et al. | Enhanced dimensionality reduction methods for classifying malaria vector dataset using decision tree | |
CN111584005B (en) | Classification model construction algorithm based on fusion of different mode markers | |
Peterson et al. | Analysis of microbiome data | |
Serra et al. | Data integration in genomics and systems biology | |
Bazan et al. | Comparison of aggregation classes in ensemble classifiers for high dimensional datasets | |
CN111739581A (en) | Comprehensive screening method for genome variables |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |