CN114927162A - Multi-set correlation phenotype prediction method based on hypergraph representation and Dirichlet distribution - Google Patents
Multi-set correlation phenotype prediction method based on hypergraph representation and Dirichlet distribution Download PDFInfo
- Publication number
- CN114927162A CN114927162A CN202210544114.7A CN202210544114A CN114927162A CN 114927162 A CN114927162 A CN 114927162A CN 202210544114 A CN202210544114 A CN 202210544114A CN 114927162 A CN114927162 A CN 114927162A
- Authority
- CN
- China
- Prior art keywords
- matrix
- hypergraph
- omics
- data
- prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 239000011159 matrix material Substances 0.000 claims abstract description 87
- 230000006870 function Effects 0.000 claims abstract description 22
- 238000013528 artificial neural network Methods 0.000 claims abstract description 20
- 238000012512 characterization method Methods 0.000 claims abstract description 11
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 10
- 238000000605 extraction Methods 0.000 claims abstract description 10
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 7
- 238000004140 cleaning Methods 0.000 claims abstract description 4
- 230000004927 fusion Effects 0.000 claims description 29
- 230000008569 process Effects 0.000 claims description 20
- 230000010354 integration Effects 0.000 claims description 17
- 239000000126 substance Substances 0.000 claims description 16
- 239000013598 vector Substances 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 10
- 238000005259 measurement Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 4
- 230000007246 mechanism Effects 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 2
- 235000013399 edible fruits Nutrition 0.000 claims 1
- 238000010187 selection method Methods 0.000 claims 1
- 238000004364 calculation method Methods 0.000 abstract description 7
- 238000007781 pre-processing Methods 0.000 abstract description 7
- 238000005065 mining Methods 0.000 abstract description 3
- 238000012216 screening Methods 0.000 abstract description 3
- 239000000523 sample Substances 0.000 description 18
- 230000011987 methylation Effects 0.000 description 5
- 238000007069 methylation reaction Methods 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 238000010276 construction Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 108020004999 messenger RNA Proteins 0.000 description 4
- 108091070501 miRNA Proteins 0.000 description 4
- 239000002679 microRNA Substances 0.000 description 4
- 230000033228 biological regulation Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 201000010099 disease Diseases 0.000 description 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 3
- MYWUZJCMWCOHBA-VIFPVBQESA-N methamphetamine Chemical compound CN[C@@H](C)CC1=CC=CC=C1 MYWUZJCMWCOHBA-VIFPVBQESA-N 0.000 description 3
- 108020004414 DNA Proteins 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 239000000090 biomarker Substances 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 108090000623 proteins and genes Proteins 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 206010006187 Breast cancer Diseases 0.000 description 1
- 208000026310 Breast neoplasm Diseases 0.000 description 1
- 102000053602 DNA Human genes 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000004071 biological effect Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 210000001124 body fluid Anatomy 0.000 description 1
- 239000010839 body fluid Substances 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000012165 high-throughput sequencing Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 230000008093 supporting effect Effects 0.000 description 1
- 238000012353 t test Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 210000002700 urine Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24147—Distances to closest patterns, e.g. nearest neighbour classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Public Health (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biotechnology (AREA)
- Genetics & Genomics (AREA)
- Analytical Chemistry (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Chemical & Material Sciences (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a multigroup theory-related phenotype prediction method based on hypergraph characterization and Dirichlet distribution, which comprises the following steps of: the omics data preprocessing module realizes the functions of cleaning primitive omics data and pre-screening characteristics so as to remove noise, errors and redundant characteristics which may influence the associated mining performance. And the omic data hypergraph characterization module is used for completing cosine similarity calculation in each omic and constructing a hypergraph correlation matrix according to the cosine similarity calculation. And the feature extraction module builds a hypergraph convolution neural network to extract features of each omics data. The multiomic ensemble prediction module constructs dirichlet distribution parameters using the initial results generated by each particular hypergraph convolutional neural network and inputs them to a multigroup ensemble algorithm for final label prediction. The method provided by the invention is used for mining the potential correlation of each omics information based on multiple groups of omics data and corresponding phenotype labels, effectively integrating the characteristic information of each omic, and realizing the accurate relevance prediction of the omics data and the human body phenotype.
Description
Technical Field
The invention belongs to the technical field of biological information, and particularly relates to a multigroup theory associated phenotype prediction method based on hypergraph representation and Dirichlet distribution.
Background
In recent years, biological correlation techniques have been rapidly developed, and especially high-throughput sequencing has been in breakthrough progress in terms of quantity, speed, accuracy, diversity and application value. People can obtain related omics data by a method which is more efficient and lower in cost than the conventional method, the research on DNA, mRNA and meth can be broadly divided into genomics, transcriptomics and epigenomics, and the integration of the data provides the basis of the integrated research on multi-omics (multi-omics) for the research on various human phenotypes. On the other hand, the complexity of the organism is often contained in the various types of data, and since the research aiming at each group of science can only find a part of the biological complexity in a limited way, the complexity of the organism can be better understood and the life science process can be more comprehensively observed by integrating a plurality of groups of science data.
Phenotype is a quantifiable characteristic expression in the biological activity process, namely a characteristic biochemical index which can be objectively evaluated under a specific state of a living being, such as height, skin color, diseases and the like. The traditional statistical method can utilize the detection results of genes, proteins and other substances contained in body fluids or tissues such as human blood, urine and the like to carry out calculation and analysis on data statistics, compares set threshold values to obtain biomarker speculation of corresponding omics, and infers the phenotype of the data to be detected according to the biomarkers. For example, the GWAS method is used for comparing the P value of sample data, and DNA gene fragments, SNP sites and the like in genomics related to diseases are researched. However, the nature of the association between the biological group and the phenotype mined by the traditional statistical method has obvious limitation, on one hand, because the method only carries out statistical calculation on a single marker in each group, but a plurality of markers with low statistical values can also play a decisive role in influencing the phenotype, so that the influence of the association between the markers with low statistical values and the phenotype cannot be eliminated. On the other hand, because the regulation process of organisms is a multi-level dynamic expression process, the research method only aiming at the single omics fundamentally has limitation, and influence brought by upper and lower level regulation of the omics cannot be considered.
In view of the above, it is necessary to use a comprehensive approach of multiomics to make full use of these data to understand biological systems. With the increasing affordable computing power of computers and high-throughput omics data, and the success of artificial intelligence technology in various fields, the application of machine learning in the biological field has become popular. Machine learning can be used to mine information hidden in experimental data. In contrast, conventional statistical-based models are typically designed using statistical assumptions and make inferences about a particular phenomenon from a given data set, while machine learning methods aim to learn knowledge from historical or existing data and use that knowledge to predict or select unknown new data. For example, Xu et al developed a HI-dfn forest framework that learns high-level feature representations from three omic datasets using stacked autoencoders, which representations are simultaneously integrated to predict cancer subtypes. The mogenet proposed by Wang et al constructs a graph structure for each omics data, performs initial prediction by using a graph convolution neural network, and then integrates through a multi-view integration network VCDN to realize multi-group integrated phenotype classification. However, the above method still has room for improvement in terms of prediction accuracy and module design composition.
Disclosure of Invention
In order to solve the problems, the invention provides a method for predicting the association between human body phenotype and omics data based on a hypergraph and Dirichlet distribution. According to the method, firstly, original data are cleaned and screened through a preprocessing module, secondly, a neural network model based on hypergraph structure characterization is developed through a combined data matrix formed by a plurality of omic data sets, hypergraph structure characterization is carried out on the plurality of omic data sets, a KNN (K-Nearest Neighbor) algorithm based on cosine similarity is adopted in the characterization process, and the relevance among different position information in the omics is deeply excavated. Then, the characterization data are subjected to efficient feature extraction through a hypergraph convolution neural network, and meanwhile, the hypergraph neural network also supports the realization of relevance prediction between the monamics and the phenotypes. And finally, forming a multi-omics combined matrix based on the characteristic matrix to construct a multi-omics (two or more) fusion algorithm based on Dirichlet distribution, completing information integration among various omics by utilizing a loss function constructed by the Dirichlet distribution, and realizing information sharing among the omics on the basis of the characteristic matrix so as to accurately predict the human body surface type condition.
In order to achieve the purpose, the specific technical scheme of the invention is as follows:
a multi-group theory association phenotype prediction method based on hypergraph characterization and Dirichlet distribution comprises the following steps:
step (1) omics data cleaning and pretreatment
Redundant noise in original data is removed through a conventional preprocessing method for each omics data, for example, only data with a chip detection success rate of at least 95% is reserved for miRNA omics data; normalized beta values were calculated for the meth omics data as expression level per methylation site. The screened data may still contain redundant features or noise that negatively impact the prediction performance. To solve this problem, the pre-selection of features is performed by the following method.
First, features in the data set having a variance less than a threshold α are filtered out.
Secondly, sequentially executing the t hypothesis of the formula (1) for each phenotype label to test whether the data of the sample omics of the same type label have significant difference, wherein the t value is larger than the threshold value gammaThe book is subjected to deletion processing, whereinTo mean the sample, μ represents the sample expectation, σ (x) represents the standard deviation of the samples, and n represents the number of samples.
Finally, because different omics data types have different expression ranges, the expression values are scaled to [0,1] by linear transformation so that the model is processed, the output of this step is the preprocessed feature matrix X.
Step (2) constructing a hypergraph structure of omics data
(2.1) A hypergraph is defined as G ═ (V, E, W), defined by the set of vertices V ═ V 1 ,v 2 ,…,v m And E-super edge set E ═ E 1 ,e 2 ,…,e l And W is a weight matrix of the super edges and represents the importance degree of each super edge. In the hypergraph, each vertex corresponds to a sample, and each hyperedge contains an arbitrary subset of V. And (3) carrying out cosine similarity operation on the feature matrix X output in the step (1) to measure the relationship between features in the omics.
The traditional construction method of the hypergraph structure usually adopts the Euclidean distance of a formula (2) to calculate the linear distance between vectors so as to measure the proximity degree between different samples, and the Euclidean distance is more suitable for reflecting the absolute difference on the numerical value and is not completely fit for the implicit correlation action between the features in omics data. In the present invention, different samples are regarded as different vectors, and the cosine similarity measurement matrix obtained by using formula (3) is used to measure the approximation degree of the angle difference between the vectors. Wherein x is i A specific eigenvector, X, representing the ith sample in the feature matrix X ir And R represents the characteristic value of the R-th item of the ith sample in the characteristic matrix X, and R represents the total number of the characteristic quantity. The method is theoretically more consistent with the action rule in omics, and the application effect of the method is proved through a control experiment.
And (2.2) carrying out KNN clustering on the samples according to the obtained cosine similarity measurement matrix. Since the cosine values between the vectors decrease with increasing angle. Therefore, the KNN clustering process in the present invention returns the index of the largest k values in each row of the similarity matrix, these indexes form the hyper-edge set e of the hyper-graph vertex, and the k indexes are set to 1 in the matrix, and the rest indexes are set to 0. The matrix H constructed by this can be expressed as the incidence matrix of the hypergraph G, defined as:
by this extension, the degree D of the vertex v Is defined as:
wherein w (e) is the weight of the super edge in the weight matrix, the degree D of the super edge e Is defined as:
and (3) building a hypergraph convolution neural network to perform characteristic extraction of a monoomics:
(3.1) firstly, constructing a Laplace matrix of a hypergraph incidence matrix according to a Laplace standardized formula, and converting an abstract node relation in the hypergraph into a matrix type which can be used as input of a neural network, wherein the Laplace matrix construction method of the traditional graph structure comprises the following steps:
wherein I is a unit matrix, D is the degree of a vertex in the graph, and A is an adjacent matrix of the graph structure.
Similarly, the laplacian matrix for the hypergraph structure formed in step (2) is defined as:
wherein D v Vertex degree matrix, D, for the hypergraph obtained by equation (5) e For the excess edge matrix obtained by formula (6), H is the correlation matrix obtained by formula (4), and for the data set without given specific weight matrix W, it is defined as unit matrix I by default, meaning that the weights of all excess edges are equal.
(3.2) taking the hypergraph laplacian matrix of the single-component data and the preprocessed feature data as input to a hypergraph convolution neural network to perform an initial prediction task. The training goal of each hypergraph convolutional neural network is to learn the association of input data with the corresponding labels, specifically, the model requires the following two inputs: one of the inputs is the result of step (1), i.e. the preprocessed feature matrix, X ∈ n × d, where n is the number of samples and d is the number of omic features. The other input is the description of the structure of the hypergraph, namely the hypergraph Laplace matrix L obtained by the formula (8) h ∈n×n。
A HyperGraph Convolutional neural Network (HGCN) model structure is constructed by stacking 3 Convolutional layers and 1 fully-connected layer, the dimension of each Convolutional layer is set according to the dimension of a feature matrix X, and the output dimension of each fully-connected layer is the label category number. The specific definition of convolutional layers is:
HGConv (l+1) =f(HGConv (l) ,L h )
=σ(L h (HGConv (l) )Z (l) ) (9)
in the formula HGCconv (l) Is the output of the first layer, Z (l) Is the weight matrix of the l-th layer, when l is 0, HGConv (l) X. σ (-) is an activation function of the hidden layer, and is set as a LeakyReLU function in the method, wherein k is a negative slope parameter of the activation function and is used for solving the problem of gradient disappearance caused by neuron failure:
a dropout mechanism is added after the first two convolutional layers to reduce the possibility of model overfitting. And the full connection layer connected behind the third convolution layer realizes characteristic integration. Output of the model F o As a result of feature extraction, F o E n x b, where n is the number of samples and b is the number of tag types.
Meanwhile, the invention also supports the prediction of corresponding phenotype of single group of chemical data through HGCN, namely, the network is trained by using a cross entropy loss function through the back propagation process of a single HGCN:
wherein Loss CE (. cndot.) represents the cross entropy loss function, and y is the sample label. According to Loss value Loss HGCN And calculating gradient, updating network weight Z to complete a back propagation process, and performing correlation prediction on single group of chemical data and phenotype by using a model stored after several iterative training processes.
Step (4) a multi-group chemical integration algorithm based on Dirichlet distribution:
constructing corresponding HGCN for each group of chemical data by using the step (3), and outputting a characteristic result matrix F for each neural network o E n x b, combining formula (12) to construct F o Dirichlet distribution parameter matrix alpha o ,α ij o Represents alpha o Each element of (1). Calculating F according to the parameters o Each element f in ij o Reliability p of (2) ij o Form a matrix P o And uncertainty parameter u of prediction results under the omics i o Component vector U o :
α o =F o +1 (12)
Credible distribution matrix P of single-group chemical prediction result obtained based on the steps o And uncertainty vector U o And performing fusion prediction of the multiomics. The process adopts a classic D-S evidence theory, namely a formula (13), and realizes pairwise information fusion between omics:
in the formula p i Representing the ith row of the matrix P, m is set to a value not less than 0, and in particular when m is 0, the formula implements that P 0 、U 0 (first group prediction result) with P 1 、U 1 (second group prediction result) to obtain P 2 、U 2 As a fusion result of the two omics; when m is 1, the formula realizes that P 2 、U 2 (fusion of first two omics) with P 3 、U 3 (third omics prediction results) to obtain P 4 、U 4 As a fusion result of the three omics. The multiomic fusion mode is analogized until the fusion of all the omics is completed to obtain P 2m+2 、U 2m+2 。
After the fusion of all kinds of omics is completed, the Dirichlet distribution parameter alpha and the fusion prediction result F under the condition of multiomic fusion are reversely deduced according to the formula (12).
And finally, training and learning of the multiomic fusion prediction are carried out, and different from the cross entropy calculation method of the formula (11), the formula (14) is adopted to calculate the fusion loss.
Loss MOIA =Loss right +λ epoch Loss wrong
Therein, Loss right Loss function for correct label, Loss wrong As a function of Loss of false tags, Loss MOIA As a function of total loss; lambda epoch The loss weight which is dynamically changed according to the current training times is taken as a value between (0, 1); k represents the number of a particular kind of tag; y is i Set of labels, y, representing the ith sample in a one hot code of the sample label ij Representing the element represented by the jth label of the ith sample in one hot coding; alpha (alpha) ("alpha") i Dirichlet distribution parameter set, α, for the ith sample ij A Dirichlet distribution parameter estimate representing a jth classification result for an ith sample; Γ (·) is a gamma function, where t is a constant integral parameter. The method makes full use of Dirichlet distribution parameter estimation alpha and calculates Loss right So thatThe model predicts the correct label according to the maximum, calculates Loss wrong Enables prediction of false tags to be further reduced, Loss MOIA And optimizing and improving the model precision from two aspects. And the loss value is used for carrying out gradient calculation, finishing a back propagation process and updating the neuron weight of the hypergraph convolution neural network. The trained model can be used for accurately predicting the phenotype based on specific omics information and cross-group association learning.
The invention has the beneficial effects that:
(1) the designed hypergraph data structure is used as an input data type, and compared with a traditional graph structure, the hypergraph data structure can represent data containing a multidirectional relation in a higher fidelity, a hypergraph convolution neural network is constructed according to the data, and the relevance between different features in the omics is mined by fully combining original features and hypergraph characterization.
(2) A multi-group chemical integration algorithm based on Dirichlet distribution is provided, the complementary relation of biological characteristics under different levels is effectively utilized, and the potentially unknown association relation of the human body omics and the phenotype is further improved. Can help people to better understand the process of biodynamic regulation, and can provide more comprehensive theoretical support in the aspects of disease detection, typing and risk prediction.
Drawings
Fig. 1 is an overall architecture diagram of the present invention.
Figure 2 is a framework diagram of the present invention for implementing multiomic phenotypic association prediction.
Fig. 3 is an overall flow chart of the present invention.
Detailed Description
The following further describes a specific embodiment of the present invention with reference to the drawings and technical solutions.
As shown in fig. 1, a multi-group mathematical correlation phenotype prediction method based on hypergraph characterization and dirichlet distribution according to the present invention can be roughly divided into: four modules of omics data preprocessing, omics data hypergraph representation, feature extraction of a hypergraph neural network and multi-group chemical integration prediction;
(1) the preprocessing module relates to the cleaning of primitive omics data and the pre-screening of characteristics: preprocessing operation is respectively carried out on each type of omics data so as to remove noise, errors and redundant characteristics which possibly influence the associated mining performance, and a better understanding and supporting effect is played for a subsequent model algorithm. First, features with no probe signal or low difference (mean close to 0) were filtered for individual omics data. Because different omics data types have different expression ranges, the expression values are optionally scaled by linear transformation for the model to operate.
(2) The omics data hypergraph characterization module relates to a cosine similarity calculation and KNN clustering process: for each kind of feature data after being preprocessed, a cosine similarity matrix of the feature data among different samples is calculated firstly, then k samples with the largest cosine value of each hypergraph node are screened out according to a KNN algorithm, finally the most similar samples are indexed in a matrix with the index of 1, and the rest indexes are indexed with 0 to complete the construction of the hypergraph correlation matrix.
(3) The feature extraction module of the hypergraph neural network realizes the building and specific training process of the neural network: and constructing a Laplace matrix of the hypergraph structure according to a hypergraph Laplace standardized formula, and converting abstract node relations in the hypergraph into matrix types which can be used as neural network input. And (3) respectively constructing a hypergraph convolutional neural network (HGCN) by combining the preprocessed specific omics feature matrix, and performing specific learning of association of the omics and the phenotype by taking the omics preprocessed feature matrix and the corresponding hypergraph Laplace matrix as the input of the HGCN. The main advantage of HGCN is that the potential correlations between samples in omics data can be well combined to achieve more efficient feature extraction.
(4) And the multi-group learning integration prediction module constructs Dirichlet distribution parameters according to the output of each HGCN model, so that a loss function different from the traditional cross entropy is designed to carry out final label prediction learning. The multi-omic integration algorithm (MOIA) firstly calculates the uncertainty of the Dirichlet distribution parameters of each omic, and mines the potential correlation among different omics through the classical D-S combination rule, thereby effectively integrating the characteristics extracted by the specific network of each omic.
As shown in fig. 3, taking the BRCA proteomics data set of TCGA as an example for the association prediction of breast cancer subtypes, the following steps are performed:
(1) firstly, performing feature screening on each omics data according to a route of a preprocessing step, reserving features highly related to a phenotype tag, and filtering features with sample calling retention rate of less than 5% in miRNA and mRNA data by taking three omics data (methylation, mRNA and miRNA data) related to BRCA phenotype obtained from a TCGA starting database as an example; for methylation data, normalized beta values were calculated as the methylation level for each methylation site. Second, features in the training dataset with a variance of less than 0.3 are filtered out.
Meanwhile, for each tag prediction task, the t test in the formula (1) is sequentially executed to evaluate whether the sample data is significantly different from other data with the same tag, the sample with the overlarge difference is deleted, and each type of omics data is scaled to the range of [0,1] through linear transformation.
(2) And (3) characterizing the preprocessed screened omics data into a hypergraph structure. As shown in fig. 2, namely, the incidence matrix and the laplacian matrix of the hypergraph are constructed for the feature matrix data, the cosine similarity matrix of the single set of the mathematical data is calculated according to the formula (3), and k with the largest cosine value in each row of the matrix is selected to be 10 indexes, so as to construct the incidence matrix of the hypergraph G, and the laplacian matrix of the hypergraph is obtained through the formula (8).
(3) And respectively inputting the feature matrix and the Laplace matrix of the single omics into a hypergraph convolutional neural network (HGCN) for feature extraction. As shown in fig. 2, HGCNs are respectively built for each type of science, each HGCN learns the characteristics of the hypergraph representation by using the characteristics of each type of science node and the association relationship between the nodes, in this example, the dimension of the original characteristics is 1000 × 612, the number of classification labels is 5, and therefore, the hidden layer dimensions are respectively set to 400, 400, and 200, the input layer dimension is 1000, and the output layer dimension is 5. The operation process of the specific neural network refers to the formula (9-10), and meanwhile, a dropout mechanism with the parameter of 0.5 is added after the convolution layers of the first two layers, so that the probability of model overfitting is reduced.
(4) In the steps, the result of each omic corresponding to the HGCN is input into a multi-group chemical integration algorithm (MOIA) for final integration prediction, the MOIA can reveal the potential cross-group chemical label correlation, a Dirichlet distribution parameter is constructed based on a formula (12), and a classic D-S evidence theory like a formula (13) is introduced to realize pairwise information fusion between the omics. After the fusion of all kinds of omics data is finished, the hypergraph convolution neural network is trained in a back propagation mode by using the loss function of the formula (14). And finally, the output correlation prediction result is predicted based on specific omics information and cross-group correlation learning. The obtained result is shown as the final output of fig. 2, and is n × 5 tensor (n is the number of samples), the 5 parameters of each row respectively represent the probability distribution of five subtypes (Normal-like, Basal-like, HER 2-inverter, lumineal a and lumineal B) of the sample with BRCA, and the value with the highest probability represents the final prediction result.
(5) Multiple sets of control experiments performed on the same data set for efficiency comparison demonstrate that the method of the invention is superior to other existing methods. Some of the control experiments were as follows:
I. compared with the MOGONET method published in Nature Communication in 2021, the single set of chemical prediction Accuracy (ACC) exceeds the method by 0.06-0.09, and the multiple set of chemical integration prediction Accuracy (ACC) exceeds the method by 0.04 (the method is 0.8289, the invention is 0.8670), and meanwhile, by referring to the experimental part content in the MOGONET paper, the accuracy of the method disclosed by the invention is far superior to that of other conventional machine learning methods.
II. The accuracy of the single omics prediction on the HGNN was: the prediction accuracy of mRNA (0.8517), meth (0.7871) and miRNA (0.8061) after MOIA integration is 0.8670, and the integration effectiveness of the MOIA module is proved.
And III, comparing experiments on a hypergraph construction method, wherein compared with a hypergraph structure constructed by a traditional Euclidean distance method, the hypergraph structure constructed by the cosine similarity method improves the final prediction accuracy by 0.02-0.04, and the effectiveness of the cosine similarity method is proved.
Claims (1)
1. A multigroup theory association phenotype prediction method based on hypergraph characterization and Dirichlet distribution is characterized by comprising the following steps of:
step (1) omics data cleaning and pretreatment
Redundant noise in original data needs to be removed from each omics data, and then pre-selection of features is carried out, wherein the pre-selection method comprises the following steps:
firstly, filtering out the characteristic that the variance in a data set is smaller than a threshold value alpha;
secondly, sequentially executing a t hypothesis of a formula (1) for each phenotype label to check whether the data of the omics of the samples of the same type label have significant difference, and deleting the samples with the t value larger than a threshold value gamma, wherein the t hypothesis is used for deleting the samples with the t value larger than the threshold value gammaFor the sample mean, μ represents the sample expectation, σ (x) represents the standard deviation of the sample, and n represents the number of samples;
finally, because different omics data types have different expression ranges, the expression values are scaled to [0,1] through linear transformation, and the expression values are output as a preprocessed feature matrix X;
step (2) constructing hypergraph structure of omics data
(2.1) A hypergraph is defined as G ═ (V, E, W), defined by the set of vertices V ═ V 1 ,v 2 ,…,v m E and super edge set E ═ E 1 ,e 2 ,…,e l W is a weight matrix of the excess edges, and represents the importance degree of each excess edge; in the hypergraph, each vertex corresponds to a sample, and each hyperedge contains an arbitrary subset of V; carrying out cosine similarity operation on the feature matrix X output in the step (1) to measure the relationship between features in the omics;
regarding different samples as different vectors, and using a formula (3) to obtain a cosine similarity measurement matrix to measure the approximation degree of the cosine similarity measurement matrix by using the angle difference between the vectors;
wherein x is i Representing a specific feature vector of an ith sample in the feature matrix X;
(2.2) carrying out KNN clustering on the samples according to the obtained cosine similarity measurement matrix; because cosine values among vectors are reduced along with the increase of angles, the KNN clustering process returns indexes of the maximum k values of each row in the similarity matrix, the indexes form a hyper-edge set e of the vertex of the hyper-graph, the k indexes are set to be 1 in the matrix, and the rest indexes are set to be 0; the matrix H constructed in this way can be represented as the incidence matrix of the hypergraph G, defined as:
by this extension, the degree D of the vertex v Is defined as:
wherein w (e) is the weight of the super edge in the weight matrix, the degree D of the super edge e Is defined as:
and (3) constructing a hypergraph convolution neural network to perform characteristic extraction of a monamics:
(3.1) firstly, constructing a Laplace matrix of a hypergraph incidence matrix according to a Laplace standardized formula, and converting abstract node relations in the hypergraph into matrix types capable of being used as neural network input;
the Laplace matrix of the hypergraph structure formed in the step (2) is defined as:
wherein D v The vertex degree matrix, D, of the hypergraph obtained for equation (5) e For the super-edge matrix obtained by the formula (6), H is the incidence matrix obtained by the formula (4), and for the data set without the specific weight matrix W, the data set is defined as a unit matrix I by default, namely, the weights of all super edges are equal;
(3.2) inputting the hypergraph Laplace matrix of the single-component mathematical data and the preprocessed feature data into a hypergraph convolution neural network as input to execute an initial prediction task; the training goal of each hypergraph convolutional neural network is to learn the association of input data with corresponding labels, specifically, the model requires the following two inputs: one of the inputs is the result of step (1), i.e. the preprocessed feature matrix, X ∈ n × d, where n is the number of samples and d is the number of omics features; the other input is the description of the structure of the hypergraph, namely the hypergraph Laplace matrix L obtained by the formula (8) h ∈n×n;
The hypergraph convolutional neural network HGCN model structure is constructed by stacking 3 convolutional layers and 1 full-connection layer, the dimension of the convolutional layers is set according to the dimension of a characteristic matrix X, and the output dimension of the full-connection layer is the label category number; the specific definition of convolutional layers is:
HGConv (l+1) =f(HGConv (l) ,L h )
=σ(L h (HGConv (l) )Z (l) ) (9)
in the formula HGCconvnv (l) Is the output of the first layer, Z (l) Is the weight matrix of the l-th layer, when l is 0, HGConv (l) X; σ (-) is the activation function of the hidden layer, set as LeakyReLU function, where k is the negative slope parameter of the activation function:
a dropout mechanism is added after the first two convolutional layers to reduce the probability of overfitting the model; the full connection layer connected behind the third convolution layer realizes feature integration; output of the model F o As a result of feature extraction, F o E is n multiplied by b, wherein n is the number of samples, and b is the number of label types;
meanwhile, the method supports the prediction of corresponding phenotypes on single set of chemical data through the HGCN, namely the network is trained by using a cross entropy loss function through the back propagation process of a single HGCN:
wherein Loss CE (. cndot.) represents a cross entropy loss function, y is the sample label; according to Loss value Loss HGCN Calculating gradient, updating network weight Z to complete a back propagation process, and performing correlation prediction on single-group chemical data and phenotype by using a model stored after several iterative training processes;
step (4) a multi-group chemical integration algorithm based on Dirichlet distribution:
constructing a corresponding HGCN for each group of chemical data by using the step (3), and outputting a characteristic result matrix F for each neural network o E n × b, first construct F in conjunction with equation (12) o Dirichlet distribution parameter matrix alpha o ,α ij o Represents alpha o Each element of (a); calculating F according to the parameters o Each element f ij o Reliability p of (2) ij o Form a matrix P o And uncertainty parameter u of prediction results in omics i o Component vector U o :
α o =F o +1 (12)
The obtained single group of mathematical prediction knotsCredible distribution matrix P of fruits o And uncertainty vector U o Performing fusion prediction of the multiomics; the process adopts the classic D-S evidence theory, namely the mode of formula (13), and realizes pairwise information fusion between omics:
in the formula, p i Represents the ith row of matrix P; m is set to a value of not less than 0; specifically, when m is 0, the formula implements the first group prediction result P 0 、U 0 And a second group prediction result P 1 、U 1 By fusion of (1) to obtain P 2 、U 2 As a fusion result of the two omics; when m is 1, the formula realizes the fusion result P of the first two omics 2 、U 2 And third omics prediction result P 3 、U 3 By fusion of (b) to obtain P 4 、U 4 As a fusion result of the three omics; the fusion mode of the multiomics is analogized until the fusion of all the omics is completed to obtain P 2m+2 、U 2m+2 ;
After the fusion of all kinds of omics is finished, a Dirichlet distribution parameter alpha and a fusion prediction result F under the condition of multiomic fusion are reversely deduced according to a formula (12);
and finally, training and learning of the multiomic fusion prediction are carried out, and the fusion loss is calculated by adopting a formula (14):
Loss MOIA =Loss right +λ epoch Loss wrong
therein, Loss right Loss function for correct label, Loss wrong As a function of Loss of false tags, Loss MOIA As a function of total loss; lambda [ alpha ] epoch The loss weight which is dynamically changed according to the current training times is taken as a value between (0, 1); k represents the number of a particular kind of tag; y is i Set of labels, y, representing the ith sample in a one hot code of the sample label ij Representing the element represented by the jth label of the ith sample in one hot coding; alpha is alpha i Dirichlet distribution parameter set, α, for the ith sample ij A dirichlet distribution parameter estimate representing a jth classification result of the ith sample; Γ (·) is the gamma function, where t is a constant integration parameter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210544114.7A CN114927162B (en) | 2022-05-19 | 2022-05-19 | Multi-mathematic association phenotype prediction method based on hypergraph characterization and dirichlet allocation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210544114.7A CN114927162B (en) | 2022-05-19 | 2022-05-19 | Multi-mathematic association phenotype prediction method based on hypergraph characterization and dirichlet allocation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114927162A true CN114927162A (en) | 2022-08-19 |
CN114927162B CN114927162B (en) | 2024-06-14 |
Family
ID=82808101
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210544114.7A Active CN114927162B (en) | 2022-05-19 | 2022-05-19 | Multi-mathematic association phenotype prediction method based on hypergraph characterization and dirichlet allocation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114927162B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115565610A (en) * | 2022-09-29 | 2023-01-03 | 四川大学 | Method and system for establishing recurrence transfer analysis model based on multiple sets of mathematical data |
CN115631799A (en) * | 2022-12-20 | 2023-01-20 | 深圳先进技术研究院 | Sample phenotype prediction method and device, electronic equipment and storage medium |
CN115631847A (en) * | 2022-10-19 | 2023-01-20 | 哈尔滨工业大学 | Early lung cancer diagnosis system based on multiple mathematical characteristics, storage medium and equipment |
CN115798598A (en) * | 2022-11-16 | 2023-03-14 | 大连海事大学 | Hypergraph-based miRNA-disease association prediction model and method |
CN116844645A (en) * | 2023-08-31 | 2023-10-03 | 云南师范大学 | Gene regulation network inference method based on multi-view layered hypergraph |
CN116992919A (en) * | 2023-09-28 | 2023-11-03 | 之江实验室 | Plant phenotype prediction method and device based on multiple groups of science |
CN117235665A (en) * | 2023-09-18 | 2023-12-15 | 北京大学 | Self-adaptive privacy data synthesis method, device, computer equipment and storage medium |
CN117541844A (en) * | 2023-09-27 | 2024-02-09 | 合肥工业大学 | Weak supervision histopathology full-section image analysis method based on hypergraph learning |
CN117633658A (en) * | 2024-01-25 | 2024-03-01 | 北京大学 | Rock reservoir lithology identification method and system |
CN118246482A (en) * | 2024-05-24 | 2024-06-25 | 小语智能信息科技(云南)有限公司 | Cross-view hypergraph self-supervision contrast learning-based salient event detection method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111028939A (en) * | 2019-11-15 | 2020-04-17 | 华南理工大学 | Multigroup intelligent diagnosis system based on deep learning |
WO2020113673A1 (en) * | 2018-12-07 | 2020-06-11 | 深圳先进技术研究院 | Cancer subtype classification method employing multiomics integration |
CN112820403A (en) * | 2021-02-25 | 2021-05-18 | 中山大学 | Deep learning method for predicting prognosis risk of cancer patient based on multiple groups of mathematical data |
CN113254729A (en) * | 2021-06-29 | 2021-08-13 | 中国科学院自动化研究所 | Multi-modal evolution characteristic automatic conformal representation method based on dynamic hypergraph network |
CN113723485A (en) * | 2021-08-23 | 2021-11-30 | 天津大学 | Method for processing brain image hypergraph of mild hepatic encephalopathy |
-
2022
- 2022-05-19 CN CN202210544114.7A patent/CN114927162B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020113673A1 (en) * | 2018-12-07 | 2020-06-11 | 深圳先进技术研究院 | Cancer subtype classification method employing multiomics integration |
CN111028939A (en) * | 2019-11-15 | 2020-04-17 | 华南理工大学 | Multigroup intelligent diagnosis system based on deep learning |
CN112820403A (en) * | 2021-02-25 | 2021-05-18 | 中山大学 | Deep learning method for predicting prognosis risk of cancer patient based on multiple groups of mathematical data |
CN113254729A (en) * | 2021-06-29 | 2021-08-13 | 中国科学院自动化研究所 | Multi-modal evolution characteristic automatic conformal representation method based on dynamic hypergraph network |
CN113723485A (en) * | 2021-08-23 | 2021-11-30 | 天津大学 | Method for processing brain image hypergraph of mild hepatic encephalopathy |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115565610B (en) * | 2022-09-29 | 2024-06-11 | 四川大学 | Recurrence and metastasis analysis model establishment method and system based on multiple groups of study data |
CN115565610A (en) * | 2022-09-29 | 2023-01-03 | 四川大学 | Method and system for establishing recurrence transfer analysis model based on multiple sets of mathematical data |
CN115631847A (en) * | 2022-10-19 | 2023-01-20 | 哈尔滨工业大学 | Early lung cancer diagnosis system based on multiple mathematical characteristics, storage medium and equipment |
CN115631847B (en) * | 2022-10-19 | 2023-07-14 | 哈尔滨工业大学 | Early lung cancer diagnosis system, storage medium and equipment based on multiple groups of chemical characteristics |
CN115798598B (en) * | 2022-11-16 | 2023-11-14 | 大连海事大学 | Hypergraph-based miRNA-disease association prediction model and method |
CN115798598A (en) * | 2022-11-16 | 2023-03-14 | 大连海事大学 | Hypergraph-based miRNA-disease association prediction model and method |
CN115631799A (en) * | 2022-12-20 | 2023-01-20 | 深圳先进技术研究院 | Sample phenotype prediction method and device, electronic equipment and storage medium |
CN116844645B (en) * | 2023-08-31 | 2023-11-17 | 云南师范大学 | Gene regulation network inference method based on multi-view layered hypergraph |
CN116844645A (en) * | 2023-08-31 | 2023-10-03 | 云南师范大学 | Gene regulation network inference method based on multi-view layered hypergraph |
CN117235665A (en) * | 2023-09-18 | 2023-12-15 | 北京大学 | Self-adaptive privacy data synthesis method, device, computer equipment and storage medium |
CN117541844A (en) * | 2023-09-27 | 2024-02-09 | 合肥工业大学 | Weak supervision histopathology full-section image analysis method based on hypergraph learning |
CN116992919A (en) * | 2023-09-28 | 2023-11-03 | 之江实验室 | Plant phenotype prediction method and device based on multiple groups of science |
CN116992919B (en) * | 2023-09-28 | 2023-12-19 | 之江实验室 | Plant phenotype prediction method and device based on multiple groups of science |
CN117633658A (en) * | 2024-01-25 | 2024-03-01 | 北京大学 | Rock reservoir lithology identification method and system |
CN117633658B (en) * | 2024-01-25 | 2024-04-19 | 北京大学 | Rock reservoir lithology identification method and system |
CN118246482A (en) * | 2024-05-24 | 2024-06-25 | 小语智能信息科技(云南)有限公司 | Cross-view hypergraph self-supervision contrast learning-based salient event detection method |
CN118246482B (en) * | 2024-05-24 | 2024-08-06 | 小语智能信息科技(云南)有限公司 | Cross-view hypergraph self-supervision contrast learning-based salient event detection method |
Also Published As
Publication number | Publication date |
---|---|
CN114927162B (en) | 2024-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114927162A (en) | Multi-set correlation phenotype prediction method based on hypergraph representation and Dirichlet distribution | |
Lee et al. | Review of statistical methods for survival analysis using genomic data | |
Sun et al. | Gene expression data analysis with the clustering method based on an improved quantum-behaved Particle Swarm Optimization | |
Maulik et al. | Simulated annealing based automatic fuzzy clustering combined with ANN classification for analyzing microarray data | |
Kim et al. | Prediction of colon cancer using an evolutionary neural network | |
CN112951321B (en) | Tensor decomposition-based miRNA-disease association prediction method and system | |
Molho et al. | Deep learning in single-cell analysis | |
CN114783526A (en) | Depth unsupervised single cell clustering method based on Gaussian mixture graph variation self-encoder | |
Huang et al. | Clustering gene expression pattern and extracting relationship in gene network based on artificial neural networks | |
Zhu et al. | Deep-gknock: nonlinear group-feature selection with deep neural networks | |
KARLIK | Soft computing methods in bioinformatics: a comprehensive review | |
US20070078606A1 (en) | Methods, software arrangements, storage media, and systems for providing a shrinkage-based similarity metric | |
Du et al. | Deep multi-label joint learning for RNA and DNA-binding proteins prediction | |
Yoo et al. | Discovery of gene-regulation pathways using local causal search. | |
CN116758993A (en) | DNA methylation prediction method integrating multiple groups of chemical characteristics | |
CN111755074B (en) | Method for predicting DNA replication origin in saccharomyces cerevisiae | |
Liang et al. | Hierarchical Bayesian neural network for gene expression temporal patterns | |
Roy et al. | A hidden-state Markov model for cell population deconvolution | |
Şahin et al. | Sequential Feature Maps with LSTM Recurrent Neural Networks for Robust Tumor Classification | |
Yaman et al. | MachineTFBS: Motif-based method to predict transcription factor binding sites with first-best models from machine learning library | |
Reddy et al. | Designing Cell-Type-Specific Promoter Sequences Using Conservative Model-Based Optimization | |
Walker | Iterative Random Forest Based High Performance Computing Methods Applied to Biological Systems and Human Health | |
US20240273359A1 (en) | Apparatus and method for discovering biomarkers of health outcomes using machine learning | |
Dragomir et al. | SOM‐based class discovery exploring the ICA‐reduced features of microarray expression profiles | |
Liang et al. | Time lagged recurrent neural network for temporal gene expression classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |