CN114496275A - Microorganism-disease association prediction method and system based on conditional random field - Google Patents
Microorganism-disease association prediction method and system based on conditional random field Download PDFInfo
- Publication number
- CN114496275A CN114496275A CN202111563953.5A CN202111563953A CN114496275A CN 114496275 A CN114496275 A CN 114496275A CN 202111563953 A CN202111563953 A CN 202111563953A CN 114496275 A CN114496275 A CN 114496275A
- Authority
- CN
- China
- Prior art keywords
- matrix
- similarity
- microorganism
- diseases
- disease
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 201000010099 disease Diseases 0.000 title claims abstract description 129
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 title claims abstract description 129
- 238000000034 method Methods 0.000 title claims abstract description 30
- 239000011159 matrix material Substances 0.000 claims abstract description 94
- 244000005700 microbiome Species 0.000 claims abstract description 69
- 239000013598 vector Substances 0.000 claims abstract description 30
- 230000003993 interaction Effects 0.000 claims description 19
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 4
- 230000006870 function Effects 0.000 claims description 4
- 238000005295 random walk Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000003780 insertion Methods 0.000 claims description 2
- 230000037431 insertion Effects 0.000 claims description 2
- 108090000623 proteins and genes Proteins 0.000 description 9
- 230000000813 microbial effect Effects 0.000 description 4
- 239000008186 active pharmaceutical agent Substances 0.000 description 3
- 239000003814 drug Substances 0.000 description 3
- 229940079593 drug Drugs 0.000 description 3
- 241000282412 Homo Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008506 pathogenesis Effects 0.000 description 1
- 230000001717 pathogenic effect Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Pathology (AREA)
- Probability & Statistics with Applications (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a method for predicting microbe-disease association by a graph volume network based on a conditional random field, which comprises the following steps: acquiring corresponding relation data of microorganisms and diseases, and constructing a microorganism-disease association matrix; according to the microorganism-disease association matrix, acquiring similarity matrixes among microorganisms and among diseases, and integrating the similarity matrixes with the microorganism-disease association matrix to obtain an adjacency matrix; respectively extracting the characteristics of the similarity matrixes among the microorganisms and the diseases, and combining to obtain a characteristic matrix; generating an embedded vector according to the adjacency matrix and the feature matrix based on a graph convolution network; updating the embedded vector according to the conditional random field; and reconstructing the incidence matrix according to the updated embedded vector. According to the method, characteristics of microorganisms and diseases are fully excavated through a graph convolution network, similar microorganisms or diseases are ensured to be embedded in a characteristic space through introducing a CRF layer, and the accuracy of correlation prediction is improved.
Description
Technical Field
The invention belongs to the technical field of medical data processing, and particularly relates to a microorganism-disease association prediction method and system based on a conditional random field.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
A microorganism is a minute organism that may exist in the form of a single cell or in a group of cells. In recent years, as microorganisms have been found to be closely related to prevention, diagnosis and treatment of many complex human diseases, more and more researchers have been working on revealing the association of microorganisms with diseases. As an effective complement to traditional experiments, more and more computational models based on various algorithms are proposed for microbe-disease association prediction to improve efficiency and save cost.
However, despite much research effort to reveal the role of microorganisms in the pathogenesis of human diseases, there is still little understanding of how microorganisms affect human health and pathogenic systems in humans. Therefore, it is necessary to investigate the correlation between the microorganism and the disease. In recent years, researchers have proposed more and more calculation methods for predicting microbes and diseases based on known microbe and disease relation data sets, such as KATZHMDA based on KATZ method, PBHMDA based on path, PRWHMDA based on random walk, LRLSHMDA based on machine learning, WMGHMDA based on metagraph, and other algorithms, but on one hand, these methods need to continuously adjust parameters to achieve optimal effect and have low efficiency; on the other hand, the lack of deep mining of features between microorganisms and between diseases affects prediction accuracy.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a microorganism-disease association prediction method and system based on a graph volume network of a conditional random field. The network is convolved with a conditional random field to ensure that similar drugs (or microorganisms) are also similar, i.e., have similar insertions, in the feature space. Therefore, the potential association relation between the microorganisms and the diseases can be fully excavated, and the prediction accuracy is improved.
In order to achieve the above object, one or more embodiments of the present invention provide the following technical solutions:
a method for predicting microbe-disease association based on a graph volume network of a conditional random field comprises the following steps:
acquiring corresponding relation data of microorganisms and diseases, and constructing a microorganism-disease incidence matrix;
according to the microorganism-disease association matrix, acquiring similarity matrixes among microorganisms and among diseases, and integrating the similarity matrixes with the microorganism-disease association matrix to obtain an adjacency matrix;
respectively extracting the characteristics of the similarity matrixes among the microorganisms and the diseases, and combining to obtain a characteristic matrix;
generating an embedded vector according to the adjacency matrix and the feature matrix based on a graph convolution network;
updating the embedded vector according to the conditional random field;
and reconstructing the incidence matrix according to the updated embedded vector.
Further, each row of the microorganism-disease association matrix represents a microorganism, each column represents a disease, and the elements in the matrix represent whether the corresponding microorganism is related to the disease or not.
Further, the microorganism similarity matrix calculation method comprises the following steps:
and respectively calculating the nuclear similarity and cosine similarity of the Gaussian interaction profile of the microorganism, and obtaining the comprehensive similarity of the microorganism according to the nuclear similarity and cosine similarity of the Gaussian interaction profile.
Further, the disease similarity matrix calculation method comprises the following steps:
and respectively calculating the nuclear similarity and the functional similarity of the Gaussian interaction profiles of the diseases, and obtaining the comprehensive similarity of the diseases according to the nuclear similarity and the functional similarity of the Gaussian interaction profiles.
Further, the similarity matrixes between the microorganisms and between diseases are subjected to feature extraction by respectively adopting a restarted random walk method to obtain probability profile vectors of the microorganisms and the diseases.
Further, generating an embedded vector according to the adjacency matrix and the feature matrix based on a graph convolution network comprises:
Hlindicating a layer of GCN embedding, H(0)=X,Representing a normalized similarity weight matrix with self-circulation,a diagonal matrix is represented that represents the diagonal matrix,Wlrepresents the weight matrix, σ represents the activation function, and I represents the identity matrix.
Further, updating the embedded vector based on the conditional random field includes:
wherein the initial embedding is setIs Hi,HiRepresenting nodes obtained from GCN convolutional layersPreliminary embedding of i, λijRepresenting an attention score, N, between node i and node jiAre neighbors of node i and alpha and beta are weight factors to balance the effect of the first term and the second term on the prediction performance.
One or more embodiments provide a graph data-based enhanced microorganism-disease association prediction system, comprising:
a known correlation obtaining module configured to obtain corresponding relation data of the microorganism and the disease, and construct a microorganism-disease correlation matrix;
an adjacency matrix calculation module configured to obtain similarity matrixes between microorganisms and between diseases according to the microorganism-disease association matrix, and integrate the similarity matrixes with the microorganism-disease association matrix to obtain an adjacency matrix;
the characteristic preprocessing module is configured to extract characteristics of similarity matrixes among the microorganisms and among diseases respectively and combine the similarity matrixes to obtain a characteristic matrix;
a feature embedding module configured to generate an embedding vector according to the adjacency matrix and the feature matrix based on a graph convolution network; updating the embedded vector according to the conditional random field;
and the correlation prediction module is configured to reconstruct the correlation matrix according to the updated embedded vector.
One or more embodiments provide an electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method for predicting a microorganism-disease association based on a graph volume network of conditional random fields when executing the program.
One or more embodiments provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the conditional random field-based atlas network prediction microbe-disease association method.
The above one or more technical solutions have the following beneficial effects:
the similarity between microorganisms and diseases is calculated, a heterogeneous association network between the microorganisms and the diseases is constructed, embedding of the microorganisms and the disease nodes is obtained through a graph convolution network, a CRF layer is introduced to ensure that similar microorganisms or diseases are also similar in a feature space, namely similar embedding is achieved, then self-attention is adopted to distinguish the contribution of adjacent nodes to a given node, and the accuracy of subsequent association prediction is improved.
Based on the similarity and cosine similarity of the Gaussian interaction profile kernel of the microorganism and the similarity of the disease function and Gaussian interaction profile kernel, the analysis of the similarity and disease similarity of the microorganism is respectively carried out, so that the potential association relationship between the microorganism and the disease is fully excavated, the supplement is effectively provided for a small amount of known association relationships, and the guarantee is provided for the subsequent association prediction precision.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is a flow diagram of a method for predicting a microorganism-disease association based on a graph volume network of conditional random fields in one or more embodiments of the invention.
Detailed Description
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Example one
The embodiment discloses a method for predicting microorganism-disease association by a graph volume network based on a conditional random field, which specifically comprises the following steps:
step 1: acquiring corresponding relation data of microorganisms and diseases, and constructing a microorganism-disease network;
wherein, the microorganism-disease network adopts a graph data structure, the nodes of the graph comprise diseases and microorganisms, and the edges of the graph are connected with the corresponding microorganisms and the diseases to show the correlation of the microorganisms and the diseases. To facilitate data storage and subsequent calculations, the microbe-disease network is stored using an adjacency matrix a, each row of which represents a microbe, each column represents a disease, and the elements in the matrix represent whether or not the corresponding microbe is associated with a disease, specifically, if associated, the element value is 1, and if not, the element value is 0. The present example relates to 450 correlations of 39 diseases and 292 microorganisms constituting an initial data set, as shown in table 1.
Table 1 statistics of the microbe-disease associated data set.
Step 2: acquiring a similarity network of the microorganisms and the diseases according to the adjacency matrix; the method specifically comprises the following steps:
step 2.1: cosine similarity and Gaussian interaction profile nuclear similarity of the microorganisms are calculated respectively.
The cosine similarity calculation formula of the microorganism is as follows:
where A (i,: denotes the ith row of the adjacency matrix A, and A (j,: denotes the jth row of the adjacency matrix A.
The calculation formula of the similarity of the microbial Gaussian interaction profile core is as follows:
KM(m(i),m(j))=exp(-λm||IP(m(i))-IP(m(j))||2)
wherein IP (m (i)) is represented by the interaction profile of the microorganism mi, wherein λmDenotes normalized kernel bandwidth, λ'mRepresenting the original bandwidth, is typically set to 1.
Step 2.2: functional similarity and gaussian interaction profile nuclear similarity of the diseases were calculated separately.
The cosine similarity calculation formula of the disease is as follows:
based on the hypothesis that similar diseases tend to interact with similar genes, we calculated disease functional similarities based on functional associations between disease-associated genes. The newly released HumanNetv2.0 database (https:// www.inetbio.org/humannet/download. php) can be used to efficiently access gene interactions, each of which has an associated log-likelihood score (LLS) for assessing the probability of functional linkage between genes. For disease di,djWe first deduce that their related gene sets are Gi={gi1,gi2,…,gim},Gj={gj1,gj2,…,gjnM is the set GiThe number of genes in (1), n is the set GjThe base factor of (1). We define gene G and genome G ═ G1,g2,…,gkThe functional associations between } are as follows:
wherein FSS represents a functional similarity score between genes, defined as follows:
wherein LLS' is the normalized least squares of the genes, defined as follows:
wherein LLSmaxAnd LLSminRepresenting the maximum LLS and minimum LLS in the human net database, respectively.
Finally, we express the disease functional similarity as:
the gaussian interaction profile nuclear similarity of the disease is as follows:
KD(d(i),d(j))=exp(-λm||IP(d(i))-IP(d(j))||2)
wherein IP (d (i)) represents the interaction profile of disease di, wherein λdDenotes normalized kernel bandwidth, λ'dRepresenting the original bandwidth, is typically set to 1.
Specifically, for microorganisms m (i) and m (j), if there is cosine similarity between them, the integrated microorganism similarity is defined as the average of CM and KM; otherwise define the overall microbial similarity MS value as follows:
the overall microbial similarity MS value is defined as follows:
similar disease similarity DS values are defined as follows:
we compose a heterogeneous network from a microorganism comprehensive similarity network MS, a disease comprehensive similarity network DS and a known microorganism-disease association network A, and the adjacent matrix of the heterogeneous network is
And step 3: characteristic treatment of microorganisms and diseases;
as described above, MS and DS matrices represent microbial and disease similarity, respectively. Each row or column represents the similarity distribution of a microorganism (or disease), which can be considered as a feature vector of that microorganism (or disease). However, due to the limitations of the calculation methods, it is not sufficient to directly use the similarity curve as an input feature for microorganisms and drugs, because the calculated similarity may contain some noise. Hence, herein we further implement a Random Walk and Restart (RWR) based method to extract features from similarity profiles. RWR is a network-based approach that can effectively capture local and global topologically intrinsic characteristics of the network.
After RWR on the microbe-like network and the disease-like network we get a probability distribution vector for each microbe or disease. These probability distribution vectors may form a new microorganism feature matrix HM and a new disease feature matrix HD. To make the features comparable between different nodes, we further normalize the probability distribution vectors in the HM matrix, i.e. normalize the sum of the probabilities in each vector to 1. Finally, the normalized probability profile vectors in HM and HD are used as input features of microbes and diseases. The new feature matrix formed is:
step 4, obtaining node embedding by using a graph convolution network;
Hlindicating a layer of GCN embedding, H(0)=X,Representing a normalized similarity weight matrix with self-circulation,a diagonal matrix is represented that represents the diagonal matrix,Wlrepresents the weight matrix, σ represents the activation function, and I represents the identity matrix.
Step 5, updating and embedding through conditional random field layer
Wherein the content of the first and second substances,layer k +1 embedding, initial embedding setup, representing node iIs HiDenotes the preliminary embedding of the node i obtained from the GCN convolutional layer, λ denotes the attention score between the nodes, λijMeasure the importance of neighbor node i to node j, NiAre neighbors of node i and alpha and beta are weight factors to balance the effect of the first term and the second term on the prediction performance.
We use self-attention to differentiate the contribution of neighboring nodes to a given node. Formally, the attention λ between node i and node jijThe definition is as follows.
aij=att(WtCi,WtCj)
Wherein, CiRepresenting the final embedding of node i, conditional random fields are used to ensure that similar drugs (or microorganisms) are also similar, i.e., have similar embedding, in the feature space.
Step 6, reconstructing the correlation prediction matrix
The feature/embedding matrix learned at the conditional random field level is represented as a learning feature matrix for microorganisms and diseases, respectively CmAnd CdThen the final associated prediction matrix is:
O=CmWm(Wd)T(Cd)T
wherein, WmAnd WdRepresenting potential factors projected back into the original feature space of the microorganism and disease, respectively.
Example two
The present embodiment aims to provide a system for predicting microbe-disease association based on a graph volume network of conditional random fields, which includes:
a known correlation obtaining module configured to obtain corresponding relation data of the microorganism and the disease, and construct a microorganism-disease correlation matrix;
an adjacency matrix calculation module configured to obtain similarity matrixes between microorganisms and between diseases according to the microorganism-disease association matrix, and integrate the similarity matrixes with the microorganism-disease association matrix to obtain an adjacency matrix;
the characteristic preprocessing module is configured to extract characteristics of similarity matrixes among the microorganisms and among diseases respectively and combine the similarity matrixes to obtain a characteristic matrix;
a feature embedding module configured to generate an embedding vector according to the adjacency matrix and the feature matrix based on a graph convolution network; updating the embedded vector according to the conditional random field;
and the correlation prediction module is configured to reconstruct the correlation matrix according to the updated embedded vector.
EXAMPLE III
The embodiment aims at providing an electronic device.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the method for predicting a microorganism-disease association based on a graph volume network of conditional random fields according to an embodiment.
Example four
An object of the present embodiment is to provide a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method for predicting a microorganism-disease association based on a graph volume network of conditional random fields according to the first embodiment.
The steps involved in the second to fourth embodiments correspond to the first embodiment of the method, and the detailed description thereof can be found in the relevant description of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media containing one or more sets of instructions; it should also be understood to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any of the methods of the present invention.
In one or more embodiments, the potential association relationship between the microorganisms and the diseases is fully mined based on the similarity and cosine similarity of the gaussian interaction profile kernels of the microorganisms and the similarity of the gaussian interaction profile kernels of the diseases, and the similarity and disease similarity of the microorganisms are analyzed respectively, and then the characteristics of the microorganisms and the diseases are preprocessed, and the accuracy of prediction is effectively improved by embedding the updated GCN layer of the conditional random field layer.
Those skilled in the art will appreciate that the modules or steps of the present invention described above can be implemented using general purpose computer means, or alternatively, they can be implemented using program code that is executable by computing means, such that they are stored in memory means for execution by the computing means, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.
Claims (10)
1. A method for predicting microbe-disease association based on a graph volume network of a conditional random field is characterized by comprising the following steps:
acquiring corresponding relation data of microorganisms and diseases, and constructing a microorganism-disease association matrix;
according to the microorganism-disease association matrix, acquiring similarity matrixes among microorganisms and among diseases, and integrating the similarity matrixes with the microorganism-disease association matrix to obtain an adjacency matrix;
respectively extracting the characteristics of the similarity matrixes among the microorganisms and the diseases, and combining to obtain a characteristic matrix;
generating an embedded vector according to the adjacency matrix and the feature matrix based on a graph convolution network;
updating the embedded vector according to the conditional random field;
and reconstructing the incidence matrix according to the updated embedded vector.
2. The method for predicting microbe-disease association based on the graph convolution network of conditional random fields as recited in claim 1, wherein each row of the microbe-disease association matrix represents a microbe, each column represents a disease, and elements in the matrix represent whether the corresponding microbe is related to the disease.
3. The method for predicting microbe-disease association based on the graph volume network of the conditional random field as claimed in claim 1, wherein the microbe similarity matrix calculation method comprises:
and respectively calculating the nuclear similarity and cosine similarity of the Gaussian interaction profile of the microorganism, and obtaining the comprehensive similarity of the microorganism according to the nuclear similarity and cosine similarity of the Gaussian interaction profile.
4. The method for predicting microbe-disease association based on the graph volume network of the conditional random field as claimed in claim 1, wherein the disease similarity matrix calculation method is as follows:
and respectively calculating the nuclear similarity and the functional similarity of the Gaussian interaction profiles of the diseases, and obtaining the comprehensive similarity of the diseases according to the nuclear similarity and the functional similarity of the Gaussian interaction profiles.
5. The method as claimed in claim 1, wherein the similarity matrix between the microbes and between diseases is extracted by a restarted random walk method to obtain probability profile vectors of microbes and diseases.
6. The method of claim 1, wherein generating an embedded vector based on the neighborhood matrix and feature matrix based on the graph-rolled network comprises:
Hldenotes a layer of GCN insertion, H(0)=X,Representing home with self-circulationA normalized similarity weight matrix is generated by using the similarity weight matrix,a diagonal matrix is represented that represents the diagonal matrix,Wlrepresents the weight matrix, σ represents the activation function, and I represents the identity matrix.
7. The method of claim 1, wherein updating the embedded vector based on the conditional random field comprises:
wherein the initial embedding is setIs Hi,HiDenotes the preliminary embedding of node i, λ, obtained from the GCN convolutional layerijRepresenting an attention score, N, between node i and node jiAre neighbors of node i and alpha and beta are weight factors to balance the effect of the first term and the second term on the prediction performance.
8. A graph data-based enhanced microorganism-disease association prediction system, comprising:
a known correlation obtaining module configured to obtain corresponding relation data of the microorganism and the disease, and construct a microorganism-disease correlation matrix;
an adjacency matrix calculation module configured to obtain similarity matrixes between microorganisms and between diseases according to the microorganism-disease association matrix, and integrate the similarity matrixes with the microorganism-disease association matrix to obtain an adjacency matrix;
the characteristic preprocessing module is configured to extract characteristics of similarity matrixes among the microorganisms and among diseases respectively and combine the similarity matrixes to obtain a characteristic matrix;
a feature embedding module configured to generate an embedding vector according to the adjacency matrix and the feature matrix based on a graph convolution network; updating the embedded vector according to the conditional random field;
and the correlation prediction module is configured to reconstruct the correlation matrix according to the updated embedded vector.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the method for predicting a microorganism-disease association based on a conditional random field atlas network as recited in any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method for predicting a microorganism-disease association according to the graph volume network based on conditional random fields of any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111563953.5A CN114496275A (en) | 2021-12-20 | 2021-12-20 | Microorganism-disease association prediction method and system based on conditional random field |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111563953.5A CN114496275A (en) | 2021-12-20 | 2021-12-20 | Microorganism-disease association prediction method and system based on conditional random field |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114496275A true CN114496275A (en) | 2022-05-13 |
Family
ID=81494008
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111563953.5A Pending CN114496275A (en) | 2021-12-20 | 2021-12-20 | Microorganism-disease association prediction method and system based on conditional random field |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114496275A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117172991A (en) * | 2023-11-02 | 2023-12-05 | 北京建工环境修复股份有限公司 | Microbial remediation scheme recommendation method and system based on site pollution characteristics |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109920478A (en) * | 2019-03-07 | 2019-06-21 | 中南大学 | A kind of microorganism-disease relationship prediction technique filled based on similitude and low-rank matrix |
CN113178232A (en) * | 2021-05-06 | 2021-07-27 | 中南林业科技大学 | Efficient prediction method for association relation between circRNA and disease |
CN113345523A (en) * | 2021-05-28 | 2021-09-03 | 山东师范大学 | Microorganism-disease association prediction method and system based on graph attention network |
US20220130541A1 (en) * | 2019-02-21 | 2022-04-28 | King Abdullah University Of Science And Technology | Disease-gene prioritization method and system |
-
2021
- 2021-12-20 CN CN202111563953.5A patent/CN114496275A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220130541A1 (en) * | 2019-02-21 | 2022-04-28 | King Abdullah University Of Science And Technology | Disease-gene prioritization method and system |
CN109920478A (en) * | 2019-03-07 | 2019-06-21 | 中南大学 | A kind of microorganism-disease relationship prediction technique filled based on similitude and low-rank matrix |
CN113178232A (en) * | 2021-05-06 | 2021-07-27 | 中南林业科技大学 | Efficient prediction method for association relation between circRNA and disease |
CN113345523A (en) * | 2021-05-28 | 2021-09-03 | 山东师范大学 | Microorganism-disease association prediction method and system based on graph attention network |
Non-Patent Citations (1)
Title |
---|
YAHUI LONG等: ""Predicting human microbe-drug associations via graph convolutional network with conditional random field", BIOINFORMATICS, vol. 36, no. 19, 8 December 2020 (2020-12-08), pages 3 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117172991A (en) * | 2023-11-02 | 2023-12-05 | 北京建工环境修复股份有限公司 | Microbial remediation scheme recommendation method and system based on site pollution characteristics |
CN117172991B (en) * | 2023-11-02 | 2024-03-08 | 北京建工环境修复股份有限公司 | Microbial remediation scheme recommendation method and system based on site pollution characteristics |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110232971B (en) | Doctor recommendation method and device | |
CN110175168B (en) | Time sequence data filling method and system based on generation of countermeasure network | |
CN103778349B (en) | Biomolecular network analysis method based on function module | |
Baingana et al. | Tracking switched dynamic network topologies from information cascades | |
CN112052404B (en) | Group discovery method, system, equipment and medium of multi-source heterogeneous relation network | |
WO2021232789A1 (en) | Mirna-disease association prediction method, system, terminal, and storage medium | |
CN113299338B (en) | Knowledge-graph-based synthetic lethal gene pair prediction method, system, terminal and medium | |
Tripoliti et al. | Modifications of the construction and voting mechanisms of the random forests algorithm | |
Liu et al. | Feedback message passing for inference in Gaussian graphical models | |
US9043326B2 (en) | Methods and systems for biclustering algorithm | |
Boguslawski et al. | Huffman coding for storing non-uniformly distributed messages in networks of neural cliques | |
Li et al. | A novel hybrid gene selection for tumor identification by combining multifilter integration and a recursive flower pollination search algorithm | |
CN114496275A (en) | Microorganism-disease association prediction method and system based on conditional random field | |
Ruffieux et al. | A global-local approach for detecting hotspots in multiple-response regression | |
CN114974421B (en) | Diffusion-noise reduction-based single-cell transcriptome sequencing data interpolation method and system | |
CN111309718B (en) | Distribution network voltage data missing filling method and device | |
Pimentel et al. | Biclustering by sparse canonical correlation analysis | |
Valera et al. | General latent feature models for heterogeneous datasets | |
Zhang et al. | Deep compression of probabilistic graphical networks | |
Wilderjans et al. | Additive biclustering: A comparison of one new and two existing ALS algorithms | |
CN112346997B (en) | Automatic test case generation method and terminal | |
Pournara et al. | FPGA-accelerated Bayesian learning for reconstruction of gene regulatory networks | |
Zhao et al. | A frequency item mining based embedded feature selection algorithm and its application in energy consumption prediction of electric bus | |
CN111984695B (en) | Method and system for determining black clusters based on Spark | |
CN111681705B (en) | MiRNA-disease association prediction method, system, terminal and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |