CN114496275A - Microorganism-disease association prediction method and system based on conditional random field - Google Patents

Microorganism-disease association prediction method and system based on conditional random field Download PDF

Info

Publication number
CN114496275A
CN114496275A CN202111563953.5A CN202111563953A CN114496275A CN 114496275 A CN114496275 A CN 114496275A CN 202111563953 A CN202111563953 A CN 202111563953A CN 114496275 A CN114496275 A CN 114496275A
Authority
CN
China
Prior art keywords
matrix
similarity
microorganism
diseases
disease
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111563953.5A
Other languages
Chinese (zh)
Inventor
王红
滑美芳
王正军
杨雪
杨杰
张双永
张子姗
郑子希
李维新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Normal University
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN202111563953.5A priority Critical patent/CN114496275A/en
Publication of CN114496275A publication Critical patent/CN114496275A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method for predicting microbe-disease association by a graph volume network based on a conditional random field, which comprises the following steps: acquiring corresponding relation data of microorganisms and diseases, and constructing a microorganism-disease association matrix; according to the microorganism-disease association matrix, acquiring similarity matrixes among microorganisms and among diseases, and integrating the similarity matrixes with the microorganism-disease association matrix to obtain an adjacency matrix; respectively extracting the characteristics of the similarity matrixes among the microorganisms and the diseases, and combining to obtain a characteristic matrix; generating an embedded vector according to the adjacency matrix and the feature matrix based on a graph convolution network; updating the embedded vector according to the conditional random field; and reconstructing the incidence matrix according to the updated embedded vector. According to the method, characteristics of microorganisms and diseases are fully excavated through a graph convolution network, similar microorganisms or diseases are ensured to be embedded in a characteristic space through introducing a CRF layer, and the accuracy of correlation prediction is improved.

Description

Microorganism-disease association prediction method and system based on conditional random field
Technical Field
The invention belongs to the technical field of medical data processing, and particularly relates to a microorganism-disease association prediction method and system based on a conditional random field.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
A microorganism is a minute organism that may exist in the form of a single cell or in a group of cells. In recent years, as microorganisms have been found to be closely related to prevention, diagnosis and treatment of many complex human diseases, more and more researchers have been working on revealing the association of microorganisms with diseases. As an effective complement to traditional experiments, more and more computational models based on various algorithms are proposed for microbe-disease association prediction to improve efficiency and save cost.
However, despite much research effort to reveal the role of microorganisms in the pathogenesis of human diseases, there is still little understanding of how microorganisms affect human health and pathogenic systems in humans. Therefore, it is necessary to investigate the correlation between the microorganism and the disease. In recent years, researchers have proposed more and more calculation methods for predicting microbes and diseases based on known microbe and disease relation data sets, such as KATZHMDA based on KATZ method, PBHMDA based on path, PRWHMDA based on random walk, LRLSHMDA based on machine learning, WMGHMDA based on metagraph, and other algorithms, but on one hand, these methods need to continuously adjust parameters to achieve optimal effect and have low efficiency; on the other hand, the lack of deep mining of features between microorganisms and between diseases affects prediction accuracy.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a microorganism-disease association prediction method and system based on a graph volume network of a conditional random field. The network is convolved with a conditional random field to ensure that similar drugs (or microorganisms) are also similar, i.e., have similar insertions, in the feature space. Therefore, the potential association relation between the microorganisms and the diseases can be fully excavated, and the prediction accuracy is improved.
In order to achieve the above object, one or more embodiments of the present invention provide the following technical solutions:
a method for predicting microbe-disease association based on a graph volume network of a conditional random field comprises the following steps:
acquiring corresponding relation data of microorganisms and diseases, and constructing a microorganism-disease incidence matrix;
according to the microorganism-disease association matrix, acquiring similarity matrixes among microorganisms and among diseases, and integrating the similarity matrixes with the microorganism-disease association matrix to obtain an adjacency matrix;
respectively extracting the characteristics of the similarity matrixes among the microorganisms and the diseases, and combining to obtain a characteristic matrix;
generating an embedded vector according to the adjacency matrix and the feature matrix based on a graph convolution network;
updating the embedded vector according to the conditional random field;
and reconstructing the incidence matrix according to the updated embedded vector.
Further, each row of the microorganism-disease association matrix represents a microorganism, each column represents a disease, and the elements in the matrix represent whether the corresponding microorganism is related to the disease or not.
Further, the microorganism similarity matrix calculation method comprises the following steps:
and respectively calculating the nuclear similarity and cosine similarity of the Gaussian interaction profile of the microorganism, and obtaining the comprehensive similarity of the microorganism according to the nuclear similarity and cosine similarity of the Gaussian interaction profile.
Further, the disease similarity matrix calculation method comprises the following steps:
and respectively calculating the nuclear similarity and the functional similarity of the Gaussian interaction profiles of the diseases, and obtaining the comprehensive similarity of the diseases according to the nuclear similarity and the functional similarity of the Gaussian interaction profiles.
Further, the similarity matrixes between the microorganisms and between diseases are subjected to feature extraction by respectively adopting a restarted random walk method to obtain probability profile vectors of the microorganisms and the diseases.
Further, generating an embedded vector according to the adjacency matrix and the feature matrix based on a graph convolution network comprises:
Figure BDA0003421639750000031
Hlindicating a layer of GCN embedding, H(0)=X,
Figure BDA0003421639750000032
Representing a normalized similarity weight matrix with self-circulation,
Figure BDA0003421639750000033
a diagonal matrix is represented that represents the diagonal matrix,
Figure BDA0003421639750000034
Wlrepresents the weight matrix, σ represents the activation function, and I represents the identity matrix.
Further, updating the embedded vector based on the conditional random field includes:
Figure BDA0003421639750000035
wherein the initial embedding is set
Figure BDA0003421639750000036
Is Hi,HiRepresenting nodes obtained from GCN convolutional layersPreliminary embedding of i, λijRepresenting an attention score, N, between node i and node jiAre neighbors of node i and alpha and beta are weight factors to balance the effect of the first term and the second term on the prediction performance.
One or more embodiments provide a graph data-based enhanced microorganism-disease association prediction system, comprising:
a known correlation obtaining module configured to obtain corresponding relation data of the microorganism and the disease, and construct a microorganism-disease correlation matrix;
an adjacency matrix calculation module configured to obtain similarity matrixes between microorganisms and between diseases according to the microorganism-disease association matrix, and integrate the similarity matrixes with the microorganism-disease association matrix to obtain an adjacency matrix;
the characteristic preprocessing module is configured to extract characteristics of similarity matrixes among the microorganisms and among diseases respectively and combine the similarity matrixes to obtain a characteristic matrix;
a feature embedding module configured to generate an embedding vector according to the adjacency matrix and the feature matrix based on a graph convolution network; updating the embedded vector according to the conditional random field;
and the correlation prediction module is configured to reconstruct the correlation matrix according to the updated embedded vector.
One or more embodiments provide an electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method for predicting a microorganism-disease association based on a graph volume network of conditional random fields when executing the program.
One or more embodiments provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the conditional random field-based atlas network prediction microbe-disease association method.
The above one or more technical solutions have the following beneficial effects:
the similarity between microorganisms and diseases is calculated, a heterogeneous association network between the microorganisms and the diseases is constructed, embedding of the microorganisms and the disease nodes is obtained through a graph convolution network, a CRF layer is introduced to ensure that similar microorganisms or diseases are also similar in a feature space, namely similar embedding is achieved, then self-attention is adopted to distinguish the contribution of adjacent nodes to a given node, and the accuracy of subsequent association prediction is improved.
Based on the similarity and cosine similarity of the Gaussian interaction profile kernel of the microorganism and the similarity of the disease function and Gaussian interaction profile kernel, the analysis of the similarity and disease similarity of the microorganism is respectively carried out, so that the potential association relationship between the microorganism and the disease is fully excavated, the supplement is effectively provided for a small amount of known association relationships, and the guarantee is provided for the subsequent association prediction precision.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is a flow diagram of a method for predicting a microorganism-disease association based on a graph volume network of conditional random fields in one or more embodiments of the invention.
Detailed Description
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Example one
The embodiment discloses a method for predicting microorganism-disease association by a graph volume network based on a conditional random field, which specifically comprises the following steps:
step 1: acquiring corresponding relation data of microorganisms and diseases, and constructing a microorganism-disease network;
wherein, the microorganism-disease network adopts a graph data structure, the nodes of the graph comprise diseases and microorganisms, and the edges of the graph are connected with the corresponding microorganisms and the diseases to show the correlation of the microorganisms and the diseases. To facilitate data storage and subsequent calculations, the microbe-disease network is stored using an adjacency matrix a, each row of which represents a microbe, each column represents a disease, and the elements in the matrix represent whether or not the corresponding microbe is associated with a disease, specifically, if associated, the element value is 1, and if not, the element value is 0. The present example relates to 450 correlations of 39 diseases and 292 microorganisms constituting an initial data set, as shown in table 1.
Table 1 statistics of the microbe-disease associated data set.
Figure BDA0003421639750000061
Step 2: acquiring a similarity network of the microorganisms and the diseases according to the adjacency matrix; the method specifically comprises the following steps:
step 2.1: cosine similarity and Gaussian interaction profile nuclear similarity of the microorganisms are calculated respectively.
The cosine similarity calculation formula of the microorganism is as follows:
Figure BDA0003421639750000062
where A (i,: denotes the ith row of the adjacency matrix A, and A (j,: denotes the jth row of the adjacency matrix A.
The calculation formula of the similarity of the microbial Gaussian interaction profile core is as follows:
KM(m(i),m(j))=exp(-λm||IP(m(i))-IP(m(j))||2)
Figure BDA0003421639750000063
wherein IP (m (i)) is represented by the interaction profile of the microorganism mi, wherein λmDenotes normalized kernel bandwidth, λ'mRepresenting the original bandwidth, is typically set to 1.
Step 2.2: functional similarity and gaussian interaction profile nuclear similarity of the diseases were calculated separately.
The cosine similarity calculation formula of the disease is as follows:
based on the hypothesis that similar diseases tend to interact with similar genes, we calculated disease functional similarities based on functional associations between disease-associated genes. The newly released HumanNetv2.0 database (https:// www.inetbio.org/humannet/download. php) can be used to efficiently access gene interactions, each of which has an associated log-likelihood score (LLS) for assessing the probability of functional linkage between genes. For disease di,djWe first deduce that their related gene sets are Gi={gi1,gi2,…,gim},Gj={gj1,gj2,…,gjnM is the set GiThe number of genes in (1), n is the set GjThe base factor of (1). We define gene G and genome G ═ G1,g2,…,gkThe functional associations between } are as follows:
Figure BDA0003421639750000071
wherein FSS represents a functional similarity score between genes, defined as follows:
Figure BDA0003421639750000072
wherein LLS' is the normalized least squares of the genes, defined as follows:
Figure BDA0003421639750000073
wherein LLSmaxAnd LLSminRepresenting the maximum LLS and minimum LLS in the human net database, respectively.
Finally, we express the disease functional similarity as:
Figure BDA0003421639750000074
the gaussian interaction profile nuclear similarity of the disease is as follows:
KD(d(i),d(j))=exp(-λm||IP(d(i))-IP(d(j))||2)
Figure BDA0003421639750000075
wherein IP (d (i)) represents the interaction profile of disease di, wherein λdDenotes normalized kernel bandwidth, λ'dRepresenting the original bandwidth, is typically set to 1.
Specifically, for microorganisms m (i) and m (j), if there is cosine similarity between them, the integrated microorganism similarity is defined as the average of CM and KM; otherwise define the overall microbial similarity MS value as follows:
the overall microbial similarity MS value is defined as follows:
Figure BDA0003421639750000081
similar disease similarity DS values are defined as follows:
Figure BDA0003421639750000082
we compose a heterogeneous network from a microorganism comprehensive similarity network MS, a disease comprehensive similarity network DS and a known microorganism-disease association network A, and the adjacent matrix of the heterogeneous network is
Figure BDA0003421639750000083
Figure BDA0003421639750000084
And step 3: characteristic treatment of microorganisms and diseases;
as described above, MS and DS matrices represent microbial and disease similarity, respectively. Each row or column represents the similarity distribution of a microorganism (or disease), which can be considered as a feature vector of that microorganism (or disease). However, due to the limitations of the calculation methods, it is not sufficient to directly use the similarity curve as an input feature for microorganisms and drugs, because the calculated similarity may contain some noise. Hence, herein we further implement a Random Walk and Restart (RWR) based method to extract features from similarity profiles. RWR is a network-based approach that can effectively capture local and global topologically intrinsic characteristics of the network.
After RWR on the microbe-like network and the disease-like network we get a probability distribution vector for each microbe or disease. These probability distribution vectors may form a new microorganism feature matrix HM and a new disease feature matrix HD. To make the features comparable between different nodes, we further normalize the probability distribution vectors in the HM matrix, i.e. normalize the sum of the probabilities in each vector to 1. Finally, the normalized probability profile vectors in HM and HD are used as input features of microbes and diseases. The new feature matrix formed is:
Figure BDA0003421639750000091
step 4, obtaining node embedding by using a graph convolution network;
Figure BDA0003421639750000092
Hlindicating a layer of GCN embedding, H(0)=X,
Figure BDA0003421639750000093
Representing a normalized similarity weight matrix with self-circulation,
Figure BDA0003421639750000094
a diagonal matrix is represented that represents the diagonal matrix,
Figure BDA0003421639750000095
Wlrepresents the weight matrix, σ represents the activation function, and I represents the identity matrix.
Step 5, updating and embedding through conditional random field layer
Figure BDA0003421639750000096
Wherein the content of the first and second substances,
Figure BDA0003421639750000097
layer k +1 embedding, initial embedding setup, representing node i
Figure BDA0003421639750000098
Is HiDenotes the preliminary embedding of the node i obtained from the GCN convolutional layer, λ denotes the attention score between the nodes, λijMeasure the importance of neighbor node i to node j, NiAre neighbors of node i and alpha and beta are weight factors to balance the effect of the first term and the second term on the prediction performance.
We use self-attention to differentiate the contribution of neighboring nodes to a given node. Formally, the attention λ between node i and node jijThe definition is as follows.
aij=att(WtCi,WtCj)
Figure BDA0003421639750000099
Wherein, CiRepresenting the final embedding of node i, conditional random fields are used to ensure that similar drugs (or microorganisms) are also similar, i.e., have similar embedding, in the feature space.
Step 6, reconstructing the correlation prediction matrix
The feature/embedding matrix learned at the conditional random field level is represented as a learning feature matrix for microorganisms and diseases, respectively CmAnd CdThen the final associated prediction matrix is:
O=CmWm(Wd)T(Cd)T
wherein, WmAnd WdRepresenting potential factors projected back into the original feature space of the microorganism and disease, respectively.
Example two
The present embodiment aims to provide a system for predicting microbe-disease association based on a graph volume network of conditional random fields, which includes:
a known correlation obtaining module configured to obtain corresponding relation data of the microorganism and the disease, and construct a microorganism-disease correlation matrix;
an adjacency matrix calculation module configured to obtain similarity matrixes between microorganisms and between diseases according to the microorganism-disease association matrix, and integrate the similarity matrixes with the microorganism-disease association matrix to obtain an adjacency matrix;
the characteristic preprocessing module is configured to extract characteristics of similarity matrixes among the microorganisms and among diseases respectively and combine the similarity matrixes to obtain a characteristic matrix;
a feature embedding module configured to generate an embedding vector according to the adjacency matrix and the feature matrix based on a graph convolution network; updating the embedded vector according to the conditional random field;
and the correlation prediction module is configured to reconstruct the correlation matrix according to the updated embedded vector.
EXAMPLE III
The embodiment aims at providing an electronic device.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the method for predicting a microorganism-disease association based on a graph volume network of conditional random fields according to an embodiment.
Example four
An object of the present embodiment is to provide a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method for predicting a microorganism-disease association based on a graph volume network of conditional random fields according to the first embodiment.
The steps involved in the second to fourth embodiments correspond to the first embodiment of the method, and the detailed description thereof can be found in the relevant description of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media containing one or more sets of instructions; it should also be understood to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any of the methods of the present invention.
In one or more embodiments, the potential association relationship between the microorganisms and the diseases is fully mined based on the similarity and cosine similarity of the gaussian interaction profile kernels of the microorganisms and the similarity of the gaussian interaction profile kernels of the diseases, and the similarity and disease similarity of the microorganisms are analyzed respectively, and then the characteristics of the microorganisms and the diseases are preprocessed, and the accuracy of prediction is effectively improved by embedding the updated GCN layer of the conditional random field layer.
Those skilled in the art will appreciate that the modules or steps of the present invention described above can be implemented using general purpose computer means, or alternatively, they can be implemented using program code that is executable by computing means, such that they are stored in memory means for execution by the computing means, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (10)

1. A method for predicting microbe-disease association based on a graph volume network of a conditional random field is characterized by comprising the following steps:
acquiring corresponding relation data of microorganisms and diseases, and constructing a microorganism-disease association matrix;
according to the microorganism-disease association matrix, acquiring similarity matrixes among microorganisms and among diseases, and integrating the similarity matrixes with the microorganism-disease association matrix to obtain an adjacency matrix;
respectively extracting the characteristics of the similarity matrixes among the microorganisms and the diseases, and combining to obtain a characteristic matrix;
generating an embedded vector according to the adjacency matrix and the feature matrix based on a graph convolution network;
updating the embedded vector according to the conditional random field;
and reconstructing the incidence matrix according to the updated embedded vector.
2. The method for predicting microbe-disease association based on the graph convolution network of conditional random fields as recited in claim 1, wherein each row of the microbe-disease association matrix represents a microbe, each column represents a disease, and elements in the matrix represent whether the corresponding microbe is related to the disease.
3. The method for predicting microbe-disease association based on the graph volume network of the conditional random field as claimed in claim 1, wherein the microbe similarity matrix calculation method comprises:
and respectively calculating the nuclear similarity and cosine similarity of the Gaussian interaction profile of the microorganism, and obtaining the comprehensive similarity of the microorganism according to the nuclear similarity and cosine similarity of the Gaussian interaction profile.
4. The method for predicting microbe-disease association based on the graph volume network of the conditional random field as claimed in claim 1, wherein the disease similarity matrix calculation method is as follows:
and respectively calculating the nuclear similarity and the functional similarity of the Gaussian interaction profiles of the diseases, and obtaining the comprehensive similarity of the diseases according to the nuclear similarity and the functional similarity of the Gaussian interaction profiles.
5. The method as claimed in claim 1, wherein the similarity matrix between the microbes and between diseases is extracted by a restarted random walk method to obtain probability profile vectors of microbes and diseases.
6. The method of claim 1, wherein generating an embedded vector based on the neighborhood matrix and feature matrix based on the graph-rolled network comprises:
Figure FDA0003421639740000021
Hldenotes a layer of GCN insertion, H(0)=X,
Figure FDA0003421639740000022
Representing home with self-circulationA normalized similarity weight matrix is generated by using the similarity weight matrix,
Figure FDA0003421639740000023
a diagonal matrix is represented that represents the diagonal matrix,
Figure FDA0003421639740000024
Wlrepresents the weight matrix, σ represents the activation function, and I represents the identity matrix.
7. The method of claim 1, wherein updating the embedded vector based on the conditional random field comprises:
Figure FDA0003421639740000025
wherein the initial embedding is set
Figure FDA0003421639740000026
Is Hi,HiDenotes the preliminary embedding of node i, λ, obtained from the GCN convolutional layerijRepresenting an attention score, N, between node i and node jiAre neighbors of node i and alpha and beta are weight factors to balance the effect of the first term and the second term on the prediction performance.
8. A graph data-based enhanced microorganism-disease association prediction system, comprising:
a known correlation obtaining module configured to obtain corresponding relation data of the microorganism and the disease, and construct a microorganism-disease correlation matrix;
an adjacency matrix calculation module configured to obtain similarity matrixes between microorganisms and between diseases according to the microorganism-disease association matrix, and integrate the similarity matrixes with the microorganism-disease association matrix to obtain an adjacency matrix;
the characteristic preprocessing module is configured to extract characteristics of similarity matrixes among the microorganisms and among diseases respectively and combine the similarity matrixes to obtain a characteristic matrix;
a feature embedding module configured to generate an embedding vector according to the adjacency matrix and the feature matrix based on a graph convolution network; updating the embedded vector according to the conditional random field;
and the correlation prediction module is configured to reconstruct the correlation matrix according to the updated embedded vector.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the method for predicting a microorganism-disease association based on a conditional random field atlas network as recited in any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method for predicting a microorganism-disease association according to the graph volume network based on conditional random fields of any of claims 1 to 7.
CN202111563953.5A 2021-12-20 2021-12-20 Microorganism-disease association prediction method and system based on conditional random field Pending CN114496275A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111563953.5A CN114496275A (en) 2021-12-20 2021-12-20 Microorganism-disease association prediction method and system based on conditional random field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111563953.5A CN114496275A (en) 2021-12-20 2021-12-20 Microorganism-disease association prediction method and system based on conditional random field

Publications (1)

Publication Number Publication Date
CN114496275A true CN114496275A (en) 2022-05-13

Family

ID=81494008

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111563953.5A Pending CN114496275A (en) 2021-12-20 2021-12-20 Microorganism-disease association prediction method and system based on conditional random field

Country Status (1)

Country Link
CN (1) CN114496275A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117172991A (en) * 2023-11-02 2023-12-05 北京建工环境修复股份有限公司 Microbial remediation scheme recommendation method and system based on site pollution characteristics

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109920478A (en) * 2019-03-07 2019-06-21 中南大学 A kind of microorganism-disease relationship prediction technique filled based on similitude and low-rank matrix
CN113178232A (en) * 2021-05-06 2021-07-27 中南林业科技大学 Efficient prediction method for association relation between circRNA and disease
CN113345523A (en) * 2021-05-28 2021-09-03 山东师范大学 Microorganism-disease association prediction method and system based on graph attention network
US20220130541A1 (en) * 2019-02-21 2022-04-28 King Abdullah University Of Science And Technology Disease-gene prioritization method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220130541A1 (en) * 2019-02-21 2022-04-28 King Abdullah University Of Science And Technology Disease-gene prioritization method and system
CN109920478A (en) * 2019-03-07 2019-06-21 中南大学 A kind of microorganism-disease relationship prediction technique filled based on similitude and low-rank matrix
CN113178232A (en) * 2021-05-06 2021-07-27 中南林业科技大学 Efficient prediction method for association relation between circRNA and disease
CN113345523A (en) * 2021-05-28 2021-09-03 山东师范大学 Microorganism-disease association prediction method and system based on graph attention network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YAHUI LONG等: ""Predicting human microbe-drug associations via graph convolutional network with conditional random field", BIOINFORMATICS, vol. 36, no. 19, 8 December 2020 (2020-12-08), pages 3 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117172991A (en) * 2023-11-02 2023-12-05 北京建工环境修复股份有限公司 Microbial remediation scheme recommendation method and system based on site pollution characteristics
CN117172991B (en) * 2023-11-02 2024-03-08 北京建工环境修复股份有限公司 Microbial remediation scheme recommendation method and system based on site pollution characteristics

Similar Documents

Publication Publication Date Title
CN110232971B (en) Doctor recommendation method and device
CN110175168B (en) Time sequence data filling method and system based on generation of countermeasure network
CN103778349B (en) Biomolecular network analysis method based on function module
Baingana et al. Tracking switched dynamic network topologies from information cascades
CN112052404B (en) Group discovery method, system, equipment and medium of multi-source heterogeneous relation network
WO2021232789A1 (en) Mirna-disease association prediction method, system, terminal, and storage medium
CN113299338B (en) Knowledge-graph-based synthetic lethal gene pair prediction method, system, terminal and medium
Tripoliti et al. Modifications of the construction and voting mechanisms of the random forests algorithm
Liu et al. Feedback message passing for inference in Gaussian graphical models
US9043326B2 (en) Methods and systems for biclustering algorithm
Boguslawski et al. Huffman coding for storing non-uniformly distributed messages in networks of neural cliques
Li et al. A novel hybrid gene selection for tumor identification by combining multifilter integration and a recursive flower pollination search algorithm
CN114496275A (en) Microorganism-disease association prediction method and system based on conditional random field
Ruffieux et al. A global-local approach for detecting hotspots in multiple-response regression
CN114974421B (en) Diffusion-noise reduction-based single-cell transcriptome sequencing data interpolation method and system
CN111309718B (en) Distribution network voltage data missing filling method and device
Pimentel et al. Biclustering by sparse canonical correlation analysis
Valera et al. General latent feature models for heterogeneous datasets
Zhang et al. Deep compression of probabilistic graphical networks
Wilderjans et al. Additive biclustering: A comparison of one new and two existing ALS algorithms
CN112346997B (en) Automatic test case generation method and terminal
Pournara et al. FPGA-accelerated Bayesian learning for reconstruction of gene regulatory networks
Zhao et al. A frequency item mining based embedded feature selection algorithm and its application in energy consumption prediction of electric bus
CN111984695B (en) Method and system for determining black clusters based on Spark
CN111681705B (en) MiRNA-disease association prediction method, system, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination