CN114242237A

CN114242237A - Graph neural network-based prediction of miRNA-disease association

Info

Publication number: CN114242237A
Application number: CN202111557995.8A
Authority: CN
Inventors: 庞善臣; 庄雨
Original assignee: China University of Petroleum East China
Current assignee: China University of Petroleum East China
Priority date: 2021-12-20
Filing date: 2021-12-20
Publication date: 2022-03-25

Abstract

The invention provides miRNA-disease association prediction based on graph neural networks. The traditional neural network model cannot process irregular non-European spatial data in the miRNA-disease associated prediction field. Therefore, the graph SAGE model is selected to extract the characteristics of the graph nodes. First we map the integrated disease similarity, integrated miRNA similarity to the same feature space and use known association data to construct miRNA-disease bipartite graphs as input for GraphSAGE. Information of neighbor nodes is aggregated through a GraphSAGE model, node feature representation is enriched, and effective data are provided for a downstream prediction task. And finally, performing weighted splicing on the learned miRNA and the potential characteristics of the diseases to serve as the input of a deep neural network prediction model, and obtaining the association score. The training model parameters are propagated back using the cross entropy loss function.

Description

Graph neural network-based prediction of miRNA-disease association

Technical Field

The invention relates to a feature extraction method, in particular to a local feature extraction method based on a graph neural network.

Background

Research shows that miRNA is used as non-coding RNA to participate in regulation and control of life activities of all levels and most pathological processes. Identifying miRNA related to disease is of great significance for diagnosis and treatment of disease, but traditional biological experiments have great uncertainty and are time-consuming and labor-consuming, and therefore require advanced intelligent computational models to solve the problem. At present, miRNA-disease associated prediction is mainly realized through a scoring model, a machine learning algorithm and a deep learning algorithm.

The traditional neural network model has great success in extracting the European space data, but is more laboursome for irregular non-European space data. Therefore, the graph neural network comes along, and the main idea of the graph neural network is to firstly find the neighbor nodes of the central node, and then gather the information carried by the neighbor nodes to the central node by a certain method. The characteristics learned through the thought show that not only the carried information is richer, but also the topological structure of the graph can be protected to a certain extent. Therefore, the method selects a GraphSAGE model to extract miRNA and disease characteristics.

Disclosure of Invention

In view of this, the invention proposes miRNA-disease association prediction based on graph neural networks. The invention utilizes the local information of the graph to represent the characteristics rich in learning for each miRNA and disease pair.

The technical scheme adopted by the invention is as follows:

A. and calculating initial feature representation of the disease based on the semantic similarity of the disease and the similarity of the Gaussian contour nucleus, and calculating initial feature representation of the miRNA based on the functional similarity of the miRNA and the similarity of the Gaussian contour nucleus.

B. Input data for constructing a neural network encoder based on initial feature representations of miRNA and disease.

C. And 4, extracting miRNA and potential disease features based on GraphSAGE.

D. And constructing a score prediction model based on the deep neural network.

E. And reversely propagating the training model parameters based on the cross entropy loss function.

And calculating initial feature representation of the disease based on the semantic similarity of the disease and the similarity of the Gaussian contour nucleus according to the weight A, and calculating the initial feature representation of the miRNA based on the functional similarity of the miRNA and the similarity of the Gaussian contour nucleus. The invention downloads known miRNA-disease associated data from an HMDD database, downloads disease semantic description in an MESH database and constructs a directed acyclic graph. Respectively calculating the semantic similarity of diseases and the functional similarity of miRNA through the constructed directed acyclic graph, calculating the Gaussian contour nuclear similarity of diseases and miRNA by using a known incidence matrix, and finally aggregating the two similarities.

And B, constructing input data of a neural network encoder based on the miRNA and the initial feature representation of the disease. Because the invention uses the DGL framework to construct the graph neural network model, the input data of the model is the characteristic representation of the graph and the nodes, and the characteristic representation requires the same embedding dimension. The initial characterization of the disease and miRNA was therefore characterized and unified into the same dimension.

And C, extracting miRNA and potential disease features based on GraphSAGE. The invention constructs a three-layer GraphSAGE network, wherein the first step of GraphSAGE is to select neighbor nodes of a central node, and the second step is to aggregate neighbor node information to the central node. In the aggregation information phase, the present invention uses a MEAN aggregator.

The deep neural network-based construction of a score prediction model of claim D. The invention constructs three layers of full-connection layer networks, two layers of hidden layers and Relu activating functions between the layers. And finally, predicting an output layer for the score, and outputting the score after being activated by using a sigmoid activation function.

The cross-entropy-loss-function-based back propagation training model parameters of claim E. The method calculates the difference between the predicted value and the label through the cross entropy loss function and uses the difference for a back propagation training model to obtain the optimal parameter.

The technical scheme provided by the invention has the beneficial effects that:

the method applies the GraphSAGE model to the field of miRNA-disease association prediction, predicts unknown miRNA-disease association by using less known association data, reduces the cost of the traditional biological experiment, and greatly reduces the association prediction time. The application of the miRNA in the real life has great significance, the potential related miRNA can be predicted for diseases, and reference significance is provided for diagnosis and treatment of diseases and research and development of new drugs.

Drawings

FIG. 1 is a schematic flow chart of a miRNA-disease association prediction method based on a graph neural network according to the present invention;

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the following describes the feature detection method of the present invention in further detail with reference to the accompanying drawings.

And in the data preprocessing stage, the data are derived from an HMDD database and an MESH database, initial characteristic representation of disease and miRNA nodes is obtained through the two databases, and an miRNA-disease association bipartite graph is constructed.

And inputting the constructed bipartite graph into a GraphSAGE coder, and enriching the embedded representation of the central node by aggregating the information of local nodes on the topology of the graph, thereby improving the accuracy of model prediction.

And (3) performing weighted splicing on the learned miRNA and potential characteristics of the disease to form input data of a deep neural network. And predicting the association score of miRNA-diseases, calculating loss by using a cross entropy loss function, performing back propagation, and training model parameters.

Claims

1. The miRNA-disease association prediction based on the graph neural network comprises the following parts:

C. And 4, extracting miRNA and potential disease features based on GraphSAGE.

D. And constructing a score prediction model based on the deep neural network.

2. The method of claim 1, wherein the calculating the initial feature representation of the disease is based on semantic similarity of the disease and gaussian contour kernel similarity, and wherein the calculating the initial feature representation of the miRNA is based on functional similarity of the miRNA and gaussian contour kernel similarity. The invention downloads known miRNA-disease associated data from an HMDD database, downloads disease semantic description in an MESH database and constructs a directed acyclic graph. Respectively calculating the semantic similarity of diseases and the functional similarity of miRNA through the constructed directed acyclic graph, calculating the Gaussian contour nuclear similarity of diseases and miRNA by using a known incidence matrix, and finally aggregating the two similarities.

3. The initial characterization representation based on miRNA and disease of claim 1 constructing a graph neural network encoder input data. Because the invention uses the DGL framework to construct the graph neural network model, the input data of the model is the characteristic representation of the graph and the nodes, and the characteristic representation requires the same embedding dimension. The initial characterization of the disease and miRNA was therefore characterized and unified into the same dimension.

4. The GraphSAGE-based miRNA, disease potential feature extraction of claim 1. The invention constructs a three-layer GraphSAGE network, wherein the first step of GraphSAGE is to select neighbor nodes of a central node, and the second step is to aggregate neighbor node information to the central node. In the aggregation information phase, the present invention uses a MEAN aggregator.

5. The deep neural network-based construction of a score prediction model of claim 1. The invention constructs three layers of full-connection layer networks, two layers of hidden layers and Relu activating functions between the layers. And finally, predicting an output layer for the score, and outputting the score after being activated by using a sigmoid activation function.

6. The cross-entropy-loss-function-based back propagation training model parameters of claim 1. The method calculates the difference between the predicted value and the label through the cross entropy loss function and uses the difference for a back propagation training model to obtain the optimal parameter.