CN114242237A - Graph neural network-based prediction of miRNA-disease association - Google Patents
Graph neural network-based prediction of miRNA-disease association Download PDFInfo
- Publication number
- CN114242237A CN114242237A CN202111557995.8A CN202111557995A CN114242237A CN 114242237 A CN114242237 A CN 114242237A CN 202111557995 A CN202111557995 A CN 202111557995A CN 114242237 A CN114242237 A CN 114242237A
- Authority
- CN
- China
- Prior art keywords
- mirna
- disease
- similarity
- neural network
- graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- Software Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Public Health (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Biotechnology (AREA)
- Bioethics (AREA)
- Pathology (AREA)
- Primary Health Care (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention provides miRNA-disease association prediction based on graph neural networks. The traditional neural network model cannot process irregular non-European spatial data in the miRNA-disease associated prediction field. Therefore, the graph SAGE model is selected to extract the characteristics of the graph nodes. First we map the integrated disease similarity, integrated miRNA similarity to the same feature space and use known association data to construct miRNA-disease bipartite graphs as input for GraphSAGE. Information of neighbor nodes is aggregated through a GraphSAGE model, node feature representation is enriched, and effective data are provided for a downstream prediction task. And finally, performing weighted splicing on the learned miRNA and the potential characteristics of the diseases to serve as the input of a deep neural network prediction model, and obtaining the association score. The training model parameters are propagated back using the cross entropy loss function.
Description
Technical Field
The invention relates to a feature extraction method, in particular to a local feature extraction method based on a graph neural network.
Background
Research shows that miRNA is used as non-coding RNA to participate in regulation and control of life activities of all levels and most pathological processes. Identifying miRNA related to disease is of great significance for diagnosis and treatment of disease, but traditional biological experiments have great uncertainty and are time-consuming and labor-consuming, and therefore require advanced intelligent computational models to solve the problem. At present, miRNA-disease associated prediction is mainly realized through a scoring model, a machine learning algorithm and a deep learning algorithm.
The traditional neural network model has great success in extracting the European space data, but is more laboursome for irregular non-European space data. Therefore, the graph neural network comes along, and the main idea of the graph neural network is to firstly find the neighbor nodes of the central node, and then gather the information carried by the neighbor nodes to the central node by a certain method. The characteristics learned through the thought show that not only the carried information is richer, but also the topological structure of the graph can be protected to a certain extent. Therefore, the method selects a GraphSAGE model to extract miRNA and disease characteristics.
Disclosure of Invention
In view of this, the invention proposes miRNA-disease association prediction based on graph neural networks. The invention utilizes the local information of the graph to represent the characteristics rich in learning for each miRNA and disease pair.
The technical scheme adopted by the invention is as follows:
A. and calculating initial feature representation of the disease based on the semantic similarity of the disease and the similarity of the Gaussian contour nucleus, and calculating initial feature representation of the miRNA based on the functional similarity of the miRNA and the similarity of the Gaussian contour nucleus.
B. Input data for constructing a neural network encoder based on initial feature representations of miRNA and disease.
C. And 4, extracting miRNA and potential disease features based on GraphSAGE.
D. And constructing a score prediction model based on the deep neural network.
E. And reversely propagating the training model parameters based on the cross entropy loss function.
And calculating initial feature representation of the disease based on the semantic similarity of the disease and the similarity of the Gaussian contour nucleus according to the weight A, and calculating the initial feature representation of the miRNA based on the functional similarity of the miRNA and the similarity of the Gaussian contour nucleus. The invention downloads known miRNA-disease associated data from an HMDD database, downloads disease semantic description in an MESH database and constructs a directed acyclic graph. Respectively calculating the semantic similarity of diseases and the functional similarity of miRNA through the constructed directed acyclic graph, calculating the Gaussian contour nuclear similarity of diseases and miRNA by using a known incidence matrix, and finally aggregating the two similarities.
And B, constructing input data of a neural network encoder based on the miRNA and the initial feature representation of the disease. Because the invention uses the DGL framework to construct the graph neural network model, the input data of the model is the characteristic representation of the graph and the nodes, and the characteristic representation requires the same embedding dimension. The initial characterization of the disease and miRNA was therefore characterized and unified into the same dimension.
And C, extracting miRNA and potential disease features based on GraphSAGE. The invention constructs a three-layer GraphSAGE network, wherein the first step of GraphSAGE is to select neighbor nodes of a central node, and the second step is to aggregate neighbor node information to the central node. In the aggregation information phase, the present invention uses a MEAN aggregator.
The deep neural network-based construction of a score prediction model of claim D. The invention constructs three layers of full-connection layer networks, two layers of hidden layers and Relu activating functions between the layers. And finally, predicting an output layer for the score, and outputting the score after being activated by using a sigmoid activation function.
The cross-entropy-loss-function-based back propagation training model parameters of claim E. The method calculates the difference between the predicted value and the label through the cross entropy loss function and uses the difference for a back propagation training model to obtain the optimal parameter.
The technical scheme provided by the invention has the beneficial effects that:
the method applies the GraphSAGE model to the field of miRNA-disease association prediction, predicts unknown miRNA-disease association by using less known association data, reduces the cost of the traditional biological experiment, and greatly reduces the association prediction time. The application of the miRNA in the real life has great significance, the potential related miRNA can be predicted for diseases, and reference significance is provided for diagnosis and treatment of diseases and research and development of new drugs.
Drawings
FIG. 1 is a schematic flow chart of a miRNA-disease association prediction method based on a graph neural network according to the present invention;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the following describes the feature detection method of the present invention in further detail with reference to the accompanying drawings.
And in the data preprocessing stage, the data are derived from an HMDD database and an MESH database, initial characteristic representation of disease and miRNA nodes is obtained through the two databases, and an miRNA-disease association bipartite graph is constructed.
And inputting the constructed bipartite graph into a GraphSAGE coder, and enriching the embedded representation of the central node by aggregating the information of local nodes on the topology of the graph, thereby improving the accuracy of model prediction.
And (3) performing weighted splicing on the learned miRNA and potential characteristics of the disease to form input data of a deep neural network. And predicting the association score of miRNA-diseases, calculating loss by using a cross entropy loss function, performing back propagation, and training model parameters.
Claims (6)
1. The miRNA-disease association prediction based on the graph neural network comprises the following parts:
A. and calculating initial feature representation of the disease based on the semantic similarity of the disease and the similarity of the Gaussian contour nucleus, and calculating initial feature representation of the miRNA based on the functional similarity of the miRNA and the similarity of the Gaussian contour nucleus.
B. Input data for constructing a neural network encoder based on initial feature representations of miRNA and disease.
C. And 4, extracting miRNA and potential disease features based on GraphSAGE.
D. And constructing a score prediction model based on the deep neural network.
E. And reversely propagating the training model parameters based on the cross entropy loss function.
2. The method of claim 1, wherein the calculating the initial feature representation of the disease is based on semantic similarity of the disease and gaussian contour kernel similarity, and wherein the calculating the initial feature representation of the miRNA is based on functional similarity of the miRNA and gaussian contour kernel similarity. The invention downloads known miRNA-disease associated data from an HMDD database, downloads disease semantic description in an MESH database and constructs a directed acyclic graph. Respectively calculating the semantic similarity of diseases and the functional similarity of miRNA through the constructed directed acyclic graph, calculating the Gaussian contour nuclear similarity of diseases and miRNA by using a known incidence matrix, and finally aggregating the two similarities.
3. The initial characterization representation based on miRNA and disease of claim 1 constructing a graph neural network encoder input data. Because the invention uses the DGL framework to construct the graph neural network model, the input data of the model is the characteristic representation of the graph and the nodes, and the characteristic representation requires the same embedding dimension. The initial characterization of the disease and miRNA was therefore characterized and unified into the same dimension.
4. The GraphSAGE-based miRNA, disease potential feature extraction of claim 1. The invention constructs a three-layer GraphSAGE network, wherein the first step of GraphSAGE is to select neighbor nodes of a central node, and the second step is to aggregate neighbor node information to the central node. In the aggregation information phase, the present invention uses a MEAN aggregator.
5. The deep neural network-based construction of a score prediction model of claim 1. The invention constructs three layers of full-connection layer networks, two layers of hidden layers and Relu activating functions between the layers. And finally, predicting an output layer for the score, and outputting the score after being activated by using a sigmoid activation function.
6. The cross-entropy-loss-function-based back propagation training model parameters of claim 1. The method calculates the difference between the predicted value and the label through the cross entropy loss function and uses the difference for a back propagation training model to obtain the optimal parameter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111557995.8A CN114242237A (en) | 2021-12-20 | 2021-12-20 | Graph neural network-based prediction of miRNA-disease association |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111557995.8A CN114242237A (en) | 2021-12-20 | 2021-12-20 | Graph neural network-based prediction of miRNA-disease association |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114242237A true CN114242237A (en) | 2022-03-25 |
Family
ID=80758750
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111557995.8A Pending CN114242237A (en) | 2021-12-20 | 2021-12-20 | Graph neural network-based prediction of miRNA-disease association |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114242237A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115798598A (en) * | 2022-11-16 | 2023-03-14 | 大连海事大学 | Hypergraph-based miRNA-disease association prediction model and method |
-
2021
- 2021-12-20 CN CN202111557995.8A patent/CN114242237A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115798598A (en) * | 2022-11-16 | 2023-03-14 | 大连海事大学 | Hypergraph-based miRNA-disease association prediction model and method |
CN115798598B (en) * | 2022-11-16 | 2023-11-14 | 大连海事大学 | Hypergraph-based miRNA-disease association prediction model and method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108510741B (en) | Conv1D-LSTM neural network structure-based traffic flow prediction method | |
Wu et al. | Evolving RBF neural networks for rainfall prediction using hybrid particle swarm optimization and genetic algorithm | |
CN104751842B (en) | The optimization method and system of deep neural network | |
CN112070277B (en) | Medicine-target interaction prediction method based on hypergraph neural network | |
CN110070715A (en) | A kind of road traffic flow prediction method based on Conv1D-NLSTMs neural network structure | |
CN106021990B (en) | A method of biological gene is subjected to classification and Urine scent with specific character | |
CN110164129B (en) | Single-intersection multi-lane traffic flow prediction method based on GERNN | |
CN110458336A (en) | A kind of net based on deep learning about vehicle supply and demand prediction method | |
CN109697512B (en) | Personal data analysis method based on Bayesian network and computer storage medium | |
CN110570035B (en) | People flow prediction system for simultaneously modeling space-time dependency and daily flow dependency | |
CN112949896B (en) | Time sequence prediction method based on fusion sequence decomposition and space-time convolution | |
Zhu et al. | A Novel Traffic Flow Forecasting Method Based on RNN‐GCN and BRB | |
CN109523021A (en) | A kind of dynamic network Structure Prediction Methods based on long memory network in short-term | |
CN113780002A (en) | Knowledge reasoning method and device based on graph representation learning and deep reinforcement learning | |
CN111860787A (en) | Short-term prediction method and device for coupling directed graph structure flow data containing missing data | |
CN115346372B (en) | Multi-component fusion traffic flow prediction method based on graph neural network | |
CN109558484A (en) | Electric power customer service work order emotion quantitative analysis method based on similarity word order matrix | |
CN112463987A (en) | Chinese classical garden knowledge graph completion and cognitive reasoning method | |
CN115952424A (en) | Graph convolution neural network clustering method based on multi-view structure | |
Qi et al. | FedAGCN: A traffic flow prediction framework based on federated learning and Asynchronous Graph Convolutional Network | |
Ishak et al. | Mining temporal reservoir data using sliding window technique | |
CN114242237A (en) | Graph neural network-based prediction of miRNA-disease association | |
Liu | Language database construction method based on big data and deep learning | |
CN116993043A (en) | Power equipment fault tracing method and device | |
Feng et al. | Link prediction based on orbit counting and graph auto-encoder |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |