CN114999635A - circRNA-disease association relation prediction method based on graph convolution neural network and node2vec - Google Patents
circRNA-disease association relation prediction method based on graph convolution neural network and node2vec Download PDFInfo
- Publication number
- CN114999635A CN114999635A CN202210702017.6A CN202210702017A CN114999635A CN 114999635 A CN114999635 A CN 114999635A CN 202210702017 A CN202210702017 A CN 202210702017A CN 114999635 A CN114999635 A CN 114999635A
- Authority
- CN
- China
- Prior art keywords
- circrna
- disease
- similarity
- graph
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 201000010099 disease Diseases 0.000 title claims abstract description 163
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 title claims abstract description 163
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 23
- 239000011159 matrix material Substances 0.000 claims abstract description 33
- 239000013598 vector Substances 0.000 claims abstract description 24
- 230000010354 integration Effects 0.000 claims abstract description 18
- 230000003993 interaction Effects 0.000 claims abstract description 14
- 238000001228 spectrum Methods 0.000 claims abstract description 12
- 239000000284 extract Substances 0.000 claims abstract description 7
- 238000007637 random forest analysis Methods 0.000 claims abstract description 7
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 4
- 238000000605 extraction Methods 0.000 claims abstract description 4
- 230000009466 transformation Effects 0.000 claims abstract description 4
- 238000005295 random walk Methods 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 5
- 238000012549 training Methods 0.000 claims description 5
- 238000002790 cross-validation Methods 0.000 claims description 4
- 238000002474 experimental method Methods 0.000 claims description 4
- 230000000295 complement effect Effects 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000006872 improvement Effects 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 abstract description 6
- 238000011282 treatment Methods 0.000 abstract description 3
- 230000008506 pathogenesis Effects 0.000 abstract 1
- 230000006870 function Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000014509 gene expression Effects 0.000 description 3
- 239000002679 microRNA Substances 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 108091028075 Circular RNA Proteins 0.000 description 2
- 230000004913 activation Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000005094 computer simulation Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 241000894006 Bacteria Species 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 108700011259 MicroRNAs Proteins 0.000 description 1
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 241001493065 dsRNA viruses Species 0.000 description 1
- 230000004064 dysfunction Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 108020004999 messenger RNA Proteins 0.000 description 1
- 108091070501 miRNA Proteins 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000004879 molecular function Effects 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 238000004393 prognosis Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Public Health (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Pathology (AREA)
- Computing Systems (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Artificial Intelligence (AREA)
- Physiology (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
A circRNA-disease association relation prediction method based on graph convolution neural network and Node2vec comprises the following steps: acquiring a circRNA-disease association matrix; calculating the functional similarity of circRNA, the Gaussian interaction spectrum nuclear similarity of circRNA, the nuclear similarity of disease Gaussian interaction spectrum and the semantic similarity of disease, constructing the integration similarity of circRNA and the integration similarity of disease, and generating a circRNA-disease heteromorphic graph; a sparse automatic encoder performs feature extraction and transformation on the circRNA (disease) integration similarity, converts the circRNA integration similarity into a 64-dimensional feature vector, and fuses the circRNA feature vector and the disease feature vector into a final circRNA-disease feature vector; extracting local structure information of nodes from the circRNA-disease heteromorphic graph by the graph convolutional neural network; the Node2vec method extracts the global structure information of the Node from the circRNA-disease heteromorphic graph; and (4) sending the node information obtained in the first two steps into a random forest classifier, and predicting a potential circRNA-disease association relation. The circRNA related to the disease is predicted by a calculation method, so that the time is saved, the disease pathogenesis is clarified, and an effective treatment scheme is searched.
Description
Technical Field
The invention relates to the field of correlation prediction in bioinformatics, in particular to a circRNA-disease correlation prediction method based on graph-based neural network and Node2 vec.
Background
In 1976, the first circRNA was found to be in the study of RNA viruses. Due to the structural specificity, unknown function and low abundance of circRNA, it is considered an artifact or a mis-spliced product. With the development of sequencing technologies, more and more circrnas are identified in thousands of organisms, such as plants, animals and bacteria. It was found that circRNA has important molecular functions: participating in regulation and control of gene expression, serving as a molecular sponge to absorb microRNA, inhibiting the activity of miRNA, regulating the expression of messenger RNA and the like. Mutations or dysfunction of circRNA cause a disruption in various vital activities, thereby causing disease. Therefore, the research on the mechanism of circRNA in disease occurrence and the function of circRNA in disease treatment is carried out, and the understanding of the association relationship between circRNA and disease is an important content of bioinformatics research, is beneficial to disease prognosis, diagnosis and treatment, and is a new way for future research.
The traditional biological experiment verification method needs a large amount of manpower and material resources, and is high in prediction accuracy but time-consuming. Biological characteristics of data are mined by using a calculation method, and the association relation of the circRNA and the disease is predicted, so that the method is convenient and efficient. The current computational methods for predicting circRNA-disease associations can be divided into two broad categories, network-based propagation and machine-learning-based.
The method based on network transmission utilizes circRNA and disease association data to construct a circRNA (disease) similarity network, and predicts the association relationship of the potential circRNA and the disease. Fan et al developed a computational model KATZHCDA using KATZ measures on a heterogeneous network of circRNA expression profiles, disease phenotype similarities and known circRNA-disease associations. The model successfully predicts the circRNA-disease association for heterogeneous networks using simple metrology, but is not suitable for predicting new diseases without any known circRNA association or isolated circRNA without any known disease association. Li et al propose a method for predicting DWNCPCDA in association with disease using the Deepwalk and network consensus projection method for circRNA. The method has the advantages that the network embedding method Deepwalk is adopted to learn node embedding of known circRNA and disease association network, and the method is combined with a similarity-based method, so that greater flexibility is provided for circRNA-disease association prediction. In the future, more biomedical association data of circRNA or diseases, such as circRNA-miRNA association and miRNA-disease association, will be integrated to further improve the prediction performance.
Machine learning-based methods exploit deep features of circRNA and disease data using supervised or unsupervised methods, iteratively learn to progressively optimize model parameters, and design classifiers to identify circrnas related to disease. Lei et al propose a calculation method RWRKNN that applies a restart random walk algorithm to weighted features with global network topology information and uses a K-nearest neighbor algorithm to classify according to the features to improve prediction performance. However, RWRKNN has a slight deficiency in methods to reveal associations between disease and new circRNA without any association or between circRNA and new disease without any association. Ding et al developed a computational model based on random walk and logistic regression to predict RWLR for the association of circRNA with disease. The method for restarting the random walk to obtain the global structure information of each circRNA is better than the method based on the similarity only. RWLR predicted novel circRNA associated with no known disease. However, RWLR only considers circRNA similarity and does not contain sufficient disease information, resulting in poor prediction accuracy. Zhang et al propose a graph representation-based learning-based approach to identifying circRNA-disease associations for predicting the potential association of circRNA with disease, iGRLCDA. The method utilizes a graph convolution neural network and a deep learning model of graph decomposition, and effectively excavates circRNA and disease information of higher levels. However, iGRLCDA is less sensitive to new circRNA-disease associations, depending on the nature or character of the known circRNA-disease association.
In view of this, it is very important to study the prediction method of circRNA-disease association relationship. The invention provides a circRNA-disease association relation prediction method based on a graph convolution neural network and Node2vec, so as to predict potential circRNA-disease association.
Disclosure of Invention
The invention aims to solve the problems of low prediction precision, time consumption performance in training and the like of the conventional circRNA-disease association prediction model, provides a circRNA-disease association relation prediction method based on a graph convolution neural network and node2vec, improves the prediction precision and reduces the training cost.
The technical scheme of the invention specifically comprises the following steps:
step 1: obtaining a circRNA-disease association matrix.
Acquiring circRNA-Disease associated data verified by experiments from a circR2Disease database, deleting redundant data, and only selecting known associated data related to human complex diseases as a circRNA-Disease associated matrix.
Step 2: calculating the semantic similarity of diseases, the Gaussian interaction spectrum nuclear similarity of diseases, the functional similarity of circRNA and the Gaussian interaction spectrum nuclear similarity of circRNA, constructing the integration similarity of circRNA and the integration similarity of diseases, and generating a circRNA-disease heteromorphic graph.
Acquiring related annotation words of each disease from a MESH database, and calculating semantic similarity among the diseases by utilizing a Directed Acyclic Graph (DAG) to obtain the semantic similarity of the diseases; calculating the core similarity of the circRNA (disease) Gaussian interaction spectrum according to the circRNA-disease association matrix; and calculating the functional similarity of the circRNA according to the semantic similarity of the diseases and the circRNA-disease association matrix. And quantifying each pair of disease similarity by integrating complementary information from a plurality of data sources and different representation methods by adopting integrated similarity to overcome inherent sparsity to obtain a circRNA integrated similarity matrix and a disease integrated similarity matrix.
And step 3: and the sparse automatic encoder performs feature extraction and transformation on the circRNA (disease) integration similarity, converts the circRNA integration similarity into a 64-dimensional feature vector, and fuses the circRNA feature vector and the disease feature vector into a final circRNA-disease feature vector.
The sparse autoencoder not only can automatically learn features, but also can give better feature description than the original data. The characteristics learned by the sparse automatic encoder are used for replacing original data, and the model prediction performance is improved to a certain extent. For this purpose, the invention uses a sparse automatic encoder to integrate similarity to circRNA (disease) respectively, minimizes the error between input and output by a back propagation algorithm, extracts and transforms features, and obtains 64-dimensional circRNA (disease) feature vectors. Finally, the circRNA (disease) feature vectors are combined to obtain the final circRNA-disease feature vector.
And 4, step 4: and extracting local structure information of the nodes from the circRNA-disease heteromorphic graph by the graph convolutional neural network.
The local structure information describes local similarities between nodes in the graph. Specifically, if there is an edge connection between two nodes, the two nodes will have a connection in the embedding space; if no edge connection exists between two nodes, their first order proximity is 0. The graph convolution neural network inputs the structure of the circRNA-disease heteromorphic graph and the characteristics of circRNA (disease) nodes, and outputs pooling information of the nodes and graph structure information to acquire local structure information.
And 5: the Node2vec method extracts the global structure information of the Node from the circRNA-disease heteromorphic graph.
The global structure information describes the relationship between two nodes that are not directly connected. The Node2vec method is a targeted improvement on Deepwalk, and is to sample a graph based on random walk and map a Node adjacent structure into a sequence structure. And then training a Skip-gram model by using the sequence obtained by sampling, capturing connectivity between nodes, and obtaining global structure information.
Step 6: and (4) sending the node information obtained in the first two steps into a random forest classifier, and predicting a potential circRNA-disease association relation.
And (3) sending the node information obtained in the first two steps into a random forest classifier, predicting a potential circRNA-disease association relation, and obtaining an AUC value and an AUPR value of the invention by adopting five-fold cross validation to obtain a prediction result.
Drawings
FIG. 1 is a schematic flow diagram of the circRNA-disease association relationship prediction method based on the atlas neural network and node2 vec.
FIG. 2 is a graph of ROC curves for an implementation of the present invention.
FIG. 3 is a PR graph of the implementation method of the present invention.
Detailed Description
The invention relates to a circRNA-disease association relation prediction method based on a graph convolution neural network and node2 vec. The present invention will be described in further detail below with reference to specific embodiments and simulation experiments. It should be understood by those skilled in the art that these implementation methods are only for explaining the technical principle of the present invention and are not intended to limit the forensic scope of the present invention.
As shown in fig. 1, a circRNA-disease association relationship prediction method based on a convolutional neural network and a node2vec specifically includes the following steps:
preferably, the obtaining of the incidence matrix in step 1 specifically includes:
experimentally verified 739 circRNA-Disease known associations (involving 661 circrnas and 100 diseases) were obtained from the circR2Disease database. After the redundant data is deleted, only 650 known association data (585 circRNAs and 88 diseases are involved) related to human complex diseases are selected as the known association matrixnc and nd represent circRNA and disease number, respectively. If circRNA c i And disease d j If there is an experimentally verified known correlation, then matrix element A (c) is defined i ,d j ) 1 is ═ 1; if any circRNA c i And disease d j In the absence of known correlations, which are experimentally verified, the matrix element A is defined (c) i ,d j )=0。
Preferably, the calculating semantic similarity of diseases in step 2 specifically includes:
building a disease semantic similarity matrix by downloading disease-related dataAny disease d t For disease d i For the semantic contribution value ofExpressed, the calculation is as follows:
in the formula, σ represents the attenuation coefficient of the semantic contribution.
Matrix element DS (d) i ,d j ) Indicates a disease d i And disease d j The semantic similarity of diseases between them is calculated as follows:
preferably, the calculating of the functional similarity of circrnas in step 2 specifically comprises:
functional similarity of circRNAs is measured by their tendency to correlate with phenotypically similar diseasesMatrix element CS (c) i ,c j ) Represents circRNA c i And c j Functional similarity between them, calculated as follows:
in the formula, set D i Representation of circular RNA c i An associated disease set; set D j Representation of circular RNA c j An associated disease set; i D i I and I D j Respectively representing the sets D i And D j The number of diseases in the eye.
Preferably, the calculating of the circRNA (disease) gaussian interaction profile nuclear similarity as described in step 2 is:
the circRNA (disease) gaussian interaction profile nuclear similarity is calculated in combination with the correlation matrix and the disease semantic similarity. By means of matricesThe matrix element DK (d) represents the Gaussian interaction spectrum nuclear similarity of the disease i ,d j ) Indicates a disease d i And disease d j The gaussian interaction spectrum kernel similarity is calculated as follows:
in the formula, the parameter mu d Control kernel bandwidth indicating GIP similarity.
In the same way, the matrixMatrix element CK (c) representing the Gaussian interaction spectrum nuclear similarity of circRNAs i ,c j ) Represents circRNA c i And c j The gaussian interaction spectrum kernel similarity is calculated as follows:
CK(c i ,c j )=exp(-μ c ||A(c i ,d j )-A(c j ,d j )|| 2 )
in the formula, the parameter mu c Control kernel bandwidth representing GIP similarity
Preferably, the calculating of circRNA (disease) integrated similarity in step 2 is specifically:
considering disease semantic similarity and the inherent sparsity of circRNA functional similarity, integrating complementary information from multiple data sources and different representation methods, employing integrated similarity to quantify each pair of circRNA (disease) similarity overcoming inherent sparsity. By means of matricesRepresenting integrated similarity of disease, matrix element X d (d i ,d j ) Is calculated as follows:
circRNA integration similarity matrixRepresents, matrix element X c (c i ,c j ) Is calculated as follows:
preferably, the sparse automatic encoder described in step 3 performs feature extraction and transformation on the circRNA (disease) integrated similarity, and then converts the circRNA (disease) integrated similarity into a 64-dimensional feature vector, and fuses the circRNA feature vector and the disease feature vector into a final circRNA-disease feature vector, specifically:
the sparse autoencoder encodes the original input features and reduces dimensionality to find potential associations between the input features and extracts high-order features that are expressive. The sparse automatic encoder consists of an encoder and a decoder and is a neural network with three layers, including an input layer, a hidden layer and an output layer, wherein the input layer x is mapped to the hidden layer y one by one. The encoder calculates as follows:
y=sigmoid(W 1 x(i)+a 1 )
in the formula, sigmoid represents an activation function; w 1 Representing the connection parameters of the input layer x and the hidden layer y; a is 1 Indicating an offset.
The decoder calculates as follows:
z=sigmoids(W 2 y+a 2 )
in the formula, W 2 Representing the connection parameter of the hidden layer y to the output layer z, a 2 Indicating the offset.
Inputting the circRNA (disease) integration similarity into a sparse automatic encoder respectively, extracting and transforming by minimizing the error between input and output through a back propagation algorithm to obtain 64-dimensional characteristic vectors respectively Z c And Z d Combining the two to obtain the final circRNA-disease characteristic vector Z cd The calculation is as follows:
preferably, the local structure information of the node extracted from the circRNA-disease heteromorphic graph by the graph convolution neural network in the step 4 specifically comprises:
the graph convolution neural network inputs the structure of the graph and the characteristics of each node, and can output the pooling information of the nodes and the information of the graph (node) structure to obtain the local structure information of the graph. For this purpose, the circRNA-disease-known correlation matrix A is converted into a adjacency matrix by calculationLocal structural information is obtained using a spatial approach to the atlas neural network, which is calculated as follows:
in the formula, ReLU (, x) represents an activation function of two layers of the neural network;to representThe metric matrix of (a); w represents a weight matrix;an adjacency matrix representing an added self-loop, which is calculated asWherein,representing an identity matrix.
Preferably, the Node2vec method in step 5 extracts global structure information of the Node for the circRNA-disease heteromorphic graph, specifically:
node2vec is a semi-supervised learning for scalable feature learning in networks, which can maximally preserve the network domain possibilities of nodes in d-dimensional feature space. Firstly, sampling a graph based on random walk, mapping a node adjacent structure into a sequence structure, then training a Skip-gram model by using the sampled sequence, and capturing connectivity between nodes to obtain global structure information.
Preferably, the step 6 of sending the information to the random forest classifier specifically comprises:
and (4) sending the node information obtained in the first two steps into a random forest classifier, predicting a potential circRNA-disease association relation, and obtaining a prediction result.
The technical effects of the invention are further illustrated by experimental verification as follows:
1. experimental conditions and contents:
the experiments of the invention were performed on AMD 1.80GHz CPU and windows10 operating systems.
2. And (3) analyzing an experimental result:
the result shows that the prediction precision of the circRNA-disease association relation adopts five-fold cross validation, and the evaluation indexes are ROC and PR. Wherein ROC is the area under ROC curve with FPR as abscissa and TPR as ordinate, and PR is the area under Pre-Recall curve with Recall as abscissa and precision as ordinate. Greater ROC and PR values indicate greater accuracy.
The ROC curve graph and the PR curve graph obtained by performing five-fold cross validation in the invention are shown in fig. 2-3.
The above description is only one specific example of the present invention and should not be construed as limiting the invention in any way. It will be apparent to persons skilled in the relevant art that various modifications and changes in form and detail can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (7)
1. A circRNA-disease association relation prediction method based on a graph convolution neural network and Node2vec is characterized by comprising the following steps:
step 1: acquiring a circRNA-disease association matrix;
step 2: calculating the semantic similarity of diseases, the Gaussian interaction spectrum nuclear similarity of diseases, the functional similarity of circRNA and the Gaussian interaction spectrum nuclear similarity of circRNA, constructing the integration similarity of circRNA and the integration similarity of diseases, and generating a circRNA-disease heteromorphic graph;
and step 3: a sparse automatic encoder performs feature extraction and transformation on the circRNA (disease) integration similarity, converts the circRNA integration similarity into a 64-dimensional feature vector, and combines the circRNA feature vector and the disease feature vector into a final circRNA-disease feature vector;
and 4, step 4: extracting local structure information of nodes from the circRNA-disease heteromorphic graph by the graph convolutional neural network;
and 5: the Node2vec method extracts the global structure information of the Node from the circRNA-disease heteromorphic graph;
step 6: and (4) sending the node information obtained in the first two steps into a random forest classifier, and predicting a potential circRNA-disease association relation.
2. The circRNA-disease association prediction method based on graph-convolution neural network and Node2vec as claimed in claim 1, wherein in step 1, specifically:
acquiring circRNA-Disease associated data verified by experiments from a circR2Disease database, deleting redundant data, and only selecting known associated data related to human complex diseases as a circRNA-Disease associated matrix.
3. The circRNA-disease association prediction method based on graph-convolution neural network and Node2vec as claimed in claim 1, wherein in step 2, specifically:
acquiring related annotation words of each disease from a MESH database, and calculating semantic similarity among the diseases by utilizing a Directed Acyclic Graph (DAG) to obtain the semantic similarity of the diseases; calculating the core similarity of the circRNA (disease) Gaussian interaction spectrum according to the circRNA-disease association matrix; calculating the functional similarity of the circRNA according to the semantic similarity of the diseases and the circRNA-disease association matrix; by integrating complementary information from multiple data sources and different representation methods, integration similarity is adopted to quantify each pair of disease similarity to overcome inherent sparsity, and a circRNA integration similarity matrix and a disease integration similarity matrix are obtained.
4. The circRNA-disease association prediction method based on the atlas neural network and Node2vec as claimed in claim 1, wherein in step 3, specifically:
the sparse automatic encoder can not only automatically learn characteristics, but also give better characteristic description than original data; original data is replaced by the learned characteristics of the sparse automatic encoder, and the model prediction performance is improved to a certain extent; therefore, the invention uses a sparse automatic encoder to respectively minimize the error between input and output through a back propagation algorithm for the integration similarity of the circRNA (diseases), extracts and transforms characteristics to obtain 64-dimensional circRNA (disease) characteristic vectors; finally, the circRNA (disease) feature vectors are combined to obtain the final circRNA-disease feature vector.
5. The circRNA-disease association prediction method based on the atlas neural network and Node2vec as claimed in claim 1, wherein in step 4, specifically:
the local structure information describes local similarity between nodes in the graph; specifically, if there is an edge connection between two nodes, the two nodes will have a connection in the embedding space; if no edge connection exists between two nodes, their first order proximity is 0; the graph convolution neural network inputs the structure of the circRNA-disease heteromorphic graph and the characteristics of circRNA (disease) nodes, and outputs pooling information of the nodes and graph structure information to acquire local structure information.
6. The circRNA-disease association prediction method based on the atlas neural network and Node2vec as claimed in claim 1, wherein in step 5, specifically:
the global structure information describes the relationship between two nodes which are not directly connected; the Node2vec method is a targeted improvement on Deepwalk, and is to sample a graph based on random walk and map a Node adjacent structure into a sequence structure; and then training a Skip-gram model by using the sequence obtained by sampling, capturing the connectivity between nodes, and obtaining global structure information.
7. The circRNA-disease association prediction method based on atlas neural network and Node2vec as claimed in claim 1, wherein in step 6, specifically:
and (3) sending the node information obtained in the first two steps into a random forest classifier, predicting a potential circRNA-disease association relation, obtaining an AUC value and an AUPR value of the invention by adopting five-fold cross validation, and obtaining a prediction result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210702017.6A CN114999635A (en) | 2022-06-20 | 2022-06-20 | circRNA-disease association relation prediction method based on graph convolution neural network and node2vec |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210702017.6A CN114999635A (en) | 2022-06-20 | 2022-06-20 | circRNA-disease association relation prediction method based on graph convolution neural network and node2vec |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114999635A true CN114999635A (en) | 2022-09-02 |
Family
ID=83037287
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210702017.6A Pending CN114999635A (en) | 2022-06-20 | 2022-06-20 | circRNA-disease association relation prediction method based on graph convolution neural network and node2vec |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114999635A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116129992A (en) * | 2023-04-17 | 2023-05-16 | 之江实验室 | Gene regulation network construction method and system based on graphic neural network |
CN117012382A (en) * | 2023-05-22 | 2023-11-07 | 东北林业大学 | Disease-related circRNA prediction system based on depth feature fusion |
CN117393143A (en) * | 2023-10-11 | 2024-01-12 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Circular RNA-disease association prediction method based on graph representation learning |
-
2022
- 2022-06-20 CN CN202210702017.6A patent/CN114999635A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116129992A (en) * | 2023-04-17 | 2023-05-16 | 之江实验室 | Gene regulation network construction method and system based on graphic neural network |
CN117012382A (en) * | 2023-05-22 | 2023-11-07 | 东北林业大学 | Disease-related circRNA prediction system based on depth feature fusion |
CN117393143A (en) * | 2023-10-11 | 2024-01-12 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Circular RNA-disease association prediction method based on graph representation learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114999635A (en) | circRNA-disease association relation prediction method based on graph convolution neural network and node2vec | |
CN113705772A (en) | Model training method, device and equipment and readable storage medium | |
AU2019289227A1 (en) | Filtering genetic networks to discover populations of interest | |
CN114496092B (en) | MiRNA and disease association relation prediction method based on graph rolling network | |
CN108427756B (en) | Personalized query word completion recommendation method and device based on same-class user model | |
CN113241115A (en) | Depth matrix decomposition-based circular RNA disease correlation prediction method | |
CN104992078B (en) | A kind of protein network complex recognizing method based on semantic density | |
CN111540405B (en) | Disease gene prediction method based on rapid network embedding | |
CN113157957A (en) | Attribute graph document clustering method based on graph convolution neural network | |
Yu et al. | Predicting protein complex in protein interaction network-a supervised learning based method | |
CN112784918A (en) | Node identification method, system and device based on unsupervised graph representation learning | |
CN109919198A (en) | A kind of new network insertion learning method for restarting formula random walk | |
CN109948242A (en) | Network representation learning method based on feature Hash | |
CN113436729A (en) | Synthetic lethal interaction prediction method based on heterogeneous graph convolution neural network | |
CN117349494A (en) | Graph classification method, system, medium and equipment for space graph convolution neural network | |
CN115602243A (en) | Disease associated information prediction method based on multi-similarity fusion | |
CN113539479B (en) | Similarity constraint-based miRNA-disease association prediction method and system | |
Paul et al. | ML-KnockoffGAN: Deep online feature selection for multi-label learning | |
CN114037014A (en) | Reference network clustering method based on graph self-encoder | |
CN106815653B (en) | Distance game-based social network relationship prediction method and system | |
CN117393049A (en) | circRNA-disease associated prediction model based on random disturbance and multi-view graph convolutional network | |
CN116959588A (en) | Biochemical passage crosstalk identification method | |
CN108304546B (en) | Medical image retrieval method based on content similarity and Softmax classifier | |
Tian et al. | MAMLCDA: A Meta-Learning Model for Predicting circRNA-Disease Association Based on MAML Combined With CNN | |
Chen et al. | Community Detection Based on DeepWalk Model in Large‐Scale Networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |