CN114999635A - circRNA-disease association relation prediction method based on graph convolution neural network and node2vec - Google Patents

circRNA-disease association relation prediction method based on graph convolution neural network and node2vec Download PDF

Info

Publication number
CN114999635A
CN114999635A CN202210702017.6A CN202210702017A CN114999635A CN 114999635 A CN114999635 A CN 114999635A CN 202210702017 A CN202210702017 A CN 202210702017A CN 114999635 A CN114999635 A CN 114999635A
Authority
CN
China
Prior art keywords
circrna
disease
similarity
graph
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210702017.6A
Other languages
Chinese (zh)
Inventor
张奕
王真梅
蔡钢生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Technology
Original Assignee
Guilin University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Technology filed Critical Guilin University of Technology
Priority to CN202210702017.6A priority Critical patent/CN114999635A/en
Publication of CN114999635A publication Critical patent/CN114999635A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Pathology (AREA)
  • Computing Systems (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Artificial Intelligence (AREA)
  • Physiology (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

A circRNA-disease association relation prediction method based on graph convolution neural network and Node2vec comprises the following steps: acquiring a circRNA-disease association matrix; calculating the functional similarity of circRNA, the Gaussian interaction spectrum nuclear similarity of circRNA, the nuclear similarity of disease Gaussian interaction spectrum and the semantic similarity of disease, constructing the integration similarity of circRNA and the integration similarity of disease, and generating a circRNA-disease heteromorphic graph; a sparse automatic encoder performs feature extraction and transformation on the circRNA (disease) integration similarity, converts the circRNA integration similarity into a 64-dimensional feature vector, and fuses the circRNA feature vector and the disease feature vector into a final circRNA-disease feature vector; extracting local structure information of nodes from the circRNA-disease heteromorphic graph by the graph convolutional neural network; the Node2vec method extracts the global structure information of the Node from the circRNA-disease heteromorphic graph; and (4) sending the node information obtained in the first two steps into a random forest classifier, and predicting a potential circRNA-disease association relation. The circRNA related to the disease is predicted by a calculation method, so that the time is saved, the disease pathogenesis is clarified, and an effective treatment scheme is searched.

Description

circRNA-disease association relation prediction method based on graph convolution neural network and node2vec
Technical Field
The invention relates to the field of correlation prediction in bioinformatics, in particular to a circRNA-disease correlation prediction method based on graph-based neural network and Node2 vec.
Background
In 1976, the first circRNA was found to be in the study of RNA viruses. Due to the structural specificity, unknown function and low abundance of circRNA, it is considered an artifact or a mis-spliced product. With the development of sequencing technologies, more and more circrnas are identified in thousands of organisms, such as plants, animals and bacteria. It was found that circRNA has important molecular functions: participating in regulation and control of gene expression, serving as a molecular sponge to absorb microRNA, inhibiting the activity of miRNA, regulating the expression of messenger RNA and the like. Mutations or dysfunction of circRNA cause a disruption in various vital activities, thereby causing disease. Therefore, the research on the mechanism of circRNA in disease occurrence and the function of circRNA in disease treatment is carried out, and the understanding of the association relationship between circRNA and disease is an important content of bioinformatics research, is beneficial to disease prognosis, diagnosis and treatment, and is a new way for future research.
The traditional biological experiment verification method needs a large amount of manpower and material resources, and is high in prediction accuracy but time-consuming. Biological characteristics of data are mined by using a calculation method, and the association relation of the circRNA and the disease is predicted, so that the method is convenient and efficient. The current computational methods for predicting circRNA-disease associations can be divided into two broad categories, network-based propagation and machine-learning-based.
The method based on network transmission utilizes circRNA and disease association data to construct a circRNA (disease) similarity network, and predicts the association relationship of the potential circRNA and the disease. Fan et al developed a computational model KATZHCDA using KATZ measures on a heterogeneous network of circRNA expression profiles, disease phenotype similarities and known circRNA-disease associations. The model successfully predicts the circRNA-disease association for heterogeneous networks using simple metrology, but is not suitable for predicting new diseases without any known circRNA association or isolated circRNA without any known disease association. Li et al propose a method for predicting DWNCPCDA in association with disease using the Deepwalk and network consensus projection method for circRNA. The method has the advantages that the network embedding method Deepwalk is adopted to learn node embedding of known circRNA and disease association network, and the method is combined with a similarity-based method, so that greater flexibility is provided for circRNA-disease association prediction. In the future, more biomedical association data of circRNA or diseases, such as circRNA-miRNA association and miRNA-disease association, will be integrated to further improve the prediction performance.
Machine learning-based methods exploit deep features of circRNA and disease data using supervised or unsupervised methods, iteratively learn to progressively optimize model parameters, and design classifiers to identify circrnas related to disease. Lei et al propose a calculation method RWRKNN that applies a restart random walk algorithm to weighted features with global network topology information and uses a K-nearest neighbor algorithm to classify according to the features to improve prediction performance. However, RWRKNN has a slight deficiency in methods to reveal associations between disease and new circRNA without any association or between circRNA and new disease without any association. Ding et al developed a computational model based on random walk and logistic regression to predict RWLR for the association of circRNA with disease. The method for restarting the random walk to obtain the global structure information of each circRNA is better than the method based on the similarity only. RWLR predicted novel circRNA associated with no known disease. However, RWLR only considers circRNA similarity and does not contain sufficient disease information, resulting in poor prediction accuracy. Zhang et al propose a graph representation-based learning-based approach to identifying circRNA-disease associations for predicting the potential association of circRNA with disease, iGRLCDA. The method utilizes a graph convolution neural network and a deep learning model of graph decomposition, and effectively excavates circRNA and disease information of higher levels. However, iGRLCDA is less sensitive to new circRNA-disease associations, depending on the nature or character of the known circRNA-disease association.
In view of this, it is very important to study the prediction method of circRNA-disease association relationship. The invention provides a circRNA-disease association relation prediction method based on a graph convolution neural network and Node2vec, so as to predict potential circRNA-disease association.
Disclosure of Invention
The invention aims to solve the problems of low prediction precision, time consumption performance in training and the like of the conventional circRNA-disease association prediction model, provides a circRNA-disease association relation prediction method based on a graph convolution neural network and node2vec, improves the prediction precision and reduces the training cost.
The technical scheme of the invention specifically comprises the following steps:
step 1: obtaining a circRNA-disease association matrix.
Acquiring circRNA-Disease associated data verified by experiments from a circR2Disease database, deleting redundant data, and only selecting known associated data related to human complex diseases as a circRNA-Disease associated matrix.
Step 2: calculating the semantic similarity of diseases, the Gaussian interaction spectrum nuclear similarity of diseases, the functional similarity of circRNA and the Gaussian interaction spectrum nuclear similarity of circRNA, constructing the integration similarity of circRNA and the integration similarity of diseases, and generating a circRNA-disease heteromorphic graph.
Acquiring related annotation words of each disease from a MESH database, and calculating semantic similarity among the diseases by utilizing a Directed Acyclic Graph (DAG) to obtain the semantic similarity of the diseases; calculating the core similarity of the circRNA (disease) Gaussian interaction spectrum according to the circRNA-disease association matrix; and calculating the functional similarity of the circRNA according to the semantic similarity of the diseases and the circRNA-disease association matrix. And quantifying each pair of disease similarity by integrating complementary information from a plurality of data sources and different representation methods by adopting integrated similarity to overcome inherent sparsity to obtain a circRNA integrated similarity matrix and a disease integrated similarity matrix.
And step 3: and the sparse automatic encoder performs feature extraction and transformation on the circRNA (disease) integration similarity, converts the circRNA integration similarity into a 64-dimensional feature vector, and fuses the circRNA feature vector and the disease feature vector into a final circRNA-disease feature vector.
The sparse autoencoder not only can automatically learn features, but also can give better feature description than the original data. The characteristics learned by the sparse automatic encoder are used for replacing original data, and the model prediction performance is improved to a certain extent. For this purpose, the invention uses a sparse automatic encoder to integrate similarity to circRNA (disease) respectively, minimizes the error between input and output by a back propagation algorithm, extracts and transforms features, and obtains 64-dimensional circRNA (disease) feature vectors. Finally, the circRNA (disease) feature vectors are combined to obtain the final circRNA-disease feature vector.
And 4, step 4: and extracting local structure information of the nodes from the circRNA-disease heteromorphic graph by the graph convolutional neural network.
The local structure information describes local similarities between nodes in the graph. Specifically, if there is an edge connection between two nodes, the two nodes will have a connection in the embedding space; if no edge connection exists between two nodes, their first order proximity is 0. The graph convolution neural network inputs the structure of the circRNA-disease heteromorphic graph and the characteristics of circRNA (disease) nodes, and outputs pooling information of the nodes and graph structure information to acquire local structure information.
And 5: the Node2vec method extracts the global structure information of the Node from the circRNA-disease heteromorphic graph.
The global structure information describes the relationship between two nodes that are not directly connected. The Node2vec method is a targeted improvement on Deepwalk, and is to sample a graph based on random walk and map a Node adjacent structure into a sequence structure. And then training a Skip-gram model by using the sequence obtained by sampling, capturing connectivity between nodes, and obtaining global structure information.
Step 6: and (4) sending the node information obtained in the first two steps into a random forest classifier, and predicting a potential circRNA-disease association relation.
And (3) sending the node information obtained in the first two steps into a random forest classifier, predicting a potential circRNA-disease association relation, and obtaining an AUC value and an AUPR value of the invention by adopting five-fold cross validation to obtain a prediction result.
Drawings
FIG. 1 is a schematic flow diagram of the circRNA-disease association relationship prediction method based on the atlas neural network and node2 vec.
FIG. 2 is a graph of ROC curves for an implementation of the present invention.
FIG. 3 is a PR graph of the implementation method of the present invention.
Detailed Description
The invention relates to a circRNA-disease association relation prediction method based on a graph convolution neural network and node2 vec. The present invention will be described in further detail below with reference to specific embodiments and simulation experiments. It should be understood by those skilled in the art that these implementation methods are only for explaining the technical principle of the present invention and are not intended to limit the forensic scope of the present invention.
As shown in fig. 1, a circRNA-disease association relationship prediction method based on a convolutional neural network and a node2vec specifically includes the following steps:
preferably, the obtaining of the incidence matrix in step 1 specifically includes:
experimentally verified 739 circRNA-Disease known associations (involving 661 circrnas and 100 diseases) were obtained from the circR2Disease database. After the redundant data is deleted, only 650 known association data (585 circRNAs and 88 diseases are involved) related to human complex diseases are selected as the known association matrix
Figure BDA0003704158740000051
nc and nd represent circRNA and disease number, respectively. If circRNA c i And disease d j If there is an experimentally verified known correlation, then matrix element A (c) is defined i ,d j ) 1 is ═ 1; if any circRNA c i And disease d j In the absence of known correlations, which are experimentally verified, the matrix element A is defined (c) i ,d j )=0。
Preferably, the calculating semantic similarity of diseases in step 2 specifically includes:
building a disease semantic similarity matrix by downloading disease-related data
Figure BDA0003704158740000052
Any disease d t For disease d i For the semantic contribution value of
Figure BDA0003704158740000053
Expressed, the calculation is as follows:
Figure BDA0003704158740000061
in the formula, σ represents the attenuation coefficient of the semantic contribution.
Matrix element DS (d) i ,d j ) Indicates a disease d i And disease d j The semantic similarity of diseases between them is calculated as follows:
Figure BDA0003704158740000062
preferably, the calculating of the functional similarity of circrnas in step 2 specifically comprises:
functional similarity of circRNAs is measured by their tendency to correlate with phenotypically similar diseases
Figure BDA0003704158740000063
Matrix element CS (c) i ,c j ) Represents circRNA c i And c j Functional similarity between them, calculated as follows:
Figure BDA0003704158740000064
in the formula, set D i Representation of circular RNA c i An associated disease set; set D j Representation of circular RNA c j An associated disease set; i D i I and I D j Respectively representing the sets D i And D j The number of diseases in the eye.
Preferably, the calculating of the circRNA (disease) gaussian interaction profile nuclear similarity as described in step 2 is:
the circRNA (disease) gaussian interaction profile nuclear similarity is calculated in combination with the correlation matrix and the disease semantic similarity. By means of matrices
Figure BDA0003704158740000065
The matrix element DK (d) represents the Gaussian interaction spectrum nuclear similarity of the disease i ,d j ) Indicates a disease d i And disease d j The gaussian interaction spectrum kernel similarity is calculated as follows:
Figure BDA0003704158740000066
in the formula, the parameter mu d Control kernel bandwidth indicating GIP similarity.
In the same way, the matrix
Figure BDA0003704158740000067
Matrix element CK (c) representing the Gaussian interaction spectrum nuclear similarity of circRNAs i ,c j ) Represents circRNA c i And c j The gaussian interaction spectrum kernel similarity is calculated as follows:
CK(c i ,c j )=exp(-μ c ||A(c i ,d j )-A(c j ,d j )|| 2 )
in the formula, the parameter mu c Control kernel bandwidth representing GIP similarity
Preferably, the calculating of circRNA (disease) integrated similarity in step 2 is specifically:
considering disease semantic similarity and the inherent sparsity of circRNA functional similarity, integrating complementary information from multiple data sources and different representation methods, employing integrated similarity to quantify each pair of circRNA (disease) similarity overcoming inherent sparsity. By means of matrices
Figure BDA0003704158740000071
Representing integrated similarity of disease, matrix element X d (d i ,d j ) Is calculated as follows:
Figure BDA0003704158740000072
circRNA integration similarity matrix
Figure BDA0003704158740000073
Represents, matrix element X c (c i ,c j ) Is calculated as follows:
Figure BDA0003704158740000074
preferably, the sparse automatic encoder described in step 3 performs feature extraction and transformation on the circRNA (disease) integrated similarity, and then converts the circRNA (disease) integrated similarity into a 64-dimensional feature vector, and fuses the circRNA feature vector and the disease feature vector into a final circRNA-disease feature vector, specifically:
the sparse autoencoder encodes the original input features and reduces dimensionality to find potential associations between the input features and extracts high-order features that are expressive. The sparse automatic encoder consists of an encoder and a decoder and is a neural network with three layers, including an input layer, a hidden layer and an output layer, wherein the input layer x is mapped to the hidden layer y one by one. The encoder calculates as follows:
y=sigmoid(W 1 x(i)+a 1 )
in the formula, sigmoid represents an activation function; w 1 Representing the connection parameters of the input layer x and the hidden layer y; a is 1 Indicating an offset.
The decoder calculates as follows:
z=sigmoids(W 2 y+a 2 )
in the formula, W 2 Representing the connection parameter of the hidden layer y to the output layer z, a 2 Indicating the offset.
Inputting the circRNA (disease) integration similarity into a sparse automatic encoder respectively, extracting and transforming by minimizing the error between input and output through a back propagation algorithm to obtain 64-dimensional characteristic vectors respectively Z c And Z d Combining the two to obtain the final circRNA-disease characteristic vector Z cd The calculation is as follows:
Figure BDA0003704158740000081
preferably, the local structure information of the node extracted from the circRNA-disease heteromorphic graph by the graph convolution neural network in the step 4 specifically comprises:
the graph convolution neural network inputs the structure of the graph and the characteristics of each node, and can output the pooling information of the nodes and the information of the graph (node) structure to obtain the local structure information of the graph. For this purpose, the circRNA-disease-known correlation matrix A is converted into a adjacency matrix by calculation
Figure BDA0003704158740000082
Local structural information is obtained using a spatial approach to the atlas neural network, which is calculated as follows:
Figure BDA0003704158740000083
in the formula, ReLU (, x) represents an activation function of two layers of the neural network;
Figure BDA0003704158740000084
to represent
Figure BDA0003704158740000085
The metric matrix of (a); w represents a weight matrix;
Figure BDA0003704158740000086
an adjacency matrix representing an added self-loop, which is calculated as
Figure BDA0003704158740000087
Wherein,
Figure BDA0003704158740000088
representing an identity matrix.
Preferably, the Node2vec method in step 5 extracts global structure information of the Node for the circRNA-disease heteromorphic graph, specifically:
node2vec is a semi-supervised learning for scalable feature learning in networks, which can maximally preserve the network domain possibilities of nodes in d-dimensional feature space. Firstly, sampling a graph based on random walk, mapping a node adjacent structure into a sequence structure, then training a Skip-gram model by using the sampled sequence, and capturing connectivity between nodes to obtain global structure information.
Preferably, the step 6 of sending the information to the random forest classifier specifically comprises:
and (4) sending the node information obtained in the first two steps into a random forest classifier, predicting a potential circRNA-disease association relation, and obtaining a prediction result.
The technical effects of the invention are further illustrated by experimental verification as follows:
1. experimental conditions and contents:
the experiments of the invention were performed on AMD 1.80GHz CPU and windows10 operating systems.
2. And (3) analyzing an experimental result:
the result shows that the prediction precision of the circRNA-disease association relation adopts five-fold cross validation, and the evaluation indexes are ROC and PR. Wherein ROC is the area under ROC curve with FPR as abscissa and TPR as ordinate, and PR is the area under Pre-Recall curve with Recall as abscissa and precision as ordinate. Greater ROC and PR values indicate greater accuracy.
The ROC curve graph and the PR curve graph obtained by performing five-fold cross validation in the invention are shown in fig. 2-3.
The above description is only one specific example of the present invention and should not be construed as limiting the invention in any way. It will be apparent to persons skilled in the relevant art that various modifications and changes in form and detail can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (7)

1. A circRNA-disease association relation prediction method based on a graph convolution neural network and Node2vec is characterized by comprising the following steps:
step 1: acquiring a circRNA-disease association matrix;
step 2: calculating the semantic similarity of diseases, the Gaussian interaction spectrum nuclear similarity of diseases, the functional similarity of circRNA and the Gaussian interaction spectrum nuclear similarity of circRNA, constructing the integration similarity of circRNA and the integration similarity of diseases, and generating a circRNA-disease heteromorphic graph;
and step 3: a sparse automatic encoder performs feature extraction and transformation on the circRNA (disease) integration similarity, converts the circRNA integration similarity into a 64-dimensional feature vector, and combines the circRNA feature vector and the disease feature vector into a final circRNA-disease feature vector;
and 4, step 4: extracting local structure information of nodes from the circRNA-disease heteromorphic graph by the graph convolutional neural network;
and 5: the Node2vec method extracts the global structure information of the Node from the circRNA-disease heteromorphic graph;
step 6: and (4) sending the node information obtained in the first two steps into a random forest classifier, and predicting a potential circRNA-disease association relation.
2. The circRNA-disease association prediction method based on graph-convolution neural network and Node2vec as claimed in claim 1, wherein in step 1, specifically:
acquiring circRNA-Disease associated data verified by experiments from a circR2Disease database, deleting redundant data, and only selecting known associated data related to human complex diseases as a circRNA-Disease associated matrix.
3. The circRNA-disease association prediction method based on graph-convolution neural network and Node2vec as claimed in claim 1, wherein in step 2, specifically:
acquiring related annotation words of each disease from a MESH database, and calculating semantic similarity among the diseases by utilizing a Directed Acyclic Graph (DAG) to obtain the semantic similarity of the diseases; calculating the core similarity of the circRNA (disease) Gaussian interaction spectrum according to the circRNA-disease association matrix; calculating the functional similarity of the circRNA according to the semantic similarity of the diseases and the circRNA-disease association matrix; by integrating complementary information from multiple data sources and different representation methods, integration similarity is adopted to quantify each pair of disease similarity to overcome inherent sparsity, and a circRNA integration similarity matrix and a disease integration similarity matrix are obtained.
4. The circRNA-disease association prediction method based on the atlas neural network and Node2vec as claimed in claim 1, wherein in step 3, specifically:
the sparse automatic encoder can not only automatically learn characteristics, but also give better characteristic description than original data; original data is replaced by the learned characteristics of the sparse automatic encoder, and the model prediction performance is improved to a certain extent; therefore, the invention uses a sparse automatic encoder to respectively minimize the error between input and output through a back propagation algorithm for the integration similarity of the circRNA (diseases), extracts and transforms characteristics to obtain 64-dimensional circRNA (disease) characteristic vectors; finally, the circRNA (disease) feature vectors are combined to obtain the final circRNA-disease feature vector.
5. The circRNA-disease association prediction method based on the atlas neural network and Node2vec as claimed in claim 1, wherein in step 4, specifically:
the local structure information describes local similarity between nodes in the graph; specifically, if there is an edge connection between two nodes, the two nodes will have a connection in the embedding space; if no edge connection exists between two nodes, their first order proximity is 0; the graph convolution neural network inputs the structure of the circRNA-disease heteromorphic graph and the characteristics of circRNA (disease) nodes, and outputs pooling information of the nodes and graph structure information to acquire local structure information.
6. The circRNA-disease association prediction method based on the atlas neural network and Node2vec as claimed in claim 1, wherein in step 5, specifically:
the global structure information describes the relationship between two nodes which are not directly connected; the Node2vec method is a targeted improvement on Deepwalk, and is to sample a graph based on random walk and map a Node adjacent structure into a sequence structure; and then training a Skip-gram model by using the sequence obtained by sampling, capturing the connectivity between nodes, and obtaining global structure information.
7. The circRNA-disease association prediction method based on atlas neural network and Node2vec as claimed in claim 1, wherein in step 6, specifically:
and (3) sending the node information obtained in the first two steps into a random forest classifier, predicting a potential circRNA-disease association relation, obtaining an AUC value and an AUPR value of the invention by adopting five-fold cross validation, and obtaining a prediction result.
CN202210702017.6A 2022-06-20 2022-06-20 circRNA-disease association relation prediction method based on graph convolution neural network and node2vec Pending CN114999635A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210702017.6A CN114999635A (en) 2022-06-20 2022-06-20 circRNA-disease association relation prediction method based on graph convolution neural network and node2vec

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210702017.6A CN114999635A (en) 2022-06-20 2022-06-20 circRNA-disease association relation prediction method based on graph convolution neural network and node2vec

Publications (1)

Publication Number Publication Date
CN114999635A true CN114999635A (en) 2022-09-02

Family

ID=83037287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210702017.6A Pending CN114999635A (en) 2022-06-20 2022-06-20 circRNA-disease association relation prediction method based on graph convolution neural network and node2vec

Country Status (1)

Country Link
CN (1) CN114999635A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116129992A (en) * 2023-04-17 2023-05-16 之江实验室 Gene regulation network construction method and system based on graphic neural network
CN117012382A (en) * 2023-05-22 2023-11-07 东北林业大学 Disease-related circRNA prediction system based on depth feature fusion
CN117393143A (en) * 2023-10-11 2024-01-12 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Circular RNA-disease association prediction method based on graph representation learning

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116129992A (en) * 2023-04-17 2023-05-16 之江实验室 Gene regulation network construction method and system based on graphic neural network
CN117012382A (en) * 2023-05-22 2023-11-07 东北林业大学 Disease-related circRNA prediction system based on depth feature fusion
CN117393143A (en) * 2023-10-11 2024-01-12 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Circular RNA-disease association prediction method based on graph representation learning

Similar Documents

Publication Publication Date Title
CN114999635A (en) circRNA-disease association relation prediction method based on graph convolution neural network and node2vec
CN113705772A (en) Model training method, device and equipment and readable storage medium
AU2019289227A1 (en) Filtering genetic networks to discover populations of interest
CN114496092B (en) MiRNA and disease association relation prediction method based on graph rolling network
CN108427756B (en) Personalized query word completion recommendation method and device based on same-class user model
CN113241115A (en) Depth matrix decomposition-based circular RNA disease correlation prediction method
CN104992078B (en) A kind of protein network complex recognizing method based on semantic density
CN111540405B (en) Disease gene prediction method based on rapid network embedding
CN113157957A (en) Attribute graph document clustering method based on graph convolution neural network
Yu et al. Predicting protein complex in protein interaction network-a supervised learning based method
CN112784918A (en) Node identification method, system and device based on unsupervised graph representation learning
CN109919198A (en) A kind of new network insertion learning method for restarting formula random walk
CN109948242A (en) Network representation learning method based on feature Hash
CN113436729A (en) Synthetic lethal interaction prediction method based on heterogeneous graph convolution neural network
CN117349494A (en) Graph classification method, system, medium and equipment for space graph convolution neural network
CN115602243A (en) Disease associated information prediction method based on multi-similarity fusion
CN113539479B (en) Similarity constraint-based miRNA-disease association prediction method and system
Paul et al. ML-KnockoffGAN: Deep online feature selection for multi-label learning
CN114037014A (en) Reference network clustering method based on graph self-encoder
CN106815653B (en) Distance game-based social network relationship prediction method and system
CN117393049A (en) circRNA-disease associated prediction model based on random disturbance and multi-view graph convolutional network
CN116959588A (en) Biochemical passage crosstalk identification method
CN108304546B (en) Medical image retrieval method based on content similarity and Softmax classifier
Tian et al. MAMLCDA: A Meta-Learning Model for Predicting circRNA-Disease Association Based on MAML Combined With CNN
Chen et al. Community Detection Based on DeepWalk Model in Large‐Scale Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination