CN115640529A - Novel circular RNA-disease association prediction method - Google Patents

Novel circular RNA-disease association prediction method Download PDF

Info

Publication number
CN115640529A
CN115640529A CN202211120279.8A CN202211120279A CN115640529A CN 115640529 A CN115640529 A CN 115640529A CN 202211120279 A CN202211120279 A CN 202211120279A CN 115640529 A CN115640529 A CN 115640529A
Authority
CN
China
Prior art keywords
disease
circular rna
nodes
attention
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211120279.8A
Other languages
Chinese (zh)
Inventor
王磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zaozhuang University
Original Assignee
Zaozhuang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zaozhuang University filed Critical Zaozhuang University
Priority to CN202211120279.8A priority Critical patent/CN115640529A/en
Publication of CN115640529A publication Critical patent/CN115640529A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a novel circular RNA-disease association prediction method which comprises the steps of selecting and establishing a data set, constructing disease attribute information, constructing circular RNA attribute information, extracting the characteristics of an attention mechanism and constructing a randomized deep learning classification model. The invention can effectively extract the attribute information of the circular RNA and the diseases, so that the circular RNA and the diseases can fully represent the complex relation of circular RNA-disease association; the invention can fully excavate the circular RNA-disease associated information by using the graph attention neural network and extract the depth characteristics of the circular RNA-disease associated information; the method can construct a model by using a randomized deep learning classifier, and greatly improves the prediction precision, thereby obtaining a better prediction effect; the method has low calculation cost and low power consumption; the relation of the circular RNA-diseases can be effectively predicted, and the prediction accuracy can reach more than 93%.

Description

Novel circular RNA-disease association prediction method
Technical Field
The invention relates to the field of machine learning and bioinformatics, in particular to a novel circular RNA-disease association prediction method.
Background
Circular RNA is a single-stranded circular endogenous non-coding RNA that is widely expressed in life, is between 200 and 2000 nucleotides in length, and lacks a free 5 'end cap or 3' poly a tail. Most circular RNAs are usually expressed at low levels, were once considered rare, and are functional by-products generated by splicing errors during the formation of mRNAs. However, with the development of genome-wide analysis and deep RNA sequencing technologies, circular RNAs were found to be widely present and abundantly expressed in eukaryotic cells, and to exhibit tissue and cell specificity. They play an important regulatory role in cell development and pathophysiology of disease and are therefore of increasing interest.
Recently, there is increasing evidence that circular RNA is a desirable disease-associated biomarker with stability, conservation, abundance, tissue and stage specificity, resistance to degradation by rnases and stable presence. For example, wang et al discovered that cZRANB1 can be involved in glaucoma-induced retinal neurodegeneration by modulating the activation of retinal reactive glial cells and affecting the activity of Retinal Ganglion Cells (RGCs). Research by Huang et al shows that circABCC4 is significantly up-regulated in prostate cancer tissues and cell lines, and miR-1182 is regulated by adsorbing expression of FOXP4 in prostate cancer cells. Silencing circABCC4 can significantly inhibit the proliferation, cell cycle progression, migration and invasion of prostate cancer cells, thereby delaying tumor growth.
With the progress of research on circular RNA, disease data associated with circular RNA is gradually accumulated. These data provide sufficient support for predicting circular RNA disease association based on computational methods. For example, wang et al designed the semi-supervised method sgnarda to predict circular RNA-disease associations. The method combines the similarity characteristics of diseases and the natural language characteristics of circular RNA sequences, and uses all samples to pre-train to generate an antagonistic network and fine-tune the parameters thereof so as to realize the optimal model performance. Zhao et al proposed a circular RNA-disease association prediction model IBNPKATZ based on KATZ measures and two network projection algorithms. The model uses known circular RNA and disease data to describe similarity by gaussian interaction profiling nuclei, effectively determining whether circRNA is associated with disease. Although the above methods work well in predicting circular RNA-disease associations, most methods do not consider the following: 1. in the face of large amounts of circular RNA and disease data, no important information is specifically noted; 2. Limited information processing resources are not allocated to important information; 3. iterative algorithms require a significant amount of computation time.
Disclosure of Invention
The present invention has been made to solve the above-mentioned problems occurring in the prior art. The invention relates to a novel circular RNA-disease association prediction method, which comprises the steps of data set selection and establishment, disease attribute information establishment, circular RNA attribute information establishment, feature extraction of an attention mechanism and randomized deep learning classification model establishment. The invention can effectively extract the attribute information of the circular RNA and the diseases, so that the circular RNA and the diseases can fully represent the complex relation of circular RNA-disease association; the invention can fully excavate the circular RNA-disease associated information by using the graph attention neural network and extract the depth characteristics of the circular RNA-disease associated information; the method can construct a model by using a randomized deep learning classifier, and greatly improves the prediction precision, thereby obtaining a better prediction effect; the method has low calculation cost and low power consumption; the relation of the circular RNA-diseases can be effectively predicted, and the prediction accuracy can reach more than 93%.
The invention specifically adopts the following technical scheme:
a novel circular RNA-disease association prediction method, the method comprising:
constructing a data set predicting circular RNA-disease association based on circular RNA and disease-related data;
the similarity attribute of the circular RNA and the diseases is characterized through a Gaussian interaction spectrum kernel, and the correlation among different diseases is characterized according to disease classification annotation;
based on the data set for predicting the circular RNA-disease association, the attention of the network nodes and the adjacent nodes thereof is calculated in parallel by using a graph attention network, and the attention mechanism characteristics are extracted by processing the nodes with different dimensions and applying the nodes to an inductive learning problem;
the method comprises the steps of integrating a deep RVFL network model based on a randomized deep learning classifier, training the deep RVFL network model in a closed-form non-iterative mode based on attention mechanism characteristics, and obtaining a model for circular RNA-disease association prediction.
Further, after constructing a dataset predicting a circular RNA-disease association based on circular RNA-to-disease association data, the method further comprises: the data set predicting circular RNA-disease association is resolved by the following formula (1):
Figure RE-GDA0004016432100000031
wherein
Figure RE-GDA0004016432100000032
To predict the circular RNA-disease associated data set,
Figure RE-GDA0004016432100000033
is a positive sample of the sample to be tested,
Figure RE-GDA0004016432100000034
an unlabeled sample;
a preset number of negative samples were randomly selected from among the unlabeled samples based on a down-sampling method.
Further, circular RNA and Disease-related data collected from the CircR2Disease database were used to construct a data set predicting circular RNA-Disease association.
Further, the characterization of similarity attributes of circular RNA and disease by gaussian interaction profiling nuclei comprises:
using a binary vector to represent the interaction profile kernel of a circular RNA whose corresponding position is assigned a 1 when the circular RNA is associated with a disease, and a 0 otherwise;
GIP information D of the circular RNA was calculated by the following formula GIP (d(i),d(j)):
Figure RE-GDA0004016432100000035
Figure RE-GDA0004016432100000036
Wherein σ d Is the width variable, m is the number of circular RNAs in the dataset, exp () is an empirical function, d (i) is the ith disease,
Figure RE-GDA0004016432100000037
is the vector in the adjacency matrix for the ith disease, d (j) is the jth disease,
Figure RE-GDA0004016432100000041
is the vector in the adjacency matrix for the jth disease.
Further, the annotating according to disease classification, characterizing associations between different diseases, comprises:
reflecting the association between different diseases by utilizing a directed acyclic graph DAG according to disease classification annotation, wherein nodes are used for representing diseases in the DAG, and edges are used for representing the relationship between the diseases;
disease group N e Disease d in DAG e Contribution of C e (d) The calculating method comprises the following steps:
Figure RE-GDA0004016432100000042
wherein, the first and the second end of the pipe are connected with each other,
Figure RE-GDA00040164321000000414
is a semantic contribution factor, is a point product, d 'is a disease d', C e (d ') is a contribution of disease d', children of d is a subset of disease d, and e is disease e.
The semantics of the disease are obtained by the following formula:
Figure RE-GDA0004016432100000043
where SC (e) is the semantics of the disease, N e Is a subset of diseases.
Further, the method for extracting attention mechanism features by processing nodes with different dimensions and applying the nodes to an inductive learning problem based on the data set of the predicted circular RNA-disease association and utilizing a graph attention network to calculate attention of network nodes and adjacent nodes thereof in parallel comprises the following steps:
a graph attention network (GAT) was introduced to enable feature extraction for the attention mechanism. The GAT can calculate attention of each node and its neighboring nodes in parallel, and can process nodes of different dimensions and directly apply them to inductive learning problems, thereby implementing an effective attention mechanism.
Suppose the input of the graph attention network GAT is
Figure RE-GDA0004016432100000044
Output is as
Figure RE-GDA0004016432100000045
Where N is the number of nodes, F and F' are the attributes of the input and output nodes, respectively,
Figure RE-GDA0004016432100000046
it is that,
Figure RE-GDA0004016432100000047
it is that,
Figure RE-GDA0004016432100000048
it is that,
Figure RE-GDA0004016432100000049
is that; training weight matrices for all nodes
Figure RE-GDA00040164321000000410
And obtaining corresponding input and output conversion, and implementing a self-attention mechanism a for each node:
Figure RE-GDA00040164321000000411
Figure RE-GDA00040164321000000412
attention coefficient e of the self-attention mechanism i,j For expressing the importance of node j to node i, expressed as:
Figure RE-GDA00040164321000000413
regularizing all neighbors of the node with a softmax function:
Figure RE-GDA0004016432100000051
note that the mechanism is a single-layer feed-forward neural network in GAT, with weight vectors
Figure RE-GDA0004016432100000052
The nonlinear activation of LeakyReLU was determined and added, as shown below:
Figure RE-GDA0004016432100000053
after obtaining the normalized attention coefficient between different nodes after regularization, the output characteristic of each node is calculated by the following formula (9) or (10)
Figure RE-GDA0004016432100000054
Figure RE-GDA0004016432100000055
Figure RE-GDA0004016432100000056
Where σ () is an activation function, K is a sequence number, K is a node number,
Figure RE-GDA0004016432100000057
is the attention-giving mechanism parameter, W k Is the weight.
Further, the input of each hidden layer in the deep RVFL network model is from the nonlinear transformation characteristic of the previous layer and the original input characteristic, and the input is described as:
Figure RE-GDA0004016432100000058
where g (-) is a non-linear activation function, H (l-1) Is the l-1 st layer input, W (l) Is the ith layer weight, is the number of layers, when l =1 layer,
Figure RE-GDA0004016432100000059
representing a weight matrix between the input and the first hidden layer; when l is>When the number of the layers is 1,
Figure RE-GDA00040164321000000510
representing a weight matrix between the inner hidden layers;
the input of the output layer in the deep RVFL network model is composed of the nonlinear characteristics of the superposed hidden layer and the original characteristics, and is represented as follows:
D=[H (1) H (2) ...H (l-1) H (l) X] (12)
the output of the deep RVFL network model may be defined as Y = D β d When using regularized least squares, its closed-form solution is described as:
Figure RE-GDA0004016432100000061
wherein beta is ed Is a closed solution, λ is a weight, I is a unit vector, D T Is the input to this layer.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a schematic flow chart of a novel method for predicting circular RNA-disease association in an embodiment of the present invention;
FIG. 2 is a graph of the 5FCV ROC curve obtained on a reference data set according to the present invention;
FIG. 3 is a bar graph comparing the results of different classifier models.
Detailed Description
In order to make the technical solutions of the present invention better understood, the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments. The following detailed description of embodiments of the invention is provided in connection with the accompanying drawings and the detailed description of embodiments of the invention, but is not intended to limit the invention. The order in which the various steps described herein are described as examples should not be construed as a limitation if there is no requirement for a contextual relationship between each other, and one skilled in the art would know that sequential adjustments may be made without destroying the logical relationship between each other, rendering the overall process impractical.
The embodiment of the invention provides a novel circular RNA-disease association prediction method, which is carried out according to the following steps as shown in figure 1:
a. selection and establishment of data sets: constructing a data set for predicting circular RNA-Disease association by using circular RNA and Disease-related data collected from a circular R2Disease database;
b. constructing circular RNA and disease attribute information: characterizing similarity attributes of the circular RNA and the disease by gaussian interaction spectroscopy nuclei; characterizing associations between diseases according to authoritative disease classification notes provided by MeSH;
c. feature extraction of attention mechanism: the attention of the network nodes and the adjacent nodes thereof is calculated in parallel by using a graph attention network algorithm, and effective attention mechanism feature extraction is realized by processing the nodes with different dimensions and applying the nodes to the inductive learning problem.
d. Constructing a randomized deep learning classification model: the model is trained using the edRVFL algorithm in a closed-form solution non-iterative manner, and a high-level representation of the data is used for fast and accurate classification.
The selection and establishment of the data set in the step a are as follows: the invention performs experiments on the basis of a reference data set, circR2 Disease. This data set currently collects 739 experimentally validated circular RNAs and disease association data, including 100 diseases and 661 circular RNAs. The invention divides the data set into positive samples according to the incidence relation
Figure RE-GDA0004016432100000071
And unlabeled samples
Figure RE-GDA0004016432100000072
Figure RE-GDA0004016432100000073
To construct a balanced data set to make the evaluation parameters more accurate, we randomly selected 739 negative samples from unlabeled samples using a down-sampling method. Although this method does not guarantee that the negative sample taken is a true negative sample, it is only selected at a ratio of 739 ÷ (661X 100-739) ≈ 1.13%. The probability of pairing of related circular RNA diseases in the selected negative samples is much less than this value, which can be ignored from a probability point of view.
Constructing cyclic RNA and disease attribute information in the step b: the disease information used in the present invention is from the MeSH database, which is an authoritative thesaurus compiled by the national medical library (NLM) and used as the basis for biomedical indexing. In MeSH, the principle of topic terms and subtopic terms for each category is strictly defined, and various references and comments are attached. From the information provided by MeSH, we use Directed Acyclic Graphs (DAGs) to reflect associations between different diseases.The nodes are used to represent diseases in the DAG and the edges are used to represent relationships between diseases. Thus, disease d is at DAG e Contribution of (1) C e (d) The following can be calculated:
Figure RE-GDA0004016432100000081
according to the above definition, we can accumulate the disease set N e All disease contributions in the database to obtain the semantics of the disease.
Figure RE-GDA0004016432100000082
Gaussian interaction profile nuclear similarity (GIP) is used to describe circular RNA information, assuming that the probability of similar circular RNAs is associated with diseases with similar function. We use binary vectors to represent the interaction profile nuclei of circular RNAs whose corresponding positions are assigned a 1 when they are associated with disease, and a 0 otherwise. Therefore, we can calculate the GIP message D of the circular RNA using the following formula GIP (d (i), d (j)), where σ d Is the width variable, and m is the number of circular RNAs in the dataset.
Figure RE-GDA0004016432100000083
Figure RE-GDA0004016432100000084
In the step c, attention is paid to feature extraction of an attention mechanism: the present invention utilizes an attention mechanism to extract features from circular RNA-disease data. Attention mechanism has originated from the study of human vision. In cognitive science, humans selectively focus on a particular portion of information and then focus on it, while ignoring other portions, thereby making reasonable use of limited visual resources. In the present invention we introduce the attention network (GAT) to enable feature extraction for the attention mechanism. The GAT can calculate attention of each node and its neighboring nodes in parallel, and can process nodes of different dimensions and directly apply them to inductive learning problems, thereby implementing an effective attention mechanism.
Assume the input of GAT is
Figure RE-GDA0004016432100000085
Output is as
Figure RE-GDA0004016432100000086
Where N is the number of nodes and F' are the attributes of the input and output nodes, respectively. To obtain the corresponding input/output conversion, we need to train the weight matrix of all nodes
Figure RE-GDA0004016432100000091
Then, we implement a self-attention mechanism a for each node:
Figure RE-GDA0004016432100000092
regardless of the information of the graph structure, its attention coefficient e i,j For representing the importance of node j to node i, as follows:
Figure RE-GDA0004016432100000093
to facilitate the comparison and calculation of the attention coefficients, a softmax function is introduced to regularize all the neighbors of a node:
Figure RE-GDA0004016432100000094
note that the mechanism is a single-layer feed-forward neural network in GAT, with weight vectors
Figure RE-GDA0004016432100000095
The nonlinear activation of LeakyReLU was determined and added, which is expressed as follows:
Figure RE-GDA0004016432100000096
after the normalized attention coefficients between different nodes are obtained through the above operations, the output characteristic of each node may be calculated as follows:
Figure RE-GDA0004016432100000097
to speed up the process of self-attention learning, we use a multi-head attention mechanism to set up the attention mechanism to work independently and join the results together after separate calculations:
Figure RE-GDA0004016432100000098
d, constructing a randomized deep learning classification model in the step d: in the present invention, we use a randomization-based deep learning classifier integrated deep RVFL network (edRVFL) to classify features to determine if they are related. edRVFL each hidden layer has as input the non-linear transformation characteristics from the previous layer and the original input characteristics, whose inputs can be described as:
Figure RE-GDA0004016432100000099
where g (-) is a non-linear activation function, when l =1 layer,
Figure RE-GDA00040164321000000910
representing a weight matrix between the input and the first hidden layer; when is l>When the number of the layers is 1,
Figure RE-GDA0004016432100000101
representing a weight matrix between the inner hidden layers. The input of the output layer consists of the nonlinear features that superimpose the hidden layer and the original features, and can be expressed as:
D=[H (1) H (2) ...H (l-1) H (l) X] (12)
thus, the output of edRVFL may be defined as Y = D β d . When using regularized least squares, its closed-form solution can be described as:
Figure RE-GDA0004016432100000102
to fully utilize all data and obtain reliable results, we used the five-fold cross-validation (5 FCV) method to calculate accuracy (Acc), sensitivity (Sen), precision (Pre), F1 score (F1), and Mausre Correlation Coefficient (MCC) in the experiment. Specifically, we first randomly split the raw data into 5 sets by non-oversampling, and then select one of them at a time as a test set, and the remaining 4 as training sets for model training. This step is repeated 5 times until each subset is used as a test set once and only once. Finally, the average of the 5 test results was calculated and summarized in table 1 as the evaluation index result of the model. As can be seen from the table, the accuracy of the present invention in the baseline data set reached 93.10%, with a standard deviation of 1.91%. In five groups of cross validation experiments, the optimal accuracy rate is up to 96.27%, and the lowest accuracy rate is also up to 91.19%. The average values of the present invention reached 93.44% and 86.55% on F1 and MCC, respectively, which reflect the overall performance of the model. In the sensitivity and accuracy evaluation indexes, the invention respectively reaches 97.56 percent and 89.68 percent. From the ROC curves in fig. 2, it can be seen that the five sets of curves generated by the present invention tend to the upper left corner of the graph, with AUC values of 0.9235.
TABLE 1 results of 5FCV experiments obtained on the reference data set according to the invention
Figure RE-GDA0004016432100000111
Comparison of different classifier models: to verify whether the edRVFL classifier is closely related to the improvement in performance of the present invention, we compared it to other classifier models. In particular, we input the extracted features into different classifier models, including random vector function chaining (RVFL), extreme Learning Machine (ELM), rotating forest (ROF), random forest (RAF), K-nearest neighbor (KNN), and Support Vector Machine (SVM), to perform 5FCV experiments, and then compare their results with those of the present invention. Table 2 summarizes the results of these classifier models, with the highest values shown in bold. At the same time, we plot these values in the form of a histogram in the graph. From the comparison results, the present invention achieved the best results in terms of accuracy, precision, F1 score, MCC, and AUC, which were 4.87%, 6.03%, 4.41%, 9.23%, and 0.0434% higher than the mean of the other classifier models, respectively. The present invention achieves the second highest results in terms of sensitivity, but only 0.14% lower than the best results. In summary, the present invention achieves the best overall performance in this comparison. This result indicates that the randomization-based deep learning algorithm can greatly improve model performance and help to accurately predict circular RNA-disease associations.
TABLE 2 Experimental results obtained on the reference data set for different classifier models
Figure RE-GDA0004016432100000121
Comparison with other existing methods: in recent years, as cyclic RNA research has progressed, a number of computational-based models have been designed to rapidly predict cyclic RNA-disease associations with satisfactory results. To fully validate the capabilities of the present invention, we compared it to these excellent models, including wang et al, PWCDA, GCNCDA, NCPCDA, DWNN-RLS, SIMCCDA, and MRLDC. For fairness, we chose these models based on the reference dataset and experimented with the 5FCV method. Since AUC can fully reflect the overall performance of the models, we summarize the AUC scores generated by these models in table 3. As can be seen from the table, the AUC scores achieved by the present invention are higher than those of these models. The results show that the model based on the randomized deep learning algorithm and combined with the attention mechanism has the best performance.
TABLE 35 FCV AUC scores obtained for different models
Figure RE-GDA0004016432100000131
Moreover, although exemplary embodiments have been described herein, the scope thereof includes any and all embodiments based on the present invention with equivalent elements, modifications, omissions, combinations (e.g., of various embodiments across), adaptations or alterations. The elements of the claims are to be interpreted broadly based on the language employed in the claims and not limited to examples described in the present specification or during the prosecution of the application, which examples are to be construed as non-exclusive. It is intended, therefore, that the specification and examples be considered as exemplary only, with a true scope and spirit being indicated by the following claims and their full scope of equivalents.
The above description is intended to be illustrative and not restrictive. For example, the above-described examples (or one or more versions thereof) may be used in combination with each other. For example, other embodiments may be used by those of ordinary skill in the art upon reading the above description. In addition, in the above-described embodiments, various features may be grouped together to streamline the disclosure. This should not be interpreted as an intention that features of an unclaimed invention be essential to any of the claims. Rather, inventive subject matter may lie in less than all features of a particular inventive embodiment. Thus, the following claims are hereby incorporated into the detailed description as examples or embodiments, with each claim standing on its own as a separate embodiment, and it is contemplated that the embodiments can be combined with each other in various combinations or permutations. The scope of the invention should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.

Claims (7)

1. A novel circular RNA-disease association prediction method, comprising:
constructing a data set predicting circular RNA-disease association based on circular RNA and disease-related data;
the similarity attribute of the circular RNA and the diseases is represented through a Gaussian interaction spectrum kernel, and the correlation among different diseases is represented according to disease classification annotation;
based on the data set for predicting the circular RNA-disease association, the attention of the network nodes and the adjacent nodes thereof is calculated in parallel by using a graph attention network, and the attention mechanism characteristics are extracted by processing the nodes with different dimensions and applying the nodes to an inductive learning problem;
and integrating a deep RVFL network model based on a randomized deep learning classifier, and training the deep RVFL network model in a closed-form solution non-iterative mode based on the attention mechanism characteristics to obtain a model for circular RNA-disease associated prediction.
2. The novel method for predicting circular RNA-disease association according to claim 1, wherein after constructing the data set for predicting circular RNA-disease association based on the circular RNA and disease-related data, the method further comprises: the data set predicting circular RNA-disease association is resolved by the following formula (1):
Figure FDA0003846403140000011
wherein
Figure FDA0003846403140000012
To predict the circular RNA-disease associated data set,
Figure FDA0003846403140000013
is a positive sample of the sample to be tested,
Figure FDA0003846403140000014
is an unlabeled sample;
a preset number of negative samples were randomly selected from among the unlabeled samples based on a down-sampling method.
3. The method of claim 1, wherein the circular RNA and Disease-related data collected from the circle 2Disease database are used to construct a data set for predicting circular RNA-Disease association.
4. The method for predicting circular RNA-disease association as claimed in claim 1, wherein said characterization of similarity between circular RNA and disease by Gaussian interaction spectrum kernel comprises:
using a binary vector to represent the interaction profile nuclei of the circular RNAs, whose corresponding positions are assigned to 1 when the circular RNAs are associated with a disease, and 0 otherwise;
GIP information D of the circular RNA was calculated by the following formula GIP (d(i),d(j)):
Figure DEST_PATH_GDA0004016432100000083
Figure FDA0003846403140000022
Wherein σ d Is the width variable, m is the number of circular RNAs in the dataset, exp () is an empirical function, d (i) is the ith disease,
Figure FDA0003846403140000023
is the vector in the adjacency matrix for the ith disease, d (j) is the jth disease,
Figure FDA0003846403140000024
is the vector in the adjacency matrix for the jth disease.
5. The novel circular RNA-disease association prediction method of claim 1 or 4, wherein the characterization of the association between different diseases according to the disease classification annotation comprises:
reflecting the association between different diseases by utilizing directed acyclic graph DAG according to disease classification annotation, wherein nodes are used for representing the diseases in the DAG, and edges are used for representing the relationship between the diseases;
disease group N e Disease d in DAG e Contribution of (1) C e (d) The calculation method comprises the following steps:
Figure FDA0003846403140000025
where θ is a semantic contribution factor, is a point product, d 'is a disease d', C e (d ') is a contribution of disease d', children of d is a subset of disease d, e is disease e;
the semantics of the disease are obtained by the following formula:
Figure FDA0003846403140000026
where SC (e) is the semantics of the disease, N e Is a subset of diseases.
6. The novel circular RNA-disease association prediction method of claim 1, wherein the method for extracting attention mechanism features by processing nodes with different dimensions and applying the nodes to induction learning problem based on the data set for predicting circular RNA-disease association and utilizing a graph attention network to calculate attention of network nodes and adjacent nodes in parallel based on the data set for predicting circular RNA-disease association comprises the following steps:
a graph attention network (GAT) was introduced to enable feature extraction for the attention mechanism. The GAT can calculate attention of each node and its neighboring nodes in parallel, and can process nodes of different dimensions and directly apply them to inductive learning problems, thereby implementing an effective attention mechanism.
Suppose the input of the graph attention network GAT is
Figure FDA0003846403140000031
Output is as
Figure FDA0003846403140000032
Where N is the number of nodes, F and F' are the attributes of the input and output nodes, respectively,
Figure FDA0003846403140000033
is a function of the input vector or vectors,
Figure FDA0003846403140000034
is the ith input vector and is the vector of the ith input,
Figure FDA0003846403140000035
is the output vector of the output vector,
Figure FDA0003846403140000036
is the ith output vector; training weight matrices for all nodes
Figure FDA0003846403140000037
And obtaining corresponding input and output conversion, and implementing a self-attention mechanism a for each node:
Figure FDA0003846403140000038
attention coefficient e of the self-attention mechanism i,j For expressing the importance of node j to node i, expressed as:
Figure FDA0003846403140000039
regularizing all neighbors of the node with a softmax function:
Figure FDA00038464031400000310
the attention mechanism is a single layer feedforward neural network in GATFrom the weight vector
Figure FDA00038464031400000311
The nonlinear activation of LeakyReLU was determined and added as follows:
Figure FDA00038464031400000312
after obtaining the normalized attention coefficient between different nodes after regularization, the output characteristic of each node is calculated by the following formula (9) or (10)
Figure FDA00038464031400000313
Figure FDA00038464031400000314
Figure FDA00038464031400000315
Where σ () is an activation function, K is a sequence number, K is a node number,
Figure FDA00038464031400000316
is a parameter of attention mechanism, W k Is the weight.
7. The novel circular RNA-disease association prediction method as claimed in claim 1, wherein the input of each hidden layer in the deep RVFL network model is from the nonlinear transformation characteristics of the previous layer and the original input characteristics, and the input is described as follows:
Figure FDA0003846403140000041
wherein g (-) is a nonlinear laserLive function, H (l-1) Is the l-1 th layer input, l is the number of layers, when l =1 layer,
Figure FDA0003846403140000042
representing a weight matrix between the input and the first hidden layer; when l is>When the number of the layers is 1,
Figure FDA0003846403140000043
representing a weight matrix between the inner hidden layers;
the input of the output layer in the deep RVFL network model consists of the nonlinear characteristics of the superposed hidden layer and the original characteristics, and is represented as follows:
D=[H (1) H (2) ...H (l-1) H (l) X] (12)
the output of the deep RVFL network model may be defined as Y = D β d When using regularized least squares, its closed-form solution is described as:
Figure FDA0003846403140000044
wherein beta is ed Is a closed solution, λ is a weight, I is a unit vector, D T Is the input to this layer.
CN202211120279.8A 2022-09-15 2022-09-15 Novel circular RNA-disease association prediction method Pending CN115640529A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211120279.8A CN115640529A (en) 2022-09-15 2022-09-15 Novel circular RNA-disease association prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211120279.8A CN115640529A (en) 2022-09-15 2022-09-15 Novel circular RNA-disease association prediction method

Publications (1)

Publication Number Publication Date
CN115640529A true CN115640529A (en) 2023-01-24

Family

ID=84941364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211120279.8A Pending CN115640529A (en) 2022-09-15 2022-09-15 Novel circular RNA-disease association prediction method

Country Status (1)

Country Link
CN (1) CN115640529A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116360366A (en) * 2023-03-24 2023-06-30 淮阴工学院 Sewage treatment process optimization control method
CN117012382A (en) * 2023-05-22 2023-11-07 东北林业大学 Disease-related circRNA prediction system based on depth feature fusion

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116360366A (en) * 2023-03-24 2023-06-30 淮阴工学院 Sewage treatment process optimization control method
CN116360366B (en) * 2023-03-24 2023-12-01 淮阴工学院 Sewage treatment process optimization control method
CN117012382A (en) * 2023-05-22 2023-11-07 东北林业大学 Disease-related circRNA prediction system based on depth feature fusion

Similar Documents

Publication Publication Date Title
CN115640529A (en) Novel circular RNA-disease association prediction method
Fan et al. lncRNA-MFDL: identification of human long non-coding RNAs by fusing multiple features and using deep learning
Liu et al. SMALF: miRNA-disease associations prediction based on stacked autoencoder and XGBoost
CN114496092B (en) MiRNA and disease association relation prediction method based on graph rolling network
CN110556184B (en) Non-coding RNA and disease relation prediction method based on Hessian regular nonnegative matrix decomposition
CN113724790B (en) PiRNA-disease association relation prediction method based on convolution denoising self-encoder
CN113871021A (en) Graph and attention machine mechanism-based circRNA and disease association relation prediction method
CN112270958A (en) Prediction method based on hierarchical deep learning miRNA-lncRNA interaction relation
CN108427865B (en) Method for predicting correlation between LncRNA and environmental factors
Zhu et al. Pre-train and learn: Preserving global information for graph neural networks
CN113869424A (en) Semi-supervised node classification method based on two-channel graph convolutional network
CN115995293A (en) Circular RNA and disease association prediction method
Li et al. First-order sensitivity analysis for hidden neuron selection in layer-wise training of networks
Qian et al. CMIVGSD: circRNA-miRNA interaction prediction based on Variational graph auto-encoder and singular value decomposition
Ji et al. Predicting miRNA-disease associations based on heterogeneous graph attention networks
CN113539372A (en) Efficient prediction method for LncRNA and disease association relation
Jing et al. Prediction of the transcription factor binding sites with meta-learning
CN116543832A (en) disease-miRNA relationship prediction method, model and application based on multi-scale hypergraph convolution
CN114141306B (en) Distant metastasis identification method based on gene interaction mode optimization graph representation
Wang et al. MAGCDA: a multi-hop attention graph neural networks method for CircRNA-disease association prediction
CN113223622B (en) miRNA-disease association prediction method based on meta-path
CN115410643A (en) Prediction method, system, equipment and storage medium of enhancer based on stacking ensemble learning
CN111984762B (en) Text classification method sensitive to attack resistance
Majeed et al. A comparison between the performance of features selection techniques: survey study
Wali et al. m-CALP–Yet another way of generating handwritten data through evolution for pattern recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination