CN116343927A - miRNA-disease association prediction method based on enhanced hypergraph convolution self-coding algorithm - Google Patents
miRNA-disease association prediction method based on enhanced hypergraph convolution self-coding algorithm Download PDFInfo
- Publication number
- CN116343927A CN116343927A CN202310121363.XA CN202310121363A CN116343927A CN 116343927 A CN116343927 A CN 116343927A CN 202310121363 A CN202310121363 A CN 202310121363A CN 116343927 A CN116343927 A CN 116343927A
- Authority
- CN
- China
- Prior art keywords
- mirna
- disease
- similarity
- matrix
- association
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioethics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Public Health (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention provides an enhanced hypergraph convolution self-coding algorithm-based miRNA-disease association prediction method, which comprises the following steps: s1: acquiring an adjacent matrix A of miRNA-diseases, and describing the association relationship between the miRNA-diseases by the adjacent matrix A of the miRNA-diseases; s2: respectively calculating miRNA Gaussian similarity MG, miRNA cosine similarity MC, disease Gaussian similarity DG and disease cosine similarity DC through an adjacent matrix A, and respectively taking the miRNA Gaussian similarity MG, the miRNA cosine similarity MC, the disease Gaussian similarity DG and the disease cosine similarity DC as the characteristics of miRNA and the characteristics of diseases; s3: acquiring graph hidden association information by using weighted K neighbor according to Gaussian similarity, so as to obtain an enhanced hypergraph association matrix; s4: mapping the features of miRNA and the features of diseases to the same domain space through a full-connection layer, and training by using a hypergraph convolution mode to obtain embedded features; s5: the embedded features are decoded by using a bilinear decoder, and the association scores are calculated.
Description
Technical Field
The invention belongs to the field of combination of deep learning and biological information, and particularly relates to an miRNA-disease association prediction method based on an enhanced hypergraph convolution self-coding algorithm.
Background
mirnas are a class of single-stranded non-coding RNAs of approximately 22 nucleotides in length. Numerous studies have shown that their posttranscriptional regulation of genes can exert biological functions, for example in terms of cellular development and immune response, and that their deregulation can lead to cellular disorders. In recent years, scientists research and discover that miRNA imbalance has close relation with various human complex diseases, such as lung cancer, breast cancer, liver cancer and the like. Thus, identifying disease-associated mirnas is of great significance to the discovery of disease biomarkers and in the diagnosis of complex diseases in humans. In the past study, the experimenter basically predicts a new association relationship through the experimental experience of the individual and the biological targeting characteristic recognition, and the process is time-consuming and labor-consuming and has a great association with the individual level of the experimenter. Therefore, a new method is highly expected to rapidly predict new association relations, the original association is calculated through a calculation method, the potential miRNA-disease association is deduced to make up for the vacancy to a certain extent, and in the previous study, the calculation model plays an effective auxiliary role in miRNA-disease association prediction.
Currently, identification of disease-related mirnas may facilitate pathological studies of disease and detection of disease biomarkers. The discovery of potential miRNA-disease associations is a great aid in the study of understanding disease pathogenesis and developing therapeutic approaches to human disease. For example, previous findings indicate that miR-204 can function as a tumor inhibitor in non-small cell lung cancer (NSCLC) by targeting JAK2, and that miR-204 can be used in medicine as a biomarker for diagnosing and treating NSCLC. The prognosis of hepatocellular carcinoma (HCC) patients is poor, and phosphoproteins (VASP) act as modulators of actin cytoskeleton and cell migration, with overexpression in HCC being associated with malignancy characteristics and poor prognosis. The miRNA-204 down-regulation caused by hypoxia has the function of regulating the over-expression of the VASP at the posttranscriptional level, indirectly promotes the up-regulation of the VASP at the transcriptional and posttranscriptional levels, and can be used as a biomarker for prognosis to a certain extent.
To further understand the relationship of mirnas and diseases, researchers have proposed several new hypotheses, one of which is that phenotypically similar diseases are often associated with mirnas with similar functions. Based on this key hypothesis, researchers have proposed many similarity-based and machine-learning-based methods, but while these methods can learn and capture important relationships between data, significant research effort has been achieved, most ignore global network structures. On the other hand, the methods cannot mine complex connection of deep layers of the network, and nonlinear relations in the network are extremely difficult to capture. The deep learning method can capture nonlinear relations in the network by constructing complex connections, and has been remarkably successful in supervised learning tasks in multiple fields such as computer vision, natural language processing, voice processing and the like.
Deep learning is also widely used in bioinformatics because of its ability to process unstructured data, and researchers have proposed a series of methods based on Graph Neural Networks (GNNs). GNN is derived from Convolutional Neural Network (CNN) and graph embedding concepts, and has excellent performance in aggregating graph structure information. At present, the GNN has advanced performance in node classification tasks and has wide application in life science and physical science. For example, korean et al, capturing nonlinear interactions through GCN and matrix decomposition, proposed a new disease gene association task framework GCN-MF. Tang et al developed a multi-view multi-channel attention graph roll-up network (MMGCN) to predict the potential relevance of MDA. A new approach, named nimgcn, was proposed by peri et al to incorporate GNNs with a neuroinductive matrix completion model. Recently, attention mechanisms have also been introduced into GNNs, which enable networks to focus on task-related input parts to identify hidden relationships between miRNA-diseases. For example, ro et al create a hierarchical attention mechanism that learns node representations at the node level and adjusts contributions from different input graphs using graph level attention.
However, in the training process, many training is performed based on the original correlation diagram, because the negative samples are difficult to obtain, a large number of unknown samples are assumed to be used as the negative samples for training, and the use of the unknown samples as the negative samples for training may reduce the flow of data between the negative samples, which affects the prediction accuracy of the method.
Disclosure of Invention
The invention provides an enhanced supergraph convolution self-coding algorithm-based miRNA-disease association prediction method, which can more accurately predict the association of miRNA-diseases.
In order to solve the technical problems, the technical scheme of the invention is as follows:
an enhanced hypergraph convolution self-coding algorithm-based miRNA-disease association prediction method is required to comprise the following steps:
s1: acquiring an adjacent matrix A of miRNA-diseases, and describing the association relationship between the miRNA-diseases by the adjacent matrix A of the miRNA-diseases;
s2: respectively calculating miRNA Gaussian similarity MG, miRNA cosine similarity MC, disease Gaussian similarity DG and disease cosine similarity DC through an adjacent matrix A, and respectively taking the miRNA Gaussian similarity MG, the miRNA cosine similarity MC, the disease Gaussian similarity DG and the disease cosine similarity DC as the characteristics of miRNA and the characteristics of diseases;
s3: obtaining graph hiding association information by using weighted K nearest neighbor (the weighted K-neighborhood profile, WKNN) according to Gaussian similarity, so as to obtain an enhanced hypergraph association matrix;
s4: mapping the features of miRNA and the features of diseases to the same domain space through a full-connection layer, and training by using a hypergraph convolution mode to obtain embedded features;
s5: the embedded features are decoded using a bilinear decoder, and their associated scores are calculated.
Preferably, the miRNA-disease adjacency matrix a mentioned in step S1 is specifically generated in the following manner:
acquisition of miRNA-disease associated data sets from human microRNA disease database (The Human MicroRNA Disease Database, HMDD)Wherein the HMDD v3.2 contains 1206 miRNAs and 893 diseases, and the total association relation is 35547. 913 mirnas and 554 diseases were retained after deletion of non-human data, and the miRNA-disease adjacency matrix could be expressed asWherein n is m And n d Expressed as the number of mirnas, the number of diseases, miRNA-disease adjacency matrix a is defined as follows:
preferably, the method for calculating the gaussian similarity mentioned in step S2 may calculate the gaussian similarity MG of the miRNA, specifically:
calculating the frequency width of the Gaussian interaction characteristic kernel similarity:
n m represents the number of mirnas; vector A (i: representing row i of the miRNA-disease adjacency matrix A, 1 in each row representing whether the miRNA has an association with the disease;
calculating the association similarity between each miRNA:
MG(i,j)=exp(-γ i ||A(i,:)-A(j,:)|| 2 )
MG (i, j) represents a similarity between the i-th miRNA and the j-th miRNA; exp is expressed as a power function of e as the base; vector A (i: representing row i of the miRNA-disease adjacency matrix A and vector A (j: representing row j of the miRNA-disease adjacency matrix A).
Preferably, the calculation method for calculating the gaussian similarity DG of the disease is the same as the calculation method for calculating the gaussian similarity MG of the miRNA, so as to calculate the gaussian similarity DG of the disease, wherein the matrix used is a transpose matrix of the miRNA-disease adjacency matrix a.
Preferably, the method for calculating cosine similarity mentioned in step S2 may calculate that the cosine similarity of miRNA is MC, specifically:
and according to cosine included angles of the two vectors, measuring similarity of the two vectors, wherein the smaller the included angle is, the more similar the included angle is. For miRNA-miRNA association, the similar calculations are:
MC (i, j) represents cosine similarity between the i-th miRNA and the j-th miRNA; vector A (i: representing row i of the miRNA-disease adjacency matrix A, vector A (j: representing row j of the miRNA-disease adjacency matrix A), and symbol represents the inner product between the two vectors.
Preferably, the method for calculating the cosine similarity DC of the disease is the same as the method for calculating the cosine similarity MC of the miRNA, wherein the matrix used is the transpose of the miRNA-disease adjacency matrix a.
Preferably, in step S3, the weighted K nearest neighbor (the weighted K-neighborhood profile, WKNNP) is used to obtain graph hiding association information, so as to obtain an enhanced hypergraph association matrix, which uses the gaussian similarity MG of miRNA and the gaussian similarity DG of disease, and the specific definition of the calculation method WKNNP for constructing the graph hiding link is as follows:
in order to better perform message transmission, the original adjacency matrix A is preprocessed to obtain more hidden relations. Defining a vector A (i: representing an ith row of the miRNA-disease adjacency matrix A, representing the association of miRNAs, and a vector A (j: representing a jth column of the miRNA-disease adjacency matrix A, representing the association of diseases);
for each miRNA, the k most relevant similar miRNAs are obtained from the Gaussian similarity, let m j The sequence number of the jth miRNA which represents the similarity is calculated, and the new association vector is calculated as follows:
wherein A is m (i) represents mNew association relationship of ith row calculated by iRNA, Q m =∑ 1≤j≤K MG(i,m j ) Is a normalization term; w (w) j =α j-1 MG(i,m j ) Wherein alpha is a decay term, and the value of alpha is [ epsilon ] [0,1 ]];
Similarly, a new association vector can be calculated for each disease, and the formula is as follows:
wherein A is d (i) represents the calculated new association relationship of the ith column, let d j Sequence number, Q, of the j-th disease showing similarity thereto d =∑ 1≤j≤K DG(i,d j ) Is a normalization term; w (w) j =α j-1 DG(i,m j ) Wherein alpha is a decay term, and the value of alpha is [ epsilon ] [0,1 ]];
Finally, obtaining the hidden association information of the graph, and obtaining an enhanced hypergraph association matrix through the following formula:
preferably, in step S4, the miRNA and the disease feature are mapped to the same domain space, specifically:
the feature matrix of the features of miRNAs can be expressed as [ MG, Z ] d ,MC,Z d ]The characteristics of a cognate disease can be expressed as [ Z m ,DG,Z m ,DC]Wherein Z is d ,Z m Column numbers are shown as 0 matrices of disease length and miRNA length, respectively. And respectively passing miRNA and disease characteristics through a linear layer, performing linear dimension reduction, and finally passing through a relu activation function. Can obtain the embedded E of miRNA m And embedding of diseases E d 。
Preferably, the method of step S4 using hypergraph convolution is specifically defined as:
according to the association relation A of miRNA-diseases, constructing an miRNA superside relation, and assuming that the node of the initial miRNA is characterized in that:
the superside signature of a miRNA is associated with all nodes within the superside. The superside characteristics of miRNA can be obtained by the method:
wherein the method comprises the steps ofNode characteristics of miRNA hypergraph representing layer I, B -1 Is A T Is a normalized matrix of (a);
the following hypergraph convolution equation is defined according to the equation of graph convolution:
expanding the formula yields the following formula:
the feature of the node is thus made to flow in the global, whereby a data transfer matrix W can be trained.
Preferably, in step S5, the embedded feature is decoded by using a bilinear decoder, and the correlation score is calculated, where the bilinear decoder is implemented as follows:
the encoded miRNAs and diseases are embedded as input of a decoder, and different interaction types are regarded as different categories through a bilinear decoder. Calculation of miRNAm i And disease d j Is a correlation score of (2):
wherein W is r Is a matrix of trainable parameters whose parameter dimension is the embedded dimension of the input, where R e r= [0,1]。
Preferably, in order to optimize the result, the cross-iteration entropy is used as a loss function, calculating the loss of the predicted result from the true value. Global network parameters are optimized by Adam as a gradient optimizer.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
according to the method, the original image is enhanced through weighting K nearest neighbor calculation and preprocessing of the original incidence matrix, the influence of negative sample random sampling is reduced, the information of the original image is enriched, the hidden incidence information of the image is obtained, and finally the enhanced hyperimage incidence matrix is obtained. Then, a new hypergraph network is constructed, the enhanced hypergraph incidence matrix information is combined, graph node characteristics flow on the hypergraph, global information is collected, the graph is sampled at a high level, nonlinear information of the graph nodes is extracted, finally, a bilinear decoder is adopted, and the association scores of miRNA-diseases are calculated by decoding node embedding, so that more accurate miRNA-disease association prediction is realized.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
fig. 2 is a schematic diagram of AUCs implemented by SHGAEMDA, GAEMDA, DFELMDA, TDRC based on the five-fold cross-validation method in this embodiment.
Detailed Description
The present embodiment provides a miRNA-disease association prediction method based on an enhanced hypergraph convolution self-coding algorithm, as shown in fig. 1, which needs to include the following steps:
s1: acquiring an adjacent matrix A of miRNA-diseases, and describing the association relationship between the miRNA-diseases by the adjacent matrix A of the miRNA-diseases;
s2: respectively calculating miRNA Gaussian similarity MG, miRNA cosine similarity MC, disease Gaussian similarity DG and disease cosine similarity DC through an adjacent matrix A, and respectively taking the miRNA Gaussian similarity MG, the miRNA cosine similarity MC, the disease Gaussian similarity DG and the disease cosine similarity DC as the characteristics of miRNA and the characteristics of diseases;
s3: obtaining graph hiding association information by using weighted K nearest neighbor (the weighted K-neighborhood profile, WKNN) according to Gaussian similarity, so as to obtain an enhanced hypergraph association matrix;
s4: mapping the features of miRNA and the features of diseases to the same domain space through a full-connection layer, and training by using a hypergraph convolution mode to obtain embedded features;
s5: the embedded features are decoded using a bilinear decoder, and their associated scores are calculated.
The specific generation method of the adjacent matrix A of the miRNA-diseases mentioned in the step S1 is as follows:
the miRNA-disease association data set is obtained from a human micro RNA disease database (The Human MicroRNA Disease Database, HMDD), wherein the HMDDv3.2 contains 1206 miRNAs and 893 diseases, and the total association relation is 35547. 913 mirnas and 554 diseases were retained after deletion of non-human data, and the miRNA-disease adjacency matrix could be expressed asWherein n is m And n d Expressed as the number of mirnas, the number of diseases, respectively. miRNA-disease adjacency matrix a is defined as follows:
the calculation method of the gaussian similarity mentioned in step S2 can calculate the gaussian similarity MG of the miRNA, specifically:
calculating the frequency width of the Gaussian interaction characteristic kernel similarity:
n m represents the number of mirnas; vector A (i: representing row i of the miRNA-disease adjacency matrix A, 1 in each row representing whether the miRNA has an association with the disease;
calculating the association similarity between each miRNA:
MG(i,j)=exp(-γ i ||A(i,:)-A(j,:)|| 2 )
MG (i, j) represents a similarity between the i-th miRNA and the j-th miRNA; exp is expressed as a power function of e as the base; vector A (i: representing row i of the miRNA-disease adjacency matrix A and vector A (j: representing row j of the miRNA-disease association matrix A).
The calculation method for calculating the Gaussian similarity DG of the disease is the same as the calculation method for calculating the Gaussian similarity MG of the miRNA, so that the Gaussian similarity DG of the disease can be calculated, wherein the matrix used is the transpose matrix of the miRNA-disease adjacent matrix A.
The method for calculating the cosine similarity mentioned in the step S2 can calculate that the cosine similarity of the miRNA is MC, specifically:
according to the cosine angle of the two vectors,
the similarity of the two vectors can be measured, the smaller the included angle, the more similar. For miRNA-miRNA association, the similar calculations are:
MC (i, j) represents cosine similarity between the i-th miRNA and the j-th miRNA; vector A (i: representing row i of the miRNA-disease adjacency matrix A, vector A (j: representing row j of the miRNA-disease adjacency matrix A), and symbol represents the inner product between the two vectors.
The method for calculating the cosine similarity DC of the diseases is the same as the method for calculating the cosine similarity MC of the miRNA, wherein the matrix used is the transpose matrix of the miRNA-disease adjacent matrix A.
In step S3, the weighted K nearest neighbor (the weighted K-neighborhood profile, WKNNP) is used to obtain the graph hiding association information, so as to obtain the enhanced hypergraph association matrix, which uses the gaussian similarity MG of the miRNA and the gaussian similarity DG of the disease, and the calculation method WKNNP for constructing the graph hiding link is specifically defined as follows:
in order to better perform message transmission, the original adjacency matrix A is preprocessed to obtain more hidden relations. Defining a vector A (i: representing an ith row of the miRNA-disease adjacency matrix A, representing the association of miRNAs, and a vector A (j: representing a jth column of the miRNA-disease adjacency matrix A, representing the association of diseases);
for each miRNA, the k most relevant similar miRNAs can be obtained from the Gaussian similarity, let m j The sequence number of the jth miRNA which represents the similarity is calculated, and the new association vector is calculated as follows:
wherein A is m (i) representing the new association relationship of the ith row calculated by miRNA, Q m =∑ 1≤j≤K MG(i,m j ) Is a normalization term; w (w) j =α j-1 MG(i,m j ) Wherein alpha is a decay term, and the value of alpha is [ epsilon ] [0,1 ]];
Similarly, a new association vector can be calculated for each disease, and the formula is as follows:
wherein A is d (i) represents the calculated new association relationship of the ith column, let d j Sequence number, Q, of the j-th disease showing similarity thereto d =∑ 1≤j≤K DG(i,d j ) Is one ofA normalization term; w (w) j =α j-1 DG(i,m j ) Wherein alpha is a decay term, and the value of alpha is [ epsilon ] [0,1 ]];
Finally, obtaining the hidden association information of the graph, and obtaining an enhanced hypergraph association matrix through the following formula:
in the step S4, the miRNA and the disease characteristics are mapped to the same domain space, specifically:
the feature matrix of the features of mirnas can be expressed as [ M G ,Z d ,MC,Z d ]The characteristics of a cognate disease can be expressed as [ Z m ,DG,Z m ,DC]Wherein Z is d ,Z m Column numbers are shown as 0 matrices of disease length and miRNA length, respectively. And respectively passing miRNA and disease characteristics through a linear layer, performing linear dimension reduction, and finally passing through a relu activation function. Can obtain the embedded E of miRNA m And embedding of diseases E d 。
The method adopting hypergraph convolution in step S4 is specifically defined as:
according to the association relation A of miRNA-diseases, a miRNA superside relation can be constructed, and the node of the initial miRNA is assumed to be characterized by:
the superside signature of a miRNA is associated with all nodes within the superside. The superside characteristics of miRNA can be obtained by the method:
wherein the method comprises the steps ofNode characteristics of miRNA hypergraph representing layer I, B -1 Is A T Is a normalized matrix of (a);
the following hypergraph convolution equation may be defined according to the equation of the graph convolution:
expanding the formula may yield the following formula:
the feature of the node is thus made to flow in the global, whereby a data transfer matrix W can be trained.
In step S5, the embedded feature is decoded by using a bilinear decoder, and the correlation score is calculated, where the bilinear decoder is implemented specifically as follows:
the encoded miRNAs and diseases are embedded as input of a decoder, and different interaction types are regarded as different categories through a bilinear decoder. Can calculate miRNAm i And disease d j Is a correlation score of (2):
wherein W is r Is a matrix of trainable parameters whose parameter dimension is the embedded dimension of the input, where R e r= [0,1]。
In order to optimize the result, cross iteration entropy is used as a loss function, and loss of the predicted result and the true value is calculated. Global network parameters are optimized by Adam as a gradient optimizer.
To check the predictive accuracy of the method of this example (SHGAEMDA), experiments were performed by means of five-fold cross-validation, first sampling samples equivalent to the validated miRNA-disease relationship as negative samples, dividing the sampled samples into 5 equal parts, one of which at a time was used as a test set. Reconstructing an original 4-part correlation matrix, calculating Gaussian similarity and cosine similarity through the correlation matrix, calculating the correlation score of miRNA-diseases through the method of the embodiment, and calculating the area under the average curve AUC of the miRNA-diseases to be 0.9367+0.0011. Under the framework of five-fold cross validation, the method is obviously superior to other comparison methods compared with other algorithms (GAEMDA 0.9295 +/-0.0028,DFELMDA 0.9139 +/-0.0018,TDRC 0.8884 +/-0.0038), and is more beneficial to the correlation prediction of miRNA-diseases. The comparative graph of the results is shown in FIG. 2.
Meanwhile, in order to examine the applicability of the method, an SHGAEMDA method is used to predict an unknown association relationship by knowing the association relationship of miRNA-diseases. In predicting new miRNA-disease association, known miRNA-disease association information is used as a training dataset for SHGAEMDA, and then the prediction score for each unknown miRNA-disease pair is calculated and ranked. Selecting lung cancer, breast cancer,
Colon cancer as a case study, top 20 cancer-associated mirnas were validated in the third party database (dbDEMC). The results are shown in table 1, table 2, and table 3, with 100%, 85% and 95% of the predicted mirnas, respectively, being associated with cancer.
Furthermore, SHGAEMDA predicts some miRNA-diseases which are not proved, including hsa-mir-133 and breast cancer, hsa-mir-29 and colon cancer, etc., and the predicted association has not been reported in the current literature, but there is a great possibility of association.
TABLE 1 SHGAEMDA predicted first 20 potential miRNAs associated with lung cancer
TABLE 2 SHGAEMDA predicted first 20 potential miRNAs associated with breast cancer
TABLE 3 SHGAEMDA predicted first 20 potential miRNAs associated with colon cancer
The terms describing the positional relationship in the drawings are merely illustrative, and are not to be construed as limiting the present patent;
it should be apparent that the above examples of the present invention are merely illustrative of the present invention and are not intended to limit the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.
Claims (11)
1. An enhanced hypergraph convolution self-coding algorithm-based miRNA-disease association prediction method is required to comprise the following steps:
s1: acquiring an adjacent matrix A of miRNA-diseases, and describing the association relationship between the miRNA-diseases by the adjacent matrix A of the miRNA-diseases;
s2: respectively calculating miRNA Gaussian similarity MG, miRNA cosine similarity MC, disease Gaussian similarity DG and disease cosine similarity DC through an adjacent matrix A, and respectively taking the miRNA Gaussian similarity MG, the miRNA cosine similarity MC, the disease Gaussian similarity DG and the disease cosine similarity DC as the characteristics of miRNA and the characteristics of diseases;
s3: acquiring graph hidden association information by using weighted K neighbor according to Gaussian similarity, so as to obtain an enhanced hypergraph association matrix;
s4: mapping the features of miRNA and the features of diseases to the same domain space through a full-connection layer, and training by using a hypergraph convolution mode to obtain embedded features;
s5: the embedded features are decoded using a bilinear decoder, and their associated scores are calculated.
2. The miRNA-disease associated prediction method according to claim 1, wherein: the specific generation mode of the adjacent matrix A of the miRNA-disease in the step S1 is as follows:
obtaining an miRNA-disease association data set from a human micro RNA disease database, wherein the human micro RNA disease database comprises 1206 miRNAs and 893 diseases, and the association relationship is 35547; 913 mirnas and 554 diseases were retained after deletion of non-human data, and the miRNA-disease adjacency matrix could be expressed asWherein n is m And n d Expressed as the number of mirnas and the number of diseases, respectively; miRNA-disease adjacency matrix a is defined as follows:
3. the miRNA-disease associated prediction method according to claim 1, wherein: the method for calculating the gaussian similarity in step S2 may calculate the gaussian similarity MG of the miRNA, specifically:
calculating the frequency width of the Gaussian interaction characteristic kernel similarity:
n m represents the number of mirnas; vector A (i: representing row i of the miRNA-disease adjacency matrix A, wherein 1 in each row represents that the miRNA has an association relationship with the disease;
calculating the association similarity between each miRNA:
MG(i,j)=exp(-γ i ||A(i,:)-A(j,:)|| 2 )
MG (i, j) represents a similarity between the i-th miRNA and the j-th miRNA; exp is expressed as a power function of e as the base; vector A (i: representing row i of the miRNA-disease adjacency matrix A and vector A (j: representing row j of the miRNA-disease adjacency matrix A).
4. The miRNA-disease associated prediction method according to claim 3, wherein: the calculation method for calculating the disease Gaussian similarity DG is the same as the calculation method for the miRNA Gaussian similarity MG, wherein the matrix used is the transpose matrix of the miRNA-disease adjacent matrix A.
5. The miRNA-disease associated prediction method according to claim 1, wherein: and (2) calculating the miRNA cosine similarity as MC by adopting the method for calculating the cosine similarity in the step (S2), wherein the method specifically comprises the following steps:
according to cosine included angles of the two vectors, similarity of the two vectors is measured, and the smaller the included angle is, the more similar is; for miRNA-miRNA association, the similar calculations are:
MC (i, j) represents cosine similarity between the i-th miRNA and the j-th miRNA; vector A (i: representing row i of the miRNA-disease adjacency matrix A, vector A (j: representing row j of the miRNA-disease adjacency matrix A), and symbol represents the inner product between the two vectors.
6. The miRNA-disease associated prediction method according to claim 5, wherein: the calculation method for calculating the disease cosine similarity DC is the same as the calculation method for the miRNA cosine similarity MC, wherein the matrix used is the transpose matrix of the miRNA-disease adjacency matrix A.
7. The miRNA-disease associated prediction method according to claim 1, wherein: in the step S3, the weighted K nearest neighbor is used to obtain the graph hidden association information, so as to obtain the enhanced hypergraph association matrix, which uses the miRNA gaussian similarity MG and the disease gaussian similarity DG, specifically defined as:
preprocessing the original adjacent matrix A to obtain a hidden relation; defining a vector A (i: representing an ith row of the miRNA-disease adjacency matrix A, representing the association of miRNAs, and a vector A (j: representing a jth column of the miRNA-disease adjacency matrix A, representing the association of diseases);
for each miRNA, the k most relevant similar miRNAs are obtained from the Gaussian similarity, let m j The sequence number of the jth miRNA which represents the similarity is calculated, and the new association vector is calculated as follows:
wherein A is m (i) representing the new association relationship of the ith row calculated by miRNA, Q m =∑ 1≤j≤K MG(i,m j ) Is a normalization term; w (w) j =α j-1 MG(i,m j ) Wherein alpha is a decay term, and the value of alpha is [ epsilon ] [0,1 ]];
Similarly, a new association vector can be calculated for each disease, and the formula is as follows:
wherein A is d (i) represents the calculated new association relationship of the ith column, let d j Sequence number, Q, of the j-th disease showing similarity thereto d =∑ 1≤j≤K DG(i,d j ) Is a normalization term; w (w) j =α j-1 DG(i,m j ) Wherein alpha is a decay term, and the value of alpha is [ epsilon ] [0,1 ]];
Finally, obtaining the hidden association information of the graph, and obtaining an enhanced hypergraph association matrix through the following formula:
8. the miRNA-disease associated prediction method according to claim 1, wherein: in the step S4, the miRNA and the disease feature are mapped to the same domain space, specifically:
the feature matrix of the features of miRNAs can be expressed as [ MG, Z ] d ,MC,Z d ]The characteristics of a cognate disease can be expressed as [ Z m ,DG,Z m ,DC]Wherein Z is d ,Z m The columns are respectively represented as a 0 matrix of disease length and miRNA length; respectively passing miRNA and disease characteristics through a linear layer, performing linear dimension reduction, and finally passing through a relu activation function; can obtain the embedded E of miRNA m And embedding of diseases E d 。
9. The miRNA-disease associated prediction method according to claim 8, wherein: the method adopting hypergraph convolution in the step S4 is specifically defined as:
according to the association relation A of miRNA-diseases, constructing an miRNA superside relation, and assuming that the node of the initial miRNA is characterized in that:
the superside feature of the miRNA is associated with all nodes within the superside; the superside characteristics of the miRNA obtained by the method are as follows:
wherein the method comprises the steps ofNode characteristics of miRNA hypergraph representing layer I, B -1 Is A T Is a normalized matrix of (a);
the following hypergraph convolution equation is defined according to the equation of graph convolution:
expanding the formula yields the following formula:
therefore, the feature of the node flows in the global, so that a data transfer matrix W can be trained;
10. The miRNA-disease associated prediction method according to claim 1, wherein: in the step S5, the embedded feature is decoded by using a bilinear decoder, and the correlation score is calculated, where the bilinear decoder is implemented specifically as follows:
embedding coded miRNAs and diseases as input of a decoder, and treating different interaction types as different categories through a bilinear decoder; calculation of miRNAm i And disease d j Is a correlation score of (2):
wherein W is r Is a matrix of trainable parameters whose parameter dimension is the embedded dimension of the input, where R e r= [0,1]。
11. The miRNA-disease association prediction method according to claim 10, wherein: in order to optimize the result, cross iteration entropy is adopted as a loss function, and loss of a predicted result and a true value is calculated; global network parameters are optimized by Adam as a gradient optimizer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310121363.XA CN116343927A (en) | 2023-02-14 | 2023-02-14 | miRNA-disease association prediction method based on enhanced hypergraph convolution self-coding algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310121363.XA CN116343927A (en) | 2023-02-14 | 2023-02-14 | miRNA-disease association prediction method based on enhanced hypergraph convolution self-coding algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116343927A true CN116343927A (en) | 2023-06-27 |
Family
ID=86890635
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310121363.XA Pending CN116343927A (en) | 2023-02-14 | 2023-02-14 | miRNA-disease association prediction method based on enhanced hypergraph convolution self-coding algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116343927A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116844645A (en) * | 2023-08-31 | 2023-10-03 | 云南师范大学 | Gene regulation network inference method based on multi-view layered hypergraph |
CN118506884A (en) * | 2024-07-19 | 2024-08-16 | 山东大学 | MiRNA-disease association relation prediction method, system, equipment and medium |
-
2023
- 2023-02-14 CN CN202310121363.XA patent/CN116343927A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116844645A (en) * | 2023-08-31 | 2023-10-03 | 云南师范大学 | Gene regulation network inference method based on multi-view layered hypergraph |
CN116844645B (en) * | 2023-08-31 | 2023-11-17 | 云南师范大学 | Gene regulation network inference method based on multi-view layered hypergraph |
CN118506884A (en) * | 2024-07-19 | 2024-08-16 | 山东大学 | MiRNA-disease association relation prediction method, system, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | LDGRNMF: LncRNA-disease associations prediction based on graph regularized non-negative matrix factorization | |
Lan et al. | GANLDA: graph attention network for lncRNA-disease associations prediction | |
Liu et al. | SMALF: miRNA-disease associations prediction based on stacked autoencoder and XGBoost | |
Yu et al. | MCLPMDA: A novel method for mi RNA‐disease association prediction based on matrix completion and label propagation | |
Lei et al. | A comprehensive survey on computational methods of non-coding RNA and disease association prediction | |
CN116343927A (en) | miRNA-disease association prediction method based on enhanced hypergraph convolution self-coding algorithm | |
Wu et al. | Inferring LncRNA-disease associations based on graph autoencoder matrix completion | |
Chen et al. | Supervised machine learning model for high dimensional gene data in colon cancer detection | |
CN110556184B (en) | Non-coding RNA and disease relation prediction method based on Hessian regular nonnegative matrix decomposition | |
Zhou et al. | Predicting miRNA–Disease Associations Through Deep Autoencoder With Multiple Kernel Learning | |
CN108427865B (en) | Method for predicting correlation between LncRNA and environmental factors | |
CN114334012A (en) | Method for identifying cancer subtypes based on multigroup data | |
CN116230077A (en) | Antiviral drug screening method based on restarting hypergraph double random walk | |
Khan et al. | DeepGene transformer: Transformer for the gene expression-based classification of cancer subtypes | |
CN113539479B (en) | Similarity constraint-based miRNA-disease association prediction method and system | |
Xi et al. | Ldcmfc: predicting long non-coding rna and disease association using collaborative matrix factorization based on correntropy | |
Huang et al. | Predicting Disease-Associated N7–Methylguanosine (m 7 G) Sites via Random Walk on Heterogeneous Network | |
CN113421614A (en) | Tensor decomposition-based lncRNA-disease association prediction method | |
Ma et al. | CRBP-HFEF: prediction of RBP-Binding sites on circRNAs based on hierarchical feature expansion and fusion | |
Zhang et al. | miTDS: Uncovering miRNA-mRNA interactions with deep learning for functional target prediction | |
CN116092581A (en) | Annular RNA marker prediction method based on natural semantic enhancement | |
Huang et al. | Sequential reinforcement active feature learning for gene signature identification in renal cell carcinoma | |
Casalino et al. | Evaluation of cognitive impairment in pediatric multiple sclerosis with machine learning: an exploratory study of miRNA expressions | |
CN115295156A (en) | Method for predicting miRNA-disease based on relation graph convolution network fusion multi-source information | |
Lu et al. | HCGCCDA: Prediction of circRNA-disease associations based on the combination of hypergraph convolution and graph convolution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |