CN116343927A - miRNA-disease association prediction method based on enhanced hypergraph convolution self-coding algorithm - Google Patents

miRNA-disease association prediction method based on enhanced hypergraph convolution self-coding algorithm Download PDF

Info

Publication number
CN116343927A
CN116343927A CN202310121363.XA CN202310121363A CN116343927A CN 116343927 A CN116343927 A CN 116343927A CN 202310121363 A CN202310121363 A CN 202310121363A CN 116343927 A CN116343927 A CN 116343927A
Authority
CN
China
Prior art keywords
mirna
disease
similarity
matrix
association
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310121363.XA
Other languages
Chinese (zh)
Inventor
谢国波
余俊锐
林志毅
顾国生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202310121363.XA priority Critical patent/CN116343927A/en
Publication of CN116343927A publication Critical patent/CN116343927A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides an enhanced hypergraph convolution self-coding algorithm-based miRNA-disease association prediction method, which comprises the following steps: s1: acquiring an adjacent matrix A of miRNA-diseases, and describing the association relationship between the miRNA-diseases by the adjacent matrix A of the miRNA-diseases; s2: respectively calculating miRNA Gaussian similarity MG, miRNA cosine similarity MC, disease Gaussian similarity DG and disease cosine similarity DC through an adjacent matrix A, and respectively taking the miRNA Gaussian similarity MG, the miRNA cosine similarity MC, the disease Gaussian similarity DG and the disease cosine similarity DC as the characteristics of miRNA and the characteristics of diseases; s3: acquiring graph hidden association information by using weighted K neighbor according to Gaussian similarity, so as to obtain an enhanced hypergraph association matrix; s4: mapping the features of miRNA and the features of diseases to the same domain space through a full-connection layer, and training by using a hypergraph convolution mode to obtain embedded features; s5: the embedded features are decoded by using a bilinear decoder, and the association scores are calculated.

Description

miRNA-disease association prediction method based on enhanced hypergraph convolution self-coding algorithm
Technical Field
The invention belongs to the field of combination of deep learning and biological information, and particularly relates to an miRNA-disease association prediction method based on an enhanced hypergraph convolution self-coding algorithm.
Background
mirnas are a class of single-stranded non-coding RNAs of approximately 22 nucleotides in length. Numerous studies have shown that their posttranscriptional regulation of genes can exert biological functions, for example in terms of cellular development and immune response, and that their deregulation can lead to cellular disorders. In recent years, scientists research and discover that miRNA imbalance has close relation with various human complex diseases, such as lung cancer, breast cancer, liver cancer and the like. Thus, identifying disease-associated mirnas is of great significance to the discovery of disease biomarkers and in the diagnosis of complex diseases in humans. In the past study, the experimenter basically predicts a new association relationship through the experimental experience of the individual and the biological targeting characteristic recognition, and the process is time-consuming and labor-consuming and has a great association with the individual level of the experimenter. Therefore, a new method is highly expected to rapidly predict new association relations, the original association is calculated through a calculation method, the potential miRNA-disease association is deduced to make up for the vacancy to a certain extent, and in the previous study, the calculation model plays an effective auxiliary role in miRNA-disease association prediction.
Currently, identification of disease-related mirnas may facilitate pathological studies of disease and detection of disease biomarkers. The discovery of potential miRNA-disease associations is a great aid in the study of understanding disease pathogenesis and developing therapeutic approaches to human disease. For example, previous findings indicate that miR-204 can function as a tumor inhibitor in non-small cell lung cancer (NSCLC) by targeting JAK2, and that miR-204 can be used in medicine as a biomarker for diagnosing and treating NSCLC. The prognosis of hepatocellular carcinoma (HCC) patients is poor, and phosphoproteins (VASP) act as modulators of actin cytoskeleton and cell migration, with overexpression in HCC being associated with malignancy characteristics and poor prognosis. The miRNA-204 down-regulation caused by hypoxia has the function of regulating the over-expression of the VASP at the posttranscriptional level, indirectly promotes the up-regulation of the VASP at the transcriptional and posttranscriptional levels, and can be used as a biomarker for prognosis to a certain extent.
To further understand the relationship of mirnas and diseases, researchers have proposed several new hypotheses, one of which is that phenotypically similar diseases are often associated with mirnas with similar functions. Based on this key hypothesis, researchers have proposed many similarity-based and machine-learning-based methods, but while these methods can learn and capture important relationships between data, significant research effort has been achieved, most ignore global network structures. On the other hand, the methods cannot mine complex connection of deep layers of the network, and nonlinear relations in the network are extremely difficult to capture. The deep learning method can capture nonlinear relations in the network by constructing complex connections, and has been remarkably successful in supervised learning tasks in multiple fields such as computer vision, natural language processing, voice processing and the like.
Deep learning is also widely used in bioinformatics because of its ability to process unstructured data, and researchers have proposed a series of methods based on Graph Neural Networks (GNNs). GNN is derived from Convolutional Neural Network (CNN) and graph embedding concepts, and has excellent performance in aggregating graph structure information. At present, the GNN has advanced performance in node classification tasks and has wide application in life science and physical science. For example, korean et al, capturing nonlinear interactions through GCN and matrix decomposition, proposed a new disease gene association task framework GCN-MF. Tang et al developed a multi-view multi-channel attention graph roll-up network (MMGCN) to predict the potential relevance of MDA. A new approach, named nimgcn, was proposed by peri et al to incorporate GNNs with a neuroinductive matrix completion model. Recently, attention mechanisms have also been introduced into GNNs, which enable networks to focus on task-related input parts to identify hidden relationships between miRNA-diseases. For example, ro et al create a hierarchical attention mechanism that learns node representations at the node level and adjusts contributions from different input graphs using graph level attention.
However, in the training process, many training is performed based on the original correlation diagram, because the negative samples are difficult to obtain, a large number of unknown samples are assumed to be used as the negative samples for training, and the use of the unknown samples as the negative samples for training may reduce the flow of data between the negative samples, which affects the prediction accuracy of the method.
Disclosure of Invention
The invention provides an enhanced supergraph convolution self-coding algorithm-based miRNA-disease association prediction method, which can more accurately predict the association of miRNA-diseases.
In order to solve the technical problems, the technical scheme of the invention is as follows:
an enhanced hypergraph convolution self-coding algorithm-based miRNA-disease association prediction method is required to comprise the following steps:
s1: acquiring an adjacent matrix A of miRNA-diseases, and describing the association relationship between the miRNA-diseases by the adjacent matrix A of the miRNA-diseases;
s2: respectively calculating miRNA Gaussian similarity MG, miRNA cosine similarity MC, disease Gaussian similarity DG and disease cosine similarity DC through an adjacent matrix A, and respectively taking the miRNA Gaussian similarity MG, the miRNA cosine similarity MC, the disease Gaussian similarity DG and the disease cosine similarity DC as the characteristics of miRNA and the characteristics of diseases;
s3: obtaining graph hiding association information by using weighted K nearest neighbor (the weighted K-neighborhood profile, WKNN) according to Gaussian similarity, so as to obtain an enhanced hypergraph association matrix;
s4: mapping the features of miRNA and the features of diseases to the same domain space through a full-connection layer, and training by using a hypergraph convolution mode to obtain embedded features;
s5: the embedded features are decoded using a bilinear decoder, and their associated scores are calculated.
Preferably, the miRNA-disease adjacency matrix a mentioned in step S1 is specifically generated in the following manner:
acquisition of miRNA-disease associated data sets from human microRNA disease database (The Human MicroRNA Disease Database, HMDD)Wherein the HMDD v3.2 contains 1206 miRNAs and 893 diseases, and the total association relation is 35547. 913 mirnas and 554 diseases were retained after deletion of non-human data, and the miRNA-disease adjacency matrix could be expressed as
Figure BDA0004080062480000041
Wherein n is m And n d Expressed as the number of mirnas, the number of diseases, miRNA-disease adjacency matrix a is defined as follows:
Figure BDA0004080062480000042
preferably, the method for calculating the gaussian similarity mentioned in step S2 may calculate the gaussian similarity MG of the miRNA, specifically:
calculating the frequency width of the Gaussian interaction characteristic kernel similarity:
Figure BDA0004080062480000043
n m represents the number of mirnas; vector A (i: representing row i of the miRNA-disease adjacency matrix A, 1 in each row representing whether the miRNA has an association with the disease;
calculating the association similarity between each miRNA:
MG(i,j)=exp(-γ i ||A(i,:)-A(j,:)|| 2 )
MG (i, j) represents a similarity between the i-th miRNA and the j-th miRNA; exp is expressed as a power function of e as the base; vector A (i: representing row i of the miRNA-disease adjacency matrix A and vector A (j: representing row j of the miRNA-disease adjacency matrix A).
Preferably, the calculation method for calculating the gaussian similarity DG of the disease is the same as the calculation method for calculating the gaussian similarity MG of the miRNA, so as to calculate the gaussian similarity DG of the disease, wherein the matrix used is a transpose matrix of the miRNA-disease adjacency matrix a.
Preferably, the method for calculating cosine similarity mentioned in step S2 may calculate that the cosine similarity of miRNA is MC, specifically:
and according to cosine included angles of the two vectors, measuring similarity of the two vectors, wherein the smaller the included angle is, the more similar the included angle is. For miRNA-miRNA association, the similar calculations are:
Figure BDA0004080062480000044
MC (i, j) represents cosine similarity between the i-th miRNA and the j-th miRNA; vector A (i: representing row i of the miRNA-disease adjacency matrix A, vector A (j: representing row j of the miRNA-disease adjacency matrix A), and symbol represents the inner product between the two vectors.
Preferably, the method for calculating the cosine similarity DC of the disease is the same as the method for calculating the cosine similarity MC of the miRNA, wherein the matrix used is the transpose of the miRNA-disease adjacency matrix a.
Preferably, in step S3, the weighted K nearest neighbor (the weighted K-neighborhood profile, WKNNP) is used to obtain graph hiding association information, so as to obtain an enhanced hypergraph association matrix, which uses the gaussian similarity MG of miRNA and the gaussian similarity DG of disease, and the specific definition of the calculation method WKNNP for constructing the graph hiding link is as follows:
in order to better perform message transmission, the original adjacency matrix A is preprocessed to obtain more hidden relations. Defining a vector A (i: representing an ith row of the miRNA-disease adjacency matrix A, representing the association of miRNAs, and a vector A (j: representing a jth column of the miRNA-disease adjacency matrix A, representing the association of diseases);
for each miRNA, the k most relevant similar miRNAs are obtained from the Gaussian similarity, let m j The sequence number of the jth miRNA which represents the similarity is calculated, and the new association vector is calculated as follows:
Figure BDA0004080062480000051
wherein A is m (i) represents mNew association relationship of ith row calculated by iRNA, Q m =∑ 1≤j≤K MG(i,m j ) Is a normalization term; w (w) j =α j-1 MG(i,m j ) Wherein alpha is a decay term, and the value of alpha is [ epsilon ] [0,1 ]];
Similarly, a new association vector can be calculated for each disease, and the formula is as follows:
Figure BDA0004080062480000052
wherein A is d (i) represents the calculated new association relationship of the ith column, let d j Sequence number, Q, of the j-th disease showing similarity thereto d =∑ 1≤j≤K DG(i,d j ) Is a normalization term; w (w) j =α j-1 DG(i,m j ) Wherein alpha is a decay term, and the value of alpha is [ epsilon ] [0,1 ]];
Finally, obtaining the hidden association information of the graph, and obtaining an enhanced hypergraph association matrix through the following formula:
Figure BDA0004080062480000061
preferably, in step S4, the miRNA and the disease feature are mapped to the same domain space, specifically:
the feature matrix of the features of miRNAs can be expressed as [ MG, Z ] d ,MC,Z d ]The characteristics of a cognate disease can be expressed as [ Z m ,DG,Z m ,DC]Wherein Z is d ,Z m Column numbers are shown as 0 matrices of disease length and miRNA length, respectively. And respectively passing miRNA and disease characteristics through a linear layer, performing linear dimension reduction, and finally passing through a relu activation function. Can obtain the embedded E of miRNA m And embedding of diseases E d
Preferably, the method of step S4 using hypergraph convolution is specifically defined as:
according to the association relation A of miRNA-diseases, constructing an miRNA superside relation, and assuming that the node of the initial miRNA is characterized in that:
Figure BDA0004080062480000062
the superside signature of a miRNA is associated with all nodes within the superside. The superside characteristics of miRNA can be obtained by the method:
Figure BDA0004080062480000063
wherein the method comprises the steps of
Figure BDA0004080062480000064
Node characteristics of miRNA hypergraph representing layer I, B -1 Is A T Is a normalized matrix of (a);
the following hypergraph convolution equation is defined according to the equation of graph convolution:
Figure BDA0004080062480000065
expanding the formula yields the following formula:
Figure BDA0004080062480000066
the feature of the node is thus made to flow in the global, whereby a data transfer matrix W can be trained.
Similarly, for a disease, the characteristics of a node of the disease can be obtained
Figure BDA0004080062480000071
Preferably, in step S5, the embedded feature is decoded by using a bilinear decoder, and the correlation score is calculated, where the bilinear decoder is implemented as follows:
Figure BDA0004080062480000072
the encoded miRNAs and diseases are embedded as input of a decoder, and different interaction types are regarded as different categories through a bilinear decoder. Calculation of miRNAm i And disease d j Is a correlation score of (2):
Figure BDA0004080062480000073
wherein W is r Is a matrix of trainable parameters whose parameter dimension is the embedded dimension of the input, where R e r= [0,1]。
Preferably, in order to optimize the result, the cross-iteration entropy is used as a loss function, calculating the loss of the predicted result from the true value. Global network parameters are optimized by Adam as a gradient optimizer.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
according to the method, the original image is enhanced through weighting K nearest neighbor calculation and preprocessing of the original incidence matrix, the influence of negative sample random sampling is reduced, the information of the original image is enriched, the hidden incidence information of the image is obtained, and finally the enhanced hyperimage incidence matrix is obtained. Then, a new hypergraph network is constructed, the enhanced hypergraph incidence matrix information is combined, graph node characteristics flow on the hypergraph, global information is collected, the graph is sampled at a high level, nonlinear information of the graph nodes is extracted, finally, a bilinear decoder is adopted, and the association scores of miRNA-diseases are calculated by decoding node embedding, so that more accurate miRNA-disease association prediction is realized.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
fig. 2 is a schematic diagram of AUCs implemented by SHGAEMDA, GAEMDA, DFELMDA, TDRC based on the five-fold cross-validation method in this embodiment.
Detailed Description
The present embodiment provides a miRNA-disease association prediction method based on an enhanced hypergraph convolution self-coding algorithm, as shown in fig. 1, which needs to include the following steps:
s1: acquiring an adjacent matrix A of miRNA-diseases, and describing the association relationship between the miRNA-diseases by the adjacent matrix A of the miRNA-diseases;
s2: respectively calculating miRNA Gaussian similarity MG, miRNA cosine similarity MC, disease Gaussian similarity DG and disease cosine similarity DC through an adjacent matrix A, and respectively taking the miRNA Gaussian similarity MG, the miRNA cosine similarity MC, the disease Gaussian similarity DG and the disease cosine similarity DC as the characteristics of miRNA and the characteristics of diseases;
s3: obtaining graph hiding association information by using weighted K nearest neighbor (the weighted K-neighborhood profile, WKNN) according to Gaussian similarity, so as to obtain an enhanced hypergraph association matrix;
s4: mapping the features of miRNA and the features of diseases to the same domain space through a full-connection layer, and training by using a hypergraph convolution mode to obtain embedded features;
s5: the embedded features are decoded using a bilinear decoder, and their associated scores are calculated.
The specific generation method of the adjacent matrix A of the miRNA-diseases mentioned in the step S1 is as follows:
the miRNA-disease association data set is obtained from a human micro RNA disease database (The Human MicroRNA Disease Database, HMDD), wherein the HMDDv3.2 contains 1206 miRNAs and 893 diseases, and the total association relation is 35547. 913 mirnas and 554 diseases were retained after deletion of non-human data, and the miRNA-disease adjacency matrix could be expressed as
Figure BDA0004080062480000081
Wherein n is m And n d Expressed as the number of mirnas, the number of diseases, respectively. miRNA-disease adjacency matrix a is defined as follows:
Figure BDA0004080062480000082
the calculation method of the gaussian similarity mentioned in step S2 can calculate the gaussian similarity MG of the miRNA, specifically:
calculating the frequency width of the Gaussian interaction characteristic kernel similarity:
Figure BDA0004080062480000091
n m represents the number of mirnas; vector A (i: representing row i of the miRNA-disease adjacency matrix A, 1 in each row representing whether the miRNA has an association with the disease;
calculating the association similarity between each miRNA:
MG(i,j)=exp(-γ i ||A(i,:)-A(j,:)|| 2 )
MG (i, j) represents a similarity between the i-th miRNA and the j-th miRNA; exp is expressed as a power function of e as the base; vector A (i: representing row i of the miRNA-disease adjacency matrix A and vector A (j: representing row j of the miRNA-disease association matrix A).
The calculation method for calculating the Gaussian similarity DG of the disease is the same as the calculation method for calculating the Gaussian similarity MG of the miRNA, so that the Gaussian similarity DG of the disease can be calculated, wherein the matrix used is the transpose matrix of the miRNA-disease adjacent matrix A.
The method for calculating the cosine similarity mentioned in the step S2 can calculate that the cosine similarity of the miRNA is MC, specifically:
according to the cosine angle of the two vectors,
the similarity of the two vectors can be measured, the smaller the included angle, the more similar. For miRNA-miRNA association, the similar calculations are:
Figure BDA0004080062480000092
MC (i, j) represents cosine similarity between the i-th miRNA and the j-th miRNA; vector A (i: representing row i of the miRNA-disease adjacency matrix A, vector A (j: representing row j of the miRNA-disease adjacency matrix A), and symbol represents the inner product between the two vectors.
The method for calculating the cosine similarity DC of the diseases is the same as the method for calculating the cosine similarity MC of the miRNA, wherein the matrix used is the transpose matrix of the miRNA-disease adjacent matrix A.
In step S3, the weighted K nearest neighbor (the weighted K-neighborhood profile, WKNNP) is used to obtain the graph hiding association information, so as to obtain the enhanced hypergraph association matrix, which uses the gaussian similarity MG of the miRNA and the gaussian similarity DG of the disease, and the calculation method WKNNP for constructing the graph hiding link is specifically defined as follows:
in order to better perform message transmission, the original adjacency matrix A is preprocessed to obtain more hidden relations. Defining a vector A (i: representing an ith row of the miRNA-disease adjacency matrix A, representing the association of miRNAs, and a vector A (j: representing a jth column of the miRNA-disease adjacency matrix A, representing the association of diseases);
for each miRNA, the k most relevant similar miRNAs can be obtained from the Gaussian similarity, let m j The sequence number of the jth miRNA which represents the similarity is calculated, and the new association vector is calculated as follows:
Figure BDA0004080062480000101
wherein A is m (i) representing the new association relationship of the ith row calculated by miRNA, Q m =∑ 1≤j≤K MG(i,m j ) Is a normalization term; w (w) j =α j-1 MG(i,m j ) Wherein alpha is a decay term, and the value of alpha is [ epsilon ] [0,1 ]];
Similarly, a new association vector can be calculated for each disease, and the formula is as follows:
Figure BDA0004080062480000102
wherein A is d (i) represents the calculated new association relationship of the ith column, let d j Sequence number, Q, of the j-th disease showing similarity thereto d =∑ 1≤j≤K DG(i,d j ) Is one ofA normalization term; w (w) j =α j-1 DG(i,m j ) Wherein alpha is a decay term, and the value of alpha is [ epsilon ] [0,1 ]];
Finally, obtaining the hidden association information of the graph, and obtaining an enhanced hypergraph association matrix through the following formula:
Figure BDA0004080062480000111
in the step S4, the miRNA and the disease characteristics are mapped to the same domain space, specifically:
the feature matrix of the features of mirnas can be expressed as [ M G ,Z d ,MC,Z d ]The characteristics of a cognate disease can be expressed as [ Z m ,DG,Z m ,DC]Wherein Z is d ,Z m Column numbers are shown as 0 matrices of disease length and miRNA length, respectively. And respectively passing miRNA and disease characteristics through a linear layer, performing linear dimension reduction, and finally passing through a relu activation function. Can obtain the embedded E of miRNA m And embedding of diseases E d
The method adopting hypergraph convolution in step S4 is specifically defined as:
according to the association relation A of miRNA-diseases, a miRNA superside relation can be constructed, and the node of the initial miRNA is assumed to be characterized by:
Figure BDA0004080062480000112
the superside signature of a miRNA is associated with all nodes within the superside. The superside characteristics of miRNA can be obtained by the method:
Figure BDA0004080062480000113
wherein the method comprises the steps of
Figure BDA0004080062480000114
Node characteristics of miRNA hypergraph representing layer I, B -1 Is A T Is a normalized matrix of (a);
the following hypergraph convolution equation may be defined according to the equation of the graph convolution:
Figure BDA0004080062480000115
expanding the formula may yield the following formula:
Figure BDA0004080062480000116
the feature of the node is thus made to flow in the global, whereby a data transfer matrix W can be trained.
Similarly, for a disease, the characteristics of a node of the disease can be obtained
Figure BDA0004080062480000117
In step S5, the embedded feature is decoded by using a bilinear decoder, and the correlation score is calculated, where the bilinear decoder is implemented specifically as follows:
Figure BDA0004080062480000121
the encoded miRNAs and diseases are embedded as input of a decoder, and different interaction types are regarded as different categories through a bilinear decoder. Can calculate miRNAm i And disease d j Is a correlation score of (2):
Figure BDA0004080062480000122
wherein W is r Is a matrix of trainable parameters whose parameter dimension is the embedded dimension of the input, where R e r= [0,1]。
In order to optimize the result, cross iteration entropy is used as a loss function, and loss of the predicted result and the true value is calculated. Global network parameters are optimized by Adam as a gradient optimizer.
To check the predictive accuracy of the method of this example (SHGAEMDA), experiments were performed by means of five-fold cross-validation, first sampling samples equivalent to the validated miRNA-disease relationship as negative samples, dividing the sampled samples into 5 equal parts, one of which at a time was used as a test set. Reconstructing an original 4-part correlation matrix, calculating Gaussian similarity and cosine similarity through the correlation matrix, calculating the correlation score of miRNA-diseases through the method of the embodiment, and calculating the area under the average curve AUC of the miRNA-diseases to be 0.9367+0.0011. Under the framework of five-fold cross validation, the method is obviously superior to other comparison methods compared with other algorithms (GAEMDA 0.9295 +/-0.0028,DFELMDA 0.9139 +/-0.0018,TDRC 0.8884 +/-0.0038), and is more beneficial to the correlation prediction of miRNA-diseases. The comparative graph of the results is shown in FIG. 2.
Meanwhile, in order to examine the applicability of the method, an SHGAEMDA method is used to predict an unknown association relationship by knowing the association relationship of miRNA-diseases. In predicting new miRNA-disease association, known miRNA-disease association information is used as a training dataset for SHGAEMDA, and then the prediction score for each unknown miRNA-disease pair is calculated and ranked. Selecting lung cancer, breast cancer,
Colon cancer as a case study, top 20 cancer-associated mirnas were validated in the third party database (dbDEMC). The results are shown in table 1, table 2, and table 3, with 100%, 85% and 95% of the predicted mirnas, respectively, being associated with cancer.
Furthermore, SHGAEMDA predicts some miRNA-diseases which are not proved, including hsa-mir-133 and breast cancer, hsa-mir-29 and colon cancer, etc., and the predicted association has not been reported in the current literature, but there is a great possibility of association.
TABLE 1 SHGAEMDA predicted first 20 potential miRNAs associated with lung cancer
Figure BDA0004080062480000141
TABLE 2 SHGAEMDA predicted first 20 potential miRNAs associated with breast cancer
Figure BDA0004080062480000151
TABLE 3 SHGAEMDA predicted first 20 potential miRNAs associated with colon cancer
Figure BDA0004080062480000161
The terms describing the positional relationship in the drawings are merely illustrative, and are not to be construed as limiting the present patent;
it should be apparent that the above examples of the present invention are merely illustrative of the present invention and are not intended to limit the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims (11)

1. An enhanced hypergraph convolution self-coding algorithm-based miRNA-disease association prediction method is required to comprise the following steps:
s1: acquiring an adjacent matrix A of miRNA-diseases, and describing the association relationship between the miRNA-diseases by the adjacent matrix A of the miRNA-diseases;
s2: respectively calculating miRNA Gaussian similarity MG, miRNA cosine similarity MC, disease Gaussian similarity DG and disease cosine similarity DC through an adjacent matrix A, and respectively taking the miRNA Gaussian similarity MG, the miRNA cosine similarity MC, the disease Gaussian similarity DG and the disease cosine similarity DC as the characteristics of miRNA and the characteristics of diseases;
s3: acquiring graph hidden association information by using weighted K neighbor according to Gaussian similarity, so as to obtain an enhanced hypergraph association matrix;
s4: mapping the features of miRNA and the features of diseases to the same domain space through a full-connection layer, and training by using a hypergraph convolution mode to obtain embedded features;
s5: the embedded features are decoded using a bilinear decoder, and their associated scores are calculated.
2. The miRNA-disease associated prediction method according to claim 1, wherein: the specific generation mode of the adjacent matrix A of the miRNA-disease in the step S1 is as follows:
obtaining an miRNA-disease association data set from a human micro RNA disease database, wherein the human micro RNA disease database comprises 1206 miRNAs and 893 diseases, and the association relationship is 35547; 913 mirnas and 554 diseases were retained after deletion of non-human data, and the miRNA-disease adjacency matrix could be expressed as
Figure FDA0004080062450000011
Wherein n is m And n d Expressed as the number of mirnas and the number of diseases, respectively; miRNA-disease adjacency matrix a is defined as follows:
Figure FDA0004080062450000012
3. the miRNA-disease associated prediction method according to claim 1, wherein: the method for calculating the gaussian similarity in step S2 may calculate the gaussian similarity MG of the miRNA, specifically:
calculating the frequency width of the Gaussian interaction characteristic kernel similarity:
Figure FDA0004080062450000021
n m represents the number of mirnas; vector A (i: representing row i of the miRNA-disease adjacency matrix A, wherein 1 in each row represents that the miRNA has an association relationship with the disease;
calculating the association similarity between each miRNA:
MG(i,j)=exp(-γ i ||A(i,:)-A(j,:)|| 2 )
MG (i, j) represents a similarity between the i-th miRNA and the j-th miRNA; exp is expressed as a power function of e as the base; vector A (i: representing row i of the miRNA-disease adjacency matrix A and vector A (j: representing row j of the miRNA-disease adjacency matrix A).
4. The miRNA-disease associated prediction method according to claim 3, wherein: the calculation method for calculating the disease Gaussian similarity DG is the same as the calculation method for the miRNA Gaussian similarity MG, wherein the matrix used is the transpose matrix of the miRNA-disease adjacent matrix A.
5. The miRNA-disease associated prediction method according to claim 1, wherein: and (2) calculating the miRNA cosine similarity as MC by adopting the method for calculating the cosine similarity in the step (S2), wherein the method specifically comprises the following steps:
according to cosine included angles of the two vectors, similarity of the two vectors is measured, and the smaller the included angle is, the more similar is; for miRNA-miRNA association, the similar calculations are:
Figure FDA0004080062450000022
MC (i, j) represents cosine similarity between the i-th miRNA and the j-th miRNA; vector A (i: representing row i of the miRNA-disease adjacency matrix A, vector A (j: representing row j of the miRNA-disease adjacency matrix A), and symbol represents the inner product between the two vectors.
6. The miRNA-disease associated prediction method according to claim 5, wherein: the calculation method for calculating the disease cosine similarity DC is the same as the calculation method for the miRNA cosine similarity MC, wherein the matrix used is the transpose matrix of the miRNA-disease adjacency matrix A.
7. The miRNA-disease associated prediction method according to claim 1, wherein: in the step S3, the weighted K nearest neighbor is used to obtain the graph hidden association information, so as to obtain the enhanced hypergraph association matrix, which uses the miRNA gaussian similarity MG and the disease gaussian similarity DG, specifically defined as:
preprocessing the original adjacent matrix A to obtain a hidden relation; defining a vector A (i: representing an ith row of the miRNA-disease adjacency matrix A, representing the association of miRNAs, and a vector A (j: representing a jth column of the miRNA-disease adjacency matrix A, representing the association of diseases);
for each miRNA, the k most relevant similar miRNAs are obtained from the Gaussian similarity, let m j The sequence number of the jth miRNA which represents the similarity is calculated, and the new association vector is calculated as follows:
Figure FDA0004080062450000031
wherein A is m (i) representing the new association relationship of the ith row calculated by miRNA, Q m =∑ 1≤j≤K MG(i,m j ) Is a normalization term; w (w) j =α j-1 MG(i,m j ) Wherein alpha is a decay term, and the value of alpha is [ epsilon ] [0,1 ]];
Similarly, a new association vector can be calculated for each disease, and the formula is as follows:
Figure FDA0004080062450000032
wherein A is d (i) represents the calculated new association relationship of the ith column, let d j Sequence number, Q, of the j-th disease showing similarity thereto d =∑ 1≤j≤K DG(i,d j ) Is a normalization term; w (w) j =α j-1 DG(i,m j ) Wherein alpha is a decay term, and the value of alpha is [ epsilon ] [0,1 ]];
Finally, obtaining the hidden association information of the graph, and obtaining an enhanced hypergraph association matrix through the following formula:
Figure FDA0004080062450000033
8. the miRNA-disease associated prediction method according to claim 1, wherein: in the step S4, the miRNA and the disease feature are mapped to the same domain space, specifically:
the feature matrix of the features of miRNAs can be expressed as [ MG, Z ] d ,MC,Z d ]The characteristics of a cognate disease can be expressed as [ Z m ,DG,Z m ,DC]Wherein Z is d ,Z m The columns are respectively represented as a 0 matrix of disease length and miRNA length; respectively passing miRNA and disease characteristics through a linear layer, performing linear dimension reduction, and finally passing through a relu activation function; can obtain the embedded E of miRNA m And embedding of diseases E d
9. The miRNA-disease associated prediction method according to claim 8, wherein: the method adopting hypergraph convolution in the step S4 is specifically defined as:
according to the association relation A of miRNA-diseases, constructing an miRNA superside relation, and assuming that the node of the initial miRNA is characterized in that:
Figure FDA0004080062450000041
the superside feature of the miRNA is associated with all nodes within the superside; the superside characteristics of the miRNA obtained by the method are as follows:
Figure FDA0004080062450000042
wherein the method comprises the steps of
Figure FDA0004080062450000043
Node characteristics of miRNA hypergraph representing layer I, B -1 Is A T Is a normalized matrix of (a);
the following hypergraph convolution equation is defined according to the equation of graph convolution:
Figure FDA0004080062450000044
expanding the formula yields the following formula:
Figure FDA0004080062450000045
therefore, the feature of the node flows in the global, so that a data transfer matrix W can be trained;
similarly, the characteristics of a node of a disease are obtained
Figure FDA0004080062450000046
10. The miRNA-disease associated prediction method according to claim 1, wherein: in the step S5, the embedded feature is decoded by using a bilinear decoder, and the correlation score is calculated, where the bilinear decoder is implemented specifically as follows:
Figure FDA0004080062450000051
embedding coded miRNAs and diseases as input of a decoder, and treating different interaction types as different categories through a bilinear decoder; calculation of miRNAm i And disease d j Is a correlation score of (2):
Figure FDA0004080062450000052
wherein W is r Is a matrix of trainable parameters whose parameter dimension is the embedded dimension of the input, where R e r= [0,1]。
11. The miRNA-disease association prediction method according to claim 10, wherein: in order to optimize the result, cross iteration entropy is adopted as a loss function, and loss of a predicted result and a true value is calculated; global network parameters are optimized by Adam as a gradient optimizer.
CN202310121363.XA 2023-02-14 2023-02-14 miRNA-disease association prediction method based on enhanced hypergraph convolution self-coding algorithm Pending CN116343927A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310121363.XA CN116343927A (en) 2023-02-14 2023-02-14 miRNA-disease association prediction method based on enhanced hypergraph convolution self-coding algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310121363.XA CN116343927A (en) 2023-02-14 2023-02-14 miRNA-disease association prediction method based on enhanced hypergraph convolution self-coding algorithm

Publications (1)

Publication Number Publication Date
CN116343927A true CN116343927A (en) 2023-06-27

Family

ID=86890635

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310121363.XA Pending CN116343927A (en) 2023-02-14 2023-02-14 miRNA-disease association prediction method based on enhanced hypergraph convolution self-coding algorithm

Country Status (1)

Country Link
CN (1) CN116343927A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116844645A (en) * 2023-08-31 2023-10-03 云南师范大学 Gene regulation network inference method based on multi-view layered hypergraph
CN118506884A (en) * 2024-07-19 2024-08-16 山东大学 MiRNA-disease association relation prediction method, system, equipment and medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116844645A (en) * 2023-08-31 2023-10-03 云南师范大学 Gene regulation network inference method based on multi-view layered hypergraph
CN116844645B (en) * 2023-08-31 2023-11-17 云南师范大学 Gene regulation network inference method based on multi-view layered hypergraph
CN118506884A (en) * 2024-07-19 2024-08-16 山东大学 MiRNA-disease association relation prediction method, system, equipment and medium

Similar Documents

Publication Publication Date Title
Wang et al. LDGRNMF: LncRNA-disease associations prediction based on graph regularized non-negative matrix factorization
Lan et al. GANLDA: graph attention network for lncRNA-disease associations prediction
Liu et al. SMALF: miRNA-disease associations prediction based on stacked autoencoder and XGBoost
Yu et al. MCLPMDA: A novel method for mi RNA‐disease association prediction based on matrix completion and label propagation
Lei et al. A comprehensive survey on computational methods of non-coding RNA and disease association prediction
CN116343927A (en) miRNA-disease association prediction method based on enhanced hypergraph convolution self-coding algorithm
Wu et al. Inferring LncRNA-disease associations based on graph autoencoder matrix completion
Chen et al. Supervised machine learning model for high dimensional gene data in colon cancer detection
CN110556184B (en) Non-coding RNA and disease relation prediction method based on Hessian regular nonnegative matrix decomposition
Zhou et al. Predicting miRNA–Disease Associations Through Deep Autoencoder With Multiple Kernel Learning
CN108427865B (en) Method for predicting correlation between LncRNA and environmental factors
CN114334012A (en) Method for identifying cancer subtypes based on multigroup data
CN116230077A (en) Antiviral drug screening method based on restarting hypergraph double random walk
Khan et al. DeepGene transformer: Transformer for the gene expression-based classification of cancer subtypes
CN113539479B (en) Similarity constraint-based miRNA-disease association prediction method and system
Xi et al. Ldcmfc: predicting long non-coding rna and disease association using collaborative matrix factorization based on correntropy
Huang et al. Predicting Disease-Associated N7–Methylguanosine (m 7 G) Sites via Random Walk on Heterogeneous Network
CN113421614A (en) Tensor decomposition-based lncRNA-disease association prediction method
Ma et al. CRBP-HFEF: prediction of RBP-Binding sites on circRNAs based on hierarchical feature expansion and fusion
Zhang et al. miTDS: Uncovering miRNA-mRNA interactions with deep learning for functional target prediction
CN116092581A (en) Annular RNA marker prediction method based on natural semantic enhancement
Huang et al. Sequential reinforcement active feature learning for gene signature identification in renal cell carcinoma
Casalino et al. Evaluation of cognitive impairment in pediatric multiple sclerosis with machine learning: an exploratory study of miRNA expressions
CN115295156A (en) Method for predicting miRNA-disease based on relation graph convolution network fusion multi-source information
Lu et al. HCGCCDA: Prediction of circRNA-disease associations based on the combination of hypergraph convolution and graph convolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination