CN113539479A - Similarity constraint-based miRNA-disease association prediction method and system - Google Patents

Similarity constraint-based miRNA-disease association prediction method and system Download PDF

Info

Publication number
CN113539479A
CN113539479A CN202110730370.0A CN202110730370A CN113539479A CN 113539479 A CN113539479 A CN 113539479A CN 202110730370 A CN202110730370 A CN 202110730370A CN 113539479 A CN113539479 A CN 113539479A
Authority
CN
China
Prior art keywords
mirna
disease
matrix
similarity
similarity matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110730370.0A
Other languages
Chinese (zh)
Other versions
CN113539479B (en
Inventor
王红
余盛朋
梁成
王正军
杨杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Normal University
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN202110730370.0A priority Critical patent/CN113539479B/en
Publication of CN113539479A publication Critical patent/CN113539479A/en
Application granted granted Critical
Publication of CN113539479B publication Critical patent/CN113539479B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Pathology (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Mathematical Optimization (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Algebra (AREA)
  • Software Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a miRNA-disease association prediction method and system based on similarity constraint, wherein the method comprises the following steps: acquiring a miRNA-disease association matrix, a miRNA functional similarity matrix and a disease semantic similarity matrix; and based on a similarity constrained target function, taking the adjacency matrix, the miRNA function similarity matrix and the disease semantic similarity matrix as training data, and performing adaptive learning to obtain a new miRNA-disease association matrix. The invention synchronously uses similarity constraint learning to reveal the correlation between miRNA and diseases, and has good prediction performance and robustness.

Description

Similarity constraint-based miRNA-disease association prediction method and system
Technical Field
The invention belongs to the technical field of computer-aided disease diagnosis, and particularly relates to a miRNA-disease association prediction method and system based on similarity constraint.
Background
mirnas are small molecules in organisms that are about 20-24 nucleotides in length. It regulates the life processes of an organism by preventing degradation or translational inhibition of messenger rna (mrna). In recent years, a large number of studies have shown that mirnas play important roles in immune response, transcription, cell proliferation, cell differentiation, signal transduction, embryonic development, and the like. mutation and dysfunction of miRNA can cause various diseases, especially plays a significant role in the diagnosis, treatment and prognosis of cancer. The identification of the miRNA-disease association relationship refers to semi-supervised learning based on the existing biological data (including but not limited to miRNA functional similarity data, disease semantic similarity data and known human miRNA-disease association relationship data), and then through data training and iteration, an excellent prediction model is trained to predict and mine a new biological data relationship.
In the course of implementing the present disclosure, the inventors found that the following technical problems exist in the prior art:
the existing mining method based on biological experiments has the defects of high experimental cost, long experimental period and waste of a large amount of manpower and resources. Meanwhile, with the rapid development of information technology sequencing technology, various types of biological data show an explosive growth state. The traditional test method cannot rapidly mine general patterns and effective information meeting human requirements from the massive biological data, and the mining of miRNA-disease association relation in bioinformatics is greatly hindered.
Although there are many excellent methods and works based on computational models, they all achieve very good prediction performance and can be applied to large-scale biological databases for relationship mining, including graph-based topological similarity methods based on semi-supervised learning methods, machine learning based methods, graph neural network based methods, etc. The methods have good prediction performance and results, but also have some universal problems and challenges, for example, experimental data has noise, and extracted effective data is too little to cause insufficient model training to be applied to new association relationship identification. Secondly, the massive data are biological data with different dimensions and different characteristics, and how to utilize the cross-source heterogeneous data is also a great challenge.
Disclosure of Invention
In order to solve the problems, the invention discloses a miRNA-disease association relationship mining method and system based on similarity information constraint.
In order to achieve the above object, one or more embodiments of the present invention provide the following technical solutions:
a miRNA-disease association prediction method based on similarity constraint comprises the following steps:
acquiring a miRNA-disease association matrix, a miRNA functional similarity matrix and a disease semantic similarity matrix;
and based on a similarity constrained target function, taking the adjacency matrix, the miRNA function similarity matrix and the disease semantic similarity matrix as training data, and performing adaptive learning to obtain a new miRNA-disease association matrix.
Further, obtaining the miRNA-disease association matrix includes: and acquiring relation data of miRNA and diseases, and constructing an adjacency matrix.
Further, obtaining the disease semantic similarity matrix comprises:
acquiring disease semantic data, and constructing a directed acyclic graph, wherein nodes represent diseases, and directed edges among the nodes represent hierarchical relations among the diseases;
and calculating the semantic similarity between diseases by using the accumulated sum of the contribution values of the ancestor nodes to the node as the semantic value of the node to obtain a disease semantic similarity matrix.
Further, the objective function of the similarity constraint is:
Figure BDA0003139106590000031
Figure BDA0003139106590000032
wherein SM represents a new miRNA functional similarity matrix, SD represents a new disease semantic similarity matrix, F represents a new miRNA-disease association matrix, AM represents a miRNA functional similarity matrix, AD represents a disease semantic similarity matrix, and Fi,FjRespectively represent the association vectors of the ith miRNA and the jth miRNA with all diseases.
One or more embodiments provide a miRNA-disease association prediction system based on similarity constraints, comprising:
a true association data acquisition module configured to: acquiring a miRNA-disease incidence matrix;
a similarity matrix acquisition module configured to: acquiring a miRNA functional similarity matrix and a disease semantic similarity matrix;
an adaptive learning prediction module configured to: and based on a similarity constrained target function, taking the adjacency matrix, the miRNA function similarity matrix and the disease semantic similarity matrix as training data, and performing adaptive learning to obtain a new miRNA-disease association matrix.
Further, obtaining the miRNA-disease association matrix includes: and acquiring relation data of miRNA and diseases, and constructing an adjacency matrix.
Further, obtaining the disease semantic similarity matrix comprises:
acquiring disease semantic data, and constructing a directed acyclic graph, wherein nodes represent diseases, and directed edges among the nodes represent hierarchical relations among the diseases;
and calculating the semantic similarity between diseases by using the accumulated sum of the contribution values of the ancestor nodes to the node as the semantic value of the node to obtain a disease semantic similarity matrix.
Further, the objective function of the similarity constraint is:
Figure BDA0003139106590000041
Figure BDA0003139106590000042
wherein SM represents a new miRNA functional similarity matrix, SD represents a new disease semantic similarity matrix, F represents a new miRNA-disease association matrix, AM represents a miRNA functional similarity matrix, AD represents a disease semantic similarity matrix, and Fi,FjRespectively represent the association vectors of the ith miRNA and the jth miRNA with all diseases.
One or more embodiments provide an electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the miRNA-disease association prediction method when executing the program.
One or more embodiments provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the miRNA-disease association prediction method.
One or more technical schemes have the following technical effects:
in consideration of the disease semantic similarity and the sparsity and incompleteness of the miRNA function similarity matrix, the miRNA and disease similarity network based on the known similarity information in the technical scheme ensures the sufficient mining of the correlation between the subsequent miRNA and the disease.
Compared with the traditional prediction method, the framework can provide more stable performance and better prediction performance, and can be used for predicting new diseases.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
Fig. 1 is a frame diagram of a miRNA-disease association prediction method based on similarity constraint in an embodiment of the present invention;
FIG. 2 is a graph of the distribution of all diseases in the examples of the present invention;
FIG. 3 is a complex disease relationship network constructed in an embodiment of the present invention;
FIG. 4 is a diagram of a DGA computation model according to an embodiment of the present invention;
fig. 5 is a heterogeneous graph model in an embodiment of the present invention.
Detailed Description
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Example one
The embodiment discloses a miRNA-disease association prediction method based on similarity constraint, which predicts the association between diseases and miRNA by learning miRNA similar network and disease similar network and using a similarity constraint learning method. As shown in fig. 1, the method specifically comprises the following steps:
step 1: constructing a miRNA-disease association network according to the relation data of miRNA and diseases to obtain an adjacency matrix; constructing a miRNA functional similarity network according to the functional similarity between the miRNAs to obtain a miRNA functional similarity matrix; constructing a disease semantic similarity network according to semantic similarity among diseases to obtain a disease semantic similarity matrix;
(1) construction of miRNA-disease association network
In this example, the hmddv2.0 database was used to construct the heterogeneous network of miRNA and disease. The relationship of miRNAs to disease can be obtained directly from the homepage of hmddv 2.0. It contains 495 miRNAs and 383 diseases, including 5340 interactions. We use M ═ M1,m2,...,mnmAnd D ═ D1,d2,...,dndRepresents miRNA pool and disease pool, respectively. For more accurate representation, we use matrices
Figure BDA0003139106590000061
Representing the adjacency matrix of miRNA and disease constituents. In particular, if disease diAnd miRNA mjIn relation, then we will be YijThe value of (d) is set to 1 in the adjacency matrix, and if there is no relation, we set its value to 0. Thus, the ith row of the adjacency matrix Y represents disease diAnd all other miRNAs, with miRNA m in column j of adjacency matrix YjAnd feature vectors of all other diseases. As shown in fig. 2, the HMDD data set contains 15 types of diseases in total, and there are over 100 cancer-related diseases, which provides a solid data base for studying the association between miRNA and disease.
(2) Construction of functional similarity network of miRNA
If multiple miRNAs have functional similarities, they may cause the same disease. Conversely, if multiple diseases occur simultaneously, it is likely to be caused by abnormal expression of functionally similar miRNAs. Based on the above assumptions, we can calculate the similarity information of miRNA similar networks, and the data can be downloaded directly from the Internet (http:// www.cuilab.cn/files/images/cuilab/misim. zip). We use the adjacency matrix AM to represent the miRNA functional similarity matrix, where entity AMijRepresents miRNA miAnd miFunctional similarity between them.
(3) Building semantic similarity networks for diseases
The Mesh database is a disease classification database widely used by researchers, and can be used for mining potential relations among diseases. In recent years, the database is also used for constructing heterogeneous networks of miRNAs and diseases, and has good effect. To build a semantically similar network of diseases, we can download a data set from the Internet (http:// www.ncbi.nlm.nih.gov /). According to the definition of diseases in MeSH databases, each disease can be represented as a Directed Acyclic Graph (DAG), where nodes represent disease keywords. Hierarchical relationship or semantic association information between diseases can be described through directed edges between nodes. For any D, the semantic relationship between the candidate disease and the other disease is expressed as DAG ═ (D, t (D), e (D)), where t (D) is the set of nodes that contains all ancestor nodes, and e (D) is the set of edges that contains all edges that connect all ancestor nodes. Thus, the more items two diseases share in the DAG model, the more semantically they are similar. According to the above definition, the contribution value of disease D to the semantic value of another disease D can be calculated as follows:
Figure BDA0003139106590000071
here we define Δ as the semantic contribution parameter. After a lot of work, we found that setting Δ 0.5 is most suitable. The semantic contribution includes two parts: from self and from other diseases. For disease D, the contribution from itself is set to 1, and another disease DjWith D and DjThe distance between them decreases and increases. Thus, the semantic value of disease D is calculated according to the following formula:
DV(D)=∑t∈T(D)Dd(t) (2)
then, we define disease djThe semantic similarity calculation method with the disease dj is as follows:
Figure BDA0003139106590000081
according to equation (3), we can compute the semantics of a disease based on the semantic similarity network of the diseaseA similarity matrix AD, wherein ADijIndicates a disease diAnd disease djSemantic similarity values between them. As shown in fig. 3, all diseases constitute a disease-like network in which diseases of the same color form a disease cluster. In addition, the disease similarity network also indicates that two diseases in the same disease group have similar semantic similarity. The process of calculating semantic similarity of diseases based on the DGA model is shown in FIG. 4.
Step 2: and based on a similarity constrained target function, taking the adjacency matrix, the miRNA function similarity matrix and the disease semantic similarity matrix as training data, and performing adaptive learning to obtain a new miRNA-disease association matrix.
In order to effectively distinguish the correlation between miRNA and disease, a novel prediction method RSMDA based on robust similarity constraint learning is provided. Our goal is to obtain a reliable indicator matrix
Figure BDA0003139106590000082
Can reflect the association probability between certain miRNAs and diseases. Further, the objective function needs to satisfy the following two conditions: (1) giving an initial similarity matrix, and learning a new information similarity matrix in a self-adaptive manner; (2) both miRNA space and disease space should be considered in the learning process. For the first requirement, to avoid the case where some rows of the learning matrix are all zero, we add another constraint so that the sum of the learning matrices for each row is equal to one. Therefore, we first define the optimization function in the miRNA space using miRNA similarity constraint learning as follows:
Figure BDA0003139106590000083
as can be seen from equation (4), the first term
Figure BDA0003139106590000091
Meaning that the new miRNA similarity matrix SM learned should approach miRNA functional similarity AM. Second item
Figure BDA0003139106590000092
Indicating that the greater the similarity between miRNA i and miRNAj, the smaller the difference in their feature vectors. Also, we can define the objective function of the disease as follows:
Figure BDA0003139106590000093
from the formula (5), the first term
Figure BDA0003139106590000094
The similarity matrix SD of the new disease should be close to the semantic similarity AD of the disease, the second term
Figure BDA0003139106590000095
It is shown that the greater the similarity between disease i and disease j, the smaller the difference between the feature vectors. From the formula (4) and the formula (5), we learn two optimal similarity matrices of miRNAs and diseases respectively by using a similarity constraint learning method respectively. Then, according to a second requirement, the two optimization functions are unified by using a similar constraint framework. The overall optimization formula is as follows:
Figure BDA0003139106590000096
Figure BDA0003139106590000097
from the formula (6), the first two terms adaptively learn a new similarity matrix in the miRNA space, the third term and the fourth term learn a new similarity matrix in the disease space, and the last term is used for constraining the predicted miRNA disease association to be consistent with the basic fact. The problem solved by the model is shown in fig. 5, and unknown information mining and prediction are performed based on a heterogeneous graph model.
As shown in equation (6), the objective function does not simultaneously scale to three variables SD, SM, and F. An efficient alternative optimization algorithm is designed, and the problem is solved iteratively. Specifically, we optimize one variable by modifying the other variables.
(1) Updating SM by fixing SD and F, an iterative formula for SM can be obtained by deriving equation (4), which can be written in the form:
Figure BDA0003139106590000101
note that the optimization process for each column of SM is independent for different values of i; thus, we can update each column separately as follows:
Figure BDA0003139106590000102
wherein
Figure BDA0003139106590000103
Denotes diThe jth element of the vector, equation (8), can be written in vector form, as follows:
Figure BDA0003139106590000104
wherein SMiAnd AMiColumns i represent SM and AM, respectively. This problem can be solved by an efficient iterative algorithm.
(2) SD is updated by fixing SM and F. By deriving the formula (5), an optimization iterative formula of SD can be obtained, the optimization process of SD is the same as SM, and the objective function for optimizing SD is as follows:
Figure BDA0003139106590000105
(3) updating F by fixing SM and SD and introducing laplacian matrix, equation (6) is converted to:
Figure BDA0003139106590000106
wherein L isSM=DSM-(SMT+ SM)/2 is the corresponding laplace matrix,
Figure BDA0003139106590000107
is defined as the ith diagonal element as sigmaj(SMij+SMji) Degree matrix of diagonal matrix,/2, LSDDefinition of (A) and LSMThe same definition is applied.
By differentiating equation (11) from F and setting it to zero, we obtain:
(αLSM+γI)F+βFLSD-γY=0 (12)
it is apparent that equation (12) is a sierweister equation, which is easily solved. Algorithm 1 summarizes the overall process of the method.
Algorithm 1 RSCMDA solving process
Figure BDA0003139106590000111
Lung cancer, also known as bronchogenic carcinoma, is a common primary malignancy of the respiratory tract in the lung. In recent years, the incidence of cancer worldwide has increased year by year. Therefore, early detection of prognostic and predictive biomarkers associated with lung tumors is of profound significance for the treatment of lung tumors. The role of miRNAs in lung tumor cell progression and drug resistance has been extensively studied. For example, various miRNAs such as hsa-mir-155, hsa-mir-17-3p, hsa-let-7a-2, hsa-mir-145 and hsa-mir-21 were found to be differentially expressed in LN tissue and corresponding non-cancerous lung tissue, and used for further diagnosis and clinical treatment. We used the association information of mirnas with disease provided in hmddv2.0 as training data, and then predicted the first 50 mirnas most correlated with lung tumors using the RSCMDA model [45 ]. The predicted results were then validated against four additional disease-related miRNAs databases. Results (table 1), at least one database from dbDEMC, miR2Disease, miRwayDB and PhenomiR demonstrated that all the first 50 predicted miRNAs were associated with lung tumors. Therefore, the results show that the RSCMDA can accurately predict the relation between the miRNA and the disease.
Table 1 prediction of the first 50 lung tumor-associated miRNAs based on known HMDD correlation
Figure BDA0003139106590000121
Figure BDA0003139106590000131
Wherein, I, II, III and IV respectively represent dbDEMC, miR2Disease, mirwayDB and PhenomiR. The first and third columns record 1-25 and 26-50 related miRNAs, respectively.
Example two
The embodiment provides a miRNA-disease association prediction system based on similarity constraint, including:
a true association data acquisition module configured to: acquiring a miRNA-disease incidence matrix;
a similarity matrix acquisition module configured to: acquiring a miRNA functional similarity matrix and a disease semantic similarity matrix;
an adaptive learning prediction module configured to: and based on a similarity constrained target function, taking the adjacency matrix, the miRNA function similarity matrix and the disease semantic similarity matrix as training data, and performing adaptive learning to obtain a new miRNA-disease association matrix.
EXAMPLE III
The embodiment aims at providing an electronic device.
An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the program, comprising:
acquiring a miRNA-disease association matrix, a miRNA functional similarity matrix and a disease semantic similarity matrix;
and based on a similarity constrained target function, taking the adjacency matrix, the miRNA function similarity matrix and the disease semantic similarity matrix as training data, and performing adaptive learning to obtain a new miRNA-disease association matrix.
Example four
An object of the present embodiment is to provide a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, performs the steps of:
acquiring a miRNA-disease association matrix, a miRNA functional similarity matrix and a disease semantic similarity matrix;
and based on a similarity constrained target function, taking the adjacency matrix, the miRNA function similarity matrix and the disease semantic similarity matrix as training data, and performing adaptive learning to obtain a new miRNA-disease association matrix.
Experimental verification
To verify the predictive power of RSCMDA. We measure the predicted performance of RSCMDA from multiple angles using different evaluation criteria.
1. Experimental setup
Through multiple test optimization, the finally proposed RSCMDA model experiment adopts the following parameter setting.
TABLE 2 parameter configuration
Parameter(s) Value of
α 1e-4
β 1e-4
γ 1
2. Results of the experiment
We used the leave-one-out-cross-validation (LOOCV) method to evaluate the performance of RSCMDA. Specifically, the LOOCV method can be divided into two categories: global LOOCV and local LOOCV. They have in common that they retain one known miRNA disease-associated sample at a time for detection, assuming the other samples are unknown samples, and then use RSCMDA for prediction. And after the prediction result is obtained, comparing the scores of all the tested samples with the scores of the unknown samples one by one, and sequencing the scores from high to low. To more intuitively describe the experimental results, the performance of the RSCMDA algorithm was compared to other advanced methods using the receiver operator characteristic curve (ROC curve). The ROC curve was plotted using the True Positive Ratio (TPR) and the false positive ratio (FPT). Taking miRNA disease association prediction herein as an example, for each threshold K (0< K <100), the true positive rate indicates the proportion of K% of known associations prior to the prediction result occupying the known associations for detection, and the false positive rate indicates the proportion of K% of unknown associations prior to the prediction result occupying the unknown associations for testing. To compare models more intuitively, the area under the ROC curve (AUC) was used as a criterion to measure predictive performance.
We compared RSCMDA with other seven predictive frameworks: EGBMMDA, MCMDA, HGIMDA, PBMDA, WBSMDA, HDMP, RLSMDA. Experimental results showed that the AUC of EGBMMDA, MCMDA, HGIMDA, PBMDA, WBSMDA, HDMP, and RLSMDA in global LOOCV were 0.9123, 0.8749, 0.8781, 0.9169, 0.8030, 0.8366, and 0.8426, respectively (fig. 4). In the frame of local LOOCV (fig. 5), they obtained AUC of 0.8221, 0.7718, 0.8077, 0.8341, 0.8031, 0.7702 and 0.6953, respectively.
Next, the performance of RSCMDA in inferring new Disease unrelated to any miRNA in heterogeneous networks was verified using Leave One Disease Out Cross differentiation (LODOCV). In particular, for any candidate disease, we remove all known information about the relevant miRNAs and then use the information about other disease-related miRNAs to prioritize all candidate miRNAs. Because there is no information associated with the disease being investigated, LODOCV is more rigorous and better able to assess the risk of overfitting than the cross-validation framework described above. Under the LODOCV framework, we also use AUC values to validate the capabilities of all methods. Of all methods, our method achieved the highest AUC value of 0.815 in the LODOCV framework. We did not demonstrate the performance of the other comparative methods because all methods yielded AUC values less than 0.5.
Finally, to further demonstrate the predictive and generalization capabilities of RSCMDA on real datasets, we applied RSCMDA in the old version of HMDD (v 1.0). We then used the latest version of HMDD (v2.0) to validate the correlation between predicted mirnas and disease. In particular, in screened HMDD (v 1.0). For each method of comparison, we selected the first N predicted miRNAs with N values between 2000 and 10000, spaced 2000 apart. Then, we count the confirmed true candidates recorded in HMDD (v 2.0). RSCMDA can also recognize more disease-associated miRNAs than the other five calculation methods. In conclusion, the verification results prove that the RSMDA can accurately mine miRNAs related to diseases.
In one or more embodiments, based on the assumption that mirnas with similar functions often cause the same disease, based on the miRNA and the disease association network, the miRNA functional similarity network, and the disease semantic similarity network, the proposed similarity constraint-based objective function is used for solving, so that high-precision miRNA-disease relationship prediction is realized. Specifically, our method will adaptively learn a new information affinity network based on known affinity information during the optimization process. Furthermore, we propose a unified constraint framework to update the predicted results from miRNA and disease spaces simultaneously, rather than learning the results separately from miRNA and disease spaces, which can provide more robust performance.
Those skilled in the art will appreciate that the modules or steps of the present invention described above can be implemented using general purpose computer means, or alternatively, they can be implemented using program code that is executable by computing means, such that they are stored in memory means for execution by the computing means, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A miRNA-disease association prediction method based on similarity constraint is characterized by comprising the following steps:
acquiring a miRNA-disease association matrix, a miRNA functional similarity matrix and a disease semantic similarity matrix;
and based on a similarity constrained target function, taking the adjacency matrix, the miRNA function similarity matrix and the disease semantic similarity matrix as training data, and performing adaptive learning to obtain a new miRNA-disease association matrix.
2. The miRNA-disease association prediction method of claim 1, wherein obtaining the miRNA-disease association matrix comprises: and acquiring relation data of miRNA and diseases, and constructing an adjacency matrix.
3. The miRNA-disease association prediction method of claim 1, wherein obtaining a disease semantic similarity matrix comprises:
acquiring disease semantic data, and constructing a directed acyclic graph, wherein nodes represent diseases, and directed edges among the nodes represent hierarchical relations among the diseases;
and calculating the semantic similarity between diseases by using the accumulated sum of the contribution values of the ancestor nodes to the node as the semantic value of the node to obtain a disease semantic similarity matrix.
4. The miRNA-disease association prediction method of claim 1, wherein the similarity-constrained objective function is:
Figure FDA0003139106580000011
Figure FDA0003139106580000012
wherein SM represents a new miRNA functional similarity matrix, SD represents a new disease semantic similarity matrix, F represents a new miRNA-disease association matrix, AM represents a miRNA functional similarity matrix, AD represents a disease semantic similarity matrix, and Fi,FjRespectively represent the association vectors of the ith miRNA and the jth miRNA with all diseases.
5. A miRNA-disease association prediction system based on similarity constraints, comprising:
a true association data acquisition module configured to: acquiring a miRNA-disease incidence matrix;
a similarity matrix acquisition module configured to: acquiring a miRNA functional similarity matrix and a disease semantic similarity matrix;
an adaptive learning prediction module configured to: and based on a similarity constrained target function, taking the adjacency matrix, the miRNA function similarity matrix and the disease semantic similarity matrix as training data, and performing adaptive learning to obtain a new miRNA-disease association matrix.
6. The miRNA-disease association prediction system of claim 5, wherein obtaining the miRNA-disease association matrix comprises: and acquiring relation data of miRNA and diseases, and constructing an adjacency matrix.
7. The miRNA-disease association prediction system of claim 5, wherein obtaining a disease semantic similarity matrix comprises:
acquiring disease semantic data, and constructing a directed acyclic graph, wherein nodes represent diseases, and directed edges among the nodes represent hierarchical relations among the diseases;
and calculating the semantic similarity between diseases by using the accumulated sum of the contribution values of the ancestor nodes to the node as the semantic value of the node to obtain a disease semantic similarity matrix.
8. The miRNA-disease association prediction system of claim 5, wherein the similarity-constrained objective function is:
Figure FDA0003139106580000021
Figure FDA0003139106580000022
wherein SM represents a new miRNA functional similarity matrix, SD represents a new disease semantic similarity matrix, F represents a new miRNA-disease association matrix, AM represents a miRNA functional similarity matrix, AD represents a disease semantic similarity matrix, and Fi,FjRespectively represent the association vectors of the ith miRNA and the jth miRNA with all diseases.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, implements the miRNA-disease association prediction method of any one of claims 1-4.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the miRNA-disease associated prediction method according to any one of claims 1-4.
CN202110730370.0A 2021-06-29 2021-06-29 Similarity constraint-based miRNA-disease association prediction method and system Active CN113539479B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110730370.0A CN113539479B (en) 2021-06-29 2021-06-29 Similarity constraint-based miRNA-disease association prediction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110730370.0A CN113539479B (en) 2021-06-29 2021-06-29 Similarity constraint-based miRNA-disease association prediction method and system

Publications (2)

Publication Number Publication Date
CN113539479A true CN113539479A (en) 2021-10-22
CN113539479B CN113539479B (en) 2024-05-07

Family

ID=78097207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110730370.0A Active CN113539479B (en) 2021-06-29 2021-06-29 Similarity constraint-based miRNA-disease association prediction method and system

Country Status (1)

Country Link
CN (1) CN113539479B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114093527A (en) * 2021-12-01 2022-02-25 中国科学院新疆理化技术研究所 Drug relocation method and system based on spatial similarity constraint and non-negative matrix factorization
CN115966252A (en) * 2023-02-12 2023-04-14 汤永 Antiviral drug screening method based on L1norm diagram

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506608A (en) * 2017-09-29 2017-12-22 杭州电子科技大学 A kind of improved miRNA disease association Forecasting Methodologies based on collaborative filtering
CN109935332A (en) * 2019-03-01 2019-06-25 桂林电子科技大学 A kind of miRNA- disease association prediction technique based on double random walk models
CN110400600A (en) * 2019-08-01 2019-11-01 枣庄学院 A kind of disease associated prediction technique of miRNA- based on rotation forest algorithm
CN110782948A (en) * 2019-10-18 2020-02-11 湖南大学 Method for predicting potential association of miRNA and disease based on constraint probability matrix decomposition method
CN112183837A (en) * 2020-09-22 2021-01-05 曲阜师范大学 miRNA and disease association relation prediction method based on self-coding model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506608A (en) * 2017-09-29 2017-12-22 杭州电子科技大学 A kind of improved miRNA disease association Forecasting Methodologies based on collaborative filtering
CN109935332A (en) * 2019-03-01 2019-06-25 桂林电子科技大学 A kind of miRNA- disease association prediction technique based on double random walk models
CN110400600A (en) * 2019-08-01 2019-11-01 枣庄学院 A kind of disease associated prediction technique of miRNA- based on rotation forest algorithm
CN110782948A (en) * 2019-10-18 2020-02-11 湖南大学 Method for predicting potential association of miRNA and disease based on constraint probability matrix decomposition method
CN112183837A (en) * 2020-09-22 2021-01-05 曲阜师范大学 miRNA and disease association relation prediction method based on self-coding model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李建伟;杜燕波;张山;宋子健;郑希强;: "microRNAs功能相似性计算方法研究进展", 生理科学进展, no. 03 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114093527A (en) * 2021-12-01 2022-02-25 中国科学院新疆理化技术研究所 Drug relocation method and system based on spatial similarity constraint and non-negative matrix factorization
CN115966252A (en) * 2023-02-12 2023-04-14 汤永 Antiviral drug screening method based on L1norm diagram
CN115966252B (en) * 2023-02-12 2024-01-19 中国人民解放军总医院 Antiviral drug screening method based on L1norm diagram

Also Published As

Publication number Publication date
CN113539479B (en) 2024-05-07

Similar Documents

Publication Publication Date Title
Albaradei et al. Machine learning and deep learning methods that use omics data for metastasis prediction
Lee et al. Review of statistical methods for survival analysis using genomic data
Pandey et al. Incorporating functional inter-relationships into protein function prediction algorithms
CN113539479B (en) Similarity constraint-based miRNA-disease association prediction method and system
CN113764034B (en) Method, device, equipment and medium for predicting potential BGC in genome sequence
Kaur et al. Prediction of enhancers in DNA sequence data using a hybrid CNN-DLSTM model
CN110556184B (en) Non-coding RNA and disease relation prediction method based on Hessian regular nonnegative matrix decomposition
Yu et al. Predicting protein complex in protein interaction network-a supervised learning based method
CN113299338A (en) Knowledge graph-based synthetic lethal gene pair prediction method, system, terminal and medium
Lei et al. Relational completion based non-negative matrix factorization for predicting metabolite-disease associations
CN115132273A (en) Method and system for evaluating tumor formation risk and tumor tissue source
Teixeira et al. Learning influential genes on cancer gene expression data with stacked denoising autoencoders
CN114999635A (en) circRNA-disease association relation prediction method based on graph convolution neural network and node2vec
CN117912570B (en) Classification feature determining method and system based on gene co-expression network
CN117594243A (en) Ovarian cancer prognosis prediction method based on cross-modal view association discovery network
CN116343927A (en) miRNA-disease association prediction method based on enhanced hypergraph convolution self-coding algorithm
CN116779034A (en) miRNA and disease association prediction method, equipment and storage medium
Patra et al. Deep learning methods for scientific and industrial research
CN110837853A (en) Rapid classification model construction method
Uthayan A novel microarray gene selection and classification using intelligent dynamic grey wolf optimization
Guzzi et al. A discussion on the biological relevance of clustering results
CN118296442B (en) Multiple-study cancer subtype classification method, system, device, medium and program product
WO2023150898A1 (en) Method for identifying chromatin structural characteristic from hi-c matrix, non-transitory computer readable medium storing program for identifying chromatin structural characteristic from hi-c matrix
Xie et al. Statistical methods for integrating multiple types of high-throughput data
Xiang Statistical Approach for Functionally Validating Transcription Factor Bindings Using Population SNP and Gene Expression Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant