CN113539479A - Similarity constraint-based miRNA-disease association prediction method and system - Google Patents
Similarity constraint-based miRNA-disease association prediction method and system Download PDFInfo
- Publication number
- CN113539479A CN113539479A CN202110730370.0A CN202110730370A CN113539479A CN 113539479 A CN113539479 A CN 113539479A CN 202110730370 A CN202110730370 A CN 202110730370A CN 113539479 A CN113539479 A CN 113539479A
- Authority
- CN
- China
- Prior art keywords
- mirna
- disease
- matrix
- similarity
- similarity matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 201000010099 disease Diseases 0.000 title claims abstract description 199
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 title claims abstract description 199
- 238000000034 method Methods 0.000 title claims abstract description 47
- 239000011159 matrix material Substances 0.000 claims abstract description 122
- 108091070501 miRNA Proteins 0.000 claims abstract description 102
- 239000002679 microRNA Substances 0.000 claims abstract description 96
- 230000006870 function Effects 0.000 claims abstract description 31
- 230000003044 adaptive effect Effects 0.000 claims abstract description 12
- 238000012549 training Methods 0.000 claims abstract description 12
- 239000013598 vector Substances 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 6
- 238000005457 optimization Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 8
- 238000005065 mining Methods 0.000 description 7
- 208000037841 lung tumor Diseases 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 206010028980 Neoplasm Diseases 0.000 description 4
- 201000011510 cancer Diseases 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 238000003745 diagnosis Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 208000020816 lung neoplasm Diseases 0.000 description 3
- 238000011282 treatment Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- 238000002790 cross-validation Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 210000004072 lung Anatomy 0.000 description 2
- 108020004999 messenger RNA Proteins 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000000092 prognostic biomarker Substances 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 206010059866 Drug resistance Diseases 0.000 description 1
- 108091070522 Homo sapiens let-7a-2 stem-loop Proteins 0.000 description 1
- 108091069002 Homo sapiens miR-145 stem-loop Proteins 0.000 description 1
- 108091065981 Homo sapiens miR-155 stem-loop Proteins 0.000 description 1
- 108091070489 Homo sapiens miR-17 stem-loop Proteins 0.000 description 1
- 108091070493 Homo sapiens miR-21 stem-loop Proteins 0.000 description 1
- 241000256602 Isoptera Species 0.000 description 1
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 1
- 108091008065 MIR21 Proteins 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 208000003362 bronchogenic carcinoma Diseases 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000024245 cell differentiation Effects 0.000 description 1
- 230000004663 cell proliferation Effects 0.000 description 1
- 238000012733 comparative method Methods 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000004064 dysfunction Effects 0.000 description 1
- 230000013020 embryo development Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000028993 immune response Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 201000005202 lung cancer Diseases 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000036210 malignancy Effects 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 238000004393 prognosis Methods 0.000 description 1
- 210000002345 respiratory system Anatomy 0.000 description 1
- 230000019491 signal transduction Effects 0.000 description 1
- 150000003384 small molecules Chemical class 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 230000009752 translational inhibition Effects 0.000 description 1
- 210000004881 tumor cell Anatomy 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Pathology (AREA)
- Primary Health Care (AREA)
- Epidemiology (AREA)
- Mathematical Optimization (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Algebra (AREA)
- Software Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention provides a miRNA-disease association prediction method and system based on similarity constraint, wherein the method comprises the following steps: acquiring a miRNA-disease association matrix, a miRNA functional similarity matrix and a disease semantic similarity matrix; and based on a similarity constrained target function, taking the adjacency matrix, the miRNA function similarity matrix and the disease semantic similarity matrix as training data, and performing adaptive learning to obtain a new miRNA-disease association matrix. The invention synchronously uses similarity constraint learning to reveal the correlation between miRNA and diseases, and has good prediction performance and robustness.
Description
Technical Field
The invention belongs to the technical field of computer-aided disease diagnosis, and particularly relates to a miRNA-disease association prediction method and system based on similarity constraint.
Background
mirnas are small molecules in organisms that are about 20-24 nucleotides in length. It regulates the life processes of an organism by preventing degradation or translational inhibition of messenger rna (mrna). In recent years, a large number of studies have shown that mirnas play important roles in immune response, transcription, cell proliferation, cell differentiation, signal transduction, embryonic development, and the like. mutation and dysfunction of miRNA can cause various diseases, especially plays a significant role in the diagnosis, treatment and prognosis of cancer. The identification of the miRNA-disease association relationship refers to semi-supervised learning based on the existing biological data (including but not limited to miRNA functional similarity data, disease semantic similarity data and known human miRNA-disease association relationship data), and then through data training and iteration, an excellent prediction model is trained to predict and mine a new biological data relationship.
In the course of implementing the present disclosure, the inventors found that the following technical problems exist in the prior art:
the existing mining method based on biological experiments has the defects of high experimental cost, long experimental period and waste of a large amount of manpower and resources. Meanwhile, with the rapid development of information technology sequencing technology, various types of biological data show an explosive growth state. The traditional test method cannot rapidly mine general patterns and effective information meeting human requirements from the massive biological data, and the mining of miRNA-disease association relation in bioinformatics is greatly hindered.
Although there are many excellent methods and works based on computational models, they all achieve very good prediction performance and can be applied to large-scale biological databases for relationship mining, including graph-based topological similarity methods based on semi-supervised learning methods, machine learning based methods, graph neural network based methods, etc. The methods have good prediction performance and results, but also have some universal problems and challenges, for example, experimental data has noise, and extracted effective data is too little to cause insufficient model training to be applied to new association relationship identification. Secondly, the massive data are biological data with different dimensions and different characteristics, and how to utilize the cross-source heterogeneous data is also a great challenge.
Disclosure of Invention
In order to solve the problems, the invention discloses a miRNA-disease association relationship mining method and system based on similarity information constraint.
In order to achieve the above object, one or more embodiments of the present invention provide the following technical solutions:
a miRNA-disease association prediction method based on similarity constraint comprises the following steps:
acquiring a miRNA-disease association matrix, a miRNA functional similarity matrix and a disease semantic similarity matrix;
and based on a similarity constrained target function, taking the adjacency matrix, the miRNA function similarity matrix and the disease semantic similarity matrix as training data, and performing adaptive learning to obtain a new miRNA-disease association matrix.
Further, obtaining the miRNA-disease association matrix includes: and acquiring relation data of miRNA and diseases, and constructing an adjacency matrix.
Further, obtaining the disease semantic similarity matrix comprises:
acquiring disease semantic data, and constructing a directed acyclic graph, wherein nodes represent diseases, and directed edges among the nodes represent hierarchical relations among the diseases;
and calculating the semantic similarity between diseases by using the accumulated sum of the contribution values of the ancestor nodes to the node as the semantic value of the node to obtain a disease semantic similarity matrix.
Further, the objective function of the similarity constraint is:
wherein SM represents a new miRNA functional similarity matrix, SD represents a new disease semantic similarity matrix, F represents a new miRNA-disease association matrix, AM represents a miRNA functional similarity matrix, AD represents a disease semantic similarity matrix, and Fi,FjRespectively represent the association vectors of the ith miRNA and the jth miRNA with all diseases.
One or more embodiments provide a miRNA-disease association prediction system based on similarity constraints, comprising:
a true association data acquisition module configured to: acquiring a miRNA-disease incidence matrix;
a similarity matrix acquisition module configured to: acquiring a miRNA functional similarity matrix and a disease semantic similarity matrix;
an adaptive learning prediction module configured to: and based on a similarity constrained target function, taking the adjacency matrix, the miRNA function similarity matrix and the disease semantic similarity matrix as training data, and performing adaptive learning to obtain a new miRNA-disease association matrix.
Further, obtaining the miRNA-disease association matrix includes: and acquiring relation data of miRNA and diseases, and constructing an adjacency matrix.
Further, obtaining the disease semantic similarity matrix comprises:
acquiring disease semantic data, and constructing a directed acyclic graph, wherein nodes represent diseases, and directed edges among the nodes represent hierarchical relations among the diseases;
and calculating the semantic similarity between diseases by using the accumulated sum of the contribution values of the ancestor nodes to the node as the semantic value of the node to obtain a disease semantic similarity matrix.
Further, the objective function of the similarity constraint is:
wherein SM represents a new miRNA functional similarity matrix, SD represents a new disease semantic similarity matrix, F represents a new miRNA-disease association matrix, AM represents a miRNA functional similarity matrix, AD represents a disease semantic similarity matrix, and Fi,FjRespectively represent the association vectors of the ith miRNA and the jth miRNA with all diseases.
One or more embodiments provide an electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the miRNA-disease association prediction method when executing the program.
One or more embodiments provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the miRNA-disease association prediction method.
One or more technical schemes have the following technical effects:
in consideration of the disease semantic similarity and the sparsity and incompleteness of the miRNA function similarity matrix, the miRNA and disease similarity network based on the known similarity information in the technical scheme ensures the sufficient mining of the correlation between the subsequent miRNA and the disease.
Compared with the traditional prediction method, the framework can provide more stable performance and better prediction performance, and can be used for predicting new diseases.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
Fig. 1 is a frame diagram of a miRNA-disease association prediction method based on similarity constraint in an embodiment of the present invention;
FIG. 2 is a graph of the distribution of all diseases in the examples of the present invention;
FIG. 3 is a complex disease relationship network constructed in an embodiment of the present invention;
FIG. 4 is a diagram of a DGA computation model according to an embodiment of the present invention;
fig. 5 is a heterogeneous graph model in an embodiment of the present invention.
Detailed Description
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Example one
The embodiment discloses a miRNA-disease association prediction method based on similarity constraint, which predicts the association between diseases and miRNA by learning miRNA similar network and disease similar network and using a similarity constraint learning method. As shown in fig. 1, the method specifically comprises the following steps:
step 1: constructing a miRNA-disease association network according to the relation data of miRNA and diseases to obtain an adjacency matrix; constructing a miRNA functional similarity network according to the functional similarity between the miRNAs to obtain a miRNA functional similarity matrix; constructing a disease semantic similarity network according to semantic similarity among diseases to obtain a disease semantic similarity matrix;
(1) construction of miRNA-disease association network
In this example, the hmddv2.0 database was used to construct the heterogeneous network of miRNA and disease. The relationship of miRNAs to disease can be obtained directly from the homepage of hmddv 2.0. It contains 495 miRNAs and 383 diseases, including 5340 interactions. We use M ═ M1,m2,...,mnmAnd D ═ D1,d2,...,dndRepresents miRNA pool and disease pool, respectively. For more accurate representation, we use matricesRepresenting the adjacency matrix of miRNA and disease constituents. In particular, if disease diAnd miRNA mjIn relation, then we will be YijThe value of (d) is set to 1 in the adjacency matrix, and if there is no relation, we set its value to 0. Thus, the ith row of the adjacency matrix Y represents disease diAnd all other miRNAs, with miRNA m in column j of adjacency matrix YjAnd feature vectors of all other diseases. As shown in fig. 2, the HMDD data set contains 15 types of diseases in total, and there are over 100 cancer-related diseases, which provides a solid data base for studying the association between miRNA and disease.
(2) Construction of functional similarity network of miRNA
If multiple miRNAs have functional similarities, they may cause the same disease. Conversely, if multiple diseases occur simultaneously, it is likely to be caused by abnormal expression of functionally similar miRNAs. Based on the above assumptions, we can calculate the similarity information of miRNA similar networks, and the data can be downloaded directly from the Internet (http:// www.cuilab.cn/files/images/cuilab/misim. zip). We use the adjacency matrix AM to represent the miRNA functional similarity matrix, where entity AMijRepresents miRNA miAnd miFunctional similarity between them.
(3) Building semantic similarity networks for diseases
The Mesh database is a disease classification database widely used by researchers, and can be used for mining potential relations among diseases. In recent years, the database is also used for constructing heterogeneous networks of miRNAs and diseases, and has good effect. To build a semantically similar network of diseases, we can download a data set from the Internet (http:// www.ncbi.nlm.nih.gov /). According to the definition of diseases in MeSH databases, each disease can be represented as a Directed Acyclic Graph (DAG), where nodes represent disease keywords. Hierarchical relationship or semantic association information between diseases can be described through directed edges between nodes. For any D, the semantic relationship between the candidate disease and the other disease is expressed as DAG ═ (D, t (D), e (D)), where t (D) is the set of nodes that contains all ancestor nodes, and e (D) is the set of edges that contains all edges that connect all ancestor nodes. Thus, the more items two diseases share in the DAG model, the more semantically they are similar. According to the above definition, the contribution value of disease D to the semantic value of another disease D can be calculated as follows:
here we define Δ as the semantic contribution parameter. After a lot of work, we found that setting Δ 0.5 is most suitable. The semantic contribution includes two parts: from self and from other diseases. For disease D, the contribution from itself is set to 1, and another disease DjWith D and DjThe distance between them decreases and increases. Thus, the semantic value of disease D is calculated according to the following formula:
DV(D)=∑t∈T(D)Dd(t) (2)
then, we define disease djThe semantic similarity calculation method with the disease dj is as follows:
according to equation (3), we can compute the semantics of a disease based on the semantic similarity network of the diseaseA similarity matrix AD, wherein ADijIndicates a disease diAnd disease djSemantic similarity values between them. As shown in fig. 3, all diseases constitute a disease-like network in which diseases of the same color form a disease cluster. In addition, the disease similarity network also indicates that two diseases in the same disease group have similar semantic similarity. The process of calculating semantic similarity of diseases based on the DGA model is shown in FIG. 4.
Step 2: and based on a similarity constrained target function, taking the adjacency matrix, the miRNA function similarity matrix and the disease semantic similarity matrix as training data, and performing adaptive learning to obtain a new miRNA-disease association matrix.
In order to effectively distinguish the correlation between miRNA and disease, a novel prediction method RSMDA based on robust similarity constraint learning is provided. Our goal is to obtain a reliable indicator matrixCan reflect the association probability between certain miRNAs and diseases. Further, the objective function needs to satisfy the following two conditions: (1) giving an initial similarity matrix, and learning a new information similarity matrix in a self-adaptive manner; (2) both miRNA space and disease space should be considered in the learning process. For the first requirement, to avoid the case where some rows of the learning matrix are all zero, we add another constraint so that the sum of the learning matrices for each row is equal to one. Therefore, we first define the optimization function in the miRNA space using miRNA similarity constraint learning as follows:
as can be seen from equation (4), the first termMeaning that the new miRNA similarity matrix SM learned should approach miRNA functional similarity AM. Second itemIndicating that the greater the similarity between miRNA i and miRNAj, the smaller the difference in their feature vectors. Also, we can define the objective function of the disease as follows:
from the formula (5), the first termThe similarity matrix SD of the new disease should be close to the semantic similarity AD of the disease, the second termIt is shown that the greater the similarity between disease i and disease j, the smaller the difference between the feature vectors. From the formula (4) and the formula (5), we learn two optimal similarity matrices of miRNAs and diseases respectively by using a similarity constraint learning method respectively. Then, according to a second requirement, the two optimization functions are unified by using a similar constraint framework. The overall optimization formula is as follows:
from the formula (6), the first two terms adaptively learn a new similarity matrix in the miRNA space, the third term and the fourth term learn a new similarity matrix in the disease space, and the last term is used for constraining the predicted miRNA disease association to be consistent with the basic fact. The problem solved by the model is shown in fig. 5, and unknown information mining and prediction are performed based on a heterogeneous graph model.
As shown in equation (6), the objective function does not simultaneously scale to three variables SD, SM, and F. An efficient alternative optimization algorithm is designed, and the problem is solved iteratively. Specifically, we optimize one variable by modifying the other variables.
(1) Updating SM by fixing SD and F, an iterative formula for SM can be obtained by deriving equation (4), which can be written in the form:
note that the optimization process for each column of SM is independent for different values of i; thus, we can update each column separately as follows:
whereinDenotes diThe jth element of the vector, equation (8), can be written in vector form, as follows:
wherein SMiAnd AMiColumns i represent SM and AM, respectively. This problem can be solved by an efficient iterative algorithm.
(2) SD is updated by fixing SM and F. By deriving the formula (5), an optimization iterative formula of SD can be obtained, the optimization process of SD is the same as SM, and the objective function for optimizing SD is as follows:
(3) updating F by fixing SM and SD and introducing laplacian matrix, equation (6) is converted to:
wherein L isSM=DSM-(SMT+ SM)/2 is the corresponding laplace matrix,is defined as the ith diagonal element as sigmaj(SMij+SMji) Degree matrix of diagonal matrix,/2, LSDDefinition of (A) and LSMThe same definition is applied.
By differentiating equation (11) from F and setting it to zero, we obtain:
(αLSM+γI)F+βFLSD-γY=0 (12)
it is apparent that equation (12) is a sierweister equation, which is easily solved. Algorithm 1 summarizes the overall process of the method.
Lung cancer, also known as bronchogenic carcinoma, is a common primary malignancy of the respiratory tract in the lung. In recent years, the incidence of cancer worldwide has increased year by year. Therefore, early detection of prognostic and predictive biomarkers associated with lung tumors is of profound significance for the treatment of lung tumors. The role of miRNAs in lung tumor cell progression and drug resistance has been extensively studied. For example, various miRNAs such as hsa-mir-155, hsa-mir-17-3p, hsa-let-7a-2, hsa-mir-145 and hsa-mir-21 were found to be differentially expressed in LN tissue and corresponding non-cancerous lung tissue, and used for further diagnosis and clinical treatment. We used the association information of mirnas with disease provided in hmddv2.0 as training data, and then predicted the first 50 mirnas most correlated with lung tumors using the RSCMDA model [45 ]. The predicted results were then validated against four additional disease-related miRNAs databases. Results (table 1), at least one database from dbDEMC, miR2Disease, miRwayDB and PhenomiR demonstrated that all the first 50 predicted miRNAs were associated with lung tumors. Therefore, the results show that the RSCMDA can accurately predict the relation between the miRNA and the disease.
Table 1 prediction of the first 50 lung tumor-associated miRNAs based on known HMDD correlation
Wherein, I, II, III and IV respectively represent dbDEMC, miR2Disease, mirwayDB and PhenomiR. The first and third columns record 1-25 and 26-50 related miRNAs, respectively.
Example two
The embodiment provides a miRNA-disease association prediction system based on similarity constraint, including:
a true association data acquisition module configured to: acquiring a miRNA-disease incidence matrix;
a similarity matrix acquisition module configured to: acquiring a miRNA functional similarity matrix and a disease semantic similarity matrix;
an adaptive learning prediction module configured to: and based on a similarity constrained target function, taking the adjacency matrix, the miRNA function similarity matrix and the disease semantic similarity matrix as training data, and performing adaptive learning to obtain a new miRNA-disease association matrix.
EXAMPLE III
The embodiment aims at providing an electronic device.
An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the program, comprising:
acquiring a miRNA-disease association matrix, a miRNA functional similarity matrix and a disease semantic similarity matrix;
and based on a similarity constrained target function, taking the adjacency matrix, the miRNA function similarity matrix and the disease semantic similarity matrix as training data, and performing adaptive learning to obtain a new miRNA-disease association matrix.
Example four
An object of the present embodiment is to provide a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, performs the steps of:
acquiring a miRNA-disease association matrix, a miRNA functional similarity matrix and a disease semantic similarity matrix;
and based on a similarity constrained target function, taking the adjacency matrix, the miRNA function similarity matrix and the disease semantic similarity matrix as training data, and performing adaptive learning to obtain a new miRNA-disease association matrix.
Experimental verification
To verify the predictive power of RSCMDA. We measure the predicted performance of RSCMDA from multiple angles using different evaluation criteria.
1. Experimental setup
Through multiple test optimization, the finally proposed RSCMDA model experiment adopts the following parameter setting.
TABLE 2 parameter configuration
Parameter(s) | Value of |
α | 1e-4 |
β | 1e-4 |
|
1 |
2. Results of the experiment
We used the leave-one-out-cross-validation (LOOCV) method to evaluate the performance of RSCMDA. Specifically, the LOOCV method can be divided into two categories: global LOOCV and local LOOCV. They have in common that they retain one known miRNA disease-associated sample at a time for detection, assuming the other samples are unknown samples, and then use RSCMDA for prediction. And after the prediction result is obtained, comparing the scores of all the tested samples with the scores of the unknown samples one by one, and sequencing the scores from high to low. To more intuitively describe the experimental results, the performance of the RSCMDA algorithm was compared to other advanced methods using the receiver operator characteristic curve (ROC curve). The ROC curve was plotted using the True Positive Ratio (TPR) and the false positive ratio (FPT). Taking miRNA disease association prediction herein as an example, for each threshold K (0< K <100), the true positive rate indicates the proportion of K% of known associations prior to the prediction result occupying the known associations for detection, and the false positive rate indicates the proportion of K% of unknown associations prior to the prediction result occupying the unknown associations for testing. To compare models more intuitively, the area under the ROC curve (AUC) was used as a criterion to measure predictive performance.
We compared RSCMDA with other seven predictive frameworks: EGBMMDA, MCMDA, HGIMDA, PBMDA, WBSMDA, HDMP, RLSMDA. Experimental results showed that the AUC of EGBMMDA, MCMDA, HGIMDA, PBMDA, WBSMDA, HDMP, and RLSMDA in global LOOCV were 0.9123, 0.8749, 0.8781, 0.9169, 0.8030, 0.8366, and 0.8426, respectively (fig. 4). In the frame of local LOOCV (fig. 5), they obtained AUC of 0.8221, 0.7718, 0.8077, 0.8341, 0.8031, 0.7702 and 0.6953, respectively.
Next, the performance of RSCMDA in inferring new Disease unrelated to any miRNA in heterogeneous networks was verified using Leave One Disease Out Cross differentiation (LODOCV). In particular, for any candidate disease, we remove all known information about the relevant miRNAs and then use the information about other disease-related miRNAs to prioritize all candidate miRNAs. Because there is no information associated with the disease being investigated, LODOCV is more rigorous and better able to assess the risk of overfitting than the cross-validation framework described above. Under the LODOCV framework, we also use AUC values to validate the capabilities of all methods. Of all methods, our method achieved the highest AUC value of 0.815 in the LODOCV framework. We did not demonstrate the performance of the other comparative methods because all methods yielded AUC values less than 0.5.
Finally, to further demonstrate the predictive and generalization capabilities of RSCMDA on real datasets, we applied RSCMDA in the old version of HMDD (v 1.0). We then used the latest version of HMDD (v2.0) to validate the correlation between predicted mirnas and disease. In particular, in screened HMDD (v 1.0). For each method of comparison, we selected the first N predicted miRNAs with N values between 2000 and 10000, spaced 2000 apart. Then, we count the confirmed true candidates recorded in HMDD (v 2.0). RSCMDA can also recognize more disease-associated miRNAs than the other five calculation methods. In conclusion, the verification results prove that the RSMDA can accurately mine miRNAs related to diseases.
In one or more embodiments, based on the assumption that mirnas with similar functions often cause the same disease, based on the miRNA and the disease association network, the miRNA functional similarity network, and the disease semantic similarity network, the proposed similarity constraint-based objective function is used for solving, so that high-precision miRNA-disease relationship prediction is realized. Specifically, our method will adaptively learn a new information affinity network based on known affinity information during the optimization process. Furthermore, we propose a unified constraint framework to update the predicted results from miRNA and disease spaces simultaneously, rather than learning the results separately from miRNA and disease spaces, which can provide more robust performance.
Those skilled in the art will appreciate that the modules or steps of the present invention described above can be implemented using general purpose computer means, or alternatively, they can be implemented using program code that is executable by computing means, such that they are stored in memory means for execution by the computing means, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A miRNA-disease association prediction method based on similarity constraint is characterized by comprising the following steps:
acquiring a miRNA-disease association matrix, a miRNA functional similarity matrix and a disease semantic similarity matrix;
and based on a similarity constrained target function, taking the adjacency matrix, the miRNA function similarity matrix and the disease semantic similarity matrix as training data, and performing adaptive learning to obtain a new miRNA-disease association matrix.
2. The miRNA-disease association prediction method of claim 1, wherein obtaining the miRNA-disease association matrix comprises: and acquiring relation data of miRNA and diseases, and constructing an adjacency matrix.
3. The miRNA-disease association prediction method of claim 1, wherein obtaining a disease semantic similarity matrix comprises:
acquiring disease semantic data, and constructing a directed acyclic graph, wherein nodes represent diseases, and directed edges among the nodes represent hierarchical relations among the diseases;
and calculating the semantic similarity between diseases by using the accumulated sum of the contribution values of the ancestor nodes to the node as the semantic value of the node to obtain a disease semantic similarity matrix.
4. The miRNA-disease association prediction method of claim 1, wherein the similarity-constrained objective function is:
wherein SM represents a new miRNA functional similarity matrix, SD represents a new disease semantic similarity matrix, F represents a new miRNA-disease association matrix, AM represents a miRNA functional similarity matrix, AD represents a disease semantic similarity matrix, and Fi,FjRespectively represent the association vectors of the ith miRNA and the jth miRNA with all diseases.
5. A miRNA-disease association prediction system based on similarity constraints, comprising:
a true association data acquisition module configured to: acquiring a miRNA-disease incidence matrix;
a similarity matrix acquisition module configured to: acquiring a miRNA functional similarity matrix and a disease semantic similarity matrix;
an adaptive learning prediction module configured to: and based on a similarity constrained target function, taking the adjacency matrix, the miRNA function similarity matrix and the disease semantic similarity matrix as training data, and performing adaptive learning to obtain a new miRNA-disease association matrix.
6. The miRNA-disease association prediction system of claim 5, wherein obtaining the miRNA-disease association matrix comprises: and acquiring relation data of miRNA and diseases, and constructing an adjacency matrix.
7. The miRNA-disease association prediction system of claim 5, wherein obtaining a disease semantic similarity matrix comprises:
acquiring disease semantic data, and constructing a directed acyclic graph, wherein nodes represent diseases, and directed edges among the nodes represent hierarchical relations among the diseases;
and calculating the semantic similarity between diseases by using the accumulated sum of the contribution values of the ancestor nodes to the node as the semantic value of the node to obtain a disease semantic similarity matrix.
8. The miRNA-disease association prediction system of claim 5, wherein the similarity-constrained objective function is:
wherein SM represents a new miRNA functional similarity matrix, SD represents a new disease semantic similarity matrix, F represents a new miRNA-disease association matrix, AM represents a miRNA functional similarity matrix, AD represents a disease semantic similarity matrix, and Fi,FjRespectively represent the association vectors of the ith miRNA and the jth miRNA with all diseases.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, implements the miRNA-disease association prediction method of any one of claims 1-4.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the miRNA-disease associated prediction method according to any one of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110730370.0A CN113539479B (en) | 2021-06-29 | 2021-06-29 | Similarity constraint-based miRNA-disease association prediction method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110730370.0A CN113539479B (en) | 2021-06-29 | 2021-06-29 | Similarity constraint-based miRNA-disease association prediction method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113539479A true CN113539479A (en) | 2021-10-22 |
CN113539479B CN113539479B (en) | 2024-05-07 |
Family
ID=78097207
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110730370.0A Active CN113539479B (en) | 2021-06-29 | 2021-06-29 | Similarity constraint-based miRNA-disease association prediction method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113539479B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114093527A (en) * | 2021-12-01 | 2022-02-25 | 中国科学院新疆理化技术研究所 | Drug relocation method and system based on spatial similarity constraint and non-negative matrix factorization |
CN115966252A (en) * | 2023-02-12 | 2023-04-14 | 汤永 | Antiviral drug screening method based on L1norm diagram |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107506608A (en) * | 2017-09-29 | 2017-12-22 | 杭州电子科技大学 | A kind of improved miRNA disease association Forecasting Methodologies based on collaborative filtering |
CN109935332A (en) * | 2019-03-01 | 2019-06-25 | 桂林电子科技大学 | A kind of miRNA- disease association prediction technique based on double random walk models |
CN110400600A (en) * | 2019-08-01 | 2019-11-01 | 枣庄学院 | A kind of disease associated prediction technique of miRNA- based on rotation forest algorithm |
CN110782948A (en) * | 2019-10-18 | 2020-02-11 | 湖南大学 | Method for predicting potential association of miRNA and disease based on constraint probability matrix decomposition method |
CN112183837A (en) * | 2020-09-22 | 2021-01-05 | 曲阜师范大学 | miRNA and disease association relation prediction method based on self-coding model |
-
2021
- 2021-06-29 CN CN202110730370.0A patent/CN113539479B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107506608A (en) * | 2017-09-29 | 2017-12-22 | 杭州电子科技大学 | A kind of improved miRNA disease association Forecasting Methodologies based on collaborative filtering |
CN109935332A (en) * | 2019-03-01 | 2019-06-25 | 桂林电子科技大学 | A kind of miRNA- disease association prediction technique based on double random walk models |
CN110400600A (en) * | 2019-08-01 | 2019-11-01 | 枣庄学院 | A kind of disease associated prediction technique of miRNA- based on rotation forest algorithm |
CN110782948A (en) * | 2019-10-18 | 2020-02-11 | 湖南大学 | Method for predicting potential association of miRNA and disease based on constraint probability matrix decomposition method |
CN112183837A (en) * | 2020-09-22 | 2021-01-05 | 曲阜师范大学 | miRNA and disease association relation prediction method based on self-coding model |
Non-Patent Citations (1)
Title |
---|
李建伟;杜燕波;张山;宋子健;郑希强;: "microRNAs功能相似性计算方法研究进展", 生理科学进展, no. 03 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114093527A (en) * | 2021-12-01 | 2022-02-25 | 中国科学院新疆理化技术研究所 | Drug relocation method and system based on spatial similarity constraint and non-negative matrix factorization |
CN115966252A (en) * | 2023-02-12 | 2023-04-14 | 汤永 | Antiviral drug screening method based on L1norm diagram |
CN115966252B (en) * | 2023-02-12 | 2024-01-19 | 中国人民解放军总医院 | Antiviral drug screening method based on L1norm diagram |
Also Published As
Publication number | Publication date |
---|---|
CN113539479B (en) | 2024-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Albaradei et al. | Machine learning and deep learning methods that use omics data for metastasis prediction | |
Lee et al. | Review of statistical methods for survival analysis using genomic data | |
Pandey et al. | Incorporating functional inter-relationships into protein function prediction algorithms | |
CN113539479B (en) | Similarity constraint-based miRNA-disease association prediction method and system | |
CN113764034B (en) | Method, device, equipment and medium for predicting potential BGC in genome sequence | |
Kaur et al. | Prediction of enhancers in DNA sequence data using a hybrid CNN-DLSTM model | |
CN110556184B (en) | Non-coding RNA and disease relation prediction method based on Hessian regular nonnegative matrix decomposition | |
Yu et al. | Predicting protein complex in protein interaction network-a supervised learning based method | |
CN113299338A (en) | Knowledge graph-based synthetic lethal gene pair prediction method, system, terminal and medium | |
Lei et al. | Relational completion based non-negative matrix factorization for predicting metabolite-disease associations | |
CN115132273A (en) | Method and system for evaluating tumor formation risk and tumor tissue source | |
Teixeira et al. | Learning influential genes on cancer gene expression data with stacked denoising autoencoders | |
CN114999635A (en) | circRNA-disease association relation prediction method based on graph convolution neural network and node2vec | |
CN117912570B (en) | Classification feature determining method and system based on gene co-expression network | |
CN117594243A (en) | Ovarian cancer prognosis prediction method based on cross-modal view association discovery network | |
CN116343927A (en) | miRNA-disease association prediction method based on enhanced hypergraph convolution self-coding algorithm | |
CN116779034A (en) | miRNA and disease association prediction method, equipment and storage medium | |
Patra et al. | Deep learning methods for scientific and industrial research | |
CN110837853A (en) | Rapid classification model construction method | |
Uthayan | A novel microarray gene selection and classification using intelligent dynamic grey wolf optimization | |
Guzzi et al. | A discussion on the biological relevance of clustering results | |
CN118296442B (en) | Multiple-study cancer subtype classification method, system, device, medium and program product | |
WO2023150898A1 (en) | Method for identifying chromatin structural characteristic from hi-c matrix, non-transitory computer readable medium storing program for identifying chromatin structural characteristic from hi-c matrix | |
Xie et al. | Statistical methods for integrating multiple types of high-throughput data | |
Xiang | Statistical Approach for Functionally Validating Transcription Factor Bindings Using Population SNP and Gene Expression Data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |