CN112992347A - lncRNA-disease associated prediction method and system based on Laplace regularization least square and network projection - Google Patents
lncRNA-disease associated prediction method and system based on Laplace regularization least square and network projection Download PDFInfo
- Publication number
- CN112992347A CN112992347A CN202110428827.2A CN202110428827A CN112992347A CN 112992347 A CN112992347 A CN 112992347A CN 202110428827 A CN202110428827 A CN 202110428827A CN 112992347 A CN112992347 A CN 112992347A
- Authority
- CN
- China
- Prior art keywords
- lncrna
- disease
- matrix
- similarity
- comprehensive
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Chemical & Material Sciences (AREA)
- Genetics & Genomics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Analytical Chemistry (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention relates to a lncRNA-disease associated prediction method and system based on Laplace regularization least square and network projection. Compared with the existing prediction method, the method can simultaneously predict the association of all diseases and lncRNA, can be used for predicting isolated diseases and new lncRNA, has the advantages of no need of negative samples and only one parameter, and has higher prediction accuracy.
Description
Technical Field
The invention relates to the technical field of biological information, in particular to a lncRNA-disease associated prediction method and system based on Laplace regularization least square and network projection.
Background
Long non-coding RNA (lncRNA) is a non-coding RNA with a length of more than 200 nucleotides. In recent years, there is a lot of evidence that many lncrnas are closely related to human diseases, and mutation and disorder of lncrnas can cause various diseases including cervical cancer, ovarian cancer and the like, so that identification and prediction of the relationship between lncrnas and diseases can help to explore the pathogenesis of diseases, which also makes identification and confirmation of the correlation between lncrnas and diseases an important subject in the field of biological research in recent years. However, it is very time-consuming and labor-consuming to determine the association between lncRNA and disease through biological experiments, and the use of computer technology to predict potential disease-associated lncRNA can greatly reduce the working strength, thereby saving cost and time.
Disclosure of Invention
The invention aims to provide an lncRNA-disease associated prediction method based on Laplace regularization least square and network projection, which is simple to implement and high in result accuracy.
In order to achieve the above purpose, the lncRNA-disease associated prediction method based on laplacian regularized least squares and network projection in the present invention adopts the following means:
step one, combining the similarity of disease Gaussian nuclear spectrums on the basis of the semantic similarity of diseases to obtain a comprehensive disease similarity matrix; on the basis of lncRNA functional similarity, combining lncRNA Gaussian nuclear spectrum similarity to obtain a comprehensive lncRNA similarity matrix;
step two, implementing a Laplace regularization least square method in the comprehensive disease similarity matrix to obtain a disease prediction score matrix, implementing the Laplace regularization least square method in the comprehensive lncRNA similarity matrix to obtain an lncRNA prediction score matrix, and integrating the disease prediction score matrix and the lncRNA prediction score matrix to obtain an lncRNA and disease association composite prediction score matrix;
and step three, projecting the comprehensive disease similarity matrix on the lncRNA and disease associated composite estimation score matrix to obtain a projection score matrix, projecting the comprehensive lncRNA similarity matrix on the lncRNA and disease associated composite estimation score matrix to obtain another projection score matrix, combining a transpose matrix of the projection score matrix based on the comprehensive disease similarity matrix with the projection score matrix based on the comprehensive lncRNA similarity matrix, and calculating the average value of the transpose matrix to obtain the final lncRNA and disease associated prediction score, thereby obtaining a disease associated lncRNA prediction result.
In the first step, the similarity of the disease gaussian nuclear spectrum is expressed as:
wherein the content of the first and second substances,for diseaseAnd diseaseGaussian nuclear spectrum similarity between;correlating lncRNA matrices for known diseasesIn the ith column of the disease in (1),is a matrixColumn j of the disease; parameter(s)For controllingThe bandwidth of the kernel of (a),calculated by the following formula:
further, in step one, lncRNA gaussian nuclear spectrum similarity is expressed as:
wherein the content of the first and second substances,is lncRNAAnd lncRNAGaussian nuclear spectrum similarity between;is a matrixIn column i of the middle lncRNA,is a matrixColumn j of middle lncRNA; parameter(s)For controllingThe bandwidth of the kernel of (a),calculated by the following formula:
furthermore, in the step one, a comprehensive disease similarity matrix is obtained by combining the similarity of the Gaussian nuclear spectrum of the disease on the basis of the semantic similarity of the diseaseComprises the following steps:
on the basis of lncRNA functional similarity, lncRNA Gaussian nuclear spectrum similarity is combined to obtain a comprehensive lncRNA similarity matrixComprises the following steps:
in addition, in the second step, Laplace regularization least square method is implemented in the comprehensive disease similarity matrix to obtain a disease estimation scoring matrix:
Wherein, the disease estimation scoring matrix is obtained by solving the optimization problem of the following formula:
Is a diagonal matrix;is composed ofThe sum of all elements of row i of (1);is a balance parameter;is Frobenius norm.
Further, in the second step, a Laplace regularization least square method is implemented in the integrated lncRNA similarity matrix to obtain an lncRNA estimation score matrix:
Wherein, the lncRNA estimation scoring matrix is obtained by solving the optimization problem of the following formula:
Is a diagonal matrix;is composed ofThe sum of all elements of row i of (1);is a balance parameter;is Frobenius norm.
Further, in the second step, the disease prediction score matrix and the lncRNA prediction score matrix are integrated in the following way to obtain a lncRNA and disease associated composite prediction score matrix:
In addition, in the third step, the comprehensive disease similarity matrix is projected on the lncRNA and disease association composite estimation score matrix to obtain a projection score matrix based on the comprehensive disease similarity matrixComprises the following steps:
projecting the comprehensive lncRNA similarity matrix on the lncRNA and disease association composite estimation score matrix to obtain a projection score matrix based on the comprehensive lncRNA similarity matrixComprises the following steps:
Further, in the third step, the transpose matrix of the projection score matrix based on the integrated disease similarity matrix is combined with the projection score matrix based on the integrated lncRNA similarity matrix, and the average value is calculated to obtain the final lncRNA and disease association prediction scoreComprises the following steps:
Finally, the invention also relates to an lncRNA-disease associated prediction system based on laplacian regularized least squares and network projections, comprising:
the data preparation unit is used for constructing a comprehensive disease similarity matrix according to the disease semantic similarity and the disease Gaussian nuclear spectrum similarity; constructing a comprehensive lncRNA similarity matrix according to the lncRNA functional similarity and lncRNA Gaussian nuclear spectrum similarity;
the lncRNA and disease association score estimation unit is used for implementing a Laplace regularization least square method in the comprehensive disease similarity matrix and the comprehensive lncRNA similarity matrix constructed by the data preparation unit to construct an lncRNA and disease association composite estimation score matrix;
the lncRNA and disease association score refining unit is used for projecting the comprehensive disease similarity matrix and the comprehensive lncRNA similarity matrix constructed by the data preparation unit on the lncRNA and disease association composite estimation score matrix constructed by the lncRNA and disease association score estimation unit respectively, and fusing two projection scores to obtain a disease association lncRNA prediction result;
the lncRNA-disease association prediction system based on the Laplace regularization least square and the network projection predicts the association between lncRNA and diseases according to the prediction method.
Most of the existing prediction methods measure similarity between diseases and functional similarity between lncRNA by using semantic similarity and lncRNA function similarity of the diseases, the similarity between a plurality of diseases and the functional similarity between lncRNA are zero due to data loss, so that the accuracy of a prediction result is influenced, a large amount of associated negative sample data must be provided as a support to ensure the accuracy of the prediction result, and the selection of the negative sample is very difficult. Different from the existing disease-associated lncRNA prediction method, the invention firstly utilizes the disease Gaussian nuclear spectrum similarity and the disease semantic similarity to construct a comprehensive disease similarity matrix, utilizes the lncRNA Gaussian nuclear spectrum similarity and the lncRNA functional similarity to construct a comprehensive lncRNA similarity matrix, compensates the judgment inaccuracy of the disease semantic similarity and the lncRNA functional similarity by citing the disease Gaussian nuclear spectrum similarity and the lncRNA Gaussian nuclear spectrum similarity, and more accurately describes the similarity among diseases and the functional similarity among lncRNA; and then respectively implementing a Laplace regularization least square method in the comprehensive disease similarity matrix and the comprehensive lncRNA similarity matrix, relieving the known lncRNA-disease associated data sparseness problem through the step, then integrating two estimated score matrixes obtained by implementing the Laplace regularization least square method to obtain an lncRNA and disease associated composite estimated score matrix, and then respectively projecting the comprehensive disease similarity matrix and the comprehensive lncRNA similarity matrix on the lncRNA and disease associated composite estimated score matrix by combining a network projection method to finally obtain an lncRNA and disease associated prediction result. Compared with the existing prediction method, the lncRNA-disease association prediction method based on Laplace regularization least square and network projection is a global prediction method, can predict the association of all diseases and lncRNA at the same time, can be used for predicting isolated diseases and new lncRNA, has the advantages of no need of negative samples and only one parameter, has higher prediction accuracy on unknown lncRNA-disease interaction prediction, and has stronger generalization capability.
Description of the drawings:
fig. 1 is a graph of the results of leave-one-out cross-validation (LOOCV) experiments performed on datasets 1 and 2 using the lncRNA-disease association prediction method based on laplace regularized least squares and network projections, as described in the examples.
FIG. 2 is a graph comparing ROC curves and AUC values in data set1 for the lncRNA-disease association prediction method based on Laplace regularization least squares and network projection and two other prior methods involved in the example.
Fig. 3 is a graph comparing ROC curves and AUC values in data set2 for the lncRNA-disease association prediction method based on laplace regularized least squares and network projection and two other prior methods involved in the example.
Fig. 4 is a graph of ROC curves and AUC values for isolated diseases and new lncrnas prediction in data set1 and data set2 by the lncRNA-disease association prediction method based on laplace regularization least squares and network projection involved in the example.
Detailed Description
For the understanding of those skilled in the art, the present invention will be further described with reference to the following examples and drawings, which are not intended to limit the present invention.
In this embodiment, the lncRNA-disease associated prediction method based on laplacian regularized least squares and network projection mainly includes the following steps:
firstly, preparing data: obtaining a comprehensive disease similarity matrix by combining the similarity of the disease Gaussian nuclear spectrum on the basis of the semantic similarity of the disease; and combining lncRNA Gaussian nuclear spectrum similarity to obtain a comprehensive lncRNA similarity matrix on the basis of lncRNA functional similarity.
1.1 lncRNA-disease association: acquiring two databases of 2013 and 2015 from an lncrnadeisense database for recording correlation between lncRNA and human diseases, processing, and extracting 156 lncRNA, 190 diseases and 352 known experimentally-verified lncRNA-disease correlation from the 2013 database to obtain a data set 1; 285 lncrnas, 226 diseases, 621 known experimentally verified lncRNA-disease associations were extracted from the 2015 database as dataset 2; wherein, in data set1 and data set2, both use matrix、Representing the lncRNA pool, all using the matrix、Representing a set of diseases, all using a Boolean matrixRepresents the lncRNA-disease association set, if lncRNA nodeNode of diseaseThere is an experimentally verified association, thenSet to 1, otherwise set to 0.
1.2 semantic similarity of diseases: in the prior art, each disease corresponds to a DAG (directed acyclic graph) in MeSH (medical topic vocabulary), semantic similarity between diseases can be measured according to the DAG graphs of the two diseases, and if more disease items are shared by the two diseases, the similarity between the two diseases is larger. Since the method for calculating semantic similarity belongs to the prior art, it is not expanded and described herein.
1.3 lncRNA functional similarity: in the prior art, lncRNA functional similarity is usually calculated by using semantic similarity of diseases and known correlation of disease-lncRNA, and the steps are as follows:
1) selecting any two lncRNA as lncRNAAnd lncRNAThe disease sets associated with these two lncrnas are represented as:;
2)for a given diseaseWith a given set of diseasesThe correlation score of (2) is calculated as follows:
this example uses the above method to calculate the functional similarity between lncRNAs and uses the matrixThe functional similarity between lncrnas is shown, and since the above method for calculating the functional similarity between lncrnas belongs to the prior art, it is not expanded and described herein.
1.4 disease gaussian nuclear spectrum similarity to lncRNA gaussian nuclear spectrum similarity: considering that when the semantic similarity of diseases is used to measure the similarity between diseases, because of data loss, the semantic similarity between many diseases is zero, which affects the accuracy of the prediction result, the gaussian kernel spectrum similarity of diseases is introduced in this embodiment to balance the above problems:
wherein the content of the first and second substances,for diseaseAnd diseaseGaussian nuclear spectrum similarity between;correlating lncRNA matrices for known diseasesIn the ith column of the disease in (1),is a matrixColumn j of the disease; parameter(s)For controllingThe bandwidth of the kernel of (a),calculated by the following formula:
similarly, lncRNA gaussian nuclear spectrum similarity was calculated as follows:
wherein the content of the first and second substances,is lncRNAAnd lncRNAGaussian nuclear spectrum similarity between;is a matrixIn column i of the middle lncRNA,is a matrixColumn j of middle lncRNA; parameter(s)For controllingThe bandwidth of the kernel of (a),calculated by the following formula:
1.5 constructing a comprehensive disease similarity matrix and a comprehensive lncRNA similarity matrix: integrating the semantic similarity of diseases and the Gaussian nuclear spectrum similarity of diseases to obtain a comprehensive disease similarity matrixIntegrating lncRNA functional similarity and lncRNA Gaussian nuclear spectrum similarity to obtain a comprehensive lncRNA similarity matrix:
II, estimating the correlation score of lncRNA and diseases: in order to solve the problem of sparse known lncRNA-disease associated network nodes, a Laplace regularization least square method is implemented in a comprehensive disease similarity matrix to obtain a disease estimation score matrix, the Laplace regularization least square method is implemented in the comprehensive lncRNA similarity matrix to obtain an lncRNA estimation score matrix, and then the disease estimation score matrix and the lncRNA estimation score matrix are integrated to obtain an lncRNA and disease associated composite estimation score matrix.
2.1 the integrated lncRNA similarity matrix implements Laplace regularization least squares: performing priority ordering on lncRNA-disease interaction by using Laplace regularization least square method in the comprehensive lncRNA similarity matrix to obtain an lncRNA estimation scoring matrix:
Wherein, the lncRNA estimation scoring matrix is obtained by solving the optimization problem of the following formula:
Is a diagonal matrix;is composed ofThe sum of all elements of row i of (1);to balance the parameters, in this embodimentTake a value of;Is Frobenius norm.
2.2 implementing Laplace regularization least squares method in the integrated disease similarity matrix: similar to 2.1, the lncRNA-disease interaction is prioritized in the comprehensive disease similarity matrix by using Laplace regularization least square method to obtain a disease estimation scoring matrix:
Wherein, the disease estimation scoring matrix is obtained by solving the optimization problem of the following formula:
Is a diagonal matrix;is composed ofThe sum of all elements of row i of (1);to balance the parameters, in this embodimentValue andsame, also is;Is Frobenius norm.
2.3 integration of two estimated score matrices: integrating the disease prediction score matrix and the lncRNA prediction score matrix to obtain a lncRNA and disease association composite prediction score matrix:
Thirdly, refining lncRNA and disease association score: projecting the comprehensive disease similarity matrix on the lncRNA and disease associated composite estimation score matrix to obtain a projection score matrix, projecting the comprehensive lncRNA similarity matrix on the lncRNA and disease associated composite estimation score matrix to obtain another projection score matrix, combining a transpose matrix of the projection score matrix based on the comprehensive disease similarity matrix with the projection score matrix based on the comprehensive lncRNA similarity matrix, and calculating the average value of the transpose matrix and the projection score matrix to obtain the final lncRNA and disease associated prediction score, thereby obtaining a disease associated lncRNA prediction result.
3.1 network projection: after the lncRNA-disease-related estimated score is obtained by using the laplace regularization least square method, a projection score is obtained by network projection.
Firstly, projecting the comprehensive lncRNA similarity matrix on an lncRNA and disease associated composite estimation score matrix to obtain a projection score matrix based on the comprehensive lncRNA similarity matrix:
then projecting the comprehensive disease similarity matrix on the lncRNA and disease association composite estimation score matrix to obtain a projection score matrix based on the comprehensive disease similarity matrix:
3.2 fusion projection score: finally, combining the transpose matrix of the projection score matrix based on the comprehensive disease similarity matrix with the projection score matrix based on the comprehensive lncRNA similarity matrix, and calculating the average value of the transpose matrix and the projection score matrix to obtain the final lncRNA and disease association prediction scoreAnd obtaining a prediction result:
Fourthly, evaluation test: the performance of the prediction method described above (in the examples described above, the prediction method is referred to as "LRLSNP" below) was evaluated using leave-one-out-of-cross validation (LOOCV), and specifically, each pair of lncRNA-disease associations was used as a test sample, and the remaining lncRNA-disease associations were used as training samples for model training in sequence until each pair of lncRNA-disease associations was tested once as a test sample. The performance index of the evaluation adopts an ROC curve and an AUC value. The ROC curve, also called the receiver operating characteristic curve or Sensitivity curve, is a comprehensive index reflecting Sensitivity (Sensitivity) and Specificity (Specificity). The area under the ROC curve line is the AUC, the more convex the ROC curve is, the closer the ROC curve is to the upper left corner, the larger the AUC value is, and the better the prediction performance is.
It should be noted that the present embodiment performs leave-one-out cross validation (LOOCV) experiments on data set1 and data set2, respectively, and the above prediction method includesAndtwo balance parameters, which may be set for simplicityAndthe values of (a) are set to be the same. To obtain the optimal parameters, the balance parameters may be adjustedAndis taken fromIs gradually increased toAnd AUC values were calculated, respectively. In two dataThe results of the LOOCV experiments performed on the set are shown in FIG. 2, where the dataset1 curve represents the change in AUC values on data set1, the dataset2 curve represents the change in AUC values on data set2, and the change trends in AUC values on both data sets are nearly the same. As can be seen from FIG. 2, when the parameters are balancedAndis taken fromIs increased toThe AUC values remained almost unchanged; when balancing the parametersAndis taken fromIs increased toThe AUC values decreased slightly with time; when balancing the parametersAndis taken fromIs increased toAUC values decreased significantly; when it is flatBalance parameterAndis taken fromIs increased toThe AUC values slightly changed. Thus, the balance parameters of the two data sets can be adjustedAndare all set as。
4.1 Performance comparison with other methods: the two methods of IIRWR and LDAI-ISPS in the prior art are selected to carry out comparison tests with LRLSNP. LOOCV was deployed on datasets to evaluate their predicted performance for three methods, IIRWR, LDAI-ISPS, and LRLSNP, respectively. IIRWR, LDAI-ISPS and LRLSNP are all set according to optimal parameters. Figures 3 and 4 show ROC curves and AUC values for the three methods for performing the LOOCV experiments in data set1 and data set2, respectively. In data set1, the AUC of LRLSNP was 0.9446, while the AUC of IIRWR and LDAI-ISPS were 0.7883 and 0.9154, respectively; in data set2, the AUC for LRLSNP was 0.9386, while the AUC for IIRWR and LDAI-ISP were 0.8230 and 0.8341, respectively. Clearly, LRLSNP showed the best prediction performance.
4.2 isolated diseases and New lncRNA prediction: an isolated disease refers to a disease in which the information associated with lncrnas is completely unknown. To mimic an isolated disease, the known associations of the disease to be queried to all lncrnas were removed. In cross validation in data set1 and data set2, one disease was modeled as an isolated disease at a time, and then LRLSNP was performed with the remaining known information for prediction, so that each disease was predicted once as a test sample. The prediction results were evaluated using ROC curves and AUC values, and the prediction results are shown in fig. 4, where AUC values in data set1 and data set2 are 0.8688 and 0.8865, respectively.
In recent years, more and more new lncrnas are discovered, but the relationship with diseases is mostly unknown, and great challenges are provided for prediction algorithms. Many existing prediction methods cannot well solve the problems, in order to verify the effectiveness of LRLSNP in correlation prediction of new lncRNA and diseases, correlation information of lncRNA to be predicted in data sets 1 and 2 and all diseases is also removed, then LRLSNP is implemented for prediction, the prediction result is shown in FIG. 4, for prediction of new lncRNA, AUC values in data sets 1 and 2 respectively reach 0.8335 and 0.8078, which shows that LRLSNP has a generalization capability of predicting lncRNA without any known correlation, and therefore LRLSNP also has good performance for correlation prediction of new lncRNA and diseases.
4.3 case analysis: to further evaluate the effect of LRLSNP on prediction of potential lncRNA-disease association, two diseases, ovarian cancer (ovarian cancer) and cervical cancer (cervical cancer), were selected below for case analysis, and the experimental sample was selected as data set 2.
Using known data, experiments were performed for ovarian cancer using LRLSNP. Of the first 5 ovarian cancer-associated lncRNAs predicted by LRLSNP, 4 lncRNAs can find supporting evidence from the LncRNADISEASE database, and the top 5 ovarian cancer-associated lncRNAs predicted by LRLSNP are shown in Table 1 below, where only HOST2 is not certified by this database, but evidence of "Sinomenine hydrochloride impurities or monomers in ovarian cancer Cells by inhibition of their expression" hanging non-coding RNA HOST2 expression "is mentioned by evidence of the fact that although the first 5 ovarian cancer-associated lncRNAs predicted by LRLSNP are not certified by this database, the Nanomedicine and Biotechnology { arthritis Cells, Nanomedicine, and Biotechnology 2019, 47: 4131. 4138}, Sinomenine hydrochloride impurities activity or monomers in ovarian cancer Cells by inhibition of their expression of their own non-coding RNA HOST2 expression". For cervical cancer, the 5 th cervical cancer-associated lncrnas predicted using LRLSNP are shown in table 1 below, and all of the 5 th cervical cancer-associated lncrnas predicted can find supporting evidence from lncrnodisease database.
To evaluate the predictive performance of LRLSNP for isolated diseases, lncRNA associations known to be associated with the identified disease were deleted, an operation that ensured that only lncRNA information associated with and similarity to the identified disease and other diseases was utilized. For ovarian cancer, LRLSNP was used to predict the association of potential lncRNA with ovarian cancer, and the top 5 lncRNA candidates predicted to be associated with ovarian cancer under conditions that deleted all known associations of ovarian cancer with lncRNA are shown in table 2 below, and all of the top 5 lncRNA predicted can find supporting evidence in lncrnodisease database. For cervical cancer, the LRLSNP is used to predict the association of potential lncrnas with cervical cancer, and the top 5 lncRNA candidates associated with cervical cancer predicted by LRLSNP under the condition of deleting all known associations of cervical cancer with lncRNA are shown in table 2 below, and all of the top 5 lncrnas predicted can find supporting evidence in lncrnodisease database.
TABLE 1
Disease | lncRNA name | Evidences | RANK | |
ovarian | HOTAIR | LncRNADisease | 1 | |
ovarian | MALAT1 | LncRNADisease | 2 | |
ovarian cancer | MEG3 | LncRNADisease | 3 | |
ovarian cancer | HOST2 | 4 | ||
ovarian cancer | CDKN2B- | LncRNADisease | 5 | |
cervical | MEG3 | LncRNADisease | 1 | |
cervical | PVT1 | LncRNADisease | 2 | |
cervical cancer | CDKN2B-AS1 | LncRNADisease | 3 | |
cervical cancer | LSINCT5 | LncRNADisease | 4 | |
cervical | GAS5 | LncRNADisease | 5 |
TABLE 2
Disease | lncRNA name | Evidences | RANK | |
ovarian | H19 | LncRNADisease | 1 | |
ovarian | DNM3OS | LncRNADisease | 2 | |
ovarian cancer | CDKN2B-AS1 | LncRNADisease | 3 | |
ovarian cancer | MALAT1 | LncRNADisease | 4 | |
ovarian | HOTAIR | LncRNADisease | 5 | |
cervical | H19 | LncRNADisease | 1 | |
cervical | TUSC8 | LncRNADisease | 2 | |
cervical cancer | CDKN2B-AS1 | LncRNADisease | 3 | |
cervical cancer | MALAT1 | LncRNADisease | 4 | |
cervical | HOTAIR | LncRNADisease | 5 |
In conclusion, LRLSNP not only has higher performance in predicting unknown lncRNA-disease interaction, but also can effectively predict isolated diseases and new lncRNA. By comparing the performance with two relatively advanced prediction methods (IIRWR and LDAI-ISPS) in the prior art, in the data set1, AUC values of LRLSNP, IIRWR and LDAI-ISPS are 0.9446, 0.7883 and 0.9154 respectively; in data set2, the AUC values for LRLSNP, IIRWR, and LDAI-ISP were 0.9386, 0.8230, and 0.8341, respectively. The prediction results of the LRLSNP are all superior to those of other methods, and the accuracy of the prediction results is high. In addition, when evaluating the predicted performance of LRLSNP for isolated diseases and new lncrnas, cross validation was performed for each disease (lncRNA) under the condition that each disease (lncRNA) was individually modeled as an isolated disease (new lncRNA), whose AUC values were 0.8688 and 0.8335 in data set1, respectively; in the data set2, the AUC values are 0.8865 and 0.8078, respectively, which shows that LRLSNP has a good prediction effect on isolated diseases and prediction of new lncRNA, and has a strong generalization ability. In general, the LRLSNP is simple to realize, can be used for prediction of isolated diseases and new lncRNA, has strong interpretability, has few parameters, only has one parameter, can predict with only few resources, and can be used as a powerful auxiliary tool for biological experiments.
Based on the LRLSNP prediction method, in the last embodiment, there is provided an lncRNA-disease correlation prediction system based on laplacian regularized least squares and network projection, where the prediction system predicts the correlation between lncRNA and disease according to the LRLSNP prediction method, and specifically, the LRLSNP prediction method at least includes:
the data preparation unit is used for constructing a comprehensive disease similarity matrix according to the disease semantic similarity and the disease Gaussian nuclear spectrum similarity; constructing a comprehensive lncRNA similarity matrix according to the lncRNA functional similarity and lncRNA Gaussian nuclear spectrum similarity;
the lncRNA and disease association score estimation unit is used for implementing a Laplace regularization least square method in the comprehensive disease similarity matrix and the comprehensive lncRNA similarity matrix constructed by the data preparation unit to construct an lncRNA and disease association composite estimation score matrix;
and the lncRNA and disease association score refining unit is used for projecting the comprehensive disease similarity matrix and the comprehensive lncRNA similarity matrix constructed by the data preparation unit on the lncRNA and disease association composite estimation score matrix constructed by the lncRNA and disease association score estimation unit respectively, and fusing the two projection scores to obtain a disease association lncRNA prediction result.
It should be noted that the disease-associated lncRNA prediction system can be packaged in a portable storage medium to operate, and can also be stored in a cloud end to operate online; the process of implementing the prediction of the disease-associated lncRNA may be executed by a computer capable of running the prediction system, or may be executed by a server located in the cloud.
The above embodiments are preferred implementations of the present invention, and the present invention can be implemented in other ways without departing from the spirit of the present invention.
Finally, it should be emphasized that some of the descriptions of the present invention have been simplified to facilitate the understanding of the improvements of the present invention over the prior art by those of ordinary skill in the art, and that other elements have been omitted from this document for the sake of clarity, and those of ordinary skill in the art will recognize that such omitted elements may also constitute the subject matter of the present invention.
Claims (10)
1. The lncRNA-disease associated prediction method based on Laplace regularization least square and network projection is characterized by comprising the following steps of:
step one, combining the similarity of disease Gaussian nuclear spectrums on the basis of the semantic similarity of diseases to obtain a comprehensive disease similarity matrix; on the basis of lncRNA functional similarity, combining lncRNA Gaussian nuclear spectrum similarity to obtain a comprehensive lncRNA similarity matrix;
step two, implementing a Laplace regularization least square method in the comprehensive disease similarity matrix to obtain a disease prediction score matrix, implementing the Laplace regularization least square method in the comprehensive lncRNA similarity matrix to obtain an lncRNA prediction score matrix, and integrating the disease prediction score matrix and the lncRNA prediction score matrix to obtain an lncRNA and disease association composite prediction score matrix;
and step three, projecting the comprehensive disease similarity matrix on the lncRNA and disease associated composite estimation score matrix to obtain a projection score matrix, projecting the comprehensive lncRNA similarity matrix on the lncRNA and disease associated composite estimation score matrix to obtain another projection score matrix, combining a transpose matrix of the projection score matrix based on the comprehensive disease similarity matrix with the projection score matrix based on the comprehensive lncRNA similarity matrix, and calculating the average value of the transpose matrix to obtain the final lncRNA and disease associated prediction score, thereby obtaining a disease associated lncRNA prediction result.
2. The lncRNA-disease associated prediction method based on laplacian regularized least squares and network projection as claimed in claim 1, wherein in the first step, the gaussian nuclear spectrum similarity of the disease is expressed as:
wherein the content of the first and second substances,for diseaseAnd diseaseGaussian nuclear spectrum similarity between;correlating lncRNA matrices for known diseasesIn the ith column of the disease in (1),is a matrixColumn j of the disease; parameter(s)For controllingThe bandwidth of the kernel of (a),calculated by the following formula:
3. the lncRNA-disease associated prediction method based on laplacian regularized least squares and network projection as claimed in claim 2, wherein in the first step, lncRNA gaussian nuclear spectrum similarity is expressed as:
wherein the content of the first and second substances,is lncRNAAnd lncRNAGaussian nuclear spectrum similarity between;is a matrixIn column i of the middle lncRNA,is a matrixColumn j of middle lncRNA; parameter(s)For controllingThe bandwidth of the kernel of (a),calculated by the following formula:
4. the lncRNA-disease associated prediction method based on Laplace regularization least squares and network projection as claimed in claim 3, wherein, in the first step,
on the basis of the semantic similarity of the diseases, the Gaussian nuclear spectrum similarity of the diseases is combined to obtain a comprehensive disease similarity matrixComprises the following steps:
on the basis of lncRNA functional similarity, lncRNA Gaussian nuclear spectrum similarity is combined to obtain a comprehensive lncRNA similarity matrixComprises the following steps:
5. laplace regularized least squares and network projection based lncRNA-disease as claimed in claim 4The disease correlation prediction method is characterized in that in the second step, a Laplace regularization least square method is implemented in the comprehensive disease similarity matrix to obtain a disease prediction score matrix:
Wherein, the disease estimation scoring matrix is obtained by solving the optimization problem of the following formula:
6. The lncRNA-disease associated prediction method based on Laplace regularization least square and network projection as claimed in claim 5, wherein in step twoAnd implementing Laplace regularization least square method in the comprehensive lncRNA similarity matrix to obtain an lncRNA estimation scoring matrix:
Wherein, the lncRNA estimation scoring matrix is obtained by solving the optimization problem of the following formula:
7. The lncRNA-disease associated prediction method based on Laplace regularization least square and network projection as claimed in claim 6, wherein in the second step, the disease prediction score matrix and lncRNA pre-prediction are integrated as followsEstimating the score matrix to obtain a composite estimation score matrix of lncRNA and disease association:
8. The lncRNA-disease associated prediction method based on laplacian regularized least squares and network projection as claimed in claim 7, characterized by the following steps:
projecting the comprehensive disease similarity matrix on the lncRNA and disease association composite estimation score matrix to obtain a projection score matrix based on the comprehensive disease similarity matrixComprises the following steps:
projecting the integrated lncRNA similarity matrix on the lncRNA and disease association composite estimation scoring matrix to obtain a projection based on the integrated lncRNA similarity matrixScore matrixComprises the following steps:
9. The lncRNA-disease associated prediction method based on Laplace regularization least square and network projection as claimed in claim 8, wherein in the third step, the transpose matrix of the projection score matrix based on the synthesized disease similarity matrix is combined with the projection score matrix based on the synthesized lncRNA similarity matrix, and the average value is calculated to obtain the final lncRNA-disease associated prediction scoreComprises the following steps:
10. An lncRNA-disease associated prediction system based on Laplace regularization least squares and network projection is characterized by comprising:
the data preparation unit is used for constructing a comprehensive disease similarity matrix according to the disease semantic similarity and the disease Gaussian nuclear spectrum similarity; constructing a comprehensive lncRNA similarity matrix according to the lncRNA functional similarity and lncRNA Gaussian nuclear spectrum similarity;
the lncRNA and disease association score estimation unit is used for implementing a Laplace regularization least square method in the comprehensive disease similarity matrix and the comprehensive lncRNA similarity matrix constructed by the data preparation unit to construct an lncRNA and disease association composite estimation score matrix;
the lncRNA and disease association score refining unit is used for projecting the comprehensive disease similarity matrix and the comprehensive lncRNA similarity matrix constructed by the data preparation unit on the lncRNA and disease association composite estimation score matrix constructed by the lncRNA and disease association score estimation unit respectively, and fusing two projection scores to obtain a disease association lncRNA prediction result;
the lncRNA-disease association prediction system based on laplacian regularized least squares and network projection predicts the association between lncRNA and disease according to the prediction method of any one of claims 2 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110428827.2A CN112992347A (en) | 2021-04-21 | 2021-04-21 | lncRNA-disease associated prediction method and system based on Laplace regularization least square and network projection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110428827.2A CN112992347A (en) | 2021-04-21 | 2021-04-21 | lncRNA-disease associated prediction method and system based on Laplace regularization least square and network projection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112992347A true CN112992347A (en) | 2021-06-18 |
Family
ID=76341492
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110428827.2A Pending CN112992347A (en) | 2021-04-21 | 2021-04-21 | lncRNA-disease associated prediction method and system based on Laplace regularization least square and network projection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112992347A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113539372A (en) * | 2021-06-27 | 2021-10-22 | 中南林业科技大学 | Efficient prediction method for LncRNA and disease association relation |
-
2021
- 2021-04-21 CN CN202110428827.2A patent/CN112992347A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113539372A (en) * | 2021-06-27 | 2021-10-22 | 中南林业科技大学 | Efficient prediction method for LncRNA and disease association relation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109243538B (en) | Method and system for predicting association relation between disease and LncRNA | |
Mohammadi et al. | Bayesian structure learning in sparse Gaussian graphical models | |
CN112464638B (en) | Text clustering method based on improved spectral clustering algorithm | |
CA3096678A1 (en) | Multi-assay prediction model for cancer detection | |
Hu et al. | Improving one-shot NAS with shrinking-and-expanding supernet | |
CN107577924B (en) | Long-chain non-coding RNA subcellular position prediction method based on deep learning | |
Hanczar et al. | Ensemble methods for biclustering tasks | |
CN108681659B (en) | Method for predicting protein complex based on sample data | |
Zhang et al. | Protein complex prediction in large ontology attributed protein-protein interaction networks | |
CN110674865B (en) | Rule learning classifier integration method oriented to software defect class distribution unbalance | |
CN110688479B (en) | Evaluation method and sequencing network for generating abstract | |
WO2019196208A1 (en) | Text sentiment analysis method, readable storage medium, terminal device, and apparatus | |
Yu et al. | Predicting protein complex in protein interaction network-a supervised learning based method | |
CN114496092A (en) | miRNA and disease association relation prediction method based on graph convolution network | |
CN116741397B (en) | Cancer typing method, system and storage medium based on multi-group data fusion | |
Zhou et al. | Personal credit default prediction model based on convolution neural network | |
CN112992347A (en) | lncRNA-disease associated prediction method and system based on Laplace regularization least square and network projection | |
Schaid et al. | Penalized models for analysis of multiple mediators | |
Vengatesan et al. | The performance analysis of microarray data using occurrence clustering | |
Bolón-Canedo et al. | Exploring the consequences of distributed feature selection in DNA microarray data | |
CN111584010B (en) | Key protein identification method based on capsule neural network and ensemble learning | |
CN110987751B (en) | Quantitative grading evaluation method for pore throat of compact reservoir in three-dimensional space | |
CN112885405A (en) | Prediction method and system of disease-associated miRNA | |
Liu et al. | An improved method for multi-objective clustering ensemble algorithm | |
CN115907775A (en) | Personal credit assessment rating method based on deep learning and application thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |