CN116052873B - Disease-metabolite association prediction system based on weight k-nearest neighbor - Google Patents

Disease-metabolite association prediction system based on weight k-nearest neighbor Download PDF

Info

Publication number
CN116052873B
CN116052873B CN202310059889.XA CN202310059889A CN116052873B CN 116052873 B CN116052873 B CN 116052873B CN 202310059889 A CN202310059889 A CN 202310059889A CN 116052873 B CN116052873 B CN 116052873B
Authority
CN
China
Prior art keywords
matrix
disease
similarity
metabolite
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310059889.XA
Other languages
Chinese (zh)
Other versions
CN116052873A (en
Inventor
王波
王鑫炜
刘明
杜晓昕
李敬有
廉佐政
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qiqihar University
Original Assignee
Qiqihar University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qiqihar University filed Critical Qiqihar University
Priority to CN202310059889.XA priority Critical patent/CN116052873B/en
Publication of CN116052873A publication Critical patent/CN116052873A/en
Application granted granted Critical
Publication of CN116052873B publication Critical patent/CN116052873B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A disease-metabolite association prediction system based on weight k-nearest neighbor relates to the technical field of bioinformatics. The invention aims to solve the problem that the existing method for acquiring the relationship between the metabolite and the disease has low prediction efficiency. The invention comprises the following steps: acquiring a Jaccard similarity matrix between diseases and metabolites, and a self-adaptive spectral clustering similarity matrix and a Cosine similarity matrix; obtaining a first disease similarity fusion matrix and a first metabolite similarity fusion matrix by using a Jaccard similarity matrix and an adaptive spectral clustering similarity matrix between the disease and the metabolites; obtaining a second disease similar fusion matrix and a second metabolite similar fusion matrix by fusion of a Cosine similarity matrix and a first disease similar fusion matrix between metabolites and diseases; constructing a disease-metabolite first network; constructing a final prediction score matrix; a predictive score for the disease and metabolite of the relationship to be predicted is obtained. The invention is useful for predicting the association between a disease and a metabolite.

Description

Disease-metabolite association prediction system based on weight k-nearest neighbor
Technical Field
The invention relates to the technical field of bioinformatics, in particular to a disease-metabolite association prediction system based on a weight k-nearest neighbor.
Background
During long-term evolution, biological organisms interact with the surrounding environment, a process of absorbing and rejecting substances and energy, known as metabolism. It acts as an important vital activity of the organism, playing a vital role in the process of substance and energy variation. More and more biological and medical experiments have shown that certain metabolite concentrations in some patients differ from those in healthy individuals. Deoxycholic acid is a secondary bile acid produced by the liver and is recirculated through the liver, bile duct, small intestine and portal vein to form the enterohepatic circuit. At physiological pH values, they are strongly toxic in the form of anions, and therefore a carrier is required for transport across intestinal and hepatic tissue membranes. When the deoxycholate content is sufficiently high, it can act as a hepatotoxin, a metabolic toxin and a tumor metabolite. Hepatotoxins can cause damage to the liver or hepatocytes. When at high levels for long periods of time, it can promote tumor growth and survival. In addition to being associated with liver disease, long-term high levels of deoxycholic acid are also associated with a variety of cancers, such as colon cancer, breast cancer and many other cancers of the gastrointestinal tract. Furthermore, the pathogenesis of cardiovascular and cerebrovascular diseases and some immune diseases has also been shown to be related to metabolites. Therefore, diagnosis of metabolic-based diseases is an important judgment in medical diagnosis.
The existing method for acquiring the relationship between the metabolite and the disease is mainly realized by proposing a mode of carrying out biological experiments, however, the biological experiments not only waste a lot of human resources, but also require a lot of time, thereby causing the problem of low prediction efficiency of the existing method for acquiring the relationship between the metabolite and the disease.
Disclosure of Invention
The invention aims to solve the problem that the existing method for acquiring the relation between the metabolite and the disease is low in prediction efficiency, and provides a disease-metabolite association prediction system based on weight k-nearest neighbor.
A disease-metabolite association prediction system based on weight k-nearest neighbor comprising: the system comprises a disease-metabolite correlation adjacency matrix acquisition module, a Jaccard similarity acquisition module, an adaptive spectral clustering similarity acquisition module, a first similarity fusion matrix acquisition module, a Cosine similarity acquisition module, a second similarity fusion matrix acquisition module, a disease-metabolite first network construction module, a final prediction score matrix construction module and a correlation acquisition module;
the disease-metabolite correlation adjacency matrix acquisition module is used for constructing an original disease-metabolite correlation bipartite network according to the known disease-metabolite correlation relationship and establishing a correlation adjacency matrix Y by utilizing the original disease-metabolite correlation bipartite network DM
The Jaccard similarity acquisition module is used for acquiring a correlation adjacency matrix Y DM Acquiring a Jaccard similarity matrix DJ between diseases and a Jaccard similarity matrix MJ between metabolites;
the self-adaptive spectral clustering similarity acquisition module is used for acquiring a correlation adjacency matrix Y according to the correlation adjacency matrix Y DM Acquiring an adaptive spectral clustering similarity matrix DS between diseases and an adaptive spectral clustering similarity matrix MS between metabolites;
the first similarity fusion matrix acquisition module is configured to fuse the Jaccard similarity matrix DJ between diseases with the adaptive spectral clustering similarity matrix DS between diseases to obtain a first disease similarity fusion matrix dsjs, and fuse the Jaccard similarity matrix MJ between metabolites with the adaptive spectral clustering similarity matrix MS between metabolites to obtain a first metabolite similarity fusion matrix MJs;
the similarity acquisition module is used for acquiring a correlation adjacency matrix Y according to the correlation adjacency matrix Y DM Acquiring a Cosine similarity matrix DC between diseases and a Cosine similarity matrix MC between metabolites;
the second similarity fusion matrix acquisition module is used for fusing a Cosine similarity matrix DC among diseases with a first disease similarity fusion matrix DJS to obtain a second disease similarity fusion matrix DJSC, and fusing a Cosine similarity matrix MC among metabolites with a first metabolite similarity fusion matrix MJS to obtain a second metabolite similarity fusion matrix MJSC;
the disease-metabolite first network construction module is used for constructing a disease-metabolite first network Y by adopting a weighted k-nearest neighbor algorithm and utilizing an original disease-metabolite association bipartite network, a second disease similar fusion matrix DJSC and a second metabolite similar fusion matrix MJSC new
The final prediction score matrix module is used for constructing a final prediction score matrix SNWKCP by using a disease-metabolite first network, a second disease similarity fusion matrix DJSC and a second metabolite similarity fusion matrix MJSC;
the relevance acquisition module is used for searching the predictive score of the diseases and the metabolites of the relation to be predicted in the final predictive score matrix SNWKCP, wherein the higher the score is, the higher the relevance of the diseases and the metabolites is;
the predictive score is in the range of 0 to 1.
A weight k-nearest neighbor based disease-metabolite association prediction storage medium for storing at least one instruction for implementing a weight k-nearest neighbor based disease-metabolite association prediction system.
The beneficial effects of the invention are as follows:
the invention adopts the disease-metabolite adjacent matrix to respectively carry out Jaccard similarity calculation, self-adaptive spectral clustering similarity calculation and Cosine similarity calculation on the disease and the metabolite, thereby obtaining the Jaccard similarity matrix, the self-adaptive spectral clustering similarity matrix and the Cosine similarity matrix of the disease-disease and the metabolite-metabolite; according to the similarity matrix integration, a second disease similarity fusion matrix DJSC and a second metabolite similarity fusion matrix MJSC are obtained; then calculating by a weighted k-nearest neighbor algorithm to obtain a disease-metabolite first network; then calculating a final disease-metabolite association prediction score matrix by using vector projection; the present invention discloses a hidden association between disease and metabolites. The invention obtains the relation between the metabolite and the disease by utilizing the final score matrix, avoids the waste of human resources and time, and improves the prediction efficiency.
Drawings
FIG. 1 is a general flow chart for constructing a disease-metabolite association relationship;
FIG. 2 is a detailed process diagram of the construction of disease-metabolite associations;
FIG. 3 is a diagram of a matrix construction according to disease-metabolite associations;
FIG. 4 is a disease similarity matrix construction diagram calculated from a disease-metabolite correlation matrix;
FIG. 5 is a diagram of metabolite similarity matrix construction calculated from a disease-metabolite correlation matrix;
FIG. 6 is a ROC diagram of SNWKCP-DMA model under the 5-fold cross validation framework.
Detailed Description
The first embodiment is as follows: as shown in fig. 1-2, the disease-metabolite association prediction system based on the weight k-nearest neighbor of the present embodiment includes: the system comprises a disease-metabolite correlation adjacency matrix acquisition module, a Jaccard similarity acquisition module, an adaptive spectral clustering similarity acquisition module, a first similarity fusion matrix acquisition module, a Cosine similarity acquisition module, a second similarity fusion matrix acquisition module, a disease-metabolite first network construction module, a final prediction score matrix construction module and a correlation acquisition module;
the disease-metabolite correlation adjacency matrix acquisition module is used for constructing an original disease-metabolite correlation bipartite network according to the known disease-metabolite correlation relationship and establishing a correlation adjacency matrix Y by utilizing the original disease-metabolite correlation bipartite network DM
The Jaccard similarity acquisition module is used for acquiring a correlation adjacency matrix Y DM Acquiring a Jaccard similarity matrix between diseases and a Jaccard similarity matrix between metabolites;
the self-adaptive spectral clustering similarity acquisition module is used for acquiring a correlation adjacency matrix Y according to the correlation adjacency matrix Y DM Acquiring an adaptive spectral clustering similarity matrix between diseases and an adaptive spectral clustering similarity matrix between metabolites;
the first similarity fusion matrix acquisition module is used for fusing the Jaccard similarity matrix among diseases and the adaptive spectral clustering similarity matrix among diseases to obtain a first disease similarity fusion matrix, and fusing the Jaccard similarity matrix among metabolites and the adaptive spectral clustering similarity matrix among metabolites to obtain a first metabolite similarity fusion matrix;
the similarity acquisition module is used for acquiring a correlation adjacency matrix Y according to the correlation adjacency matrix Y DM Acquiring a Cosine similarity matrix between diseases and a Cosine similarity matrix between metabolites;
the second similarity fusion matrix acquisition module is used for fusing a similarity matrix between diseases and the first disease similarity fusion matrix to obtain a second disease similarity fusion matrix, and fusing the similarity matrix between metabolites and the first metabolite similarity fusion matrix to obtain a second metabolite similarity fusion matrix;
the disease-metabolite first network construction module is used for constructing a disease-metabolite first network by adopting a weighted k-nearest neighbor algorithm and utilizing an original disease-metabolite association bipartite network, a second disease similar fusion matrix and a second metabolite similar fusion matrix;
the final prediction score matrix module is used for constructing a final prediction score matrix by utilizing a disease-metabolite first network, a second disease similarity fusion matrix and a second metabolite similarity fusion matrix;
the relevance acquisition module is used for searching the predictive score of the diseases and the metabolites of the relation to be predicted in the final predictive score matrix SNWKCP, wherein the higher the score is, the higher the relevance of the diseases and the metabolites is;
the predictive score is in the range of 0 to 1.
The second embodiment is as follows: the disease-metabolite correlation adjacency matrix acquisition module is used for constructing a bipartite network by utilizing the known disease-metabolite correlation relationship and establishing a correlation adjacency matrix Y by utilizing the bipartite network DM The following formula:
Y DM ={Y(i,j)} r*n
where r represents the number of disease species, n represents the number of metabolite species, and Y (i, j) is the original disease-metabolite association bipartite network, with particular reference to fig. 3.
And a third specific embodiment: the Jaccard similarity acquisition module is used for a correlation adjacency matrix Y DM Obtaining Jaccard similarity between diseases and Jaccard similarity between metabolites, comprising the steps of:
as shown in FIG. 4Specifically, the Jaccard similarity method between diseases is shown as follows, and two row vectors Y (d i ) And Y (d) i’ ). Then, the number of metabolites which are correlated with the two and the number of metabolites which are correlated with the two are calculated respectively to obtain the disease d i And disease d i’ Similarity between them;
disease-disease Jaccard similarity matrix DJ, DJ (d i ,d i’ ) The formula is as follows:
wherein d i And d i’ Representing two different diseases, Y (d) i ) And Y (d) i’ ) Respectively express and treat the disease d i And disease d i’ Number of related metabolite sets.
As shown in FIG. 5, the method of specifically calculating Jaccard similarity is as follows, finding its two column vectors Y (m j’ ) And Y (m) j ). Then, the number of diseases related to the two are calculated respectively, and the number of diseases related to the two are calculated, thereby obtaining a metabolite m j’ With metabolite m j Similarity between them;
the metabolite-metabolite Jaccard similarity matrix is MJ, where MS (m j ,m j’ ) The method comprises the following steps:
wherein m is j’ And m j Respectively represent two different metabolites, Y (m j’ ) And Y (m) j ) Respectively represent and metabolite m j’ And metabolite m j Number of related disease sets.
The specific embodiment IV is as follows: the self-adaptive spectral clustering similarity acquisition module is used for acquiring a correlation adjacency matrix Y according to the correlation adjacency matrix Y DM Acquisition of diseaseThe self-adaptive spectral clustering similarity and the self-adaptive spectral clustering similarity among metabolites are specifically as follows:
as shown in fig. 4, the method of specifically calculating the adaptive spectral cluster similarity is as follows, first, two row vectors Y (d i ) And Y (d) i’ ). Then, calculating the full-connection Euclidean distance, then calculating sigma of the 'Kth' point with the nearest Euclidean distance by using KNN, and finally constructing a similarity matrix;
element DS (d) in disease-disease adaptive spectral clustering similarity matrix DS i ,d i’ ) The formula is as follows:
δ x =||Y(d x )-Y(d xK )||
wherein Y (d) xK ) Is Y (d) x ) K-th neighbor of sample point, matrix Y DM The i-th and i' -th row vectors of (a) are denoted as Y (d) i ) And Y (d) i’ ) X is i or i', K is a constant greater than 0, delta x Is an intermediate variable, delta x Is an intermediate variable, i and i' are Y DM Is a row vector label of (c).
As shown in fig. 5, the method of specifically calculating the adaptive spectral cluster similarity is as follows, first, two column vectors Y (m j’ ) And Y (m) j ) And, a method for producing the same. Then, calculating the full-connection Euclidean distance, then calculating sigma of the 'Kth' point with the nearest Euclidean distance by using KNN, and finally constructing a similarity matrix, wherein the similarity matrix is specifically as follows:
element MS (m) in metabolite-metabolite adaptive spectral clustering similarity matrix MS j ,m j’ ) The following formula:
δ x’ =||Y(m x’ )-Y(m x’K )||
wherein Y (m) x’K ) Is Y (m) x’ ) K-th neighbor of sample point, matrix Y DM The j 'th and j' th column vectors are denoted as Y (m j’ ) And Y (m) j ) X ' is j or j ', j and j ' are Y DM Is a column vector label of (c).
Fifth embodiment: the first similarity fusion matrix acquisition module is used for fusing a Jaccard similarity matrix between diseases and an adaptive spectral clustering similarity matrix between diseases to obtain a first disease similarity fusion matrix, and fusing a Jaccard similarity matrix between metabolites and an adaptive spectral clustering similarity matrix between metabolites to obtain a first metabolite similarity fusion matrix, and specifically comprises the following steps:
if it passes through the disease-metabolite association matrix Y DM The resulting Jaccard similarity matrix DJ (d i ,d u’ ) Is 0, then directly from the disease-metabolite correlation matrix Y DM The obtained adaptive spectral clustering similarity matrix DS (d i ,d i’ ) And (2) filling the value of (c) or else adding the two values to average the value to a new value.
Element DJS (d) i ,d i’ ) The formula is as follows:
wherein DJ (d) i ,d i’ ) For passing through disease-metabolite association matrix Y DM The resulting Jaccard similarity matrix, DS (d i ,d i’ ) For passing through disease-metabolite association matrix Y DM And (5) obtaining the self-adaptive spectrum clustering similarity matrix.
If it passes through the disease-metabolite association matrix Y DM The resulting Jaccard similarity matrix MJ (m j ,m j’ ) Is 0, then directly from the disease-metabolite correlation matrix Y DM The obtained adaptive spectral clustering similarity matrix MS (m j ,m j’ ) Filling the values of (2) or else, twoThe values are summed to average to a new value.
Element MJS (m) j ,m j’ ) The following formula:
wherein MJ (m) j ,m j’ ) For passing through disease-metabolite association matrix Y DM The resulting Jaccard similarity matrix, MS (m j ,m j’ ) For passing through disease-metabolite association matrix Y DM And (5) obtaining the self-adaptive spectrum clustering similarity matrix.
Specific embodiment six: the similarity acquisition module is used for acquiring a correlation adjacency matrix Y according to the correlation adjacency matrix Y DM The method comprises the steps of obtaining a Cosine similarity matrix between diseases and a Cosine similarity matrix between metabolites, wherein the Cosine similarity matrix comprises the following specific steps:
as shown in FIG. 4, the method of specifically calculating the similarity of Cosine is as follows, and two row vectors Y (d i ) And Y (d) i’ ). Then, the included angles are obtained, and cosine values corresponding to the included angles are obtained, and can be used for representing the similarity of the two vectors. The smaller the angle, the closer the cosine value is to 1 and the more identical their directions are, the more similar.
Disease-element DC in Cosine similarity matrix DC of disease (d i ,d i’ ) The following formula:
as shown in FIG. 5, the method of specifically calculating the similarity of Cosine is as follows, and two column vectors Y (m j’ ) And Y (m) j ). Then, the included angles are obtained, and cosine values corresponding to the included angles are obtained, and can be used for representing the similarity of the two vectors. The smaller the angle, the closer the cosine value is to 1, and their directions are more identicalThe more similar the combination.
Element MC (m) in the metabolite-metabolite Cosine similarity matrix MC i ,m j’ ) The method comprises the following steps:
wherein m is j’ And m j Respectively represent two different metabolites, Y (m j’ ) And Y (m) j ) Respectively represent and metabolite m j’ And metabolite m j Number of related disease sets.
Seventh embodiment: the second similarity fusion matrix acquisition module is configured to fuse a similarity matrix between diseases with the first disease similarity fusion matrix to obtain a second disease similarity fusion matrix, and fuse the similarity matrix between metabolites with the first metabolite similarity fusion matrix to obtain a second metabolite similarity fusion matrix, specifically:
if it passes through the disease-metabolite association matrix Y DM The obtained Cosine similarity matrix DC (d i ,d i’ ) Is 0, then directly from the previous fusion similarity matrix DJS (d i ,d i’ ) And (5) supplementing. Otherwise, from the disease-metabolite association matrix Y DM The obtained Cosine similarity matrix DC (d i ,d i’ ) And the previous fusion similarity matrix DJS (d i ,d i’ ) As a new similarity value.
Element DJSC (d) in second disease-like fusion matrix DJSC i ,d i’ ) The following formula:
if it passes through the disease-metabolite association matrix Y DM The obtained Cosine similarity matrix MC (m i ,m j’ ) Is 0, then directly from the previous fusion similarity matrix MJS (m j ,m j’ ) And (5) supplementing. Otherwise, from the disease-metabolite association matrix Y DM The obtained Cosine similarity matrix MC (m i ,m j’ ) And the previous fusion similarity matrix MJS (m j ,m j’ ) As a new similarity value.
Element MJSC (m in second metabolite-like fusion matrix MJSC j ,m j’ ) The following formula:
seventh embodiment: the disease-metabolite first network construction module is used for constructing a disease-metabolite first network by adopting a weighted k-nearest neighbor algorithm and utilizing an original disease-metabolite association bipartite network, a second disease similar fusion matrix and a second metabolite similar fusion matrix, and specifically comprises the following steps:
Y new =max(Y DM ,Y' new )
wherein,and->
Wherein, xi d And xi m
Wherein, in the formula,is a matrix Y DM Line i->Representation matrix Y DM Column j, N (d) i ) Is disease d i N (m) j ) Is metabolite m j N 'neighbors, Y' new 、ξ d 、ξ m Is an intermediate variable,/->Is Y DM The row i of the column "is,is Y DM Column j ".
Eighth embodiment: the final prediction score matrix module is used for constructing a final prediction score matrix by using a disease-metabolite first network, a second disease similarity fusion matrix and a second metabolite similarity fusion matrix, and specifically comprises the following steps:
calculating a final predictive score matrix SNWKCP for both dqsc and MJSC, while SNWKCP (d i ,m j’ ) The value of (d) is in the range of 0 to 1, wherein SNWKCP (d) i ,m j’ ) The method comprises the following steps:
wherein DSNWKCP (d) i ,m j’ ) And MSNWKCP (d) i ,m j’ ) The method comprises the following steps:
wherein,is DJSC d i Go (go)/(go)>Is Y new Is the m < th > of j’ Column (S)/(S)>Is vector->Length of->Is MJSC mth j’ Column (S)/(S)>Is Y new D of (2) i Go (go)/(go)>Is vector->SNWKCP, SNWKCP is a scoring matrix calculated based on similarity between diseases, a scoring matrix calculated based on metabolite similarity.
The matrix SNWKCP is the final vector projection scoring matrix of the disease space and the metabolite space, with each value in the matrix representing the final score for each disease-metabolite data pair. The final score was used to predict disease-metabolite correlation. The higher its score, the higher the correlation.
Detailed description nine: a weight k-nearest neighbor based disease-metabolite associated prediction storage medium for storing at least one instruction for implementing a weight k-nearest neighbor based disease-metabolite associated prediction system.
Examples: in order to verify the beneficial effects of the invention, the following tests were performed:
by using a 5-fold CV algorithm for the prediction model evaluation to evaluate the performance of the present invention, an ROC image based on the 5-fold CV algorithm is shown in FIG. 6, and the ratio of the AUC of the 5-fold CV algorithm to other models is shown in Table 1.
Among the predictive results, the present invention validated that 3 disease states, obesity-rich, colorectal and lung cancer related metabolites of top 15, by predictive analysis of other known datasets, the validation results are shown in tables 2,3, 4.
Under the same dataset, the SNWKCP-DMA model and other models gave AUC values under 5-fold CV framework as shown in Table 1:
TABLE 1
Method AUC
MCF 0.6156
WMAN 0.6181
PROFANCY 0.9027
MN-LMF 0.9659
SNWKCP-DMA 0.9819
Top 15 metabolites (Metabolite) associated with Obesity (Obesity), as shown in table 2:
TABLE 2
The top 15 Metabolite (metalite) associated with colorectal cancer (Colorectal cancer) is shown in table 3:
TABLE 3 Table 3
Top 15 Metabolite (metalite) associated with Lung Cancer (Lung Cancer), as shown in table 4:
TABLE 4 Table 4
The invention adopts the known disease-metabolite adjacent matrix to respectively carry out various similarity calculations on the disease and the metabolite, including Jaccard similarity calculation, self-adaptive spectral clustering similarity and Cosine similarity calculation, thereby obtaining Jaccard similarity matrix, self-adaptive spectral clustering similarity matrix and Cosine similarity matrix of the disease-disease and the metabolite-metabolite; integrating the similarity matrix to obtain a new disease-disease similarity matrix DJSC and a metabolite-metabolite similarity matrix MJSC; then calculating by using a weighted k-nearest neighbor algorithm to obtain a new disease-metabolite correlation network; then, calculating by using vector projection to obtain a final predictive score matrix SNWKCP; finally, the unknown association hidden under the data is revealed through the multi-aspect data relationship. Through the fusion of multiple similarities and a weighted k-nearest neighbor algorithm, the data dimension is more plump, and meanwhile, better results are obtained by combining two vector projections, and experiments show that the method has certain superiority compared with the traditional association relation constructing method, and the prediction results show that the association method has certain reliability.

Claims (2)

1. A disease-metabolite association prediction system based on weight k-nearest neighbor, characterized in that the system comprises: the system comprises a disease-metabolite correlation adjacency matrix acquisition module, a Jaccard similarity acquisition module, an adaptive spectral clustering similarity acquisition module, a first similarity fusion matrix acquisition module, a Cosine similarity acquisition module, a second similarity fusion matrix acquisition module, a disease-metabolite first network construction module, a final prediction score matrix construction module and a correlation acquisition module;
the disease-metabolite correlation adjacency matrix acquisition module: for constructing an original disease-metabolite association bipartite network according to known disease-metabolite association relationships, and establishing a correlation adjacency matrix Y using the original disease-metabolite association bipartite network DM
Establishing a correlation adjacency matrix Y by using an original disease-metabolite correlation bipartite network DM The following formula:
Y DM ={Y(i,j)} r*n
wherein r represents the number of kinds of diseases, and n represents the number of kinds of metabolites;
the primary disease-metabolite association bipartite network has the formula:
wherein Y (i, j) is the original disease-metabolite association bipartite network;
the Jaccard similarity acquisition module: for adjacency matrix Y according to correlation DM Acquiring a Jaccard similarity matrix DJ between diseases and a Jaccard similarity matrix MJ between metabolites;
element DJ (d) in the inter-disease Jaccard similarity matrix DJ i ,d i’ ) And element MJ (m) in Jaccard similarity matrix MJ between metabolites j ,m j’ ) Obtained by the following formula:
wherein d i And d i’ Representing two different diseases, m j’ And m i Is two different metabolites, Y (d i ) And Y (d) i’ ) Is Y DM Is defined as a row vector, Y (m j’ ) And Y (m) j ) Is Y DM Is a column vector of (1);
the adaptive spectral clustering similarity acquisition module is used for: for adjacency matrix Y according to correlation DM Acquiring an adaptive spectral clustering similarity matrix DS between diseases and an adaptive spectral clustering similarity matrix MS between metabolites;
element DS (d) in the inter-disease adaptive spectral cluster similarity matrix DS i ,d i’ ) Element MS (m) in adaptive spectral cluster similarity matrix MS between metabolites j ,m j’ ) Obtained by the following formula:
δ x =||Y(d x )-Y(d xK )||
δ x’ =||Y(m x’ )-Y(m x’K )||
wherein Y (d) xK ) Is Y (d) x ) Is the Kth neighbor point of (2), Y (m x’K ) Is Y (m) x’ ) X takes i or i ', x ' takes j or j ', delta x And delta x’ Is an intermediate variable, i and i' are Y DM Row vector labels of j and j' are Y DM K is a constant greater than 0;
the first similar fusion matrix acquisition module: the method comprises the steps of fusing a Jaccard similarity matrix DJ among diseases with an adaptive spectral clustering similarity matrix DS among diseases to obtain a first disease similarity fusion matrix DJS, and fusing a Jaccard similarity matrix MJ among metabolites with an adaptive spectral clustering similarity matrix MS among metabolites to obtain a first metabolite similarity fusion matrix MJS;
element DJS (d) i ,d i (ii) and the first metabolite are similar to the elements MJS (m) in the fusion matrix MJS j ,m j’ ) The following formula:
wherein d i And d i’ Representing two different diseases, m i’ And m j Is two different metabolites, DJ (d i ,d i’ ) Is an element in the Jaccard similarity matrix DJ between diseases, MJ (m j ,m j’ ) Is an element in the Jaccard similarity matrix MJ between metabolites, DS (d) i ,d i’ ) Is an element in an inter-disease adaptive spectral cluster similarity matrix DS, MS (m) j ,m j’ ) Is an element in an adaptive spectral clustering similarity matrix MS between metabolites;
the similarity acquisition module: for adjacency matrix Y according to correlation DM Acquiring a Cosine similarity matrix DC between diseases and a Cosine similarity matrix MC between metabolites;
element DC (d) in the inter-disease Cosine similarity matrix DC i ,d i’ ) Element MC (m) in the Cosine similarity matrix MC between metabolites j ,m j’ ) The following formula:
the second similarity fusion matrix acquisition module: the method comprises the steps of performing fusion on a Cosine similarity matrix DC among diseases and a first disease similarity fusion matrix DJS to obtain a second disease similarity fusion matrix DJSC, and performing fusion on a Cosine similarity matrix MC among metabolites and a first metabolite similarity fusion matrix MJS to obtain a second metabolite similarity fusion matrix MJSC;
element DJSC (d) in the second disease-like fusion matrix DJSC i ,d i’ ) Element MJSC (m j ,m j’ ) The following formula:
wherein DC (d) i ,d i’ ) Is an element in a Cosine similarity matrix DC between diseases, MC (m j ,m j’ ) Is an element in a Cosine similarity matrix MC between metabolites;
the disease-metabolite first network building block: for constructing a disease-metabolite first network Y using a weighted k-nearest neighbor algorithm using an original disease-metabolite association bipartite network, a second disease-similarity fusion matrix dqsc and a second metabolite-similarity fusion matrix MJSC new The following formula:
Y new =max(Y DM ,Y′ new )
in the method, in the process of the invention,is a matrix Y DM Line i->Representation matrix Y DM Column j, N (d) i ) Is disease d i N (m) j ) Is metabolite m j N 'neighbors, Y' new 、ξ d 、ξ m Is an intermediate variable,/->Is Y DM Line i @, @>Is Y DM Column j ";
the final prediction score matrix construction module: the method comprises the steps of constructing a final prediction score matrix SNWKCP by using a disease-metabolite first network, a second disease similarity fusion matrix DJSC and a second metabolite similarity fusion matrix MJSC;
the elements SNWKCP (d) in the final predictive score matrix SNWKCP i ,m j’ ) The following formula:
wherein,is DJSC d i Go (go)/(go)>Is Y new Is the m < th > of j’ Column (S)/(S)>Is vector->Is provided for the length of (a),is MJSC mth j’ Column (S)/(S)>Is Y new D of (2) i Go (go)/(go)>Is vector->DSNWKCP, MSNWKCP is an intermediate matrix;
the relevance acquisition module is used for: the method comprises the steps of searching a final prediction score matrix SNWKCP for a disease and metabolite prediction score of a relation to be predicted, wherein the higher the score is, the higher the disease and metabolite correlation is;
the predictive score is in the range of 0 to 1.
2. A disease-metabolite association prediction storage medium based on weight k-nearest neighbor, characterized by: the storage medium is for storing at least one instruction for implementing a weight k-nearest neighbor based disease-metabolite association prediction system of claim 1.
CN202310059889.XA 2023-01-18 2023-01-18 Disease-metabolite association prediction system based on weight k-nearest neighbor Active CN116052873B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310059889.XA CN116052873B (en) 2023-01-18 2023-01-18 Disease-metabolite association prediction system based on weight k-nearest neighbor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310059889.XA CN116052873B (en) 2023-01-18 2023-01-18 Disease-metabolite association prediction system based on weight k-nearest neighbor

Publications (2)

Publication Number Publication Date
CN116052873A CN116052873A (en) 2023-05-02
CN116052873B true CN116052873B (en) 2024-01-26

Family

ID=86132918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310059889.XA Active CN116052873B (en) 2023-01-18 2023-01-18 Disease-metabolite association prediction system based on weight k-nearest neighbor

Country Status (1)

Country Link
CN (1) CN116052873B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203471A (en) * 2016-06-22 2016-12-07 南京航空航天大学 A kind of based on the Spectral Clustering merging Kendall Tau distance metric
CN107887023A (en) * 2017-12-08 2018-04-06 中南大学 A kind of microbial diseases Relationship Prediction method based on similitude and double random walks
KR20190000166A (en) * 2017-06-22 2019-01-02 한국과학기술원 Method and system for predicting drug repositioning candidate based on similarity between drug and metabolite
CN109935332A (en) * 2019-03-01 2019-06-25 桂林电子科技大学 A kind of miRNA- disease association prediction technique based on double random walk models
CN110610763A (en) * 2019-09-10 2019-12-24 陕西师范大学 KaTZ model-based metabolite and disease association relation prediction method
CN112289373A (en) * 2020-10-27 2021-01-29 齐齐哈尔大学 lncRNA-miRNA-disease association method fusing similarity
CN115602243A (en) * 2022-11-02 2023-01-13 曲阜师范大学(Cn) Disease associated information prediction method based on multi-similarity fusion

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11037684B2 (en) * 2014-11-14 2021-06-15 International Business Machines Corporation Generating drug repositioning hypotheses based on integrating multiple aspects of drug similarity and disease similarity

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203471A (en) * 2016-06-22 2016-12-07 南京航空航天大学 A kind of based on the Spectral Clustering merging Kendall Tau distance metric
KR20190000166A (en) * 2017-06-22 2019-01-02 한국과학기술원 Method and system for predicting drug repositioning candidate based on similarity between drug and metabolite
CN107887023A (en) * 2017-12-08 2018-04-06 中南大学 A kind of microbial diseases Relationship Prediction method based on similitude and double random walks
CN109935332A (en) * 2019-03-01 2019-06-25 桂林电子科技大学 A kind of miRNA- disease association prediction technique based on double random walk models
CN110610763A (en) * 2019-09-10 2019-12-24 陕西师范大学 KaTZ model-based metabolite and disease association relation prediction method
CN112289373A (en) * 2020-10-27 2021-01-29 齐齐哈尔大学 lncRNA-miRNA-disease association method fusing similarity
CN115602243A (en) * 2022-11-02 2023-01-13 曲阜师范大学(Cn) Disease associated information prediction method based on multi-similarity fusion

Also Published As

Publication number Publication date
CN116052873A (en) 2023-05-02

Similar Documents

Publication Publication Date Title
Spann et al. Applying machine learning in liver disease and transplantation: a comprehensive review
Choi Deep learning in nuclear medicine and molecular imaging: current perspectives and future directions
Casiraghi et al. Explainable machine learning for early assessment of COVID-19 risk prediction in emergency departments
Owais et al. Artificial intelligence-based classification of multiple gastrointestinal diseases using endoscopy videos for clinical diagnosis
Li et al. A generalized framework of feature learning enhanced convolutional neural network for pathology-image-oriented cancer diagnosis
Thirunavukarasu et al. Towards computational solutions for precision medicine based big data healthcare system using deep learning models: A review
CN115049603B (en) Intestinal polyp segmentation method and system based on small sample learning
Saeed et al. TMSS: an end-to-end transformer-based multimodal network for segmentation and survival prediction
Bhardwaj et al. Computational biology in the lens of CNN
Jin et al. Deep learning based classification of multi-label chest X-ray images via dual-weighted metric loss
Sattar et al. Lung cancer prediction using multi-gene genetic programming by selecting automatic features from amino acid sequences
Meng et al. Radiomics-enhanced deep multi-task learning for outcome prediction in head and neck cancer
Kovalev et al. Biomedical image recognition in pulmonology and oncology with the use of deep learning
Chen et al. Identifying cardiomegaly in chest x-rays using dual attention network
Daza et al. Cerberus: A multi-headed network for brain tumor segmentation
Tenali et al. Oral Cancer Detection using Deep Learning Techniques
CN116052873B (en) Disease-metabolite association prediction system based on weight k-nearest neighbor
Mukherji et al. Recent landscape of deep learning intervention and consecutive clustering on biomedical diagnosis
Gholami et al. Proposing method to Increase the detection accuracy of stomach cancer based on colour and lint features of tongue using CNN and SVM
Liu et al. Combining self-training and hybrid architecture for semi-supervised abdominal organ segmentation
CN111582330A (en) Integrated ResNet-NRC method for dividing sample space based on lung tumor image
Fiaidhi et al. Thick data analytics for rating ulcerative colitis severity using small endoscopy image sample
CN114999566B (en) Drug repositioning method and system based on word vector characterization and attention mechanism
Wang et al. Gene selection for cancer detection using graph signal processing
CN114708347A (en) Lung nodule CT image classification method based on self-adaptive selection dual-source-domain heterogeneous migration learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant