CN116052873B

CN116052873B - Disease-metabolite association prediction system based on weight k-nearest neighbor

Info

Publication number: CN116052873B
Application number: CN202310059889.XA
Authority: CN
Inventors: 王波; 王鑫炜; 刘明; 杜晓昕; 李敬有; 廉佐政
Original assignee: Qiqihar University
Current assignee: Qiqihar University
Priority date: 2023-01-18
Filing date: 2023-01-18
Publication date: 2024-01-26
Anticipated expiration: 2043-01-18
Also published as: CN116052873A

Abstract

A disease-metabolite association prediction system based on weight k-nearest neighbor relates to the technical field of bioinformatics. The invention aims to solve the problem that the existing method for acquiring the relationship between the metabolite and the disease has low prediction efficiency. The invention comprises the following steps: acquiring a Jaccard similarity matrix between diseases and metabolites, and a self-adaptive spectral clustering similarity matrix and a Cosine similarity matrix; obtaining a first disease similarity fusion matrix and a first metabolite similarity fusion matrix by using a Jaccard similarity matrix and an adaptive spectral clustering similarity matrix between the disease and the metabolites; obtaining a second disease similar fusion matrix and a second metabolite similar fusion matrix by fusion of a Cosine similarity matrix and a first disease similar fusion matrix between metabolites and diseases; constructing a disease-metabolite first network; constructing a final prediction score matrix; a predictive score for the disease and metabolite of the relationship to be predicted is obtained. The invention is useful for predicting the association between a disease and a metabolite.

Description

Disease-metabolite association prediction system based on weight k-nearest neighbor

Technical Field

The invention relates to the technical field of bioinformatics, in particular to a disease-metabolite association prediction system based on a weight k-nearest neighbor.

Background

During long-term evolution, biological organisms interact with the surrounding environment, a process of absorbing and rejecting substances and energy, known as metabolism. It acts as an important vital activity of the organism, playing a vital role in the process of substance and energy variation. More and more biological and medical experiments have shown that certain metabolite concentrations in some patients differ from those in healthy individuals. Deoxycholic acid is a secondary bile acid produced by the liver and is recirculated through the liver, bile duct, small intestine and portal vein to form the enterohepatic circuit. At physiological pH values, they are strongly toxic in the form of anions, and therefore a carrier is required for transport across intestinal and hepatic tissue membranes. When the deoxycholate content is sufficiently high, it can act as a hepatotoxin, a metabolic toxin and a tumor metabolite. Hepatotoxins can cause damage to the liver or hepatocytes. When at high levels for long periods of time, it can promote tumor growth and survival. In addition to being associated with liver disease, long-term high levels of deoxycholic acid are also associated with a variety of cancers, such as colon cancer, breast cancer and many other cancers of the gastrointestinal tract. Furthermore, the pathogenesis of cardiovascular and cerebrovascular diseases and some immune diseases has also been shown to be related to metabolites. Therefore, diagnosis of metabolic-based diseases is an important judgment in medical diagnosis.

The existing method for acquiring the relationship between the metabolite and the disease is mainly realized by proposing a mode of carrying out biological experiments, however, the biological experiments not only waste a lot of human resources, but also require a lot of time, thereby causing the problem of low prediction efficiency of the existing method for acquiring the relationship between the metabolite and the disease.

Disclosure of Invention

The invention aims to solve the problem that the existing method for acquiring the relation between the metabolite and the disease is low in prediction efficiency, and provides a disease-metabolite association prediction system based on weight k-nearest neighbor.

A disease-metabolite association prediction system based on weight k-nearest neighbor comprising: the system comprises a disease-metabolite correlation adjacency matrix acquisition module, a Jaccard similarity acquisition module, an adaptive spectral clustering similarity acquisition module, a first similarity fusion matrix acquisition module, a Cosine similarity acquisition module, a second similarity fusion matrix acquisition module, a disease-metabolite first network construction module, a final prediction score matrix construction module and a correlation acquisition module;

the disease-metabolite correlation adjacency matrix acquisition module is used for constructing an original disease-metabolite correlation bipartite network according to the known disease-metabolite correlation relationship and establishing a correlation adjacency matrix Y by utilizing the original disease-metabolite correlation bipartite network _DM ；

The Jaccard similarity acquisition module is used for acquiring a correlation adjacency matrix Y _DM Acquiring a Jaccard similarity matrix DJ between diseases and a Jaccard similarity matrix MJ between metabolites;

the self-adaptive spectral clustering similarity acquisition module is used for acquiring a correlation adjacency matrix Y according to the correlation adjacency matrix Y _DM Acquiring an adaptive spectral clustering similarity matrix DS between diseases and an adaptive spectral clustering similarity matrix MS between metabolites;

the first similarity fusion matrix acquisition module is configured to fuse the Jaccard similarity matrix DJ between diseases with the adaptive spectral clustering similarity matrix DS between diseases to obtain a first disease similarity fusion matrix dsjs, and fuse the Jaccard similarity matrix MJ between metabolites with the adaptive spectral clustering similarity matrix MS between metabolites to obtain a first metabolite similarity fusion matrix MJs;

the similarity acquisition module is used for acquiring a correlation adjacency matrix Y according to the correlation adjacency matrix Y _DM Acquiring a Cosine similarity matrix DC between diseases and a Cosine similarity matrix MC between metabolites;

the second similarity fusion matrix acquisition module is used for fusing a Cosine similarity matrix DC among diseases with a first disease similarity fusion matrix DJS to obtain a second disease similarity fusion matrix DJSC, and fusing a Cosine similarity matrix MC among metabolites with a first metabolite similarity fusion matrix MJS to obtain a second metabolite similarity fusion matrix MJSC;

the disease-metabolite first network construction module is used for constructing a disease-metabolite first network Y by adopting a weighted k-nearest neighbor algorithm and utilizing an original disease-metabolite association bipartite network, a second disease similar fusion matrix DJSC and a second metabolite similar fusion matrix MJSC _new ；

The final prediction score matrix module is used for constructing a final prediction score matrix SNWKCP by using a disease-metabolite first network, a second disease similarity fusion matrix DJSC and a second metabolite similarity fusion matrix MJSC;

the relevance acquisition module is used for searching the predictive score of the diseases and the metabolites of the relation to be predicted in the final predictive score matrix SNWKCP, wherein the higher the score is, the higher the relevance of the diseases and the metabolites is;

the predictive score is in the range of 0 to 1.

A weight k-nearest neighbor based disease-metabolite association prediction storage medium for storing at least one instruction for implementing a weight k-nearest neighbor based disease-metabolite association prediction system.

The beneficial effects of the invention are as follows:

the invention adopts the disease-metabolite adjacent matrix to respectively carry out Jaccard similarity calculation, self-adaptive spectral clustering similarity calculation and Cosine similarity calculation on the disease and the metabolite, thereby obtaining the Jaccard similarity matrix, the self-adaptive spectral clustering similarity matrix and the Cosine similarity matrix of the disease-disease and the metabolite-metabolite; according to the similarity matrix integration, a second disease similarity fusion matrix DJSC and a second metabolite similarity fusion matrix MJSC are obtained; then calculating by a weighted k-nearest neighbor algorithm to obtain a disease-metabolite first network; then calculating a final disease-metabolite association prediction score matrix by using vector projection; the present invention discloses a hidden association between disease and metabolites. The invention obtains the relation between the metabolite and the disease by utilizing the final score matrix, avoids the waste of human resources and time, and improves the prediction efficiency.

Drawings

FIG. 1 is a general flow chart for constructing a disease-metabolite association relationship;

FIG. 2 is a detailed process diagram of the construction of disease-metabolite associations;

FIG. 3 is a diagram of a matrix construction according to disease-metabolite associations;

FIG. 4 is a disease similarity matrix construction diagram calculated from a disease-metabolite correlation matrix;

FIG. 5 is a diagram of metabolite similarity matrix construction calculated from a disease-metabolite correlation matrix;

FIG. 6 is a ROC diagram of SNWKCP-DMA model under the 5-fold cross validation framework.

Detailed Description

The first embodiment is as follows: as shown in fig. 1-2, the disease-metabolite association prediction system based on the weight k-nearest neighbor of the present embodiment includes: the system comprises a disease-metabolite correlation adjacency matrix acquisition module, a Jaccard similarity acquisition module, an adaptive spectral clustering similarity acquisition module, a first similarity fusion matrix acquisition module, a Cosine similarity acquisition module, a second similarity fusion matrix acquisition module, a disease-metabolite first network construction module, a final prediction score matrix construction module and a correlation acquisition module;

The Jaccard similarity acquisition module is used for acquiring a correlation adjacency matrix Y _DM Acquiring a Jaccard similarity matrix between diseases and a Jaccard similarity matrix between metabolites;

the self-adaptive spectral clustering similarity acquisition module is used for acquiring a correlation adjacency matrix Y according to the correlation adjacency matrix Y _DM Acquiring an adaptive spectral clustering similarity matrix between diseases and an adaptive spectral clustering similarity matrix between metabolites;

the first similarity fusion matrix acquisition module is used for fusing the Jaccard similarity matrix among diseases and the adaptive spectral clustering similarity matrix among diseases to obtain a first disease similarity fusion matrix, and fusing the Jaccard similarity matrix among metabolites and the adaptive spectral clustering similarity matrix among metabolites to obtain a first metabolite similarity fusion matrix;

the similarity acquisition module is used for acquiring a correlation adjacency matrix Y according to the correlation adjacency matrix Y _DM Acquiring a Cosine similarity matrix between diseases and a Cosine similarity matrix between metabolites;

the second similarity fusion matrix acquisition module is used for fusing a similarity matrix between diseases and the first disease similarity fusion matrix to obtain a second disease similarity fusion matrix, and fusing the similarity matrix between metabolites and the first metabolite similarity fusion matrix to obtain a second metabolite similarity fusion matrix;

the disease-metabolite first network construction module is used for constructing a disease-metabolite first network by adopting a weighted k-nearest neighbor algorithm and utilizing an original disease-metabolite association bipartite network, a second disease similar fusion matrix and a second metabolite similar fusion matrix;

the final prediction score matrix module is used for constructing a final prediction score matrix by utilizing a disease-metabolite first network, a second disease similarity fusion matrix and a second metabolite similarity fusion matrix;

the predictive score is in the range of 0 to 1.

The second embodiment is as follows: the disease-metabolite correlation adjacency matrix acquisition module is used for constructing a bipartite network by utilizing the known disease-metabolite correlation relationship and establishing a correlation adjacency matrix Y by utilizing the bipartite network _DM The following formula:

Y _DM ＝{Y(i,j)} _r*n

where r represents the number of disease species, n represents the number of metabolite species, and Y (i, j) is the original disease-metabolite association bipartite network, with particular reference to fig. 3.

And a third specific embodiment: the Jaccard similarity acquisition module is used for a correlation adjacency matrix Y _DM Obtaining Jaccard similarity between diseases and Jaccard similarity between metabolites, comprising the steps of:

as shown in FIG. 4Specifically, the Jaccard similarity method between diseases is shown as follows, and two row vectors Y (d _i ) And Y (d) _i’ ). Then, the number of metabolites which are correlated with the two and the number of metabolites which are correlated with the two are calculated respectively to obtain the disease d _i And disease d _i’ Similarity between them;

disease-disease Jaccard similarity matrix DJ, DJ (d _i ,d _i’ ) The formula is as follows:

wherein d _i And d _i’ Representing two different diseases, Y (d) _i ) And Y (d) _i’ ) Respectively express and treat the disease d _i And disease d _i’ Number of related metabolite sets.

As shown in FIG. 5, the method of specifically calculating Jaccard similarity is as follows, finding its two column vectors Y (m _j’ ) And Y (m) _j ). Then, the number of diseases related to the two are calculated respectively, and the number of diseases related to the two are calculated, thereby obtaining a metabolite m _j’ With metabolite m _j Similarity between them;

the metabolite-metabolite Jaccard similarity matrix is MJ, where MS (m _j ,m _j’ ) The method comprises the following steps:

wherein m is _j’ And m _j Respectively represent two different metabolites, Y (m _j’ ) And Y (m) _j ) Respectively represent and metabolite m _j’ And metabolite m _j Number of related disease sets.

The specific embodiment IV is as follows: the self-adaptive spectral clustering similarity acquisition module is used for acquiring a correlation adjacency matrix Y according to the correlation adjacency matrix Y _DM Acquisition of diseaseThe self-adaptive spectral clustering similarity and the self-adaptive spectral clustering similarity among metabolites are specifically as follows:

as shown in fig. 4, the method of specifically calculating the adaptive spectral cluster similarity is as follows, first, two row vectors Y (d _i ) And Y (d) _i’ ). Then, calculating the full-connection Euclidean distance, then calculating sigma of the 'Kth' point with the nearest Euclidean distance by using KNN, and finally constructing a similarity matrix;

element DS (d) in disease-disease adaptive spectral clustering similarity matrix DS _i ,d _i’ ) The formula is as follows:

δ _x ＝||Y(d _x )-Y(d _xK )||

wherein Y (d) _xK ) Is Y (d) _x ) K-th neighbor of sample point, matrix Y _DM The i-th and i' -th row vectors of (a) are denoted as Y (d) _i ) And Y (d) _i’ ) X is i or i', K is a constant greater than 0, delta _x Is an intermediate variable, delta _x Is an intermediate variable, i and i' are Y _DM Is a row vector label of (c).

As shown in fig. 5, the method of specifically calculating the adaptive spectral cluster similarity is as follows, first, two column vectors Y (m _j’ ) And Y (m) _j ) And, a method for producing the same. Then, calculating the full-connection Euclidean distance, then calculating sigma of the 'Kth' point with the nearest Euclidean distance by using KNN, and finally constructing a similarity matrix, wherein the similarity matrix is specifically as follows:

element MS (m) in metabolite-metabolite adaptive spectral clustering similarity matrix MS _j ,m _j’ ) The following formula:

δ _x’ ＝||Y(m _x’ )-Y(m _x’K )||

wherein Y (m) _x’K ) Is Y (m) _x’ ) K-th neighbor of sample point, matrix Y _DM The j 'th and j' th column vectors are denoted as Y (m _j’ ) And Y (m) _j ) X ' is j or j ', j and j ' are Y _DM Is a column vector label of (c).

Fifth embodiment: the first similarity fusion matrix acquisition module is used for fusing a Jaccard similarity matrix between diseases and an adaptive spectral clustering similarity matrix between diseases to obtain a first disease similarity fusion matrix, and fusing a Jaccard similarity matrix between metabolites and an adaptive spectral clustering similarity matrix between metabolites to obtain a first metabolite similarity fusion matrix, and specifically comprises the following steps:

if it passes through the disease-metabolite association matrix Y _DM The resulting Jaccard similarity matrix DJ (d _i ,d _u’ ) Is 0, then directly from the disease-metabolite correlation matrix Y _DM The obtained adaptive spectral clustering similarity matrix DS (d _i ,d _i’ ) And (2) filling the value of (c) or else adding the two values to average the value to a new value.

Element DJS (d) _i ,d _i’ ) The formula is as follows:

wherein DJ (d) _i ,d _i’ ) For passing through disease-metabolite association matrix Y _DM The resulting Jaccard similarity matrix, DS (d _i ,d _i’ ) For passing through disease-metabolite association matrix Y _DM And (5) obtaining the self-adaptive spectrum clustering similarity matrix.

If it passes through the disease-metabolite association matrix Y _DM The resulting Jaccard similarity matrix MJ (m _j ,m _j’ ) Is 0, then directly from the disease-metabolite correlation matrix Y _DM The obtained adaptive spectral clustering similarity matrix MS (m _j ,m _j’ ) Filling the values of (2) or else, twoThe values are summed to average to a new value.

Element MJS (m) _j ,m _j’ ) The following formula:

wherein MJ (m) _j ,m _j’ ) For passing through disease-metabolite association matrix Y _DM The resulting Jaccard similarity matrix, MS (m _j ,m _j’ ) For passing through disease-metabolite association matrix Y _DM And (5) obtaining the self-adaptive spectrum clustering similarity matrix.

Specific embodiment six: the similarity acquisition module is used for acquiring a correlation adjacency matrix Y according to the correlation adjacency matrix Y _DM The method comprises the steps of obtaining a Cosine similarity matrix between diseases and a Cosine similarity matrix between metabolites, wherein the Cosine similarity matrix comprises the following specific steps:

as shown in FIG. 4, the method of specifically calculating the similarity of Cosine is as follows, and two row vectors Y (d _i ) And Y (d) _i’ ). Then, the included angles are obtained, and cosine values corresponding to the included angles are obtained, and can be used for representing the similarity of the two vectors. The smaller the angle, the closer the cosine value is to 1 and the more identical their directions are, the more similar.

Disease-element DC in Cosine similarity matrix DC of disease (d _i ,d _i’ ) The following formula:

as shown in FIG. 5, the method of specifically calculating the similarity of Cosine is as follows, and two column vectors Y (m _j’ ) And Y (m) _j ). Then, the included angles are obtained, and cosine values corresponding to the included angles are obtained, and can be used for representing the similarity of the two vectors. The smaller the angle, the closer the cosine value is to 1, and their directions are more identicalThe more similar the combination.

Element MC (m) in the metabolite-metabolite Cosine similarity matrix MC _i ,m _j’ ) The method comprises the following steps:

Seventh embodiment: the second similarity fusion matrix acquisition module is configured to fuse a similarity matrix between diseases with the first disease similarity fusion matrix to obtain a second disease similarity fusion matrix, and fuse the similarity matrix between metabolites with the first metabolite similarity fusion matrix to obtain a second metabolite similarity fusion matrix, specifically:

if it passes through the disease-metabolite association matrix Y _DM The obtained Cosine similarity matrix DC (d _i ,d _i’ ) Is 0, then directly from the previous fusion similarity matrix DJS (d _i ,d _i’ ) And (5) supplementing. Otherwise, from the disease-metabolite association matrix Y _DM The obtained Cosine similarity matrix DC (d _i ,d _i’ ) And the previous fusion similarity matrix DJS (d _i ,d _i’ ) As a new similarity value.

Element DJSC (d) in second disease-like fusion matrix DJSC _i ,d _i’ ) The following formula:

if it passes through the disease-metabolite association matrix Y _DM The obtained Cosine similarity matrix MC (m _i ,m _j’ ) Is 0, then directly from the previous fusion similarity matrix MJS (m _j ,m _j’ ) And (5) supplementing. Otherwise, from the disease-metabolite association matrix Y _DM The obtained Cosine similarity matrix MC (m _i ,m _j’ ) And the previous fusion similarity matrix MJS (m _j ,m _j’ ) As a new similarity value.

Element MJSC (m in second metabolite-like fusion matrix MJSC _j ,m _j’ ) The following formula:

seventh embodiment: the disease-metabolite first network construction module is used for constructing a disease-metabolite first network by adopting a weighted k-nearest neighbor algorithm and utilizing an original disease-metabolite association bipartite network, a second disease similar fusion matrix and a second metabolite similar fusion matrix, and specifically comprises the following steps:

Y _new ＝max(Y _DM ，Y' _new )

wherein,and->

Wherein, xi _d And xi _m ：

Wherein, in the formula,is a matrix Y _DM Line i->Representation matrix Y _DM Column j, N (d) _i ) Is disease d _i N (m) _j ) Is metabolite m _j N 'neighbors, Y' _new 、ξ _d 、ξ _m Is an intermediate variable,/->Is Y _DM The row i of the column "is,is Y _DM Column j ".

Eighth embodiment: the final prediction score matrix module is used for constructing a final prediction score matrix by using a disease-metabolite first network, a second disease similarity fusion matrix and a second metabolite similarity fusion matrix, and specifically comprises the following steps:

calculating a final predictive score matrix SNWKCP for both dqsc and MJSC, while SNWKCP (d _i ,m _j’ ) The value of (d) is in the range of 0 to 1, wherein SNWKCP (d) _i ,m _j’ ) The method comprises the following steps:

wherein DSNWKCP (d) _i ,m _j’ ) And MSNWKCP (d) _i ,m _j’ ) The method comprises the following steps:

wherein,is DJSC d _i Go (go)/(go)>Is Y _new Is the m < th > of _j’ Column (S)/(S)>Is vector->Length of->Is MJSC mth _j’ Column (S)/(S)>Is Y _new D of (2) _i Go (go)/(go)>Is vector->SNWKCP, SNWKCP is a scoring matrix calculated based on similarity between diseases, a scoring matrix calculated based on metabolite similarity.

The matrix SNWKCP is the final vector projection scoring matrix of the disease space and the metabolite space, with each value in the matrix representing the final score for each disease-metabolite data pair. The final score was used to predict disease-metabolite correlation. The higher its score, the higher the correlation.

Detailed description nine: a weight k-nearest neighbor based disease-metabolite associated prediction storage medium for storing at least one instruction for implementing a weight k-nearest neighbor based disease-metabolite associated prediction system.

Examples: in order to verify the beneficial effects of the invention, the following tests were performed:

by using a 5-fold CV algorithm for the prediction model evaluation to evaluate the performance of the present invention, an ROC image based on the 5-fold CV algorithm is shown in FIG. 6, and the ratio of the AUC of the 5-fold CV algorithm to other models is shown in Table 1.

Among the predictive results, the present invention validated that 3 disease states, obesity-rich, colorectal and lung cancer related metabolites of top 15, by predictive analysis of other known datasets, the validation results are shown in tables 2,3, 4.

Under the same dataset, the SNWKCP-DMA model and other models gave AUC values under 5-fold CV framework as shown in Table 1:

TABLE 1

Method	AUC
		MCF	0.6156
WMAN	0.6181
		PROFANCY	0.9027
MN-LMF	0.9659
		SNWKCP-DMA	0.9819

Top 15 metabolites (Metabolite) associated with Obesity (Obesity), as shown in table 2:

TABLE 2

The top 15 Metabolite (metalite) associated with colorectal cancer (Colorectal cancer) is shown in table 3:

TABLE 3 Table 3

Top 15 Metabolite (metalite) associated with Lung Cancer (Lung Cancer), as shown in table 4:

TABLE 4 Table 4

The invention adopts the known disease-metabolite adjacent matrix to respectively carry out various similarity calculations on the disease and the metabolite, including Jaccard similarity calculation, self-adaptive spectral clustering similarity and Cosine similarity calculation, thereby obtaining Jaccard similarity matrix, self-adaptive spectral clustering similarity matrix and Cosine similarity matrix of the disease-disease and the metabolite-metabolite; integrating the similarity matrix to obtain a new disease-disease similarity matrix DJSC and a metabolite-metabolite similarity matrix MJSC; then calculating by using a weighted k-nearest neighbor algorithm to obtain a new disease-metabolite correlation network; then, calculating by using vector projection to obtain a final predictive score matrix SNWKCP; finally, the unknown association hidden under the data is revealed through the multi-aspect data relationship. Through the fusion of multiple similarities and a weighted k-nearest neighbor algorithm, the data dimension is more plump, and meanwhile, better results are obtained by combining two vector projections, and experiments show that the method has certain superiority compared with the traditional association relation constructing method, and the prediction results show that the association method has certain reliability.

Claims

1. A disease-metabolite association prediction system based on weight k-nearest neighbor, characterized in that the system comprises: the system comprises a disease-metabolite correlation adjacency matrix acquisition module, a Jaccard similarity acquisition module, an adaptive spectral clustering similarity acquisition module, a first similarity fusion matrix acquisition module, a Cosine similarity acquisition module, a second similarity fusion matrix acquisition module, a disease-metabolite first network construction module, a final prediction score matrix construction module and a correlation acquisition module;

the disease-metabolite correlation adjacency matrix acquisition module: for constructing an original disease-metabolite association bipartite network according to known disease-metabolite association relationships, and establishing a correlation adjacency matrix Y using the original disease-metabolite association bipartite network _DM ；

Establishing a correlation adjacency matrix Y by using an original disease-metabolite correlation bipartite network _DM The following formula:

Y _DM ＝{Y(i，j)} _r*n

wherein r represents the number of kinds of diseases, and n represents the number of kinds of metabolites;

the primary disease-metabolite association bipartite network has the formula:

wherein Y (i, j) is the original disease-metabolite association bipartite network;

the Jaccard similarity acquisition module: for adjacency matrix Y according to correlation _DM Acquiring a Jaccard similarity matrix DJ between diseases and a Jaccard similarity matrix MJ between metabolites;

element DJ (d) in the inter-disease Jaccard similarity matrix DJ _i ，d _i’ ) And element MJ (m) in Jaccard similarity matrix MJ between metabolites _j ，m _j’ ) Obtained by the following formula:

wherein d _i And d _i’ Representing two different diseases, m _j’ And m _i Is two different metabolites, Y (d _i ) And Y (d) _i’ ) Is Y _DM Is defined as a row vector, Y (m _j’ ) And Y (m) _j ) Is Y _DM Is a column vector of (1);

the adaptive spectral clustering similarity acquisition module is used for: for adjacency matrix Y according to correlation _DM Acquiring an adaptive spectral clustering similarity matrix DS between diseases and an adaptive spectral clustering similarity matrix MS between metabolites;

element DS (d) in the inter-disease adaptive spectral cluster similarity matrix DS _i ，d _i’ ) Element MS (m) in adaptive spectral cluster similarity matrix MS between metabolites _j ，m _j’ ) Obtained by the following formula:

δ _x ＝||Y(d _x )-Y(d _xK )||

δ _x’ ＝||Y(m _x’ )-Y(m _x’K )||

wherein Y (d) _xK ) Is Y (d) _x ) Is the Kth neighbor point of (2), Y (m _x’K ) Is Y (m) _x’ ) X takes i or i ', x ' takes j or j ', delta _x And delta _x’ Is an intermediate variable, i and i' are Y _DM Row vector labels of j and j' are Y _DM K is a constant greater than 0;

the first similar fusion matrix acquisition module: the method comprises the steps of fusing a Jaccard similarity matrix DJ among diseases with an adaptive spectral clustering similarity matrix DS among diseases to obtain a first disease similarity fusion matrix DJS, and fusing a Jaccard similarity matrix MJ among metabolites with an adaptive spectral clustering similarity matrix MS among metabolites to obtain a first metabolite similarity fusion matrix MJS;

element DJS (d) _i ，d _i (ii) and the first metabolite are similar to the elements MJS (m) in the fusion matrix MJS _j ，m _j’ ) The following formula:

wherein d _i And d _i’ Representing two different diseases, m _i’ And m _j Is two different metabolites, DJ (d _i ，d _i’ ) Is an element in the Jaccard similarity matrix DJ between diseases, MJ (m _j ，m _j’ ) Is an element in the Jaccard similarity matrix MJ between metabolites, DS (d) _i ，d _i’ ) Is an element in an inter-disease adaptive spectral cluster similarity matrix DS, MS (m) _j ，m _j’ ) Is an element in an adaptive spectral clustering similarity matrix MS between metabolites;

the similarity acquisition module: for adjacency matrix Y according to correlation _DM Acquiring a Cosine similarity matrix DC between diseases and a Cosine similarity matrix MC between metabolites;

element DC (d) in the inter-disease Cosine similarity matrix DC _i ，d _i’ ) Element MC (m) in the Cosine similarity matrix MC between metabolites _j ，m _j’ ) The following formula:

the second similarity fusion matrix acquisition module: the method comprises the steps of performing fusion on a Cosine similarity matrix DC among diseases and a first disease similarity fusion matrix DJS to obtain a second disease similarity fusion matrix DJSC, and performing fusion on a Cosine similarity matrix MC among metabolites and a first metabolite similarity fusion matrix MJS to obtain a second metabolite similarity fusion matrix MJSC;

element DJSC (d) in the second disease-like fusion matrix DJSC _i ，d _i’ ) Element MJSC (m _j ，m _j’ ) The following formula:

wherein DC (d) _i ，d _i’ ) Is an element in a Cosine similarity matrix DC between diseases, MC (m _j ，m _j’ ) Is an element in a Cosine similarity matrix MC between metabolites;

the disease-metabolite first network building block: for constructing a disease-metabolite first network Y using a weighted k-nearest neighbor algorithm using an original disease-metabolite association bipartite network, a second disease-similarity fusion matrix dqsc and a second metabolite-similarity fusion matrix MJSC _new The following formula:

Y _new ＝max(Y _DM ，Y′ _new )

in the method, in the process of the invention,is a matrix Y _DM Line i->Representation matrix Y _DM Column j, N (d) _i ) Is disease d _i N (m) _j ) Is metabolite m _j N 'neighbors, Y' _new 、ξ _d 、ξ _m Is an intermediate variable,/->Is Y _DM Line i @, @>Is Y _DM Column j ";

the final prediction score matrix construction module: the method comprises the steps of constructing a final prediction score matrix SNWKCP by using a disease-metabolite first network, a second disease similarity fusion matrix DJSC and a second metabolite similarity fusion matrix MJSC;

the elements SNWKCP (d) in the final predictive score matrix SNWKCP _i ，m _j’ ) The following formula:

wherein,is DJSC d _i Go (go)/(go)>Is Y _new Is the m < th > of _j’ Column (S)/(S)>Is vector->Is provided for the length of (a),is MJSC mth _j’ Column (S)/(S)>Is Y _new D of (2) _i Go (go)/(go)>Is vector->DSNWKCP, MSNWKCP is an intermediate matrix;

the relevance acquisition module is used for: the method comprises the steps of searching a final prediction score matrix SNWKCP for a disease and metabolite prediction score of a relation to be predicted, wherein the higher the score is, the higher the disease and metabolite correlation is;

the predictive score is in the range of 0 to 1.

2. A disease-metabolite association prediction storage medium based on weight k-nearest neighbor, characterized by: the storage medium is for storing at least one instruction for implementing a weight k-nearest neighbor based disease-metabolite association prediction system of claim 1.