CN114038509A

CN114038509A - Disturbed pathway analysis method based on metabolite association network

Info

Publication number: CN114038509A
Application number: CN202111301560.7A
Authority: CN
Inventors: 董继扬; 吴昕玥; 邓伶莉
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2021-11-04
Filing date: 2021-11-04
Publication date: 2022-02-11

Abstract

A disturbed pathway analysis method based on a metabolite association network belongs to the technical field of analysis. The method comprises the following steps: 1) analyzing metabolome in a biological sample, and preprocessing data to obtain a metabonomic data matrix; 2) constructing a bipartite metabolite-metabolic pathway map; 3) carrying out metabolite association network modeling on the metabonomics data matrix to obtain a difference network between an experimental group and a control group; 4) calculating the enrichment degree of the difference network on the bipartite graph; 5) sorting the paths according to importance based on a differential metabolite network between two groups of samples and combining information of a path database, and identifying potential disturbance paths; 6) and (4) carrying out significance test on the potential perturbation pathway to obtain a significantly perturbed metabolic pathway. The difference between different physiological states of the organism is analyzed by utilizing the related theory of the difference network, the screened disturbed metabolic pathway related to the experimental group is more reliable, and a new research idea is provided for understanding the generation mechanism behind the organism.

Description

Disturbed pathway analysis method based on metabolite association network

Technical Field

The invention belongs to the technical field of analysis, and particularly relates to a disturbed pathway analysis method based on a metabolite association network.

Background

During the development of a particular disease, many important metabolic pathways are perturbed, and identification of perturbed metabolic pathways associated with a particular disease is of great significance in studying the development of cancer, as well as providing important clues for exploring the pathogenesis and drug action targets of a disease (Y. Drier, M. Sheffer, E. Domany, Pathway-based personalised analysis of cancer, Proceedings of the National Academy of Sciences 110(16 (2012013)) 6393).

The traditional path analysis method mainly involves the following steps: firstly, carrying out data preprocessing operations such as denoising, baseline correction, spectral peak alignment and the like on an acquired biological sample metabolic spectrogram; then, carrying out univariate or multivariate statistical analysis on the obtained data matrix, and screening differential metabolites; finally, pathway enrichment analysis is performed on the differential metabolites by combining with metabolic pathway databases (such as KEGG and HMDB) or literature data to obtain perturbed metabolic pathways (F.M. Al-Akwaa, B.Yunits, S.Huang, H.Alhajaji, L.X.Garmire, Lilikoi: an R package for personalized-based classification modeling algorithms data, Gigascience 7(12) (2018)). In recent years, many intelligent metabolic pathway analysis tools have emerged, such as Pathview (Luo W, Brouwer C. Pathview: an R/bioconductor package for pathway-based data integration and visualization. Bioinformatics,2013,29(14): 1830-.

The traditional univariate or multivariate analysis method can only screen out the metabolite sets with the difference among groups, however, the metabolites are not independent from each other, and the metabolites are mutually related to form a biological metabolism network. In recent years, network analysis methods have been used to study the analysis of metabolomics data. A metabolite association network which takes metabolites as nodes and the associations between the metabolites as edges is constructed by using a network analysis method, so that not only can the visualization of metabonomic data structures and relationships be realized, but also the potential action mechanism in the network structure can be further mined by analyzing the network structure through a graph theory correlation method (Toubiana D, Fernie AR, Nikoloski Z, et al. network analysis: tagklind complex data to study platform biology. trends Biotechnology.2013 January 1; 31(1): 29-36; Belleggia R, Omranian N, Holtz Y, et al. composite analysis based on numerous and metabolic data related variables and yield results in relationship 2020.02.03.931717. 858533/857. A. the network analysis method is applied to the network structure of the related metabolites and metabolic data related to the metabolites of the related genes. It is generally more interesting whether and how the network structure changes between disease states than a particular network structure. The differential metabolite association network analysis method can be used for analyzing differential metabolic association under different physiological and pathological states, and currently, no differential metabolite association network and pathway database is integrated for analyzing examples of disturbed metabolic pathways.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provide a disturbed pathway analysis method based on a metabolite association network, which utilizes the correlation theory of a difference network to analyze the difference between different physiological states of a biological organism and screens out disturbed metabolic pathways related to an experimental group more reliably.

The invention comprises the following steps:

1) analyzing metabolome in a biological sample, and preprocessing data to obtain a metabonomic data matrix;

2) constructing a bipartite metabolite-metabolic pathway map;

3) carrying out metabolite association network modeling on the metabonomics data matrix to obtain a difference network between an experimental group and a control group;

4) calculating the enrichment degree of the difference network on the bipartite graph;

5) identifying potential disturbance pathways;

6) and (4) carrying out significance test on the potential perturbation pathway to obtain a significantly perturbed metabolic pathway.

In step 1), the pairAnalyzing a metabolite group in a biological sample by adopting analytical instruments such as GC-MS, LC-MS, NMR and the like to detect metabolites in the biological sample to obtain metabolic spectrum data; after preprocessing operations such as denoising, baseline correction, spectrum peak alignment and identification are carried out on the obtained metabolic spectrum, metabonomics data matrixes of an experimental group and a control group which can be used for subsequent statistical analysis are obtained and respectively recorded as

And

wherein N is₁、N₂Respectively represent the sample amounts of the experimental group and the control group, and M is the amount of the metabolite contained in the sample.

In step 2), constructing a metabolite-metabolic pathway bipartite graph according to membership of metabolites and metabolic pathways in a KEGG pathway database; the specific steps can be as follows: mapping the detected M metabolites to a KEGG (http:// www.kegg.jp) metabolic pathway database to obtain the membership of the M metabolites and P metabolic pathways, and constructing a metabolite-metabolic pathway bipartite graph

Wherein the content of the first and second substances,

is a set of nodes, including a set of metabolites

And channel set

Is a collection of membership of metabolites to pathways.

In step 3), the difference network between the experimental group and the control group is represented as a weighted undirected graph

Edge set

The matrix theta corresponding to the weight is the difference network

Is obtained by solving the following cost function:

wherein, sigma_xSum-sigma_YCovariance matrices of X and Y, respectively, tr () denotes the trace of the matrix, (.)^TRepresenting the transpose of the matrix, | | - | luminance₁L representing a matrix₁Norm, λ > 0, is a sparse constraint parameter.

In step 4), the calculating the enrichment degree of the difference network on the bipartite graph is to calculate the shortest distance between the metabolite pairs corresponding to the connecting edges of the difference network in the bipartite graph of the metabolite-metabolic pathway, and define an enrichment function of the difference network on the bipartite graph of the metabolite-metabolic pathway, and the specific method may be:

two parts picture

Two upper nodes

The shortest distance therebetween

Computing difference network

The shortest distance of the node pair corresponding to the edge of (1)

And defining a difference network

In the bipartite drawing

Degree of enrichment of

Wherein gamma is more than 0 and less than 1.

In the step 5), the potential disturbance pathway is identified by successively deleting one pathway node in the metabolite-metabolic pathway bipartite graph on the basis of the maximum enrichment function until the last pathway node is left, recording the deletion sequence of each pathway node, wherein the later deleted pathway is more important, and the later deleted pathway is obtained;

the specific method can be as follows: will delete bipartite graph

A path node of

The new bipartite graph obtained is marked as

Computing a difference network using the following equation

To the bipartite drawing

K, the minimum influence path₁：

Get the 1 st minimum influenceWay k₁And corresponding bipartite graph

Calculating difference network by the same method

To the bipartite drawing

To obtain the 2 nd least affected path k₂And corresponding bipartite graph

Iterating the steps to obtain a path k with minimum influence₃,k₄… until only one path node remains on the bipartite graph, the last path node is marked as k_P(ii) a The more important the path deletion sequence is, the more important the corresponding path is, the most important R potential disturbed paths are

In step 6), the significance test of the potential disturbance path can adopt a replacement test method, and a path is set

The ranking value is r for potential disturbed channels; the experimental group data X is divided into two parts according to the column (metabolite)

Wherein the content of the first and second substances,

represents a path k_iThe sub-matrix of the corresponding sub-matrix,

denotes in addition to the path k_iA sub-matrix outside; control data is also shown in two parts

Given a significance level of α and a number of permutations of N, the following procedure is used for path k_iThe significance test was performed:

(1) random permutation

And

the new data matrix obtained is X 'and Y';

(2) calculating Path k on X' and Y_iIs ranked by importance, and is noted

(3) Repeating the replacement experiments of the steps (1) and (2) for N times to obtain a path k_iN order of rank value of

(4) Will be provided with

As a path k_iZero distribution of importance, calculating the quantile p of the real ranking value r on the zero distribution_r：

If p is_rAlpha is less than or equal to alpha, then the path k is called_iIs a significantly disturbed pathway; according to the above steps, checking one by one

The significance of the mesometabolic pathway and deletion of the non-significant pathways resulted in the final set of significantly perturbed pathways.

Compared with the prior art, the invention has the following outstanding advantages:

most of the traditional path enrichment methods adopt a univariate or multivariate analysis method to screen differential metabolites, and based on the screened differential metabolites, a corresponding path analysis platform is used for path enrichment analysis, so as to screen disturbance metabolic paths. But no studies have been made to screen perturbed metabolic pathways based on differential metabolite association networks. The invention sorts the paths according to importance based on the differential metabolite network between two groups of samples and by combining the information of the path database. Compared with the traditional method, the method provided by the invention utilizes the related theory of the difference network to analyze the difference between different physiological states of the organism, and the screened disturbed metabolic pathway related to the experimental group is more reliable, so that a new research thought can be provided for understanding the occurrence mechanism behind the organism.

Drawings

FIG. 1 is a system block diagram of an embodiment of the invention.

Fig. 2 shows the identification result of the significantly disturbed passage. Wherein represents p_r<0.05 denotes p_r<＝0.2；PBAB：Primary bile acid biosynthesis；APM：Arginine and proline metabolism；GDM：Glyoxylate and dicarboxylate metabolism；PYM：Pyrimidine metabolism；PUM：Purine metabolism。

Detailed Description

The following examples will further illustrate the present invention with reference to the accompanying drawings.

Referring to fig. 1, the perturbed pathway analysis method based on the metabolite association network according to the present invention includes the following steps: analyzing a metabonomic group in a biological sample by using analytical instruments such as GC-MS, LC-MS, NMR and the like, and preprocessing data to obtain metabonomic data matrixes of an experimental group and a control group; constructing a metabolite-metabolic pathway bipartite graph according to the membership relation of metabolites and metabolic pathways in a KEGG pathway database; carrying out metabolite association network modeling on the metabonomics data matrix to obtain a difference network between an experimental group and a control group; calculating the shortest distance of the metabolite pairs in the metabolite-metabolic pathway bipartite graph according to the metabolite pairs corresponding to the connecting edges of the differential network, defining an enrichment function of the differential network on the metabolite-metabolic pathway bipartite graph, deleting one pathway node in the metabolite-metabolic pathway bipartite graph successively on the basis of the maximized enrichment function until the last pathway node is left, recording the deletion sequence of each pathway node, wherein the more important the pathway deleted later is, and obtaining a potential disturbance pathway; and (4) carrying out significance test on the potential perturbation pathway by adopting a displacement test method to obtain a significantly perturbed metabolic pathway.

Specific examples are given below.

The embodiment of the invention comprises the following steps:

1. obtaining metabolomics data matrices

The present examples use published colorectal cancer (CRC) data sets for analysis. The data set contained 158 human serum samples, including 66 colorectal cancer samples and 92 healthy control group samples. All patients with CRC in the experiment were first-diagnosed and blood samples were collected before surgery, chemotherapy or radiotherapy. Detecting metabolites in the biological sample by an LC-MS analyzer to obtain metabolic spectrogram data containing 113 metabolites. The obtained metabolic profiles were subjected to data preprocessing and quantitative analysis using MultiQuant 2.1 software (AB Sciex, Toronto, ON, Canada). To correct for minor instrument drift, the metabolite concentrations were corrected using QC samples. Finally, the metabonomic data matrix of the colorectal cancer group and the healthy control group can be respectively expressed as X_66*113And Y_92*113。

2. Construction of a bipartite metabolite-metabolic pathway

The 113 metabolites detected were mapped into the KEGG (http:// www.kegg.jp) metabolic pathway database, matching a total of 85 metabolites. The KEGG metabolic pathway database employed contained 1549 endogenous metabolites (excluding most lipids) and 82 metabolic pathways identified. In order to ensure the reliability of the results, this example only retained metabolic pathways containing 3 and more metabolites. Finally, a bipartite graph containing 76 metabolites and 30 metabolic pathways was obtained, which is labeled

Wherein the content of the first and second substances,

is a set of nodes, including a set of metabolites

And channel set

Is a collection of membership of metabolites to pathways.

3. Metabolite association network modeling

Solving the formula (1) by adopting an Alternating Direction Method of Multiplier (ADMM) to obtain a difference network of the colorectal cancer group and the healthy control group

In this embodiment, λ is 0.5, and the network includes 25 connecting edges.

4. Calculating the enrichment degree of the difference network on the bipartite graph

Computing a difference network using a shortest path algorithm

The shortest distance of the node pair corresponding to the edge of (1)

Calculating a difference network according to equation (2)

In the bipartite drawing

Degree of enrichment of

In this example, γ is 0.5.

5. Identifying potentially perturbed pathways

Calculating a first minimum influence path k according to equation (3)₁To obtain k₁And corresponding bipartite graph

Calculating difference network by the same method

To the bipartite drawing

To obtain the 2 nd least affected path k₂And corresponding bipartite graph

Repeating the above steps to obtain the path k with the minimum influence₃,k₄… until only one path node remains on the bipartite graph, the last path node is marked as k₃₀. The more important the corresponding path is the later the path deletion order is, the more important the ranking of all paths can be obtained, and the results are shown in table 1.

TABLE 1 Path importance ranking

Smaller r the more important the path

In this embodiment, the number R of potential disturbance paths is 5, which is the heaviestThe desired 5 potentially disturbed pathways are

6. Significance testing of perturbation pathways

Providing a passageway

The rank value is r for the potential disturbed path. Group data X of colorectal cancer_66*113Is divided into two parts according to a column (metabolite)

Wherein the content of the first and second substances,

represents a path k_iThe sub-matrix of the corresponding sub-matrix,

denotes in addition to the path k_iAn outer sub-matrix. The healthy control group data are also shown in two parts

In this example, the significance level α is given as 0.2 and the number of permutations N is given as 10000, and the path k is subjected to the following procedure_iAnd (3) carrying out significance test:

(1) random permutation

And

sample (row) of (a), a new data matrix X is obtained_66*113' and Y_92*113′；

(2) At X_66*113' and Y_92*113' Up calculationVia k_iIs ranked by importance, and is noted

(3) Repeating the steps (1) and (2) for 10000 times to obtain a path k_i10000 ranks value of

(4) Will be provided with

As a path k_iZero distribution of importance, calculating the quantile p of the real ranking value r on the zero distribution according to the formula (4)_rIf p is_rWhen the number of the path k is less than or equal to 0.2, the path k is called_iIs a significantly disturbed pathway.

According to the above steps, checking one by one

Significance of the mesometabolic pathway and deletion of the non-significant pathways, resulting in the final significantly perturbed set of pathways

The corresponding paths are Primary double acid biosyntheses (k)₃₀)、Arginine and proline metabolism(k₂₉) And Purine metabolism (k)₂₆) The results are shown in FIG. 2. The 3 screened remarkable disturbance pathways are proved to be closely related to colorectal cancer in the literature, which shows that the method provided by the embodiment of the invention has higher effectiveness in analyzing disturbance pathways related to metabolism such as rectal cancer and the like, and can provide important clues and bases for exploring drug action targets.

Colorectal cancer data research is only a preferred example selected by the invention, and the invention can be applied to the path analysis of other disease data. The above description is only a preferred embodiment of the present invention, and therefore should not be taken as limiting the scope of the invention, which is defined by the appended claims and their equivalents.

Claims

1. A disturbed pathway analysis method based on a metabolite association network is characterized by comprising the following steps:

2) constructing a bipartite metabolite-metabolic pathway map;

5) identifying potential disturbance pathways;

2. The method for analyzing disturbed pathway based on metabolite association network as claimed in claim 1, wherein in step 1), the analysis of the metabolome in the biological sample is performed by detecting metabolites in the biological sample by using an analysis instrument such as GC-MS, LC-MS, NMR, etc. to obtain metabolic spectrum data.

3. The method for analyzing disturbed pathway based on metabolite association network as claimed in claim 1, wherein in step 1), the data preprocessing includes but is not limited to denoising, baseline correction, spectral peak alignment and identification; after the obtained metabolic spectrum is pretreated, metabonomics data matrixes of an experimental group and a control group which can be used for subsequent statistical analysis are obtained and are respectively recorded as

And

4. The method according to claim 1, wherein in the step 2), the constructing of the metabolite-metabolic pathway bipartite graph is to construct the metabolite-metabolic pathway bipartite graph according to membership of metabolites and metabolic pathways in the KEGG pathway database, and the specific steps are as follows: mapping the detected M metabolites to a KEGG (http:// www.keggjp) metabolic pathway database to obtain the membership of the M metabolites and P metabolic pathways, and constructing a metabolite-metabolic pathway bipartite graph

Wherein the content of the first and second substances,

is a set of nodes, including a set of metabolites

And channel set

Is a collection of membership of metabolites to pathways.

5. The method according to claim 1, wherein in step 3), the difference network between the experimental group and the control group is represented as a weighted undirected graph

Edge set

The matrix theta corresponding to the weight is the difference network

Is obtained by solving the following cost function:

therein, sigma_XSum Σ_YCovariance matrices of X and Y, respectively, tr () denotes the trace of the matrix, (.)^TRepresenting the transpose of the matrix, | · | > non-conducting phosphor₁Represents the ^ ing of the matrix₁Norm, λ > 0, is a sparse constraint parameter.

6. The method according to claim 1, wherein in step 4), the step of calculating the enrichment degree of the difference network on the bipartite graph is to calculate the shortest distance between the pairs of metabolites on the bipartite graph of the metabolite-metabolic pathway according to the metabolite pairs corresponding to the connecting edges of the difference network, and define an enrichment function of the difference network on the bipartite graph of the metabolite-metabolic pathway.

7. The method for analyzing disturbed pathway based on the metabolite association network according to claim 1, wherein in step 4), the specific method for calculating the enrichment degree of the difference network on the bipartite graph is as follows:

two parts picture

The last two nodes i are connected to each other,

the shortest distance therebetween

Computing difference network

The shortest distance of the node pair corresponding to the edge of (1)

And defining a difference network

In the bipartite drawing

Degree of enrichment of

Wherein gamma is more than 0 and less than 1.

8. The method according to claim 1, wherein in step 5), the identification of the potential disturbance pathway is performed by successively deleting a pathway node in the metabolite-metabolic pathway bipartite graph on the basis of the maximum enrichment function until the last pathway node is left, recording the deletion sequence of each pathway node, and obtaining the potential disturbance pathway as the pathway deleted later is more important.

9. The method for analyzing disturbed pathway based on the metabolite association network as claimed in claim 1, wherein in step 5), the specific method for identifying the potential disturbed pathway is: will delete bipartite graph

One access segment on

The new bipartite graph obtained is marked as

Computing a difference network using the following equation

To the bipartite drawing

K, the minimum influence path₁：

Get the 1 st minimum influence path k₁And corresponding bipartite graph

Calculating difference network by the same method

To the bipartite drawing

To obtain the 2 nd least affected path k₂And corresponding bipartite graph

Iterating the steps to obtain a path k with minimum influence₃，k₄… until only one path node remains on the bipartite graph, the last path node is marked as k_P(ii) a The more important the path deletion sequence is, the more important the corresponding path is, the most important R potential disturbed paths are

10. The method for analyzing disturbed pathway based on metabolite association network as claimed in claim 1, wherein in step 6), the significance test of the potential disturbed pathway is performed by using a displacement test method, and the pathway is set

The ranking value is r for potential disturbed channels; the experimental group data X is divided into two parts according to the metabolites, namely the columns of the data matrix X