CN114038509A - Disturbed pathway analysis method based on metabolite association network - Google Patents

Disturbed pathway analysis method based on metabolite association network Download PDF

Info

Publication number
CN114038509A
CN114038509A CN202111301560.7A CN202111301560A CN114038509A CN 114038509 A CN114038509 A CN 114038509A CN 202111301560 A CN202111301560 A CN 202111301560A CN 114038509 A CN114038509 A CN 114038509A
Authority
CN
China
Prior art keywords
pathway
metabolite
path
network
disturbed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111301560.7A
Other languages
Chinese (zh)
Inventor
董继扬
吴昕玥
邓伶莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202111301560.7A priority Critical patent/CN114038509A/en
Publication of CN114038509A publication Critical patent/CN114038509A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

A disturbed pathway analysis method based on a metabolite association network belongs to the technical field of analysis. The method comprises the following steps: 1) analyzing metabolome in a biological sample, and preprocessing data to obtain a metabonomic data matrix; 2) constructing a bipartite metabolite-metabolic pathway map; 3) carrying out metabolite association network modeling on the metabonomics data matrix to obtain a difference network between an experimental group and a control group; 4) calculating the enrichment degree of the difference network on the bipartite graph; 5) sorting the paths according to importance based on a differential metabolite network between two groups of samples and combining information of a path database, and identifying potential disturbance paths; 6) and (4) carrying out significance test on the potential perturbation pathway to obtain a significantly perturbed metabolic pathway. The difference between different physiological states of the organism is analyzed by utilizing the related theory of the difference network, the screened disturbed metabolic pathway related to the experimental group is more reliable, and a new research idea is provided for understanding the generation mechanism behind the organism.

Description

Disturbed pathway analysis method based on metabolite association network
Technical Field
The invention belongs to the technical field of analysis, and particularly relates to a disturbed pathway analysis method based on a metabolite association network.
Background
During the development of a particular disease, many important metabolic pathways are perturbed, and identification of perturbed metabolic pathways associated with a particular disease is of great significance in studying the development of cancer, as well as providing important clues for exploring the pathogenesis and drug action targets of a disease (Y. Drier, M. Sheffer, E. Domany, Pathway-based personalised analysis of cancer, Proceedings of the National Academy of Sciences 110(16 (2012013)) 6393).
The traditional path analysis method mainly involves the following steps: firstly, carrying out data preprocessing operations such as denoising, baseline correction, spectral peak alignment and the like on an acquired biological sample metabolic spectrogram; then, carrying out univariate or multivariate statistical analysis on the obtained data matrix, and screening differential metabolites; finally, pathway enrichment analysis is performed on the differential metabolites by combining with metabolic pathway databases (such as KEGG and HMDB) or literature data to obtain perturbed metabolic pathways (F.M. Al-Akwaa, B.Yunits, S.Huang, H.Alhajaji, L.X.Garmire, Lilikoi: an R package for personalized-based classification modeling algorithms data, Gigascience 7(12) (2018)). In recent years, many intelligent metabolic pathway analysis tools have emerged, such as Pathview (Luo W, Brouwer C. Pathview: an R/bioconductor package for pathway-based data integration and visualization. Bioinformatics,2013,29(14): 1830-.
The traditional univariate or multivariate analysis method can only screen out the metabolite sets with the difference among groups, however, the metabolites are not independent from each other, and the metabolites are mutually related to form a biological metabolism network. In recent years, network analysis methods have been used to study the analysis of metabolomics data. A metabolite association network which takes metabolites as nodes and the associations between the metabolites as edges is constructed by using a network analysis method, so that not only can the visualization of metabonomic data structures and relationships be realized, but also the potential action mechanism in the network structure can be further mined by analyzing the network structure through a graph theory correlation method (Toubiana D, Fernie AR, Nikoloski Z, et al. network analysis: tagklind complex data to study platform biology. trends Biotechnology.2013 January 1; 31(1): 29-36; Belleggia R, Omranian N, Holtz Y, et al. composite analysis based on numerous and metabolic data related variables and yield results in relationship 2020.02.03.931717. 858533/857. A. the network analysis method is applied to the network structure of the related metabolites and metabolic data related to the metabolites of the related genes. It is generally more interesting whether and how the network structure changes between disease states than a particular network structure. The differential metabolite association network analysis method can be used for analyzing differential metabolic association under different physiological and pathological states, and currently, no differential metabolite association network and pathway database is integrated for analyzing examples of disturbed metabolic pathways.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide a disturbed pathway analysis method based on a metabolite association network, which utilizes the correlation theory of a difference network to analyze the difference between different physiological states of a biological organism and screens out disturbed metabolic pathways related to an experimental group more reliably.
The invention comprises the following steps:
1) analyzing metabolome in a biological sample, and preprocessing data to obtain a metabonomic data matrix;
2) constructing a bipartite metabolite-metabolic pathway map;
3) carrying out metabolite association network modeling on the metabonomics data matrix to obtain a difference network between an experimental group and a control group;
4) calculating the enrichment degree of the difference network on the bipartite graph;
5) identifying potential disturbance pathways;
6) and (4) carrying out significance test on the potential perturbation pathway to obtain a significantly perturbed metabolic pathway.
In step 1), the pairAnalyzing a metabolite group in a biological sample by adopting analytical instruments such as GC-MS, LC-MS, NMR and the like to detect metabolites in the biological sample to obtain metabolic spectrum data; after preprocessing operations such as denoising, baseline correction, spectrum peak alignment and identification are carried out on the obtained metabolic spectrum, metabonomics data matrixes of an experimental group and a control group which can be used for subsequent statistical analysis are obtained and respectively recorded as
Figure BDA0003338600270000021
And
Figure BDA0003338600270000022
wherein N is1、N2Respectively represent the sample amounts of the experimental group and the control group, and M is the amount of the metabolite contained in the sample.
In step 2), constructing a metabolite-metabolic pathway bipartite graph according to membership of metabolites and metabolic pathways in a KEGG pathway database; the specific steps can be as follows: mapping the detected M metabolites to a KEGG (http:// www.kegg.jp) metabolic pathway database to obtain the membership of the M metabolites and P metabolic pathways, and constructing a metabolite-metabolic pathway bipartite graph
Figure BDA0003338600270000023
Wherein the content of the first and second substances,
Figure BDA0003338600270000024
is a set of nodes, including a set of metabolites
Figure BDA0003338600270000025
And channel set
Figure BDA0003338600270000026
Is a collection of membership of metabolites to pathways.
In step 3), the difference network between the experimental group and the control group is represented as a weighted undirected graph
Figure BDA0003338600270000027
Edge set
Figure BDA0003338600270000028
The matrix theta corresponding to the weight is the difference network
Figure BDA0003338600270000029
Is obtained by solving the following cost function:
Figure BDA0003338600270000031
wherein, sigmaxSum-sigmaYCovariance matrices of X and Y, respectively, tr () denotes the trace of the matrix, (.)TRepresenting the transpose of the matrix, | | - | luminance1L representing a matrix1Norm, λ > 0, is a sparse constraint parameter.
In step 4), the calculating the enrichment degree of the difference network on the bipartite graph is to calculate the shortest distance between the metabolite pairs corresponding to the connecting edges of the difference network in the bipartite graph of the metabolite-metabolic pathway, and define an enrichment function of the difference network on the bipartite graph of the metabolite-metabolic pathway, and the specific method may be:
two parts picture
Figure BDA0003338600270000032
Two upper nodes
Figure BDA0003338600270000033
The shortest distance therebetween
Figure BDA0003338600270000034
Computing difference network
Figure BDA0003338600270000035
The shortest distance of the node pair corresponding to the edge of (1)
Figure BDA0003338600270000036
And defining a difference network
Figure BDA0003338600270000037
In the bipartite drawing
Figure BDA0003338600270000038
Degree of enrichment of
Figure BDA0003338600270000039
Figure BDA00033386002700000310
Wherein gamma is more than 0 and less than 1.
In the step 5), the potential disturbance pathway is identified by successively deleting one pathway node in the metabolite-metabolic pathway bipartite graph on the basis of the maximum enrichment function until the last pathway node is left, recording the deletion sequence of each pathway node, wherein the later deleted pathway is more important, and the later deleted pathway is obtained;
the specific method can be as follows: will delete bipartite graph
Figure BDA00033386002700000311
A path node of
Figure BDA00033386002700000312
The new bipartite graph obtained is marked as
Figure BDA00033386002700000313
Computing a difference network using the following equation
Figure BDA00033386002700000314
To the bipartite drawing
Figure BDA00033386002700000315
K, the minimum influence path1
Figure BDA00033386002700000316
Get the 1 st minimum influenceWay k1And corresponding bipartite graph
Figure BDA00033386002700000317
Calculating difference network by the same method
Figure BDA00033386002700000318
To the bipartite drawing
Figure BDA00033386002700000319
To obtain the 2 nd least affected path k2And corresponding bipartite graph
Figure BDA00033386002700000320
Iterating the steps to obtain a path k with minimum influence3,k4… until only one path node remains on the bipartite graph, the last path node is marked as kP(ii) a The more important the path deletion sequence is, the more important the corresponding path is, the most important R potential disturbed paths are
Figure BDA00033386002700000321
In step 6), the significance test of the potential disturbance path can adopt a replacement test method, and a path is set
Figure BDA00033386002700000322
The ranking value is r for potential disturbed channels; the experimental group data X is divided into two parts according to the column (metabolite)
Figure BDA0003338600270000041
Wherein the content of the first and second substances,
Figure BDA0003338600270000042
represents a path kiThe sub-matrix of the corresponding sub-matrix,
Figure BDA0003338600270000043
denotes in addition to the path kiA sub-matrix outside; control data is also shown in two parts
Figure BDA0003338600270000044
Given a significance level of α and a number of permutations of N, the following procedure is used for path kiThe significance test was performed:
(1) random permutation
Figure BDA0003338600270000045
And
Figure BDA0003338600270000046
the new data matrix obtained is X 'and Y';
(2) calculating Path k on X' and YiIs ranked by importance, and is noted
Figure BDA0003338600270000047
(3) Repeating the replacement experiments of the steps (1) and (2) for N times to obtain a path kiN order of rank value of
Figure BDA0003338600270000048
(4) Will be provided with
Figure BDA0003338600270000049
As a path kiZero distribution of importance, calculating the quantile p of the real ranking value r on the zero distributionr
Figure BDA00033386002700000410
If p isrAlpha is less than or equal to alpha, then the path k is callediIs a significantly disturbed pathway; according to the above steps, checking one by one
Figure BDA00033386002700000411
The significance of the mesometabolic pathway and deletion of the non-significant pathways resulted in the final set of significantly perturbed pathways.
Compared with the prior art, the invention has the following outstanding advantages:
most of the traditional path enrichment methods adopt a univariate or multivariate analysis method to screen differential metabolites, and based on the screened differential metabolites, a corresponding path analysis platform is used for path enrichment analysis, so as to screen disturbance metabolic paths. But no studies have been made to screen perturbed metabolic pathways based on differential metabolite association networks. The invention sorts the paths according to importance based on the differential metabolite network between two groups of samples and by combining the information of the path database. Compared with the traditional method, the method provided by the invention utilizes the related theory of the difference network to analyze the difference between different physiological states of the organism, and the screened disturbed metabolic pathway related to the experimental group is more reliable, so that a new research thought can be provided for understanding the occurrence mechanism behind the organism.
Drawings
FIG. 1 is a system block diagram of an embodiment of the invention.
Fig. 2 shows the identification result of the significantly disturbed passage. Wherein represents pr<0.05 denotes pr<=0.2;PBAB:Primary bile acid biosynthesis;APM:Arginine and proline metabolism;GDM:Glyoxylate and dicarboxylate metabolism;PYM:Pyrimidine metabolism;PUM:Purine metabolism。
Detailed Description
The following examples will further illustrate the present invention with reference to the accompanying drawings.
Referring to fig. 1, the perturbed pathway analysis method based on the metabolite association network according to the present invention includes the following steps: analyzing a metabonomic group in a biological sample by using analytical instruments such as GC-MS, LC-MS, NMR and the like, and preprocessing data to obtain metabonomic data matrixes of an experimental group and a control group; constructing a metabolite-metabolic pathway bipartite graph according to the membership relation of metabolites and metabolic pathways in a KEGG pathway database; carrying out metabolite association network modeling on the metabonomics data matrix to obtain a difference network between an experimental group and a control group; calculating the shortest distance of the metabolite pairs in the metabolite-metabolic pathway bipartite graph according to the metabolite pairs corresponding to the connecting edges of the differential network, defining an enrichment function of the differential network on the metabolite-metabolic pathway bipartite graph, deleting one pathway node in the metabolite-metabolic pathway bipartite graph successively on the basis of the maximized enrichment function until the last pathway node is left, recording the deletion sequence of each pathway node, wherein the more important the pathway deleted later is, and obtaining a potential disturbance pathway; and (4) carrying out significance test on the potential perturbation pathway by adopting a displacement test method to obtain a significantly perturbed metabolic pathway.
Specific examples are given below.
The embodiment of the invention comprises the following steps:
1. obtaining metabolomics data matrices
The present examples use published colorectal cancer (CRC) data sets for analysis. The data set contained 158 human serum samples, including 66 colorectal cancer samples and 92 healthy control group samples. All patients with CRC in the experiment were first-diagnosed and blood samples were collected before surgery, chemotherapy or radiotherapy. Detecting metabolites in the biological sample by an LC-MS analyzer to obtain metabolic spectrogram data containing 113 metabolites. The obtained metabolic profiles were subjected to data preprocessing and quantitative analysis using MultiQuant 2.1 software (AB Sciex, Toronto, ON, Canada). To correct for minor instrument drift, the metabolite concentrations were corrected using QC samples. Finally, the metabonomic data matrix of the colorectal cancer group and the healthy control group can be respectively expressed as X66*113And Y92*113
2. Construction of a bipartite metabolite-metabolic pathway
The 113 metabolites detected were mapped into the KEGG (http:// www.kegg.jp) metabolic pathway database, matching a total of 85 metabolites. The KEGG metabolic pathway database employed contained 1549 endogenous metabolites (excluding most lipids) and 82 metabolic pathways identified. In order to ensure the reliability of the results, this example only retained metabolic pathways containing 3 and more metabolites. Finally, a bipartite graph containing 76 metabolites and 30 metabolic pathways was obtained, which is labeled
Figure BDA0003338600270000051
Wherein the content of the first and second substances,
Figure BDA0003338600270000052
is a set of nodes, including a set of metabolites
Figure BDA0003338600270000053
And channel set
Figure BDA0003338600270000054
Is a collection of membership of metabolites to pathways.
3. Metabolite association network modeling
Solving the formula (1) by adopting an Alternating Direction Method of Multiplier (ADMM) to obtain a difference network of the colorectal cancer group and the healthy control group
Figure BDA0003338600270000061
In this embodiment, λ is 0.5, and the network includes 25 connecting edges.
Figure BDA0003338600270000062
4. Calculating the enrichment degree of the difference network on the bipartite graph
Computing a difference network using a shortest path algorithm
Figure BDA0003338600270000063
The shortest distance of the node pair corresponding to the edge of (1)
Figure BDA0003338600270000064
Calculating a difference network according to equation (2)
Figure BDA0003338600270000065
In the bipartite drawing
Figure BDA0003338600270000066
Degree of enrichment of
Figure BDA0003338600270000067
In this example, γ is 0.5.
Figure BDA0003338600270000068
5. Identifying potentially perturbed pathways
Calculating a first minimum influence path k according to equation (3)1To obtain k1And corresponding bipartite graph
Figure BDA0003338600270000069
Calculating difference network by the same method
Figure BDA00033386002700000610
To the bipartite drawing
Figure BDA00033386002700000611
To obtain the 2 nd least affected path k2And corresponding bipartite graph
Figure BDA00033386002700000612
Repeating the above steps to obtain the path k with the minimum influence3,k4… until only one path node remains on the bipartite graph, the last path node is marked as k30. The more important the corresponding path is the later the path deletion order is, the more important the ranking of all paths can be obtained, and the results are shown in table 1.
TABLE 1 Path importance ranking
Figure BDA00033386002700000613
Figure BDA0003338600270000071
Smaller r the more important the path
In this embodiment, the number R of potential disturbance paths is 5, which is the heaviestThe desired 5 potentially disturbed pathways are
Figure BDA0003338600270000072
Figure BDA0003338600270000073
6. Significance testing of perturbation pathways
Providing a passageway
Figure BDA0003338600270000074
The rank value is r for the potential disturbed path. Group data X of colorectal cancer66*113Is divided into two parts according to a column (metabolite)
Figure BDA0003338600270000075
Wherein the content of the first and second substances,
Figure BDA0003338600270000076
represents a path kiThe sub-matrix of the corresponding sub-matrix,
Figure BDA0003338600270000077
denotes in addition to the path kiAn outer sub-matrix. The healthy control group data are also shown in two parts
Figure BDA0003338600270000078
In this example, the significance level α is given as 0.2 and the number of permutations N is given as 10000, and the path k is subjected to the following procedureiAnd (3) carrying out significance test:
(1) random permutation
Figure BDA0003338600270000079
And
Figure BDA00033386002700000710
sample (row) of (a), a new data matrix X is obtained66*113' and Y92*113′;
(2) At X66*113' and Y92*113' Up calculationVia kiIs ranked by importance, and is noted
Figure BDA00033386002700000711
(3) Repeating the steps (1) and (2) for 10000 times to obtain a path ki10000 ranks value of
Figure BDA00033386002700000712
(4) Will be provided with
Figure BDA00033386002700000713
As a path kiZero distribution of importance, calculating the quantile p of the real ranking value r on the zero distribution according to the formula (4)rIf p isrWhen the number of the path k is less than or equal to 0.2, the path k is callediIs a significantly disturbed pathway.
Figure BDA0003338600270000081
According to the above steps, checking one by one
Figure BDA0003338600270000082
Significance of the mesometabolic pathway and deletion of the non-significant pathways, resulting in the final significantly perturbed set of pathways
Figure BDA0003338600270000083
The corresponding paths are Primary double acid biosyntheses (k)30)、Arginine and proline metabolism(k29) And Purine metabolism (k)26) The results are shown in FIG. 2. The 3 screened remarkable disturbance pathways are proved to be closely related to colorectal cancer in the literature, which shows that the method provided by the embodiment of the invention has higher effectiveness in analyzing disturbance pathways related to metabolism such as rectal cancer and the like, and can provide important clues and bases for exploring drug action targets.
Colorectal cancer data research is only a preferred example selected by the invention, and the invention can be applied to the path analysis of other disease data. The above description is only a preferred embodiment of the present invention, and therefore should not be taken as limiting the scope of the invention, which is defined by the appended claims and their equivalents.

Claims (10)

1. A disturbed pathway analysis method based on a metabolite association network is characterized by comprising the following steps:
1) analyzing metabolome in a biological sample, and preprocessing data to obtain a metabonomic data matrix;
2) constructing a bipartite metabolite-metabolic pathway map;
3) carrying out metabolite association network modeling on the metabonomics data matrix to obtain a difference network between an experimental group and a control group;
4) calculating the enrichment degree of the difference network on the bipartite graph;
5) identifying potential disturbance pathways;
6) and (4) carrying out significance test on the potential perturbation pathway to obtain a significantly perturbed metabolic pathway.
2. The method for analyzing disturbed pathway based on metabolite association network as claimed in claim 1, wherein in step 1), the analysis of the metabolome in the biological sample is performed by detecting metabolites in the biological sample by using an analysis instrument such as GC-MS, LC-MS, NMR, etc. to obtain metabolic spectrum data.
3. The method for analyzing disturbed pathway based on metabolite association network as claimed in claim 1, wherein in step 1), the data preprocessing includes but is not limited to denoising, baseline correction, spectral peak alignment and identification; after the obtained metabolic spectrum is pretreated, metabonomics data matrixes of an experimental group and a control group which can be used for subsequent statistical analysis are obtained and are respectively recorded as
Figure FDA0003338600260000011
And
Figure FDA0003338600260000012
wherein N is1、N2Respectively represent the sample amounts of the experimental group and the control group, and M is the amount of the metabolite contained in the sample.
4. The method according to claim 1, wherein in the step 2), the constructing of the metabolite-metabolic pathway bipartite graph is to construct the metabolite-metabolic pathway bipartite graph according to membership of metabolites and metabolic pathways in the KEGG pathway database, and the specific steps are as follows: mapping the detected M metabolites to a KEGG (http:// www.keggjp) metabolic pathway database to obtain the membership of the M metabolites and P metabolic pathways, and constructing a metabolite-metabolic pathway bipartite graph
Figure FDA0003338600260000013
Wherein the content of the first and second substances,
Figure FDA0003338600260000014
is a set of nodes, including a set of metabolites
Figure FDA0003338600260000015
And channel set
Figure FDA0003338600260000016
Figure FDA0003338600260000017
Is a collection of membership of metabolites to pathways.
5. The method according to claim 1, wherein in step 3), the difference network between the experimental group and the control group is represented as a weighted undirected graph
Figure FDA0003338600260000018
Edge set
Figure FDA0003338600260000019
The matrix theta corresponding to the weight is the difference network
Figure FDA00033386002600000110
Is obtained by solving the following cost function:
Figure FDA00033386002600000111
Figure FDA0003338600260000021
therein, sigmaXSum ΣYCovariance matrices of X and Y, respectively, tr () denotes the trace of the matrix, (.)TRepresenting the transpose of the matrix, | · | > non-conducting phosphor1Represents the ^ ing of the matrix1Norm, λ > 0, is a sparse constraint parameter.
6. The method according to claim 1, wherein in step 4), the step of calculating the enrichment degree of the difference network on the bipartite graph is to calculate the shortest distance between the pairs of metabolites on the bipartite graph of the metabolite-metabolic pathway according to the metabolite pairs corresponding to the connecting edges of the difference network, and define an enrichment function of the difference network on the bipartite graph of the metabolite-metabolic pathway.
7. The method for analyzing disturbed pathway based on the metabolite association network according to claim 1, wherein in step 4), the specific method for calculating the enrichment degree of the difference network on the bipartite graph is as follows:
two parts picture
Figure FDA0003338600260000022
The last two nodes i are connected to each other,
Figure FDA0003338600260000023
the shortest distance therebetween
Figure FDA0003338600260000024
Computing difference network
Figure FDA0003338600260000025
The shortest distance of the node pair corresponding to the edge of (1)
Figure FDA0003338600260000026
Figure FDA0003338600260000027
And defining a difference network
Figure FDA0003338600260000028
In the bipartite drawing
Figure FDA0003338600260000029
Degree of enrichment of
Figure FDA00033386002600000210
Figure FDA00033386002600000211
Wherein gamma is more than 0 and less than 1.
8. The method according to claim 1, wherein in step 5), the identification of the potential disturbance pathway is performed by successively deleting a pathway node in the metabolite-metabolic pathway bipartite graph on the basis of the maximum enrichment function until the last pathway node is left, recording the deletion sequence of each pathway node, and obtaining the potential disturbance pathway as the pathway deleted later is more important.
9. The method for analyzing disturbed pathway based on the metabolite association network as claimed in claim 1, wherein in step 5), the specific method for identifying the potential disturbed pathway is: will delete bipartite graph
Figure FDA00033386002600000212
One access segment on
Figure FDA00033386002600000213
The new bipartite graph obtained is marked as
Figure FDA00033386002600000214
Computing a difference network using the following equation
Figure FDA00033386002600000215
To the bipartite drawing
Figure FDA00033386002600000216
K, the minimum influence path1
Figure FDA00033386002600000217
Get the 1 st minimum influence path k1And corresponding bipartite graph
Figure FDA00033386002600000218
Calculating difference network by the same method
Figure FDA00033386002600000219
To the bipartite drawing
Figure FDA00033386002600000220
To obtain the 2 nd least affected path k2And corresponding bipartite graph
Figure FDA00033386002600000221
Iterating the steps to obtain a path k with minimum influence3,k4… until only one path node remains on the bipartite graph, the last path node is marked as kP(ii) a The more important the path deletion sequence is, the more important the corresponding path is, the most important R potential disturbed paths are
Figure FDA0003338600260000031
10. The method for analyzing disturbed pathway based on metabolite association network as claimed in claim 1, wherein in step 6), the significance test of the potential disturbed pathway is performed by using a displacement test method, and the pathway is set
Figure FDA0003338600260000032
The ranking value is r for potential disturbed channels; the experimental group data X is divided into two parts according to the metabolites, namely the columns of the data matrix X
Figure FDA0003338600260000033
Wherein the content of the first and second substances,
Figure FDA0003338600260000034
represents a path kiThe sub-matrix of the corresponding sub-matrix,
Figure FDA00033386002600000316
denotes in addition to the path kiA sub-matrix outside; control data is also shown in two parts
Figure FDA0003338600260000036
Given a significance level of α and a number of permutations of N, the following procedure is used for path kiThe significance test was performed:
(1) random permutation
Figure FDA0003338600260000037
And
Figure FDA0003338600260000038
of samples, i.e. data matrices
Figure FDA0003338600260000039
And
Figure FDA00033386002600000310
in the middle row, new data matrixes X 'and Y' are obtained;
(2) calculating Path k on X' and YiIs ranked by importance, and is noted
Figure FDA00033386002600000311
(3) Repeating the replacement experiments of the steps (1) and (2) for N times to obtain a path kiN order of rank value of
Figure FDA00033386002600000312
(4) Will be provided with
Figure FDA00033386002600000313
As a path kiZero distribution of importance, calculating the quantile p of the real ranking value r on the zero distributionr
Figure FDA00033386002600000314
If p isrAlpha is less than or equal to alpha, then the path k is callediIs a significantly disturbed pathway; according to the above steps, checking one by one
Figure FDA00033386002600000315
The significance of the mesometabolic pathway and deletion of the non-significant pathways resulted in the final set of significantly perturbed pathways.
CN202111301560.7A 2021-11-04 2021-11-04 Disturbed pathway analysis method based on metabolite association network Pending CN114038509A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111301560.7A CN114038509A (en) 2021-11-04 2021-11-04 Disturbed pathway analysis method based on metabolite association network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111301560.7A CN114038509A (en) 2021-11-04 2021-11-04 Disturbed pathway analysis method based on metabolite association network

Publications (1)

Publication Number Publication Date
CN114038509A true CN114038509A (en) 2022-02-11

Family

ID=80136346

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111301560.7A Pending CN114038509A (en) 2021-11-04 2021-11-04 Disturbed pathway analysis method based on metabolite association network

Country Status (1)

Country Link
CN (1) CN114038509A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160213328A1 (en) * 2013-09-13 2016-07-28 Julia Morris HOENG Systems and methods for evaluating perturbation of xenobiotic metabolism
CN110322930A (en) * 2019-06-06 2019-10-11 大连理工大学 Metabolism group operator logo object recognition methods based on horizontal relationship
CN111210876A (en) * 2020-01-06 2020-05-29 厦门大学 Disturbed metabolic pathway determination method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160213328A1 (en) * 2013-09-13 2016-07-28 Julia Morris HOENG Systems and methods for evaluating perturbation of xenobiotic metabolism
CN110322930A (en) * 2019-06-06 2019-10-11 大连理工大学 Metabolism group operator logo object recognition methods based on horizontal relationship
CN111210876A (en) * 2020-01-06 2020-05-29 厦门大学 Disturbed metabolic pathway determination method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李光;李宜航;吕亚娜;李学兰;陈曦;张宁;: "基于代谢组学技术探讨特色傣药肾茶的"雅解"作用机制", 中国科学:生命科学, no. 04, 20 April 2018 (2018-04-20), pages 111 - 124 *

Similar Documents

Publication Publication Date Title
Diggins et al. Methods for discovery and characterization of cell subsets in high dimensional mass cytometry data
Lamichhane et al. An overview of metabolomics data analysis: current tools and future perspectives
Fonville et al. The evolution of partial least squares models and related chemometric approaches in metabonomics and metabolic phenotyping
JP7057913B2 (en) Big data analysis method and mass spectrometry system using the analysis method
Boccard et al. A consensus orthogonal partial least squares discriminant analysis (OPLS-DA) strategy for multiblock Omics data fusion
Enot et al. Preprocessing, classification modeling and feature selection using flow injection electrospray mass spectrometry metabolite fingerprint data
JP5496650B2 (en) System, method and computer program product for analyzing spectroscopic data to identify and quantify individual elements in a sample
CN110890130B (en) Biological network module marker identification method based on multi-type relationship
CN113284566B (en) Method for realizing prediction of inhibition activity of fructose-1, 6-bisphosphatase inhibitor based on quantitative structure-activity relationship model
Suvitaival et al. Stronger findings from mass spectral data through multi-peak modeling
Papoutsoglou et al. Challenges in the multivariate analysis of mass cytometry data: the effect of randomization
Ghosh et al. Reproducibility of mass spectrometry based metabolomics data
CN113049664B (en) Path analysis modeling method based on mass spectrum metabonomics
CN111210876B (en) Disturbed metabolic pathway determination method and system
Codesido et al. Network principal component analysis: a versatile tool for the investigation of multigroup and multiblock datasets
CN114038509A (en) Disturbed pathway analysis method based on metabolite association network
Chen et al. Robust variable selection based on bagging classification tree for support vector machine in metabonomic data analysis
Cariou et al. Three‐way clustering around latent variables approach with constraints on the configurations to facilitate interpretation
CN114705766A (en) Large-scale omics data correction method and system based on IS combined SVR
Byron Clustering and network analysis of reverse phase protein array data
Defernez et al. Strategies for data handling and statistical analysis in metabolomics studies
Madhavan et al. Integrative analysis workflow for untargeted metabolomics in translational research
Bertinetto et al. Comprehensive multivariate evaluation of the effects on cell phenotypes in multicolor flow cytometry data using ANOVA simultaneous component analysis
Wang et al. Identification of Breast Cancer Biomarkers Based on Improved Gene Co-Expression Analysis
Xia Developing bioinformatics tools for metabolomics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination