CN114038509A - Disturbed pathway analysis method based on metabolite association network - Google Patents
Disturbed pathway analysis method based on metabolite association network Download PDFInfo
- Publication number
- CN114038509A CN114038509A CN202111301560.7A CN202111301560A CN114038509A CN 114038509 A CN114038509 A CN 114038509A CN 202111301560 A CN202111301560 A CN 202111301560A CN 114038509 A CN114038509 A CN 114038509A
- Authority
- CN
- China
- Prior art keywords
- pathway
- metabolite
- path
- network
- disturbed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000002207 metabolite Substances 0.000 title claims abstract description 68
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000003068 pathway analysis Methods 0.000 title claims abstract description 8
- 230000037361 pathway Effects 0.000 claims abstract description 76
- 239000011159 matrix material Substances 0.000 claims abstract description 29
- 230000037353 metabolic pathway Effects 0.000 claims abstract description 26
- 238000004458 analytical method Methods 0.000 claims abstract description 11
- 239000012472 biological sample Substances 0.000 claims abstract description 10
- 238000012360 testing method Methods 0.000 claims abstract description 10
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 230000002503 metabolic effect Effects 0.000 claims description 11
- 238000012217 deletion Methods 0.000 claims description 9
- 230000037430 deletion Effects 0.000 claims description 9
- 239000000126 substance Substances 0.000 claims description 6
- 239000000523 sample Substances 0.000 claims description 5
- 238000001228 spectrum Methods 0.000 claims description 5
- 238000004895 liquid chromatography mass spectrometry Methods 0.000 claims description 4
- 238000005481 NMR spectroscopy Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 238000002474 experimental method Methods 0.000 claims description 3
- 238000002290 gas chromatography-mass spectrometry Methods 0.000 claims description 3
- 238000007619 statistical method Methods 0.000 claims description 3
- 238000010998 test method Methods 0.000 claims description 3
- 238000006073 displacement reaction Methods 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims description 2
- 230000003595 spectral effect Effects 0.000 claims description 2
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 claims 1
- 230000007246 mechanism Effects 0.000 abstract description 3
- 230000035790 physiological processes and functions Effects 0.000 abstract description 3
- 238000011160 research Methods 0.000 abstract description 3
- 206010009944 Colon cancer Diseases 0.000 description 8
- 208000001333 Colorectal Neoplasms Diseases 0.000 description 8
- 201000010099 disease Diseases 0.000 description 5
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 5
- 230000004060 metabolic process Effects 0.000 description 5
- 238000003012 network analysis Methods 0.000 description 5
- 239000004475 Arginine Substances 0.000 description 2
- ONIBWKKTOPOVIA-BYPYZUCNSA-N L-Proline Chemical compound OC(=O)[C@@H]1CCCN1 ONIBWKKTOPOVIA-BYPYZUCNSA-N 0.000 description 2
- ODKSFYDXXFIFQN-BYPYZUCNSA-P L-argininium(2+) Chemical compound NC(=[NH2+])NCCC[C@H]([NH3+])C(O)=O ODKSFYDXXFIFQN-BYPYZUCNSA-P 0.000 description 2
- 206010028980 Neoplasm Diseases 0.000 description 2
- ONIBWKKTOPOVIA-UHFFFAOYSA-N Proline Natural products OC(=O)C1CCCN1 ONIBWKKTOPOVIA-UHFFFAOYSA-N 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- ODKSFYDXXFIFQN-UHFFFAOYSA-N arginine Natural products OC(=O)C(N)CCCNC(N)=N ODKSFYDXXFIFQN-UHFFFAOYSA-N 0.000 description 2
- 201000011510 cancer Diseases 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 238000010201 enrichment analysis Methods 0.000 description 2
- 238000002705 metabolomic analysis Methods 0.000 description 2
- 230000001431 metabolomic effect Effects 0.000 description 2
- 238000000491 multivariate analysis Methods 0.000 description 2
- 230000004144 purine metabolism Effects 0.000 description 2
- 238000007473 univariate analysis Methods 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- HSINOMROUCMIEA-FGVHQWLLSA-N (2s,4r)-4-[(3r,5s,6r,7r,8s,9s,10s,13r,14s,17r)-6-ethyl-3,7-dihydroxy-10,13-dimethyl-2,3,4,5,6,7,8,9,11,12,14,15,16,17-tetradecahydro-1h-cyclopenta[a]phenanthren-17-yl]-2-methylpentanoic acid Chemical compound C([C@@]12C)C[C@@H](O)C[C@H]1[C@@H](CC)[C@@H](O)[C@@H]1[C@@H]2CC[C@]2(C)[C@@H]([C@H](C)C[C@H](C)C(O)=O)CC[C@H]21 HSINOMROUCMIEA-FGVHQWLLSA-N 0.000 description 1
- 208000015634 Rectal Neoplasms Diseases 0.000 description 1
- 239000002253 acid Substances 0.000 description 1
- 239000003613 bile acid Substances 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 238000002512 chemotherapy Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- HHLFWLYXYJOTON-UHFFFAOYSA-N glyoxylic acid Chemical compound OC(=O)C=O HHLFWLYXYJOTON-UHFFFAOYSA-N 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 150000002632 lipids Chemical class 0.000 description 1
- 230000008506 pathogenesis Effects 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 230000003334 potential effect Effects 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 230000004147 pyrimidine metabolism Effects 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 238000001959 radiotherapy Methods 0.000 description 1
- 206010038038 rectal cancer Diseases 0.000 description 1
- 201000001275 rectum cancer Diseases 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 210000002966 serum Anatomy 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/30—Data warehousing; Computing architectures
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Theoretical Computer Science (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
A disturbed pathway analysis method based on a metabolite association network belongs to the technical field of analysis. The method comprises the following steps: 1) analyzing metabolome in a biological sample, and preprocessing data to obtain a metabonomic data matrix; 2) constructing a bipartite metabolite-metabolic pathway map; 3) carrying out metabolite association network modeling on the metabonomics data matrix to obtain a difference network between an experimental group and a control group; 4) calculating the enrichment degree of the difference network on the bipartite graph; 5) sorting the paths according to importance based on a differential metabolite network between two groups of samples and combining information of a path database, and identifying potential disturbance paths; 6) and (4) carrying out significance test on the potential perturbation pathway to obtain a significantly perturbed metabolic pathway. The difference between different physiological states of the organism is analyzed by utilizing the related theory of the difference network, the screened disturbed metabolic pathway related to the experimental group is more reliable, and a new research idea is provided for understanding the generation mechanism behind the organism.
Description
Technical Field
The invention belongs to the technical field of analysis, and particularly relates to a disturbed pathway analysis method based on a metabolite association network.
Background
During the development of a particular disease, many important metabolic pathways are perturbed, and identification of perturbed metabolic pathways associated with a particular disease is of great significance in studying the development of cancer, as well as providing important clues for exploring the pathogenesis and drug action targets of a disease (Y. Drier, M. Sheffer, E. Domany, Pathway-based personalised analysis of cancer, Proceedings of the National Academy of Sciences 110(16 (2012013)) 6393).
The traditional path analysis method mainly involves the following steps: firstly, carrying out data preprocessing operations such as denoising, baseline correction, spectral peak alignment and the like on an acquired biological sample metabolic spectrogram; then, carrying out univariate or multivariate statistical analysis on the obtained data matrix, and screening differential metabolites; finally, pathway enrichment analysis is performed on the differential metabolites by combining with metabolic pathway databases (such as KEGG and HMDB) or literature data to obtain perturbed metabolic pathways (F.M. Al-Akwaa, B.Yunits, S.Huang, H.Alhajaji, L.X.Garmire, Lilikoi: an R package for personalized-based classification modeling algorithms data, Gigascience 7(12) (2018)). In recent years, many intelligent metabolic pathway analysis tools have emerged, such as Pathview (Luo W, Brouwer C. Pathview: an R/bioconductor package for pathway-based data integration and visualization. Bioinformatics,2013,29(14): 1830-.
The traditional univariate or multivariate analysis method can only screen out the metabolite sets with the difference among groups, however, the metabolites are not independent from each other, and the metabolites are mutually related to form a biological metabolism network. In recent years, network analysis methods have been used to study the analysis of metabolomics data. A metabolite association network which takes metabolites as nodes and the associations between the metabolites as edges is constructed by using a network analysis method, so that not only can the visualization of metabonomic data structures and relationships be realized, but also the potential action mechanism in the network structure can be further mined by analyzing the network structure through a graph theory correlation method (Toubiana D, Fernie AR, Nikoloski Z, et al. network analysis: tagklind complex data to study platform biology. trends Biotechnology.2013 January 1; 31(1): 29-36; Belleggia R, Omranian N, Holtz Y, et al. composite analysis based on numerous and metabolic data related variables and yield results in relationship 2020.02.03.931717. 858533/857. A. the network analysis method is applied to the network structure of the related metabolites and metabolic data related to the metabolites of the related genes. It is generally more interesting whether and how the network structure changes between disease states than a particular network structure. The differential metabolite association network analysis method can be used for analyzing differential metabolic association under different physiological and pathological states, and currently, no differential metabolite association network and pathway database is integrated for analyzing examples of disturbed metabolic pathways.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide a disturbed pathway analysis method based on a metabolite association network, which utilizes the correlation theory of a difference network to analyze the difference between different physiological states of a biological organism and screens out disturbed metabolic pathways related to an experimental group more reliably.
The invention comprises the following steps:
1) analyzing metabolome in a biological sample, and preprocessing data to obtain a metabonomic data matrix;
2) constructing a bipartite metabolite-metabolic pathway map;
3) carrying out metabolite association network modeling on the metabonomics data matrix to obtain a difference network between an experimental group and a control group;
4) calculating the enrichment degree of the difference network on the bipartite graph;
5) identifying potential disturbance pathways;
6) and (4) carrying out significance test on the potential perturbation pathway to obtain a significantly perturbed metabolic pathway.
In step 1), the pairAnalyzing a metabolite group in a biological sample by adopting analytical instruments such as GC-MS, LC-MS, NMR and the like to detect metabolites in the biological sample to obtain metabolic spectrum data; after preprocessing operations such as denoising, baseline correction, spectrum peak alignment and identification are carried out on the obtained metabolic spectrum, metabonomics data matrixes of an experimental group and a control group which can be used for subsequent statistical analysis are obtained and respectively recorded asAndwherein N is1、N2Respectively represent the sample amounts of the experimental group and the control group, and M is the amount of the metabolite contained in the sample.
In step 2), constructing a metabolite-metabolic pathway bipartite graph according to membership of metabolites and metabolic pathways in a KEGG pathway database; the specific steps can be as follows: mapping the detected M metabolites to a KEGG (http:// www.kegg.jp) metabolic pathway database to obtain the membership of the M metabolites and P metabolic pathways, and constructing a metabolite-metabolic pathway bipartite graphWherein the content of the first and second substances,is a set of nodes, including a set of metabolitesAnd channel setIs a collection of membership of metabolites to pathways.
In step 3), the difference network between the experimental group and the control group is represented as a weighted undirected graphEdge setThe matrix theta corresponding to the weight is the difference networkIs obtained by solving the following cost function:
wherein, sigmaxSum-sigmaYCovariance matrices of X and Y, respectively, tr () denotes the trace of the matrix, (.)TRepresenting the transpose of the matrix, | | - | luminance1L representing a matrix1Norm, λ > 0, is a sparse constraint parameter.
In step 4), the calculating the enrichment degree of the difference network on the bipartite graph is to calculate the shortest distance between the metabolite pairs corresponding to the connecting edges of the difference network in the bipartite graph of the metabolite-metabolic pathway, and define an enrichment function of the difference network on the bipartite graph of the metabolite-metabolic pathway, and the specific method may be:
two parts pictureTwo upper nodesThe shortest distance therebetweenComputing difference networkThe shortest distance of the node pair corresponding to the edge of (1)And defining a difference networkIn the bipartite drawingDegree of enrichment of
Wherein gamma is more than 0 and less than 1.
In the step 5), the potential disturbance pathway is identified by successively deleting one pathway node in the metabolite-metabolic pathway bipartite graph on the basis of the maximum enrichment function until the last pathway node is left, recording the deletion sequence of each pathway node, wherein the later deleted pathway is more important, and the later deleted pathway is obtained;
the specific method can be as follows: will delete bipartite graphA path node ofThe new bipartite graph obtained is marked asComputing a difference network using the following equationTo the bipartite drawingK, the minimum influence path1:
Get the 1 st minimum influenceWay k1And corresponding bipartite graphCalculating difference network by the same methodTo the bipartite drawingTo obtain the 2 nd least affected path k2And corresponding bipartite graphIterating the steps to obtain a path k with minimum influence3,k4… until only one path node remains on the bipartite graph, the last path node is marked as kP(ii) a The more important the path deletion sequence is, the more important the corresponding path is, the most important R potential disturbed paths are
In step 6), the significance test of the potential disturbance path can adopt a replacement test method, and a path is setThe ranking value is r for potential disturbed channels; the experimental group data X is divided into two parts according to the column (metabolite)Wherein the content of the first and second substances,represents a path kiThe sub-matrix of the corresponding sub-matrix,denotes in addition to the path kiA sub-matrix outside; control data is also shown in two partsGiven a significance level of α and a number of permutations of N, the following procedure is used for path kiThe significance test was performed:
(3) Repeating the replacement experiments of the steps (1) and (2) for N times to obtain a path kiN order of rank value of
(4) Will be provided withAs a path kiZero distribution of importance, calculating the quantile p of the real ranking value r on the zero distributionr:
If p isrAlpha is less than or equal to alpha, then the path k is callediIs a significantly disturbed pathway; according to the above steps, checking one by oneThe significance of the mesometabolic pathway and deletion of the non-significant pathways resulted in the final set of significantly perturbed pathways.
Compared with the prior art, the invention has the following outstanding advantages:
most of the traditional path enrichment methods adopt a univariate or multivariate analysis method to screen differential metabolites, and based on the screened differential metabolites, a corresponding path analysis platform is used for path enrichment analysis, so as to screen disturbance metabolic paths. But no studies have been made to screen perturbed metabolic pathways based on differential metabolite association networks. The invention sorts the paths according to importance based on the differential metabolite network between two groups of samples and by combining the information of the path database. Compared with the traditional method, the method provided by the invention utilizes the related theory of the difference network to analyze the difference between different physiological states of the organism, and the screened disturbed metabolic pathway related to the experimental group is more reliable, so that a new research thought can be provided for understanding the occurrence mechanism behind the organism.
Drawings
FIG. 1 is a system block diagram of an embodiment of the invention.
Fig. 2 shows the identification result of the significantly disturbed passage. Wherein represents pr<0.05 denotes pr<=0.2;PBAB:Primary bile acid biosynthesis;APM:Arginine and proline metabolism;GDM:Glyoxylate and dicarboxylate metabolism;PYM:Pyrimidine metabolism;PUM:Purine metabolism。
Detailed Description
The following examples will further illustrate the present invention with reference to the accompanying drawings.
Referring to fig. 1, the perturbed pathway analysis method based on the metabolite association network according to the present invention includes the following steps: analyzing a metabonomic group in a biological sample by using analytical instruments such as GC-MS, LC-MS, NMR and the like, and preprocessing data to obtain metabonomic data matrixes of an experimental group and a control group; constructing a metabolite-metabolic pathway bipartite graph according to the membership relation of metabolites and metabolic pathways in a KEGG pathway database; carrying out metabolite association network modeling on the metabonomics data matrix to obtain a difference network between an experimental group and a control group; calculating the shortest distance of the metabolite pairs in the metabolite-metabolic pathway bipartite graph according to the metabolite pairs corresponding to the connecting edges of the differential network, defining an enrichment function of the differential network on the metabolite-metabolic pathway bipartite graph, deleting one pathway node in the metabolite-metabolic pathway bipartite graph successively on the basis of the maximized enrichment function until the last pathway node is left, recording the deletion sequence of each pathway node, wherein the more important the pathway deleted later is, and obtaining a potential disturbance pathway; and (4) carrying out significance test on the potential perturbation pathway by adopting a displacement test method to obtain a significantly perturbed metabolic pathway.
Specific examples are given below.
The embodiment of the invention comprises the following steps:
1. obtaining metabolomics data matrices
The present examples use published colorectal cancer (CRC) data sets for analysis. The data set contained 158 human serum samples, including 66 colorectal cancer samples and 92 healthy control group samples. All patients with CRC in the experiment were first-diagnosed and blood samples were collected before surgery, chemotherapy or radiotherapy. Detecting metabolites in the biological sample by an LC-MS analyzer to obtain metabolic spectrogram data containing 113 metabolites. The obtained metabolic profiles were subjected to data preprocessing and quantitative analysis using MultiQuant 2.1 software (AB Sciex, Toronto, ON, Canada). To correct for minor instrument drift, the metabolite concentrations were corrected using QC samples. Finally, the metabonomic data matrix of the colorectal cancer group and the healthy control group can be respectively expressed as X66*113And Y92*113。
2. Construction of a bipartite metabolite-metabolic pathway
The 113 metabolites detected were mapped into the KEGG (http:// www.kegg.jp) metabolic pathway database, matching a total of 85 metabolites. The KEGG metabolic pathway database employed contained 1549 endogenous metabolites (excluding most lipids) and 82 metabolic pathways identified. In order to ensure the reliability of the results, this example only retained metabolic pathways containing 3 and more metabolites. Finally, a bipartite graph containing 76 metabolites and 30 metabolic pathways was obtained, which is labeledWherein the content of the first and second substances,is a set of nodes, including a set of metabolitesAnd channel setIs a collection of membership of metabolites to pathways.
3. Metabolite association network modeling
Solving the formula (1) by adopting an Alternating Direction Method of Multiplier (ADMM) to obtain a difference network of the colorectal cancer group and the healthy control groupIn this embodiment, λ is 0.5, and the network includes 25 connecting edges.
4. Calculating the enrichment degree of the difference network on the bipartite graph
Computing a difference network using a shortest path algorithmThe shortest distance of the node pair corresponding to the edge of (1)Calculating a difference network according to equation (2)In the bipartite drawingDegree of enrichment ofIn this example, γ is 0.5.
5. Identifying potentially perturbed pathways
Calculating a first minimum influence path k according to equation (3)1To obtain k1And corresponding bipartite graphCalculating difference network by the same methodTo the bipartite drawingTo obtain the 2 nd least affected path k2And corresponding bipartite graphRepeating the above steps to obtain the path k with the minimum influence3,k4… until only one path node remains on the bipartite graph, the last path node is marked as k30. The more important the corresponding path is the later the path deletion order is, the more important the ranking of all paths can be obtained, and the results are shown in table 1.
TABLE 1 Path importance ranking
Smaller r the more important the path
In this embodiment, the number R of potential disturbance paths is 5, which is the heaviestThe desired 5 potentially disturbed pathways are
6. Significance testing of perturbation pathways
Providing a passagewayThe rank value is r for the potential disturbed path. Group data X of colorectal cancer66*113Is divided into two parts according to a column (metabolite)Wherein the content of the first and second substances,represents a path kiThe sub-matrix of the corresponding sub-matrix,denotes in addition to the path kiAn outer sub-matrix. The healthy control group data are also shown in two partsIn this example, the significance level α is given as 0.2 and the number of permutations N is given as 10000, and the path k is subjected to the following procedureiAnd (3) carrying out significance test:
(4) Will be provided withAs a path kiZero distribution of importance, calculating the quantile p of the real ranking value r on the zero distribution according to the formula (4)rIf p isrWhen the number of the path k is less than or equal to 0.2, the path k is callediIs a significantly disturbed pathway.
According to the above steps, checking one by oneSignificance of the mesometabolic pathway and deletion of the non-significant pathways, resulting in the final significantly perturbed set of pathwaysThe corresponding paths are Primary double acid biosyntheses (k)30)、Arginine and proline metabolism(k29) And Purine metabolism (k)26) The results are shown in FIG. 2. The 3 screened remarkable disturbance pathways are proved to be closely related to colorectal cancer in the literature, which shows that the method provided by the embodiment of the invention has higher effectiveness in analyzing disturbance pathways related to metabolism such as rectal cancer and the like, and can provide important clues and bases for exploring drug action targets.
Colorectal cancer data research is only a preferred example selected by the invention, and the invention can be applied to the path analysis of other disease data. The above description is only a preferred embodiment of the present invention, and therefore should not be taken as limiting the scope of the invention, which is defined by the appended claims and their equivalents.
Claims (10)
1. A disturbed pathway analysis method based on a metabolite association network is characterized by comprising the following steps:
1) analyzing metabolome in a biological sample, and preprocessing data to obtain a metabonomic data matrix;
2) constructing a bipartite metabolite-metabolic pathway map;
3) carrying out metabolite association network modeling on the metabonomics data matrix to obtain a difference network between an experimental group and a control group;
4) calculating the enrichment degree of the difference network on the bipartite graph;
5) identifying potential disturbance pathways;
6) and (4) carrying out significance test on the potential perturbation pathway to obtain a significantly perturbed metabolic pathway.
2. The method for analyzing disturbed pathway based on metabolite association network as claimed in claim 1, wherein in step 1), the analysis of the metabolome in the biological sample is performed by detecting metabolites in the biological sample by using an analysis instrument such as GC-MS, LC-MS, NMR, etc. to obtain metabolic spectrum data.
3. The method for analyzing disturbed pathway based on metabolite association network as claimed in claim 1, wherein in step 1), the data preprocessing includes but is not limited to denoising, baseline correction, spectral peak alignment and identification; after the obtained metabolic spectrum is pretreated, metabonomics data matrixes of an experimental group and a control group which can be used for subsequent statistical analysis are obtained and are respectively recorded asAndwherein N is1、N2Respectively represent the sample amounts of the experimental group and the control group, and M is the amount of the metabolite contained in the sample.
4. The method according to claim 1, wherein in the step 2), the constructing of the metabolite-metabolic pathway bipartite graph is to construct the metabolite-metabolic pathway bipartite graph according to membership of metabolites and metabolic pathways in the KEGG pathway database, and the specific steps are as follows: mapping the detected M metabolites to a KEGG (http:// www.keggjp) metabolic pathway database to obtain the membership of the M metabolites and P metabolic pathways, and constructing a metabolite-metabolic pathway bipartite graphWherein the content of the first and second substances,is a set of nodes, including a set of metabolitesAnd channel set Is a collection of membership of metabolites to pathways.
5. The method according to claim 1, wherein in step 3), the difference network between the experimental group and the control group is represented as a weighted undirected graphEdge setThe matrix theta corresponding to the weight is the difference networkIs obtained by solving the following cost function:
therein, sigmaXSum ΣYCovariance matrices of X and Y, respectively, tr () denotes the trace of the matrix, (.)TRepresenting the transpose of the matrix, | · | > non-conducting phosphor1Represents the ^ ing of the matrix1Norm, λ > 0, is a sparse constraint parameter.
6. The method according to claim 1, wherein in step 4), the step of calculating the enrichment degree of the difference network on the bipartite graph is to calculate the shortest distance between the pairs of metabolites on the bipartite graph of the metabolite-metabolic pathway according to the metabolite pairs corresponding to the connecting edges of the difference network, and define an enrichment function of the difference network on the bipartite graph of the metabolite-metabolic pathway.
7. The method for analyzing disturbed pathway based on the metabolite association network according to claim 1, wherein in step 4), the specific method for calculating the enrichment degree of the difference network on the bipartite graph is as follows:
two parts pictureThe last two nodes i are connected to each other,the shortest distance therebetweenComputing difference networkThe shortest distance of the node pair corresponding to the edge of (1) And defining a difference networkIn the bipartite drawingDegree of enrichment of
Wherein gamma is more than 0 and less than 1.
8. The method according to claim 1, wherein in step 5), the identification of the potential disturbance pathway is performed by successively deleting a pathway node in the metabolite-metabolic pathway bipartite graph on the basis of the maximum enrichment function until the last pathway node is left, recording the deletion sequence of each pathway node, and obtaining the potential disturbance pathway as the pathway deleted later is more important.
9. The method for analyzing disturbed pathway based on the metabolite association network as claimed in claim 1, wherein in step 5), the specific method for identifying the potential disturbed pathway is: will delete bipartite graphOne access segment onThe new bipartite graph obtained is marked asComputing a difference network using the following equationTo the bipartite drawingK, the minimum influence path1:
Get the 1 st minimum influence path k1And corresponding bipartite graphCalculating difference network by the same methodTo the bipartite drawingTo obtain the 2 nd least affected path k2And corresponding bipartite graphIterating the steps to obtain a path k with minimum influence3,k4… until only one path node remains on the bipartite graph, the last path node is marked as kP(ii) a The more important the path deletion sequence is, the more important the corresponding path is, the most important R potential disturbed paths are
10. The method for analyzing disturbed pathway based on metabolite association network as claimed in claim 1, wherein in step 6), the significance test of the potential disturbed pathway is performed by using a displacement test method, and the pathway is setThe ranking value is r for potential disturbed channels; the experimental group data X is divided into two parts according to the metabolites, namely the columns of the data matrix XWherein the content of the first and second substances,represents a path kiThe sub-matrix of the corresponding sub-matrix,denotes in addition to the path kiA sub-matrix outside; control data is also shown in two partsGiven a significance level of α and a number of permutations of N, the following procedure is used for path kiThe significance test was performed:
(1) random permutationAndof samples, i.e. data matricesAndin the middle row, new data matrixes X 'and Y' are obtained;
(3) Repeating the replacement experiments of the steps (1) and (2) for N times to obtain a path kiN order of rank value of
(4) Will be provided withAs a path kiZero distribution of importance, calculating the quantile p of the real ranking value r on the zero distributionr:
If p isrAlpha is less than or equal to alpha, then the path k is callediIs a significantly disturbed pathway; according to the above steps, checking one by oneThe significance of the mesometabolic pathway and deletion of the non-significant pathways resulted in the final set of significantly perturbed pathways.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111301560.7A CN114038509A (en) | 2021-11-04 | 2021-11-04 | Disturbed pathway analysis method based on metabolite association network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111301560.7A CN114038509A (en) | 2021-11-04 | 2021-11-04 | Disturbed pathway analysis method based on metabolite association network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114038509A true CN114038509A (en) | 2022-02-11 |
Family
ID=80136346
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111301560.7A Pending CN114038509A (en) | 2021-11-04 | 2021-11-04 | Disturbed pathway analysis method based on metabolite association network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114038509A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160213328A1 (en) * | 2013-09-13 | 2016-07-28 | Julia Morris HOENG | Systems and methods for evaluating perturbation of xenobiotic metabolism |
CN110322930A (en) * | 2019-06-06 | 2019-10-11 | 大连理工大学 | Metabolism group operator logo object recognition methods based on horizontal relationship |
CN111210876A (en) * | 2020-01-06 | 2020-05-29 | 厦门大学 | Disturbed metabolic pathway determination method and system |
-
2021
- 2021-11-04 CN CN202111301560.7A patent/CN114038509A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160213328A1 (en) * | 2013-09-13 | 2016-07-28 | Julia Morris HOENG | Systems and methods for evaluating perturbation of xenobiotic metabolism |
CN110322930A (en) * | 2019-06-06 | 2019-10-11 | 大连理工大学 | Metabolism group operator logo object recognition methods based on horizontal relationship |
CN111210876A (en) * | 2020-01-06 | 2020-05-29 | 厦门大学 | Disturbed metabolic pathway determination method and system |
Non-Patent Citations (1)
Title |
---|
李光;李宜航;吕亚娜;李学兰;陈曦;张宁;: "基于代谢组学技术探讨特色傣药肾茶的"雅解"作用机制", 中国科学:生命科学, no. 04, 20 April 2018 (2018-04-20), pages 111 - 124 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Diggins et al. | Methods for discovery and characterization of cell subsets in high dimensional mass cytometry data | |
Lamichhane et al. | An overview of metabolomics data analysis: current tools and future perspectives | |
Fonville et al. | The evolution of partial least squares models and related chemometric approaches in metabonomics and metabolic phenotyping | |
JP7057913B2 (en) | Big data analysis method and mass spectrometry system using the analysis method | |
Boccard et al. | A consensus orthogonal partial least squares discriminant analysis (OPLS-DA) strategy for multiblock Omics data fusion | |
Enot et al. | Preprocessing, classification modeling and feature selection using flow injection electrospray mass spectrometry metabolite fingerprint data | |
JP5496650B2 (en) | System, method and computer program product for analyzing spectroscopic data to identify and quantify individual elements in a sample | |
CN110890130B (en) | Biological network module marker identification method based on multi-type relationship | |
CN113284566B (en) | Method for realizing prediction of inhibition activity of fructose-1, 6-bisphosphatase inhibitor based on quantitative structure-activity relationship model | |
Suvitaival et al. | Stronger findings from mass spectral data through multi-peak modeling | |
Papoutsoglou et al. | Challenges in the multivariate analysis of mass cytometry data: the effect of randomization | |
Ghosh et al. | Reproducibility of mass spectrometry based metabolomics data | |
CN113049664B (en) | Path analysis modeling method based on mass spectrum metabonomics | |
CN111210876B (en) | Disturbed metabolic pathway determination method and system | |
Codesido et al. | Network principal component analysis: a versatile tool for the investigation of multigroup and multiblock datasets | |
CN114038509A (en) | Disturbed pathway analysis method based on metabolite association network | |
Chen et al. | Robust variable selection based on bagging classification tree for support vector machine in metabonomic data analysis | |
Cariou et al. | Three‐way clustering around latent variables approach with constraints on the configurations to facilitate interpretation | |
CN114705766A (en) | Large-scale omics data correction method and system based on IS combined SVR | |
Byron | Clustering and network analysis of reverse phase protein array data | |
Defernez et al. | Strategies for data handling and statistical analysis in metabolomics studies | |
Madhavan et al. | Integrative analysis workflow for untargeted metabolomics in translational research | |
Bertinetto et al. | Comprehensive multivariate evaluation of the effects on cell phenotypes in multicolor flow cytometry data using ANOVA simultaneous component analysis | |
Wang et al. | Identification of Breast Cancer Biomarkers Based on Improved Gene Co-Expression Analysis | |
Xia | Developing bioinformatics tools for metabolomics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |