CN106909805A

CN106909805A - The method for rebuilding species phylogenetic tree is compared based on a plurality of metabolic pathway

Info

Publication number: CN106909805A
Application number: CN201710116712.3A
Authority: CN
Inventors: 黄毅然; 钟诚; 林海翔
Original assignee: Guangxi University
Current assignee: Guangxi University
Priority date: 2017-03-01
Filing date: 2017-03-01
Publication date: 2017-06-30
Anticipated expiration: 2037-03-01
Also published as: CN106909805B

Abstract

Method the invention discloses reconstruction species phylogenetic tree is compared based on a plurality of metabolic pathway.The conjunction figure of many metabolic pathways is set up by the overall comparison between a plurality of metabolic pathway, then the mapping set up by the node clustering of conjunction figure between the functional module of each metabolic pathway, and the phylogenetic tree by the mapping of functional module come the relation further analyzed between metabolic pathway and between setting up species.The beneficial effects of the invention are as follows：By the implementation of this method, the comparison work of metabolic pathway is simplified, researcher only needs to carry out the phylogenetic tree that fast and accurately product inter-species is just capable of in shirtsleeve operation.

Description

The method for rebuilding species phylogenetic tree is compared based on a plurality of metabolic pathway

Technical field

This method is related to a kind of generation method of species phylogenetic tree.Generation is specifically compared based on a plurality of metabolic pathway The method of species phylogenetic tree.

Background technology

Phylogenetic analysis are a key areas of systems biology research, set up system using metabolite data at present The method of tree mainly analyzes the relation between metabolic pathway by the mapping between metabolic pathway node, and is closed with these System carries out Phylogenetic analysis to species.However, the map information between node is limited, it is only difficult by node map information With deeper into correlation between ground excavation metabolic pathway.

The content of the invention

It is an object of the invention to：The side for being compared based on a plurality of metabolic pathway and setting up phylogenetic tree tree between species is provided Method.

The present invention solve above-mentioned technical problem technical scheme be：

The method for rebuilding species phylogenetic tree is compared based on a plurality of metabolic pathway, is comprised the following steps that：

1) a plurality of metabolic pathway closes the foundation of figure：

1.1) calculating of node similarity：

For metabolic pathway P, if G_p=(V_p,E_p) represent metabolic pathway P, wherein G_pIt is a digraph, V_pIt is G_pTop Point set, E_pIt is G_pOriented line set, G_pIn summit u_iAnd u_jRepresent the reaction r in P_iAnd r_j.If r_iOne output chemical combination Thing is r_jOne input compound, then u_iAnd u_jBetween have one from r_iTo r_jDirected edge, if r_i, r_jAll it is reversible , then there is also one from r_jTo r_iDirected edge.

K is positive integer, for figure G_pIn arbitrary node u, define u k neighborhoods：N_k(u), N_kU () is V_pOne Node set, wherein u are not belonging to N_k(u) and for any x ∈ N_kU the node of (), the beeline from u to x is k；Wherein most Short distance is defined as the shortest path side number from u to x.For figure G_p' in arbitrary node v, can similarly define the k neighbours of v Set N_k(v)。

For node u ∈ V_pWith node v ∈ V '_p, in G_pIn, k neighbours' subgraph of u is expressed asIt is defined as G_pIn N_k (u) ∪ { u } inner induced subgraph.In G_p' inner, k neighbours' subgraph of v is expressed asIt is defined as G_p' in N_kV () ∪ { v } is inner Induced subgraph.If d (u) and d (v) are respectively u, v is in G_pAnd G_p' inner degree.It is Neighborhood N_kThe node degree series of the k neighbours of the u arranged by non-ascending order in (u). It is neighborhood N_kThe node degree series of the k neighbours of the v arranged by non-ascending order in (v).The topological similarity T of definition node u, v (u, v) is：

Biochemical analogy degree between definition node u and node v：Bsim (u, v)=α × ESim (u_e, v_e)+β×Csim(u_i, v_i)+γ×Csim(u_o, v_o).Wherein u_e, v_eIt is respectively the enzyme of catalytic reaction u, v, ESim (u_e, v_e) it is enzyme u_eWith enzyme v_eBetween Similarity, the Similarity Measure of the enzyme intersecting ratio of enzyme EC is used as the similarity between them.Csim(u_i, v_i) it is section The average similarity of the input compound of point u and node v, Csim (u_o, v_o) be node u and node v output compound it is average Similarity.α, beta, gamma is proportionality coefficient, for adjusting ratio of each variable in Bsim (u, v).The topological phase of integration node Like degree and node biochemical analogy degree, node similarity S (u, v) that can be obtained between node u, v is：

S (u, v)=σ × T (u, v)+(1- σ) × Bsim (u, v) (2)

Wherein σ is proportionality coefficient, for adjusting ratio of each variable in S (u, v).

1.2) mapping between node is found according to node similarity：

With G_pIn set of node as cum rights bigraph (bipartite graph) (G_b) one segmentation, with G_p' inner set of node is used as bigraph (bipartite graph) (G_b) another segmentation, with G_pNode and G_p' node between homologous similarity as connecting the two nodes split Side right weight, is G with weight limit Bipartite Matching method_pIn arbitrary node u in G_p' inner it is found in G_p' inner unique mapping Node v, obtains the mapping of 1 couple 1 (u, v) of u to v, u ∈ V (G_p), v ∈ V (G_p′)。

1.3) foundation of figure is closed between two metabolic pathways：

By step 1.2) mapping of 1 couple 1 (u, v) of u to the v that obtains is defined as merging point V_m=(u, v) | u ∈ V (G_p),v∈ V(G_p'), and the figure that these merging points are constituted is defined as conjunction figure G_M。

If G_pWith G_p' conjunction figure G_MVertex set be V (G_M)={ V_m1,V_m2,…,V_mi,…V_mn, i ∈ { 1,2 ..., n }, n =max | V (G_p)|,|V(G_p') |, we are also by V (G_M) it is referred to as G_pAnd G_p' merging point set.It is homologous similar between merging point The calculating of degree：

S (u, v)=α × Esim (u_e,v_e)+β×Csim(u_ic,v_ic)+γ×Csim(u_oc,v_oc) (3)

Conjunction figure G is calculated by (3) formula respectively_MMiddle any two merges the homologous similarity between point, can obtain conjunction figure G_M's Merge the homologous similar matrix M of point, M is one | V (G_p)|×|V(G_p') | matrix, each element M [V in M_mi,V_mj] ∈ [0,1] table Show merging point V_mi∈V(G_M) with merge point V_mj∈V(G_M) homologous similarity.

1.4) foundation of the homologous similarity matrix of figure and correspondence conjunction figure is closed between a plurality of metabolic pathway：

If the public metabolic pathway of t species is respectively G₁(V₁,E₁),G₂(V₂,E₂),…,G_t(V_t,E_t), these metabolism roads Footpath constitutes set G={ G₁(V₁,E₁),G₂(V₂,E₂),…,G_t(V_t,E_t)}。

The conjunction figure set up between the public metabolic pathway of these species is comprised the following steps that：

1.4.1 the most metabolic pathway G of nodes) is selected from G first_max, | V (G_max) |=n, then uses G_maxRespectively With each metabolic pathway G in G_i∈ G set up a conjunction figure G_Mi, close figure G_MiVertex set be V (G_Mi)={ V_m1i,V_m2i,…, V_mni, i ∈ { 1 ..., t }.Then, a conjunction figure G is often set up_MiOne will be obtained and merge the homologous similar matrix M of point_i。

1.4.2) step 1.4.1) the conjunction figure that obtains merges, and obtains the public metabolic pathway of this t species Close figure G_MK, wherein closing figure G_MKVertex set beClose figure G_MKMerging point Homologous similar matrix

2) foundation of functional module is guarded：

Using step 1.4) each in the conjunction figure that obtains merge point an as data point, and a homologous similarity moment is put merging Battle array is clustered as the similarity matrix between data point to merging point, and cluster result is exactly to close be divided into a class in figure Merging point set, we it is this merging point set be collectively referred to as U_M.For every metabolic pathway, by drawing in comparing every time After segregation class, same U is belonged to by all in metabolic pathway_MThe set of node composition be exactly one of the metabolic pathway conservative Functional module.

3) calculating of species similarity：

If the public metabolic pathway in t species is expressed as G₁(V₁,E₁),G₂(V₂,E₂),…,G_t(V_t,E_t).In step It is rapid 2) in, the conservative functional module found in this t metabolic pathway is M={ M₁,M₂,…,M_r, its interior joint is largest Conservative functional module is M_max.For any two metabolic pathway G_i(V_i,E_i) and G_j(V_j,E_j), if their node is largest Conservative functional module be respectively M_imaxAnd M_jmax, wherein M_imaxAnd M_jmaxVertex set be respectively V_imaxxAnd V_jmax, M_imaxWith M_jmaxSide collection be respectively E_imaxAnd E_jmax；If M_imaxWith M_jmaxIn M_imaxIn LCCS be M_iLCCS, M_iLCCSVertex set be V_iLCCS, it is E that side integrates_iLCCSIf, M_imaxWith M_jmaxIn M_jmaxIn LCCS be M_jLCCS, M_jLCCSVertex set be V_jLCCS, Bian Jiwei E_jLCCS.Then, metabolic pathway G_i(V_i,E_i) and G_j(V_j,E_j) between similar score：

If t species are respectively O₁, O₂..., O_t, O₁The public metabolic pathway of p bars be G₁₁,G₁₂,…,G_1p, O₂P bars it is public Co metabolism path is G₂₁,G₂₂,…,G_2p..., O_tThe public metabolic pathway of p bars be G_t1,G_t2,…,G_tp.Then, any two thing Plant O_iAnd O_jBetween similarity：

4) foundation of species phylogenetic tree：

Comprise the following steps that：

4.1) similarity in this t species between any two species is calculated with (5) formula, obtains the similar of t × t Degree matrix B Sim.BSim is the symmetrical matrix that diagonal entry is 1, and BSim [i, j] ∈ [0,1] represents species i and species j Between similarity.

4.2) distance matrix for setting this t species is D, and D [i, j] ∈ [0,1] represents the distance between species i and species j, D [i, j]=1-BSim [i, j].Then, a phylogenetic tree based on Distance matrix D is set up with software PHYLIP.

4.3) software TreeView display system trees are used.

The beneficial effects of the invention are as follows：By the implementation of this method, researcher only needs to carry out shirtsleeve operation with regard to energy The phylogenetic tree of enough fast and accurately product inter-species；This method converts the process of many metabolic pathway overall comparisons of several species To set up the process of many metabolic pathway conjunction figures, the comparison work of metabolic pathway is simplified；This method is by the node in pairing figure Cluster is finding out the functional module of each metabolic pathway, and the mapping set up between functional module, the discovery of functional module and Mapping between functional module can be helped it is found that the total biochemical characteristic information of more metabolic pathways；This method utilizes this Mapping between a little functional modules establishes species distance matrix, sets up phylogenetic tree using species distance matrix, thus The evolutionary relationship between species can be analyzed using phylogenetic tree.

Specific embodiment

1) a plurality of metabolic pathway closes the foundation of figure：

1.1) calculating of node similarity：

S (u, v)=σ × T (u, v)+(1- σ) × Bsim (u, v) (2)

1.2) mapping between node is found according to node similarity：

1.3) foundation of figure is closed between two metabolic pathways：

S (u, v)=α × Esim (u_e,v_e)+β×Csim(u_ic,v_ic)+γ×Csim(u_oc,v_oc) (3)

2) foundation of functional module is guarded：

3) calculating of species similarity：

4) foundation of species phylogenetic tree：

Comprise the following steps that：

4.3) software TreeView display system trees are used.

Claims

1. the method for rebuilding species phylogenetic tree is compared based on a plurality of metabolic pathway, is comprised the following steps that：

1) foundation of figure is closed：

1.1) calculating of node similarity：

For metabolic pathway P, if G_p=(V_p,E_p) represent metabolic pathway P, wherein G_pIt is a digraph, V_pIt is G_pVertex set, E_pIt is G_pOriented line set, G_pIn summit u_iAnd u_jRepresent the reaction r in P_iAnd r_jIf, r_iOne output compound be r_jOne input compound, then u_iAnd u_jBetween have one from r_iTo r_jDirected edge, if r_i, r_jAll be it is reversible, So there is also one from r_jTo r_iDirected edge；

K is positive integer, for figure G_pIn arbitrary node u, define u k neighborhoods：N_k(u), N_kU () is V_pA node Set, wherein u is not belonging to N_k(u) and for any x ∈ N_kU the node of (), the beeline from u to x is k；Wherein most short distance From the shortest path side number being defined as from u to x, for figure G_p' in arbitrary node v, can similarly define the k neighborhoods N of v_k (v)；

For node u ∈ V_pWith node v ∈ V '_p, in G_pIn, k neighbours' subgraph of u is expressed as It is defined as G_pIn N_k(u)∪ { u } inner induced subgraph, in G_p' inner, k neighbours' subgraph of v is expressed as It is defined as G_p' in N_k(v) ∪ { v } inner derivation Subgraph, if d (u) and d (v) are respectively u, v is in G_pAnd G_p' inner degree.It is neighbours' collection Close N_kThe node degree series of the k neighbours of the u arranged by non-ascending order in (u).It is neighbours Set N_kThe node degree series of the k neighbours of the v arranged by non-ascending order in (v), the topological similarity T (u, v) of definition node u, v For：

T [u, v] = \frac{m i n {| V (G_{u}^{k}) |, | V (G_{v}^{k}) |} + [Σ_{k = 0}^{K \max} m i n {Σ_{i = 1, x_{i} &Element; N_{k} (u)}^{| N_{k} (u) |} d (x_{i}), Σ_{i = 1, y_{i} &Element; N_{k} (v)}^{| N_{k} (v) |} d (y_{i})}] / 2}{m a x {| V (G_{u}^{k}) |, | V (G_{v}^{k}) |} + m a x {| E (G_{u}^{k}) |, | E (G_{v}^{k}) |}} - - - (1)

Biochemical analogy degree between definition node u and node v：Bsim (u, v)=α × ESim (u_e, v_e)+β×Csim(u_i, v_i)+γ ×Csim(u_o, v_o), wherein u_e, v_eIt is respectively the enzyme of catalytic reaction u, v, ESim (u_e, v_e) it is enzyme u_eWith enzyme v_eBetween it is similar Degree, the Similarity Measure of the enzyme intersecting ratio of enzyme EC is used as the similarity between them, Csim (u_i, v_i) be node u and The average similarity of the input compound of node v, Csim (u_o, v_o) be node u and node v output compound it is average similar Degree, α, beta, gamma is proportionality coefficient, for adjusting ratio of each variable in Bsim (u, v), the topological similarity of integration node With node biochemical analogy degree, node similarity S (u, v) that can be obtained between node u, v is：

S (u, v)=σ × T (u, v)+(1- σ) × Bsim (u, v) (2)

Wherein σ is proportionality coefficient, for adjusting ratio of each variable in S (u, v)；

1.2) mapping between node is found according to node similarity：

With G_pIn set of node as cum rights bigraph (bipartite graph) (G_b) one segmentation, with G_p' inner set of node is used as bigraph (bipartite graph) (G_b) Another segmentation, with G_pNode and G_p' node between homologous similarity as connect the two split nodes side right Weight, is G with weight limit Bipartite Matching method_pIn arbitrary node u in G_p' inner it is found in G_p' inner unique mapping node V, obtains the mapping of 1 couple 1 (u, v) of u to v, u ∈ V (G_p), v ∈ V (G_p′)；

1.3) foundation of figure is closed between two metabolic pathways：

By step 1.2) mapping of 1 couple 1 (u, v) of u to the v that obtains is defined as merging point V_m=(u, v) | u ∈ V (G_p),v∈V (G_p'), and the figure that these merging points are constituted is defined as conjunction figure G_M；

If G_pWith G_p' conjunction figure G_MVertex set be V (G_M)={ V_m1,V_m2,…,V_mi,…V_mn, i ∈ { 1,2 ..., n }, n=max {|V(G_p)|,|V(G_p') |, we are also by V (G_M) it is referred to as G_pAnd G_p' merging point set, merge the homologous similarity between point Calculate：

S (u, v)=α × Esim (u_e,v_e)+β×Csim(u_ic,v_ic)+γ×Csim(u_oc,v_oc) (3)

Conjunction figure G is calculated by (3) formula respectively_MMiddle any two merges the homologous similarity between point, can obtain conjunction figure G_MMerging The homologous similar matrix M of point, M is one | V (G_p)|×|V(G_p') | matrix, each element M [V in M_mi,V_mj] ∈ [0,1] expression conjunctions And point V_mi∈V(G_M) with merge point V_mj∈V(G_M) homologous similarity；

If the public metabolic pathway of t species is respectively G₁(V₁,E₁),G₂(V₂,E₂),…,G_t(V_t,E_t), these metabolic pathway structures Into set G={ G₁(V₁,E₁),G₂(V₂,E₂),…,G_t(V_t,E_t)}；

1.4.1 the most metabolic pathway G of nodes) is selected from G first_max, | V (G_max) |=n, then uses G_maxRespectively with G in Each metabolic pathway G_i∈ G set up a conjunction figure G_Mi, close figure G_MiVertex set be V (G_Mi)={ V_m1i,V_m2i,…,V_mni, i ∈ { 1 ..., t }, then, often sets up a conjunction figure G_MiOne will be obtained and merge the homologous similar matrix M of point_i；

1.4.2) step 1.4.1) the conjunction figure that obtains merges, and obtains the conjunction figure of the public metabolic pathway of this t species G_MK, wherein closing figure G_MKVertex set beClose figure G_MKMerging point it is homologous Similar matrix

2) foundation of functional module is guarded：

Using step 1.4) each in the conjunction figure that obtains merge point an as data point, merging the homologous similarity matrix work of point It is the similarity matrix between data point, is clustered to merging point, cluster result is exactly to close the conjunction that a class is divided into figure And point set, this merging point set is collectively referred to as U by we_M, it is poly- by dividing in comparing every time for every metabolic pathway After class, same U is belonged to by all in metabolic pathway_MNode composition set be exactly the metabolic pathway a conservative function Module；

3) calculating of species similarity：

If the public metabolic pathway in t species is expressed as G₁(V₁,E₁),G₂(V₂,E₂),…,G_t(V_t,E_t).In step 2) In, the conservative functional module found in this t metabolic pathway is M={ M₁,M₂,…,M_r, its interior joint is largest to be guarded Functional module is M_max, for any two metabolic pathway G_i(V_i,E_i) and G_j(V_j,E_j), if the largest guarantor of their node Keep functional module respectively M_imaxAnd M_jmax, wherein M_imaxAnd M_jmaxVertex set be respectively V_imaxxAnd V_jmax, M_imaxAnd M_jmax's Side collection is respectively E_imaxAnd E_jmax；If M_imaxWith M_jmaxIn M_imaxIn LCCS be M_iLCCS, M_iLCCSVertex set be V_iLCCS, side collection It is E_iLCCSIf, M_imaxWith M_jmaxIn M_jmaxIn LCCS be M_jLCCS, M_jLCCSVertex set be V_jLCCS, it is E that side integrates_jLCCS.Then, Metabolic pathway G_i(V_i,E_i) and G_j(V_j,E_j) between similar score：

S i m S c o r e (G_{i}, G_{j}) = \frac{\min {| E_{i L C C S} |, | E_{j L C C S} |}}{\max {| E_{i} |, | E_{j} |}} - - - (4)

If t species are respectively O₁, O₂..., O_t, O₁The public metabolic pathway of p bars be G₁₁,G₁₂,…,G_1p, O₂P bars public generation Thank to path for G₂₁,G₂₂,…,G_2p..., O_tThe public metabolic pathway of p bars be G_t1,G_t2,…,G_tp, then, any two species O_i And O_jBetween similarity：

S i m S c o r e (O_{i}, O_{j}) = \frac{Σ_{s = 1}^{p} S i m S c o r e (G_{i s}, G_{j s})}{p} - - - (5)

4) foundation of species phylogenetic tree：

Comprise the following steps that：

4.1) similarity in this t species between any two species is calculated with (5) formula, obtains a similarity moment of t × t Battle array BSim.BSim is the symmetrical matrix that diagonal entry is 1, and BSim [i, j] ∈ [0,1] is represented between species i and species j Similarity；

4.2) distance matrix for setting this t species is D, and D [i, j] ∈ [0,1] represents the distance between species i and species j, D [i, J]=1-BSim [i, j].Then, a phylogenetic tree based on Distance matrix D is set up with software PHYLIP；

4.3) software TreeView display system trees are used.