WO2007093960A2 - Method for analyzing the fine structure of networks - Google Patents

Method for analyzing the fine structure of networks Download PDF

Info

Publication number
WO2007093960A2
WO2007093960A2 PCT/IB2007/050471 IB2007050471W WO2007093960A2 WO 2007093960 A2 WO2007093960 A2 WO 2007093960A2 IB 2007050471 W IB2007050471 W IB 2007050471W WO 2007093960 A2 WO2007093960 A2 WO 2007093960A2
Authority
WO
WIPO (PCT)
Prior art keywords
link
elements
links
network
chev
Prior art date
Application number
PCT/IB2007/050471
Other languages
French (fr)
Inventor
István Kovács
Péter CSERMELY
Tamás KORCSMÁROS
Máte SZALAY
Original Assignee
Kovacs Istvan
Csermely Peter
Korcsmaros Tamas
Szalay Mate
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kovacs Istvan, Csermely Peter, Korcsmaros Tamas, Szalay Mate filed Critical Kovacs Istvan
Publication of WO2007093960A2 publication Critical patent/WO2007093960A2/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • G16B5/20Probabilistic models

Definitions

  • the invention disclosed herein relates to methods for analyzing the fine structure of networks comprising plurality of elements of identical nature and directed or undirected links between said elements having commensurable strengths.
  • the invention further concerns methods for high resolution modularization of networks as well as methods for identifying distinguished links or elements or groups of links or elements supposedly playing special roles in said networks (such as links or elements situated in module overlaps and VTP-links or VIP-elements situated in the overlaps of a plurality of modules).
  • the network analysis methods of the invention are based on rendering a novel index to all elements and/or links of the net- work called 'community landscape height value' quantitatively characterizing the centrality of each element and/or link from the viewpoint of the network as a whole and using said novel index for exploring the so far hidden fine structure of the network.
  • the invention also concerns pre-programmed non-programmable processing devices capable of carrying out the methods of the invention as well as computer readable data carriers comprising computer readable algorithms suitable for carrying out such methods.
  • a 'community heap' of a given element and/or link a special group of elements and/or links of a network is meant which can interact with said element to any extent.
  • the whole analyzed network may belong to the community heap of any element and/or link of the said network, nevertheless the elements and/or links, which practically do not interact with said given element are considered as practically not belonging to said community heap.
  • the extent to which a given element or link belongs to the community heap of another element or link is characterized by a 'community heap value' (CHV) said extent being higher when the interaction with said another element or link is more intense.
  • CHV 'community heap value'
  • a CHV of a given element is called a 'community heap element value' (CHEV) and a CHV of a given link is called a 'community heap link value' (CHLV).
  • the CHEVs and CHLVs of all elements and/or links are calculated in step-by-step processes, wherein during said calculation processes the 'actual' CHEVs and/or CHLVs can be altered in every step, while the actual CHEV and/or CHLV in the final calculation step is called the 'final' CHEV and/or CHLV which, consequently, equals to said CHEV and/or CHLV as defined above.
  • the group of elements and/or links the actual CHVs of which differ from zero is considered as the 'actual community heap' of the element and/or link from which said calculation process has started. Accordingly, an element and/or link is said to 'belong to a community heap' if its CHV relating to said community heap differs from zero.
  • the 'centrality' of a given element and/or link of a network can be characterized by a set CHEVs and/or
  • said set of CHEVs and/or CHLVs are usually represented by a single number called 'community landscape height value' by adding up the individual values of said set applying predetermined integration rules.
  • Said community landscape height value is, therefore, a single number for each element and/or link quantitatively characterizing the 'centrality' of said element and/or link from the viewpoint of the whole network.
  • the centrality of a given element and/or link in accordance with the methods of the invention, will be consequently characterized by said newly introduced community landscape height value even though the centrality of elements and/or links of known networks have already been characterized by other indices in the art calculated by different methods.
  • the neighboring value of an element is called 'neighboring element value' (NEV) while the neighboring value of a link is called 'neighboring link value' (NLV).
  • NEV 'neighboring element value
  • NLV 'neighboring link value'
  • 'module assignment vector' or 'module assignment values vector' of a link or element a vector of single values is meant, where the elements of said vector are the so called 'module core assignment values', and said values being indicative of the intensity with which said link or element belongs to the corresponding module core.
  • only one undirected (weighted or non- weighted) and maximum two directed (weighted or non- weighted) links are considered between any two selected elements of the analyzed network.
  • more than one link of the same direction or more than one undirected link between two selected elements can be construed, in the methods of the invention, for calculation purposes, we replace said more than one link with one single link having the same direction (or being undirected) and having the cumulated strength of said more than one link.
  • the methods used for the purpose of module determination may be classified on the basis of two classifications basically. One classifies the methods as either 'clustering' or 'divisive' by the starting conditions of the elements. In case of the clustering methods, the elements should be thought of as independent ones without links (contacts), amongst which the similarities / links create clusters.
  • the 'divisive' methods start from the entire network, which is split in the proper locations, hence divided into modules after some time.
  • the basis of the other classification is what sort of information is required for network module determination. The required information may be local (where knowing the direct neighborhood of the given element is enough for the purpose of module determination), or global (where knowing the entire network is required for the module determination).
  • Clustering methods require rather local information, whilst the divisive ones need global information.
  • Clustering methods 1. Clustering and renormalization.
  • the clustering methods are called clustering methods mostly, since the starting conditions in these methods is that all connections between the elements are deleted and the network clusters are constructed on the basis of the interconnection density.
  • the network structure is 'renormalized' in one of the clustering methods in a way that the clusters constructed in the starting step are perceived as elements and the clusters of such 'cluster-elements' are searched for in the rest of the process. These steps are iterated all over again until they might be continued.
  • the network ultimately alters into a structure having tree-like branches, to wit, a dendrogram (Wasserman and Faust, 1994).
  • the fractal structure of the networks can be derived from this dendrogram-like hierarchic module structure
  • Clustering may be performed by the random assignment of the network elements.
  • the clusters formed are characterized by a 'module parameter' appropriate for one of the module defi- nitions presented hereinabove.
  • this process is the optimization of this module parameter in a way that elements are replaced randomly in the random cluster (Newman, 2004).
  • the Pott's-model A special type of the clustering uses the Pott's-model, which was developed first for the purpose of describing the paramagnetic clusters. In this model, the clusters are identified by adding a spin to each element. The spins of the neighboring elements are correlated with each other and compose clusters, in which the spins are identical (Spirin and Mirny, 2003).
  • Divisive methods 5 Community detection method of Girvan and Newman. The most widespread version of the divisive methods amongst the modules is based on the detection of bridge-like links. If the number of the links between the modules is lower than that of the one within the modules, than plenty of such 'shortest paths' penetrate through the links between the modules, which connect elements within two modules that differ from each other. To wit, the betweenness centrality of the links between the modules is very high.
  • the method of module determination based on the betweenness centrality first searches for the link that has the highest betweennes centrality value of the network, next removes it. Followingly, it searches for and removes the highest betweenness centrality links again and again in sequence up to the point, where the network is split up.
  • TlIe clustering coefficient points out the probability of the fact whether the two neighbors of a given element become each other's neighbors.
  • the value of the clustering coefficient varies between zero and one. If the clustering coefficients are averaged on the entire network, then we receive one general index of the clustering, which is typical to the network (Barabasi and Oltvai, 2004). . - A - into the modules generated this way and the modules are cut randomly in the next step and the iteration process is started again (Duch and Arenas, 2005).
  • Guimera and Amaral (2005) recently elaborated another method akin to the aforementioned one, who applied stimulated annealing to detect the maximum number of modules found within the network.
  • Fuzzy divisive methods The status of being fuzzy can be used in the 'traditional' Girvan-Newman (2002) method in a way that the method selects the removed link randomly from the links having the highest betweenness centrality values. This method is based on the observation that the elements found in the module overlapping communities are placed in different modules if the removal order of the links having high betweenness centrality values is altered (Wilkinson and Huberman, 2004).
  • Topological overlap methods use that property of the networks that the elements within the corresponding modules demonstrate much larger overlaps in their second and third neighbors, than those ones, which are placed in different modules (Ravasz et al, 2002; Yip and Horvath, 2005).
  • the ⁇ -clique network-walk method The overlapping modules can be identified in a way that ⁇ -cliques are identi- fied in the network and the network of these ⁇ -cliques being interconnected is determined by a 'network walk'. This method starts with the removal of certain part of the weak links found in the network. These weak links are frequently the ones linking modules, thus the overlaps generated this way are probably 'the most secure' and the tiniest existing overlaps between the modules (PaMa et al, 2005).
  • Module fringe areas The distinguished role of the elements within the overlaps of network modules has been presumed only within the networks of neurons. Agnati et al. (2004) proposed that the overlaps (fringe areas) of neuron networks could be important in the regulation process of networks. Csermely (2005; 2006) generalized this notion, yet he did not conclude any concrete conclusion in respect to the specific application and properties of the elements within the over- laps.
  • the brokers usually create transient links, whilst the bridges facilitate more durable ones, yet the distinction of their roles is not quite obvious in the bibliography.
  • the role of the brokers in the distribution method of Rogers (2003) and in marketing (Rosen, 2000) has been revealed, yet no method has been elaborated to specify their localization and identification.
  • the VIP-elements constitute de facto the elite of the network and have the potential to alert the entire network promptly and efficiently.
  • the positions of the VIP- elements within the entire network were unknown, so that no such an algorithm was available, which could have been capable to identify the VIP-elements.
  • the methods used in the art start from either the local properties of, or the global properties of the networks in the course of the module determination. In this manner, there is no such a method amongst those known to date, which is 'me- zoscopic'; in other words, in which the local network properties continuously 'blend in' the global ones. To put it in another way, the local properties are of stronger importance, whilst the more remote ones are of less importance and the contribution value of the two different effects is selected by the expert seeking modules on the basis of particular criteria (biased process). No process is known, in which the process inherently contains this value and frames it by itself accordingly depending on the complex structure of the network (unbiased process).
  • Methods 9-11 used the properties of the given elements in their own direct neighborhood only (method 11), or combined them with the ones typical to the whole networks (methods 9 and 10).
  • the known methods are either incapable to consider the various weights and directedness of the network edges, or if they can be altered to do so, their version that considers the weights and the directedness of the edges slows down so much it becomes nearly useless for the large networks occurring in practice.
  • the methods known to date are not able to arrange all the elements and edges into modules to provide objective indexes for that how far a given module or edge belongs to its own module, or into several modules that actually contain them. Therefore, the methods of the invention fill a gap and also provide new outcome in respect to the mapping of the overlaps linking the network modules, and are applicable even in the case of the largest networks, where the currently known methods have failed.
  • an object of the present invention is to provide methods for high resolution analysis of such complex networks enabling the identification of elements probably playing important roles in some important functions of said complex networks.
  • An additional object of the invention is to explore the topology of complex networks in unprecedented detail and pro- vide a hierarchical description of their community (modular) structure. This - among others - helps the identification of network elements, which play a key role in the integration of the network and, therefore, require special protection.
  • the present inventors have recognized that network analyzing methods available in the art are not suitable for the high resolution analysis, and identification of all possibly important elements of complex network systems, because the measures and indices so far applied for the characterization of complex networks only consider either the local or global characteristics of any parts of said networks while mostly fail to imply a high resolution analysis of the topology of intermediate topology levels in between the said local and global measures.
  • a complex representation and analysis may become very important in a number of special and important circumstances, e.g. when the network to be analyzed suffers unusual stimuli. Therefore, the present inventors have elaborated high resolution network analytical methods based on an entirely novel concept.
  • the network analytical methods of the invention are based on examining the structure and features of the whole network from the viewpoint of each element and/or link comprised in the network and then, by cumulating the information obtained thereby, characterize the centrality of all elements and/or links of the network from the viewpoint of the whole network, rendering a continuously changing 'community landscape height value' to all said elements and/or links.
  • the analysis of said herein defined community landscape height values of said elements and/or links enables the present inventors to modularize complex networks with an unprecedented high resolution, identifying and analyzing thereby the structure of so far hidden module overlaps and identifying so far hidden key elements of complex networks which play important roles in the interactions among apparently very distant parts of the analyzed complex networks.
  • the present invention concerns a method for analyzing the fine structure of a network comprising a plurality of elements of identical nature and links between said elements wherein each link being a directed or undirected connection between two elements and having a strength representing the intensity of the connection between said connected elements, the strengths of each of said links being commensurable with each other and each of said elements being connected to at least one other element; said elements representing predetermined physical, chemical or biological entities; said method comprising the following steps: (i) taking each element or link of said network, one by one, as a starting point, calculating a 'community heap element value' (CHEV) for each element and/or a 'community heap link value' (CHLV) for each link, in a step-by-step process, characterizing the extent to which each element and/or each link belongs to the community heap of said starting element or starting link, by gradually exploring said network starting from said starting element or starting link; said step-by-step process comprising the following steps:
  • step (c) repeating step (b) until the maximum NV defined in actual step (b) becomes zero or becomes lower than a threshold value calculated in a predetermined manner and considering the actual CHLV and CHEV of each link and element, respectively, as the final CHLV and CHEV characterizing the extent to which each link and element belongs to said community heap of said starting element;
  • all said links of the analyzed network are directed links and said CHEV and CHLV for each element and link is defined in one single step-by-step process, wherein both said actual CHEV and CHLV values being calculated in the above-defined step (b) as follows: defining a 'neighboring link value' (NLV) for each link, said value being set to zero for all links the actual CHLV of which differs from zero or the actual CHEV of the starting point element of which is zero, wherein said NLV for each link being set to be higher when the strength and/or the actual CHEV of the starting point element of said link is higher; then defining a 'neighboring element value' (NEV) for each element the actual CHEV of which being zero and being the end point element of one of said links having a NLV differing from zero, wherein said NEV of said element being set to be higher when the cumulated NLV of all links ending at said element is higher while said NEV of all other elements being set to
  • CHLV of each link is being set to zero, and when a zero CHEV or CHLV is increased in any step, said increased actual CHEV or CHLV is set to remain zero until said actual CHEV or CHLV reaches a threshold value, calculated in a predetermined manner, and becomes one when exceeding said threshold value, and when a CHEV or CHLV being one is increased in any step, said increased CHEV or CHLV remains one, whereby all set of CHEVs and CHLVs of said elements and links of said network will exclusively comprise zero and one values.
  • said increasing of the actual CHLV or CHEV from zero is carried out only once for each link and each element and any further increasing of said actual CHLV or CHEV in any further step is omitted.
  • the fine structure of an undirected network is analyzed, wherein each connection between the elements of the analyzed network is an undirected connection, and wherein said actual CHEV or CHLV values being calculated in step (b) as follows: defining a NLV for each link, said NLV being set to zero for all links the actual CHEVs of the end point elements of which are both zero or non-zero, respectively, wherein said NLV for each further link being set to be higher when the strength and/or the actual CHEV of the end point element of said link having a non-zero CEHV is higher; then defining a NEV for each element the actual CHEV of which being zero and being connected to one of said links having a NLV differing from zero, wherein said NEV of said element being set to be higher when
  • each element or link belongs to any particular module core element or link of said network by gradually exploring the surrounding of each module core element or link, and rendering a set of 'module core assignment values' to each element or link, each of said values characterizing the extent of the assignment of said element or link to a respective module core element or link, wherein each said module core assignment value for each element or link is determined as a function of the module core assignment values of all neighboring elements or links, and the sum of said module core assignment values for each element or link correlating with the community landscape height value of said element or link in accordance with predetermined calculation rules; and
  • the invention further concerns a method for identifying elements or links of a network of interest being typically situated in module overlaps, comprising the following steps:
  • the invention also concerns a method for identifying elements or links of a network as defined above supposedly playing special roles in said network (VTP-elements or VTP-links), comprising the following steps:
  • the elements of the networks analyzed by the methods of the invention are typically selected from the group consisting of atoms of a macromolecule, such as a protein, a DNA-molecule, an RNA-molecule or a polysaccharide; proteins, such as proteins of a cell's signaling network, a cell's cytoskeletal network or a cell's gene expression regulatory network, proteins present in a particular cell membrane or cell organelle, proteins having special enzymatic or regulatory functions; coenzymes; cells of an organism, such as nerve cells or immune cells; microorganisms, groups of microorganisms; technical devices, such as computers, computer or microchip controlled devices, robots, transportation or communication devices, telephones, mobile telephones, radios, televisions, elements of a pipeline, communication or transportation network elements, power grid elements, digital organisms and elements of a technical device.
  • a macromolecule such as a protein, a DNA-molecule, an RNA-molecule or a polysaccharide
  • proteins such as proteins of a cell's signal
  • the links of the networks analyzed by the methods of the invention are typically selected from the group consisting of covalent or non-covalent bonds between atoms of a macromolecule, such as proteins, a DNA-molecules, RNA-molecules or a polysaccharides; protein-protein interactions; enzyme-coenzyme interactions; intracellular or intercellular interactions; specific interactions between microorganisms or groups of microorganisms and specific interactions between technical de- vices or parts of technical devices, advantageously communicative interactions.
  • the invention further provides a method for improving at least one important characteristic of a network of interest, comprising the following steps:
  • the methods of the invention are advantageously carried out by using a computing device, advantageously a computer, a microprocessor or a chip. Therefore, the invention also provides non-programmable processing devices, advantageously microprocessors or chips, capable of carrying out the methods of the invention.
  • the invention also concerns the use of any computing device, advantageously a computer, a microprocessor or a chip for carrying out the methods of the invention.
  • the invention further relates to computer readable data carriers comprising computer readable algorithms suitable for carrying out the methods of the invention.
  • Figure 1.A Graphical representation of the model signaling network analyzed in Example 1. Designation of elements and directions, strengths and topology of links connecting them are shown.
  • Figure 1.B Graphical representation of the model signaling network showing the end results of step 1 of the spreading process used in Example 1.
  • the CHEVs of links and the CHEVs of elements are shown.
  • the links with 0 CHLV are dashed, and the elements with 0 CHEV are empty.
  • Figure LC Graphical representation of the model signaling network showing the NLV calculated in the second step of the spreading process used in Example 1.
  • the NLVs of the links are shown.
  • the dashed links and empty elements show the initial state of the second step.
  • Figure LD Graphical representation of the model signaling network showing the end results of step 2 of spreading process in example 1
  • the CHEV of the links and the CHEV of the elements are shown.
  • the links with 0 CHLV are dashed, and the elements with 0 CHEV are empty.
  • Figure 1.E Graphical representation of the model signaling network showing the final CHLE and CHEV of heap starting from element K in Example 1.
  • Figure 1.F Graphical representation of the model signaling network showing the community landscape height values of all links in the network.
  • Figure LG Graphical representation of the model signaling network showing an intermediate step of link module assignment in Example 1. The dashed links are not assigned to any modules yet
  • Figure LH Graphical representation of the model signaling network showing assignment of the links to the five mod- ules ofthe network
  • Figure 1.1 Graphical representation of the model signaling network showing module overlapness of all proteins in the signaling network.
  • Figure LJ Hierarchical representation of the signaling network and the projections of the hierarchical levels, wherein the hierarchic levels of the model network is shown on the left side of the figure and their back projection to the original network is shown on the right side and the dashed ellipses comprise elements belonging to the same module based on their maximal projected module assignment values.
  • Figure 2 A diagram, showing results of the analysis of the yeast protein interaction network as performed in Example 2. Individual hub molecules are represented by triangles (party hubs) and by + signals (date hubs), axis X represents the n(M) measure (which can be interpreted as the extent of 'overlapness') and axis Y the n ⁇ H 1 ⁇ ) measure (characterizing the 'intramodular bridge property').
  • Figure 2 shows that party hubs belong to relatively few modules of the network meanwhile, among the elements which belong to the same number of modules, a party hub is most expected to be an intramodular bridge.
  • Figure 3 A diagram showing the inverse geodesic length as a function of the number of removed links in the case of the power grid analyzed in Example 4. The figure clearly demonstrates that the network collapses the earliest when the links are removed in the order of their higher community landscape height value.
  • the n(M) measure represents the number of modules, to which a given element or link significantly belongs.
  • the n(M) measure can be interpreted as the extent of 'overlapness' of the given element or link. In the following we give an example of the determination of the 'overlapness' of an element, in accordance with Examples 1 and 2.
  • n(M) the extended number of a discrete distribution, where the discrete distribution is the module assignment vector ( A[] ) of the given link or element.
  • the /T* 1 measure defines the extent of the 'bridgeness' function between modules K and L of the given element or link. This measure can be formalized in many ways based upon the strength, the community landscape height value and the module assignment vector of the elements and links of the network. We will now show an example of the Z/* 1 measure of the elements, in accordance with Examples 1 and 2. Let O C*
  • J e * denote the sum of the weight of those outbound links of element 'e', which have non-zero community landscape
  • module assignment vectors of links of the network are not normalized to the initial strength of the links, then let us re- normalize the module assignment vector of each link of the network to the initial strength of the given link. (In the patent examples 1 and 2 this normalization is not required, since the links are already normalized fulfilling said condition.)
  • N e denote the outbound assignment value vector of element 'e' and similarly, let M e denote the inbound assignment value vector of element ' e' Then the bridgeness of an element ' e' between modules K and L can be calculated as rrKL M e [K]N e [L]
  • Tj KKLL the L l e measure in the given example is defined the same way for undirected networks.
  • the nQ ⁇ ) measure represents the ' VTP-ness' of a given element or link situated in module overlaps. j y- j j y- j As can be seen in the definition of the TL measure, the Tl measure characterizes the absolute bridge function
  • a technologically relevant example of directed networks is the signaling networks of human cells.
  • elements are various proteins (occasionally RNA-s and RNA or DNA segments) and the directed links between them are the effects of an element on another element, when a signaling step occurs.
  • Such an effect might be phosphorylation, methylation, modification of binding affinity, induction of a conformational change, activation or inhibition, increase or decrease in the synthesis rate of the downstream element induced by the upstream element of the network (Balazsi and Oltvai, 2005; Papin et al., 2005).
  • Finding of key elements of signaling networks can be used as a discovery tool helping the identification of drug targets, e.g. for the attack of central elements of signaling pathways in cancer cells by anticancer drugs (Adjei and Hidalgo, 2005; Chen et al., 2005).
  • the identification of overlaps and bridges between signaling modules gives unbiased information on the cross-talk between signaling pathways, which may be crucial to design therapeutic interventions, which de-couple a signaling pathway from another, and, therefore, allow a selective inhibition of a certain pathway.
  • the identification of signaling modules gives crucial information on the possible back-up mechanisms, which may give an alternative pathway once a certain pathway has been blocked (Hornberg et al., 2006).
  • Models of such altered networks can be used to predict the behavior of the signaling network after the action of a proposed drug candidate. All the above information opens the way the heretofore unsolved mechanism to find target-sets for multi-target drugs (Borisy et al., 2003; Csermely et al., 2005; Huang, 2002).
  • target sets may contain elements of the core of a signaling module, as well as major bridges of the same module to adjacent modules. Larger, mostly linear segments of signaling networks are often called pathways. However, multifunctional signaling molecules provide a large number of cross-talks between pathways. Similarly, adaptor and scaffold proteins function as collection platforms of various signaling steps. These make a rationale to organize smaller sets of signaling elements to larger cohorts introducing a hierarchical organization of signaling (Yu and Gerstein, 2006).
  • Element E might be considered as a receptor, with elements D and F as its distinct primary targets.
  • Element K represents a negative feedback mechanism how the dominant 'pathway-D' down-regulates receptor E, and provides a cross-talk to 'pathway-F'.
  • Elements B, or G serve as adaptor molecules (switchboards) of pathway-D or pathway-F, respectively.
  • Elements L and C provide a reverse cross-talk from pathway-F to pathway-D, which is regulated by elements A and B via element C.
  • Element A is the final effector of pathway-D, while elements J, H and I are the final effectors of pathway-F.
  • the final effectors J, H and I form a direct signaling cascade, which coordinates their action.
  • the final effector I is under triple control: it has a direct influence of the primary target F, a secondary amplification loop from the adaptor G and a coordinative regulation from the competing final effector H.
  • the community landscape value is a global measure of centrality, which means that it represents how much the said element or link is affected by a change anywhere in the network, or how much the change of said element or link affects the whole network.
  • the method for calculating the community landscape height values used in the present example will be referred to herein as the ' SpreadLand method' . Step (i): Calculation of the community heap values
  • the global topological information represented by the community landscape height values is explored by a calculation process involving many elements of local topological information.
  • a community heap of each (starting) element of the network we define a community heap of each (starting) element of the network.
  • a community heap value is calculated for each element and link of the network, which is the local information representing the extent of the effect the starting element has on all the other elements and links of the network.
  • the SpreadLand method in accordance with an advantageous embodiment of the present invention, to calculate the community heap values for each element and link of the illustrative network described above.
  • This community heap value calculation of one specific method of the invention used herein represents a simplified model of information spreading.
  • a minor effect/information/perturbation at the starting element, that effect spreads from element to element according to the topology of the network, and the strength and direction of the links of the network.
  • This effect-spreading is simulated in a step by step process.
  • the community heap value of an element or link of the network will represent how much effect has reached the said element or link that is, how much said element or link belongs to the community heap of the starting element K.
  • a similar information flow might result, if protein K became oxidatively damaged or conformationally changed, phosphorylated, etc. affecting all its links to neighboring proteins.
  • the starting effect will spread from one or more elements (reached by the starting effect in one of the previous steps) to other one or more elements along the outbound links of the source elements.
  • element K has a non-zero CHEV value, so naturally it is the initial source of spreading.
  • Element K has two outbound links (note that the direction of links is important) with the same strength of one.
  • NLV the 'neighboring link value' (from now on: NLV) of each link.
  • the NLV of a link represents that if the effect spread through the given link, how much effect would spread through the given link.
  • the NLV of a given directed link X ⁇ Y is given by the formula
  • the CHEV and CHLV is determined for each element and link of the network taking each element as a starting point of a spreading process in the same way as we did in case of element K.
  • only elements and links reachable from the starting element by a directed path will have a non-zero CHEV or CHLV, for example the community heap of element //has only the elements H and / with non-zero CHEV and the respective link with non-zero CHLV.
  • the centers (or cores) of modules are defined as the hilltops of the community landscape.
  • a hilltop is either a link, whose outbound links are of smaller height (by the outbound links of a link i ⁇ j we mean the outbound links of element j), or a strongly connected component consisting of equally high links which have the previous property. This way, in this method (as opposed to most of the previously known methods) there is no need to set a pre-determined number of the modules to be found.
  • the first step links of the top community landscape level without any neighboring assigned links are marked, and are sought for distinct sets of links that fulfill the previously defined hilltop-criterion. Each such set of links becomes a new module core. Each link of all these sets is assigned to its own module with an assignment-strength of their strength
  • unassigned links of the actual community landscape level which have at least one neighbor- ing assigned link, are assigned to modules with their strength distributed as assignment strength, based on their already assigned neighboring links.
  • a link is assigned to modules to which its assigned neighbors are already assigned, in proportion of the community landscape height value (m 1 ⁇ ) of these neighbors.
  • Local maxima are the links with no neighboring link of higher community landscape height value, and there are five such links, namely B ⁇ A, C ⁇ D, F ⁇ I, G ⁇ I and H ⁇ I. So the network has five modules, and said links become the module cores of respective modules.
  • Modules will be referenced by a number starting from 1 according to the position of the module core of that module in this list of module core links; so for example module ' 1 ' refers to the module whose module core is the link B ⁇ A, module '4' refers to the module whose module core is the link G ⁇ I.).
  • a module assignment vector [X 15 X 25 Xs 5 X ⁇ Xs] is assigned to each link of the network, where X 1 is the assignment strength to module i.
  • Components of the module assignment vector of a module core link of module i are zero, except component i, which is the strength of said link, because module core links are totally assigned to the respective module. So the following module assignment vectors are initially known: B ⁇ A [1,0,0,0,0]; C ⁇ D [0,5,0,0,0]; F ⁇ I [0,0,3,0,0]; G ⁇ I [0,0,0,8,0]; H ⁇ I [0,0,0,0,7].
  • the initial strengths of the links of the network are shown in Figure LA.
  • link F ⁇ G is to be assigned to modules.
  • Said link has two already assigned neighbors, which are assigned to different modules: link G ⁇ I belongs totally to module '4', while link G ⁇ H belongs totally to module '5'.
  • the assignment formula (1) described in the proportional distribution method is used, so the module assignment vector of link F ⁇ G is given by calculating the weighted sum of the module assignment vectors of links G ⁇ I and G ⁇ H, where the weight is the community landscape height of the given link, and renormalizing the calculated weighted sum to the strength of link F ⁇ G.
  • the inbound assignment value vector of said element is calculated by summing the assignment value vectors of the inbound links of said element.
  • the outbound assignment value vector of said element is calculated by summing the assignment value vectors of outbound links of said element. Note, that in some practical cases, the in- and outbound assignment value vectors of said element may be combined (in the simplest case, added) to render a single module assignment value vector of said element. Analysis of module overlaps It could be seen in the introduction, that overlapping regions of signaling networks are important to determine potential elements of cross-talk and regulation. How much an element lies in an overlapping region of the network can be quantified using the n(M) measure the meaning of which was defined above immediately preceding the present example (See Figure U).
  • element K as the element of maximal modular overlap.
  • the rationalization of the putative signaling network of the example element K has been noted as a "negative feedback mechanism how the dominant 'pathway-D' down-regulates receptor E, and provides a cross-talk to 'pathway-F'". This description is in a complete agreement with the paramount modular overlap of this element, since we would expect an extreme modular overlap of cross-talk elements in signaling pathway.
  • Tj M Tj M be the sum of i i g for e running over all the elements of the original network.
  • the meaning of measure J ⁇ i e was de- fined above immediately preceding the present example. Note that the said condition 'k' not equals T has the desired effect of eliminating loops of the new network, since loops does not represent inter-modular connections in this current example.
  • the structure of the resulting new network represents an essential backbone of the original network.
  • Modules of said new network can be determined and even higher hierarchical levels can be constructed by repeating the described renor- malization process for the new network. Note that in this process we define 'modules of modules' (called 'supermodules'), and we can project these larger modules to the original elements of the network, as can be seen on Figure IJ.
  • Said renor- malization gives a highly efficient help in understanding the modular structure of any complex network, for example a signaling network with thousands of proteins.
  • Example 2 Identification of targets for hub-based multi-target drug design using the yeast protein-protein interaction network
  • Hubs i.e. network elements, which have a much higher number of neighbors than the average degree of the network, have a paramount importance as primary targets for the modification of various networks.
  • the inhibition of hubs proved to be a very efficient way to disturb a large number of self-organized complex systems (Barabasi & Albert, 1999, Albert et al., 2000).
  • hubs of yeast protein-protein interaction networks may be used as targets of anti-fungal drugs.
  • multi-target drugs may behave significantly better than conventional, single target drugs (Borisy et al., 2003; Csermely et al., 2005; Huang, 2002) decreasing the danger of the development of resistance, which is a primary concern with fungicides.
  • Modules of the yeast protein-protein interaction network are corresponding to grossly different functions of the respec- tive protein-complexes in the yeast cell (Valente et al, 2005). Therefore, it is advisable to select a list of hubs as non- redundant hubs, which are in different modules and satisfy the first condition. An even higher anti-fungal efficiency of the proposed multi-target drug can be expected, if we select those hubs, which are in the core of different modules. Lastly, we may expect the least functional overlap, if we select hubs, which are in the cores of non-adjacent modules.
  • our network analysis method can be efficiently used for the determination of multiple targets for anti-fungal drugs.
  • the described selection method can also be applied in many other, therapeutically relevant settings using different networks, such as the determination of multiple target-sets of signaling networks or human/cancer-specific protein-protein interaction networks for anti-cancer drugs, or finding the most efficient intervention point-sets of various technological networks, such as power-grids, the internet, etc.
  • the community landscape height value of a protein-protein interaction was set as the sum of the community heap values of the said interaction for all 2445 'starting proteins' of the giant component of the network. Module assignment of the protein-protein interactions and proteins was performed using the 'Total Distribution Method'. In the below box we describe this method in detail.
  • E is a set of links of the network Note: as for the undirected case, every link-indexed quantity is symmetric for reversing the endpoints of the link shown in the index (for example: ' ⁇ 3 "" ⁇ J*). In the total distribution method, in undirected case for non- module core links we can apply:
  • the link * ⁇ -• ⁇ * is assigned to the links connected to it, in proportion to the community landscape height of these connecting links.
  • This method results in a large number of primary modules with are extremely overlapping. If this overlap is not convenient, we may use a higher level of the hierarchical representation of the modular structure of the network, where many of the highly overlapping modules are already merged.
  • YERO 12W 1 PREl proteasome core protein is unique among Russell et al. (1999) J. the proteasome regulatory complexes. HowBiol. Chem. 274, 21943- ever, it has a unique nuclear co-localization 21952. with regulatory elements.
  • YEL037C RAD23 is an excision-repair protein, however, Guerrero et al. (2006) it is a ubiquitin receptor and part of the proteaMoI Cell Proteomics. 5, some complex. 366-378.
  • YER148W SPTl 5 is a TATA-binding protein but has not Bess et al. (2003) Bio- been shown to associate with the proteasome. chem J. 374, 667-675; However, at least two TATA-binding proteins Makino et al. (1999) (PRH and TIP120) were shown to interact with Genes Cells 4, 529-539 the proteasome.
  • YML092C PRE8 is a part of the proteasome. However, it Kiyomiya et al. (2001) may be involved in the nuclear transport of the Cancer Res. 61, 2467- proteasome. 2471.
  • YGR092W 18 The two hubs of this complex, the CAF4 RNA Liu et al. (2001) J. Biol. polymerase regulator and DBF2 kinase were Chem.276, 7541-7548. shown to interact functionally.
  • YAL043C 23 PTAl was shown to recruit GLC2 to the site of He and Moore (2005) RNA maturation. MoI. Cell 19, 619-629.
  • YNL307C 36 Yeast GSK3 is a well known regulator of tranNeigeborn and Mitchell scriptional activity. (1991) Genes Dev. 5,
  • the TORI (PI-3-kinase) complex may be a Loevith et al (2002) MoI novel complex regulator of yeast gene tranCell 10, 457-468. scription.
  • YNL006W 43 LST8 has been shown to be a part of the TORI Loevith et al (2002) MoI complex. Cell 10, 457-468
  • the additional primary modules are highly overlapping but have two additional separate peaks of the community landscape.
  • One of these macro-modules is best represented by primary modules #2 and #6. These modules are the cores of the yeast cytoskeletal apparatus. (It is very remarkable, that the cyclin-dependent protein kinase cdc28 segregates best together with this movement- related complex.)
  • the last separate macro-module is organized around the primary modules #14 and #16, which is centered around ribosome biogenesis. From the above list of macro-modules, the following hubs in non-adjacent module cores may be selected as the most separate targets for multi-target anti- fungal drugs:
  • Party hubs will be discriminated from date hubs using two measures. It is expected that party hubs will generally belong to fewer modules that date hubs. (In other words, if using the general and well-applicable measure of the 'extended number of modules', n(M) as defined in section of the description immediately preceding Example 1, date hubs will generally have a lower extended number of modules than that of party hubs.)
  • VTP-elements proteins
  • centrality is analyzed not in the level of local topology like in the case of hubs (which represent a central protein for their neighbors only) but from the point of the entire network topology.
  • VTP-proteins are intra-modular elements, which significantly belong to more than two modules. According to this definition VTP-proteins are not bridges between two modules, but share their contacts between at least 3 or more modules of the protein-protein interaction network. Since these modules represent different functions as we have seen in example 2, VTP-proteins are highly flexible interfaces, mediators and regulators of a variety of cellular functions.
  • This mediator and regulatory role will be especially important, when the complex system (yeast cell) experiences stress, since VTP-proteins have a unique potential to re-direct, and re-organize inter-modular contacts.
  • the re-organization of inter-modular contacts is the most efficient way to change the cellular response-repertoire, which becomes especially important, when the cell experiences stress, which substantially limits its resources.
  • the stress in the above module-reorganization scenario can be generalized as any outside stimulus, which is either novel or too sudden, and for which the complex system (yeast cell) does not have a pre-made (learned) adaptive response.
  • the high resolution analytical method provided by the present invention is the first method, which is able to identify overlapping modules with sufficient details to assess the position of VTP- elements (proteins).
  • VTP- elements proteins
  • the high resolution analytical method is the first method, which is able to identify overlapping modules with sufficient details to assess the position of VTP- elements (proteins).
  • the same high confidence yeast protein-protein interaction network we already introduced in example 2 (Ekman et al., 2006). This network contained 6379 protein-protein interaction data of 2633 yeast proteins total, where 2445 proteins form a connected giant component and the residual 188 proteins are distinct from this giant component.
  • VTP-protein interaction network for VTP-proteins using their property not to belong any modules more than 50% (proteins in modular overlaps). Most of the proteins in the giant component were in this category. We continued the search for VTP-proteins using the more stringent criterion that they could not belong to any two modules more than 70%. This still resulted in a too large sample in this particular example (where protein-protein connections are overlapping and dense). Therefore, we further discriminated VTP-proteins by means of the ⁇ h* 2 ) measure. As we have already defined in the section of the specification immediately preceding Example 1, H_kl measure characterizes an absolute bridge function of the respective protein between modules K and L.
  • measure h 12 characterizes a relative bridge function of the same protein between modules K and L meaning the fraction of the total bridge function between modules K and L exemplified by the given protein.
  • Measure n ⁇ h ⁇ ) gives the extended number of module pairs, where the given protein plays relatively important bridge function between the two modules of the module pair. If n ⁇ h ⁇ ) is large, the respective protein is an important (and relatively equally important) bridge between a large number of module-pairs.
  • the analyzed network is an undirected non- weighted representation of the topology of the Western States Power Grid of the United States, compiled by Duncan Watts and Steven Strogatz.
  • the data are from the web site of Prof. Duncan Watts at Columbia University. (Watts & Strogatz, 1998)
  • the community landscape height value can also be regarded as a kind of centrality value. It is important to recognize that centrality measures based upon the degree of the elements, utilize the local properties of an element, on the other hand, betweenness centrality value - taking into account the shortest paths of the network - considers the whole network. While both values can be used for industrial purposes, neither approach is perfect because generally nearby regions (belonging to the same/close module) are evidently more determining in terms of the centrality of the element as other parts in distant modules.
  • Our community heaps allow us to specify certain extent of importance of an element compared to other elements in the network. It takes into consideration the less important, distant regions, but with less measure. So, community landscape value - derived from community heap values - eliminates the above problem and can associate a more applicable centrality value to elements and links of the network.
  • NodeLand heap method assigns a community heap value to all links (CHLV) and elements (CHEV) of an undirected network, where the community heap value of given element or link can be only zero or one.
  • CHLV links
  • CHEV elements
  • the NodeLand method starts from a specified element of the network, called the 'starting element'. In the beginning, each CHEV and CHLV is set to zero, except the CHEV of the starting element which is initialized to one.
  • a neighboring value (NV) is calculated for every element (NEV) and link (NLV) of the network.
  • the NLV of a given link is set to one, if one end point element of the given link belongs to the community heap while the other end point element does not, otherwise the NLV of the given link is set to zero.
  • a threshold value is defined by the formula S/E, where S and E have the same meaning as above. Elements with the maximal NEV are selected, and if the NEV of selected elements is equal or higher than said threshold value, the CHEV of said selected elements is set to one and the CHLV of links connected to said selected elements and having non-zero NLV is set to one.
  • step (b) the CHLV of links, whose both end point elements have CHEV of one but whose CHLV equals zero is set to one.
  • Step (b) is repeated while the maximal NEV calculated in the actual step (b) is non-zero, and is not lower than the threshold value of the given step (b).
  • the NodeLand method is finished for the given starting element once the maximal NEV is zero or is lower than said threshold.
  • the actual CHEV of each element and the actual CHLV of each link belonging to the community heap of the given starting element will be the final community heap value of the given element or link.
  • a community heap was assigned to every element of the network resulting in a set of binary community heap values for each link.
  • the community landscape value of a link is calculated by multiplying the initial strength of the given link by the number of community heaps the given link belongs to.
  • the initial strengths of links were set to one, so the multiplication can be omitted.
  • the resulting community landscape values serve as our centrality measures.
  • the high resolution network analyzing methods of the invention enables the present inventors to modularize complex networks with an unprecedented high resolution, identifying and analyzing thereby the structure of so far hidden module overlaps and identifying so far hidden key elements of complex networks which play important roles in the interactions among apparently very distant parts of the analyzed com- plex networks.

Landscapes

  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physiology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention disclosed herein relates to methods for analyzing the fine structure of networks comprising plurality of elements of identical nature and directed or undirected links between said elements having commensurable strengths. The invention further concerns methods for high resolution modularization of networks as well as methods for identifying distinguished links or elements or groups of links or elements supposedly playing special roles in said networks (such as links or elements situated in module overlaps and VIP-links or VTP-elements situated in the overlaps of a plurality of modules). The network analysis methods of the invention are based on rendering a novel index to all elements and/or links of the network called 'community landscape height value' quantitatively characterizing the centrality of each element and/or link from the viewpoint of the network as a whole and using said novel index for exploring the so far hidden fine structure of the network. The invention also concerns pre-programmed non-programmable processing devices capable of carrying out the methods of the invention as well as computer readable data carriers comprising computer readable algorithms suitable for carrying out such methods.

Description

Method for analyzing the fine structure of networks
Field of the invention
The invention disclosed herein relates to methods for analyzing the fine structure of networks comprising plurality of elements of identical nature and directed or undirected links between said elements having commensurable strengths. The invention further concerns methods for high resolution modularization of networks as well as methods for identifying distinguished links or elements or groups of links or elements supposedly playing special roles in said networks (such as links or elements situated in module overlaps and VTP-links or VIP-elements situated in the overlaps of a plurality of modules). The network analysis methods of the invention are based on rendering a novel index to all elements and/or links of the net- work called 'community landscape height value' quantitatively characterizing the centrality of each element and/or link from the viewpoint of the network as a whole and using said novel index for exploring the so far hidden fine structure of the network. The invention also concerns pre-programmed non-programmable processing devices capable of carrying out the methods of the invention as well as computer readable data carriers comprising computer readable algorithms suitable for carrying out such methods.
Definitions
With respect to the present specification and the appended claims, the following technical terms will be used in accordance with the definitions given below. With regard to the interpretation of the present invention, it shall be understood that the terms defined below are used in accordance with the given definitions even if said definitions might not be in perfect harmony with the usual interpretation of said technical term in the art.
By a 'community heap' of a given element and/or link a special group of elements and/or links of a network is meant which can interact with said element to any extent. Theoretically, the whole analyzed network may belong to the community heap of any element and/or link of the said network, nevertheless the elements and/or links, which practically do not interact with said given element are considered as practically not belonging to said community heap. The extent to which a given element or link belongs to the community heap of another element or link is characterized by a 'community heap value' (CHV) said extent being higher when the interaction with said another element or link is more intense. A CHV of a given element is called a 'community heap element value' (CHEV) and a CHV of a given link is called a 'community heap link value' (CHLV). In the methods of the invention the CHEVs and CHLVs of all elements and/or links are calculated in step-by-step processes, wherein during said calculation processes the 'actual' CHEVs and/or CHLVs can be altered in every step, while the actual CHEV and/or CHLV in the final calculation step is called the 'final' CHEV and/or CHLV which, consequently, equals to said CHEV and/or CHLV as defined above. During said step-by-step calculation processes in accordance with the invention, the group of elements and/or links the actual CHVs of which differ from zero is considered as the 'actual community heap' of the element and/or link from which said calculation process has started. Accordingly, an element and/or link is said to 'belong to a community heap' if its CHV relating to said community heap differs from zero. Generally, the 'centrality' of a given element and/or link of a network can be characterized by a set CHEVs and/or
CHLVs, said set being indicative of the extent to which the said element and/or link belongs to all community heaps of all other elements and/or links. In the methods of the invention, said set of CHEVs and/or CHLVs are usually represented by a single number called 'community landscape height value' by adding up the individual values of said set applying predetermined integration rules. Said community landscape height value is, therefore, a single number for each element and/or link quantitatively characterizing the 'centrality' of said element and/or link from the viewpoint of the whole network. In the following, the centrality of a given element and/or link, in accordance with the methods of the invention, will be consequently characterized by said newly introduced community landscape height value even though the centrality of elements and/or links of known networks have already been characterized by other indices in the art calculated by different methods.
When calculating the actual CHEVs and/or CHLVs of the elements and/or links of a network of interest with respect to a given element and/or link, in accordance with the methods of the invention, it is important that the effects of the most 'neighboring' elements and/or links on said actual CHEV and/or CHLV be considered first in every step, which are in the most intense interaction with the actual community heap of said given element and/or link. Therefore, in the methods of the invention said 'neighbority' is characterized by a quantitative 'neighboring value' (NV) calculated in accordance with predetermined rules. The neighboring value of an element is called 'neighboring element value' (NEV) while the neighboring value of a link is called 'neighboring link value' (NLV). Based on the community landscape height values as defined by the methods of the invention, it is possible to characterize all elements and/or links of a network by specific 'module core assignment values' differing from zero with respect to all identified module cores of said network enabling the identification of module overlaps so far non-identified in the art.
By the term 'module assignment vector', or 'module assignment values vector' of a link or element a vector of single values is meant, where the elements of said vector are the so called 'module core assignment values', and said values being indicative of the intensity with which said link or element belongs to the corresponding module core.
With respect to the methods of the invention, only one undirected (weighted or non- weighted) and maximum two directed (weighted or non- weighted) links are considered between any two selected elements of the analyzed network. When, in a given network to be analyzed, more than one link of the same direction or more than one undirected link between two selected elements can be construed, in the methods of the invention, for calculation purposes, we replace said more than one link with one single link having the same direction (or being undirected) and having the cumulated strength of said more than one link.
Background of the invention
Processes and methods used in the art for network structure examination The methods used for the purpose of module determination may be classified on the basis of two classifications basically. One classifies the methods as either 'clustering' or 'divisive' by the starting conditions of the elements. In case of the clustering methods, the elements should be thought of as independent ones without links (contacts), amongst which the similarities / links create clusters. The 'divisive' methods start from the entire network, which is split in the proper locations, hence divided into modules after some time. The basis of the other classification is what sort of information is required for network module determination. The required information may be local (where knowing the direct neighborhood of the given element is enough for the purpose of module determination), or global (where knowing the entire network is required for the module determination). Rationally, the clustering methods require rather local information, whilst the divisive ones need global information. Clustering methods 1. Clustering and renormalization. The clustering methods are called clustering methods mostly, since the starting conditions in these methods is that all connections between the elements are deleted and the network clusters are constructed on the basis of the interconnection density. The network structure is 'renormalized' in one of the clustering methods in a way that the clusters constructed in the starting step are perceived as elements and the clusters of such 'cluster-elements' are searched for in the rest of the process. These steps are iterated all over again until they might be continued. On the basis of this method, the network ultimately alters into a structure having tree-like branches, to wit, a dendrogram (Wasserman and Faust, 1994). The fractal structure of the networks can be derived from this dendrogram-like hierarchic module structure
(Alessina and Bodini, 2004; Garlaschelli et al, 2003; Goh et al, 2005; Song et al, 2005).
2. The optimization of random clusters. Clustering may be performed by the random assignment of the network elements. In this respect, the clusters formed are characterized by a 'module parameter' appropriate for one of the module defi- nitions presented hereinabove. Actually, this process is the optimization of this module parameter in a way that elements are replaced randomly in the random cluster (Newman, 2004).
3. The Pott's-model. A special type of the clustering uses the Pott's-model, which was developed first for the purpose of describing the paramagnetic clusters. In this model, the clusters are identified by adding a spin to each element. The spins of the neighboring elements are correlated with each other and compose clusters, in which the spins are identical (Spirin and Mirny, 2003).
4. Co-neighbors and pathways method. The high values of the clustering coefficients (i.e. many co-neighbors) and the similarity of the shortest paths were used by a recently published model for the purpose of arranging the network elements into modules (Poyatos and Hurst, 2004).
Divisive methods 5. Community detection method of Girvan and Newman. The most widespread version of the divisive methods amongst the modules is based on the detection of bridge-like links. If the number of the links between the modules is lower than that of the one within the modules, than plenty of such 'shortest paths' penetrate through the links between the modules, which connect elements within two modules that differ from each other. To wit, the betweenness centrality of the links between the modules is very high. The method of module determination based on the betweenness centrality first searches for the link that has the highest betweennes centrality value of the network, next removes it. Followingly, it searches for and removes the highest betweenness centrality links again and again in sequence up to the point, where the network is split up. After this, it repeats the process within both parts and within all parts generated this way. This method requires tremendous computation, since the value of betweenness centrality changes by the removal of each link; therefore, calculations must be compiled after each step all over again (Girvan and Newman, 2002; Newman and Girvan, 2004). 6. Neighboring edge method. The method of the clustering coefficient of the edges1 is based on that property of the networks that the edges between the modules rarely constitute part of a triangle-shaped edge formulation. If those edges are removed from the network that constitute parts of the fewest edge triangles, then the network will most likely disintegrate into its modules (Radicchi et al., 2004).
7. Horizontal random cut model. By another method, the network is cut randomly into two parts. Next, by the applica- tion of any of the module definitions hereinabove, the 'conformity' of the elements of the modules received is calculated in each module. The element that has the lowest 'conformity value' is relocated into the other module. If there is no further improvement observed in respect to the 'conformity value' following the relocations, the network is ultimately distributed
1TlIe clustering coefficient points out the probability of the fact whether the two neighbors of a given element become each other's neighbors. The value of the clustering coefficient varies between zero and one. If the clustering coefficients are averaged on the entire network, then we receive one general index of the clustering, which is typical to the network (Barabasi and Oltvai, 2004). . - A - into the modules generated this way and the modules are cut randomly in the next step and the iteration process is started again (Duch and Arenas, 2005). Guimera and Amaral (2005) recently elaborated another method akin to the aforementioned one, who applied stimulated annealing to detect the maximum number of modules found within the network.
8. Dynamic methods. The dynamic module determination methods arrange those network elements in one module, of which links always occur jointly (Papin et al., 2004). Methods resulting in overlapping modules
None of the methods introduced hereinabove will result in models overlapping each other, if used by the exact application of the original theories, so that such methods cannot be used at all to reach the below defined objects of the invention. The following methods are known for the determination of overlapping modules. 9. Fuzzy Pott's-model. Reichardt and Bornholdt (2004) described the fuzzy version of the Pott's-model listed herein- above. In their method, not only the islands of the identical neighboring spins are regarded as 'outcome', but also they introduced another 'conformity variable', which appreciates in the whole of the network, if the spins remain identical. In case of the appropriate ratio of these contradictory effects, the identification of overlapping modules becomes possible. In the course of the network description, fuzziness can be introduced to this method by the random modification of the weights assigned to the network links (Gfeller et al, 2005).
10. Fuzzy divisive methods. The status of being fuzzy can be used in the 'traditional' Girvan-Newman (2002) method in a way that the method selects the removed link randomly from the links having the highest betweenness centrality values. This method is based on the observation that the elements found in the module overlapping communities are placed in different modules if the removal order of the links having high betweenness centrality values is altered (Wilkinson and Huberman, 2004).
11. Topological overlap methods. These methods use that property of the networks that the elements within the corresponding modules demonstrate much larger overlaps in their second and third neighbors, than those ones, which are placed in different modules (Ravasz et al, 2002; Yip and Horvath, 2005).
12. The ^-clique network-walk method. The overlapping modules can be identified in a way that ^-cliques are identi- fied in the network and the network of these ^-cliques being interconnected is determined by a 'network walk'. This method starts with the removal of certain part of the weak links found in the network. These weak links are frequently the ones linking modules, thus the overlaps generated this way are probably 'the most secure' and the tiniest existing overlaps between the modules (PaMa et al, 2005).
Studies examining the distinguished roles of the elements in the overlaps of network modules Distinguished role of the module fringe areas
13. Module fringe areas. The distinguished role of the elements within the overlaps of network modules has been presumed only within the networks of neurons. Agnati et al. (2004) proposed that the overlaps (fringe areas) of neuron networks could be important in the regulation process of networks. Csermely (2005; 2006) generalized this notion, yet he did not conclude any concrete conclusion in respect to the specific application and properties of the elements within the over- laps.
Distinguished elements within the networks
14. Hubs. The nodes, which have plenty of links, are the key factors of the networks. Following the work of Laszlό Barabasi (Barabasi and Oltvai, 2004), an extremely wide range of research was conducted in the past six years in respect to the distinguished role of the hubs regarding the viability and the protection of the networks. 15. Bridges, brokers and structural holes. The bridges and the brokers, which create links between two, or more communities, alias modules, constitute important elements of the networks. Both elements link those remote parts of the net- works, of which linking is essential in the view of the network operation. Due to this, those places, which are occupied by these elements, are often called as structural holes (Burt, 1995; 2005). The brokers usually create transient links, whilst the bridges facilitate more durable ones, yet the distinction of their roles is not quite obvious in the bibliography. The role of the brokers in the distribution method of Rogers (2003) and in marketing (Rosen, 2000) has been revealed, yet no method has been elaborated to specify their localization and identification.
16. Weak links. The weak links uphold a distinguished role in the stabilization process of the networks (Granovetter, 1973). The generalization of these links has been completed and the positions of some weak links between the modules have been described (Csermely 2005; 2006). However, neither any method has been elaborated to identify them, nor their key role in giving optimal responses to unexpected effects has been recognized. 17. VIP-elements. The VIP-elements are those ones within the network that are the best, the most talented ones in a given situation and the most creative ones. Masuda and Konno (2005) pointed out that these elements have such an effect that creates relatively few links. These relatively few links, on the other hand, are very important, since these links proceed to the most important nodes of the network with nearly no exception. Therefore, the VIP-elements constitute de facto the elite of the network and have the potential to alert the entire network promptly and efficiently. The positions of the VIP- elements within the entire network were unknown, so that no such an algorithm was available, which could have been capable to identify the VIP-elements.
The methods used in the art start from either the local properties of, or the global properties of the networks in the course of the module determination. In this manner, there is no such a method amongst those known to date, which is 'me- zoscopic'; in other words, in which the local network properties continuously 'blend in' the global ones. To put it in another way, the local properties are of stronger importance, whilst the more remote ones are of less importance and the contribution value of the two different effects is selected by the expert seeking modules on the basis of particular criteria (biased process). No process is known, in which the process inherently contains this value and frames it by itself accordingly depending on the complex structure of the network (unbiased process).
Vast majority of the module determination methods (see methods 1-8) were incapable to define the overlapping ele- ments of the modules. The other section of the module determinations (methods 9-11) used the properties of the given elements in their own direct neighborhood only (method 11), or combined them with the ones typical to the whole networks (methods 9 and 10). The known methods are either incapable to consider the various weights and directedness of the network edges, or if they can be altered to do so, their version that considers the weights and the directedness of the edges slows down so much it becomes nearly useless for the large networks occurring in practice. Moreover, the methods known to date are not able to arrange all the elements and edges into modules to provide objective indexes for that how far a given module or edge belongs to its own module, or into several modules that actually contain them. Therefore, the methods of the invention fill a gap and also provide new outcome in respect to the mapping of the overlaps linking the network modules, and are applicable even in the case of the largest networks, where the currently known methods have failed.
The publications on the elements having a distinguished role in the network module overlaps have not recognized the properties of these elements, thus their applicability has not been defined in case of unexpected effects on the networks (Items 13 and 15-17), or the publications to date have not focused on these elements (Item 14).
Complex organizations that evolved in the process of evolution (macromolecules, cells, cellular networks, living organisms, groups of living organisms) and the complex devices created by man (computers, mobile telephones, transportation vehicles, space-crafts, robots etc.) as well as technical networks (pipeline systems, power supply grids, computer networks, transportation route networks, commercial networks, communication networks etc.) are often composed of elements exceeding the number of ten thousand. The complexity of these networks cannot be described by a series of simple logical functions and it also exceeds the capacity of the human brain, but requires the network approach. Beyond giving responses to effects that are ordinary, planned and calculated in advance, it is very important in the life of complex systems and during the use of technical devices and networks that they can also give responses to unexpected effects, for which the complex system was incapable to prepare, or for which the designer of the said technical device, or network has not thought of during the design procedure. Simple systems stand defenseless against unexpected effects (e.g.: the kettle breaks, if dropped).
Therefore, an object of the present invention is to provide methods for high resolution analysis of such complex networks enabling the identification of elements probably playing important roles in some important functions of said complex networks.
An additional object of the invention is to explore the topology of complex networks in unprecedented detail and pro- vide a hierarchical description of their community (modular) structure. This - among others - helps the identification of network elements, which play a key role in the integration of the network and, therefore, require special protection.
Furthermore, another object of the invention is to provide methods for producing, improving or developing complex networks that are more capable to respond to unexpected stimuli (e.g. more durable, more resistant and have higher stress resistance potential) than their original version. Still another object of the present invention is to identify the key elements unknown to date within said complex networks, which may trigger a sudden and extensive alteration in the behavior of the said complex systems upon stimuli.
Detailed description of the invention
The present inventors have recognized that network analyzing methods available in the art are not suitable for the high resolution analysis, and identification of all possibly important elements of complex network systems, because the measures and indices so far applied for the characterization of complex networks only consider either the local or global characteristics of any parts of said networks while mostly fail to imply a high resolution analysis of the topology of intermediate topology levels in between the said local and global measures. A complex representation and analysis may become very important in a number of special and important circumstances, e.g. when the network to be analyzed suffers unusual stimuli. Therefore, the present inventors have elaborated high resolution network analytical methods based on an entirely novel concept. The network analytical methods of the invention are based on examining the structure and features of the whole network from the viewpoint of each element and/or link comprised in the network and then, by cumulating the information obtained thereby, characterize the centrality of all elements and/or links of the network from the viewpoint of the whole network, rendering a continuously changing 'community landscape height value' to all said elements and/or links. The analysis of said herein defined community landscape height values of said elements and/or links enables the present inventors to modularize complex networks with an unprecedented high resolution, identifying and analyzing thereby the structure of so far hidden module overlaps and identifying so far hidden key elements of complex networks which play important roles in the interactions among apparently very distant parts of the analyzed complex networks.
In accordance with the above, the present invention concerns a method for analyzing the fine structure of a network comprising a plurality of elements of identical nature and links between said elements wherein each link being a directed or undirected connection between two elements and having a strength representing the intensity of the connection between said connected elements, the strengths of each of said links being commensurable with each other and each of said elements being connected to at least one other element; said elements representing predetermined physical, chemical or biological entities; said method comprising the following steps: (i) taking each element or link of said network, one by one, as a starting point, calculating a 'community heap element value' (CHEV) for each element and/or a 'community heap link value' (CHLV) for each link, in a step-by-step process, characterizing the extent to which each element and/or each link belongs to the community heap of said starting element or starting link, by gradually exploring said network starting from said starting element or starting link; said step-by-step process comprising the following steps:
(a) setting the starting CHEV and/or CHLV to zero for each element and link, except for said starting element or link, the starting CHEV or CHLV for which being set to a predetermined positive value differing from zero;
(b) calculating, in a predetermined manner, a 'neighboring value' (NV) for each element (NEV) and/or each link (NLV) having a zero CHEV and/or CHLV, characterizing the intensity of the cumulated connectedness of said element and/or link to the actual community heap of said starting element and/or link; adding the elements) and/or link(s) having the highest NV to the actual community heap of said starting element or link; and calculat- ing, in a predetermined manner, an actual CHEV and/or CHLV for all elements and links belonging to the actual community heap of said starting element or starting link;
(c) repeating step (b) until the maximum NV defined in actual step (b) becomes zero or becomes lower than a threshold value calculated in a predetermined manner and considering the actual CHLV and CHEV of each link and element, respectively, as the final CHLV and CHEV characterizing the extent to which each link and element belongs to said community heap of said starting element;
(ii) repeating steps (a)-(c) for each element taken as starting element, rendering thereby a set of final CHLV and CHEV to each link and element, respectively, said set representing the extent to which said link or element belongs to the community heaps of all elements of the network;
(iii) representing said set of CHLVs or CHEVs of each link or element, respectively, by a corresponding number calcu- lated according to predetermined integration rules and considering said number as the 'community landscape height value' of said link or element characterizing the centrality of said link or element in said network from the cumulated viewpoint of all links or elements of said network;
(iv) analyzing the fine structure of said network and the special roles certain elements play in said network on the basis of combined consideration of the topology, strengths and directions of the links connecting certain elements or groups of ele- ments of interest together with the community landscape height values of said links and/or elements.
In an advantageous embodiment of the method of the invention, all said links of the analyzed network are directed links and said CHEV and CHLV for each element and link is defined in one single step-by-step process, wherein both said actual CHEV and CHLV values being calculated in the above-defined step (b) as follows: defining a 'neighboring link value' (NLV) for each link, said value being set to zero for all links the actual CHLV of which differs from zero or the actual CHEV of the starting point element of which is zero, wherein said NLV for each link being set to be higher when the strength and/or the actual CHEV of the starting point element of said link is higher; then defining a 'neighboring element value' (NEV) for each element the actual CHEV of which being zero and being the end point element of one of said links having a NLV differing from zero, wherein said NEV of said element being set to be higher when the cumulated NLV of all links ending at said element is higher while said NEV of all other elements being set to zero; and selecting one ore more element having zero as the actual CHEV and having the highest NEV; selecting all links starting from elements having greater than zero actual CHEV and ending at said selected elements and selecting all links, starting from said selected elements and ending at elements having greater than zero actual CHEV, and selecting all links starting from any of said selected elements and ending at any of said selected elements, and increasing the actual CHEV of each said selected element, wherein the extent of said increasing being higher when said NEV is higher, and increasing the actual CHLV of each said selected link, wherein the extent of said increasing being higher when said NLV of said link is higher. In a further advantageous method of the invention said, CHEV of each starting element is being set to one, said starting
CHLV of each link is being set to zero, and when a zero CHEV or CHLV is increased in any step, said increased actual CHEV or CHLV is set to remain zero until said actual CHEV or CHLV reaches a threshold value, calculated in a predetermined manner, and becomes one when exceeding said threshold value, and when a CHEV or CHLV being one is increased in any step, said increased CHEV or CHLV remains one, whereby all set of CHEVs and CHLVs of said elements and links of said network will exclusively comprise zero and one values.
In another advantageous method of the invention, said increasing of the actual CHLV or CHEV from zero is carried out only once for each link and each element and any further increasing of said actual CHLV or CHEV in any further step is omitted. In still another advantageous method of the invention the fine structure of an undirected network is analyzed, wherein each connection between the elements of the analyzed network is an undirected connection, and wherein said actual CHEV or CHLV values being calculated in step (b) as follows: defining a NLV for each link, said NLV being set to zero for all links the actual CHEVs of the end point elements of which are both zero or non-zero, respectively, wherein said NLV for each further link being set to be higher when the strength and/or the actual CHEV of the end point element of said link having a non-zero CEHV is higher; then defining a NEV for each element the actual CHEV of which being zero and being connected to one of said links having a NLV differing from zero, wherein said NEV of said element being set to be higher when the cumulated NLV of all links connected to said element is higher while said NEV of all other elements being set to zero; and selecting one ore more element having zero as the actual CHEV and having the highest NEV; selecting all links cαn- necting elements having greater than zero actual CHEV and one of said selected elements, further selecting all links being connected exclusively to said selected elements, and increasing the actual CHEV of each said selected element, wherein the extent of said increasing being higher when said NEV is higher, and increasing the actual CHLV of each said selected link, wherein the extent of said increasing being higher when said NLV of said link and/or the strength of said link and/or the actual CHEVs of said elements connected by said links are higher. The invention further provides a method for the high resolution modularization of a network of interest, comprising the steps of:
(i) rendering a community landscape height value to all elements or links of said network according to any previously described method of the invention;
(ii) identifying the elements or links having local maximum community landscape height values as module core ele- ments or links of said network;
(iii) determining the extent to which each element or link belongs to any particular module core element or link of said network by gradually exploring the surrounding of each module core element or link, and rendering a set of 'module core assignment values' to each element or link, each of said values characterizing the extent of the assignment of said element or link to a respective module core element or link, wherein each said module core assignment value for each element or link is determined as a function of the module core assignment values of all neighboring elements or links, and the sum of said module core assignment values for each element or link correlating with the community landscape height value of said element or link in accordance with predetermined calculation rules; and
(iv) arbitrarily defining a threshold module core assignment value and each element or link having a higher module core assignment value than said threshold value with respect to any module core element or link is considered as belonging to the module defined by said module core element or link. The invention further concerns a method for identifying elements or links of a network of interest being typically situated in module overlaps, comprising the following steps:
(i) allocating all elements or links of said network to all identified module cores according to the above modularization method of the invention; and (ii) identifying elements or links not being assigned to belong to any module core by more than 50% of their community landscape height value as being situated in module overlaps.
The invention also concerns a method for identifying elements or links of a network as defined above supposedly playing special roles in said network (VTP-elements or VTP-links), comprising the following steps:
(i) identifying elements or links typically situated in module overlaps according to the above defined method of the in- vention; and
(ii) identifying elements or links being typically situated in module overlaps and not being assigned to belong to any two module cores aggregately by more than 70% of its community landscape height value as VTP-elements or VTP-links.
The elements of the networks analyzed by the methods of the invention are typically selected from the group consisting of atoms of a macromolecule, such as a protein, a DNA-molecule, an RNA-molecule or a polysaccharide; proteins, such as proteins of a cell's signaling network, a cell's cytoskeletal network or a cell's gene expression regulatory network, proteins present in a particular cell membrane or cell organelle, proteins having special enzymatic or regulatory functions; coenzymes; cells of an organism, such as nerve cells or immune cells; microorganisms, groups of microorganisms; technical devices, such as computers, computer or microchip controlled devices, robots, transportation or communication devices, telephones, mobile telephones, radios, televisions, elements of a pipeline, communication or transportation network elements, power grid elements, digital organisms and elements of a technical device.
The links of the networks analyzed by the methods of the invention are typically selected from the group consisting of covalent or non-covalent bonds between atoms of a macromolecule, such as proteins, a DNA-molecules, RNA-molecules or a polysaccharides; protein-protein interactions; enzyme-coenzyme interactions; intracellular or intercellular interactions; specific interactions between microorganisms or groups of microorganisms and specific interactions between technical de- vices or parts of technical devices, advantageously communicative interactions.
The invention further provides a method for improving at least one important characteristic of a network of interest, comprising the following steps:
(i) identifying at least one VTP-element or VTP-link of said network according to the above defined method of the invention; (ii) altering, multiplying, deleting or replacing said at least one VTP-element or VTP-link of said network, producing thereby an altered network;
(iii) comparing at least one important characteristic of said altered network to that of the original network and effecting another replacement or alteration on the same or another VTP-element or VTP-link of the original network if said important characteristic is not improved sufficiently. The methods of the invention are advantageously carried out by using a computing device, advantageously a computer, a microprocessor or a chip. Therefore, the invention also provides non-programmable processing devices, advantageously microprocessors or chips, capable of carrying out the methods of the invention.
The invention also concerns the use of any computing device, advantageously a computer, a microprocessor or a chip for carrying out the methods of the invention. The invention further relates to computer readable data carriers comprising computer readable algorithms suitable for carrying out the methods of the invention. Brief description of the drawings
Figure 1.A: Graphical representation of the model signaling network analyzed in Example 1. Designation of elements and directions, strengths and topology of links connecting them are shown.
Figure 1.B: Graphical representation of the model signaling network showing the end results of step 1 of the spreading process used in Example 1. The CHEVs of links and the CHEVs of elements are shown. The links with 0 CHLV are dashed, and the elements with 0 CHEV are empty.
Figure LC: Graphical representation of the model signaling network showing the NLV calculated in the second step of the spreading process used in Example 1. The NLVs of the links are shown. The dashed links and empty elements show the initial state of the second step. Figure LD: Graphical representation of the model signaling network showing the end results of step 2 of spreading process in example 1 The CHEV of the links and the CHEV of the elements are shown. The links with 0 CHLV are dashed, and the elements with 0 CHEV are empty.
Figure 1.E: Graphical representation of the model signaling network showing the final CHLE and CHEV of heap starting from element K in Example 1. Figure 1.F: Graphical representation of the model signaling network showing the community landscape height values of all links in the network.
Figure LG: Graphical representation of the model signaling network showing an intermediate step of link module assignment in Example 1. The dashed links are not assigned to any modules yet
Figure LH: Graphical representation of the model signaling network showing assignment of the links to the five mod- ules ofthe network
Figure 1.1: Graphical representation of the model signaling network showing module overlapness of all proteins in the signaling network.
Figure LJ: Hierarchical representation of the signaling network and the projections of the hierarchical levels, wherein the hierarchic levels of the model network is shown on the left side of the figure and their back projection to the original network is shown on the right side and the dashed ellipses comprise elements belonging to the same module based on their maximal projected module assignment values.
Figure 2: A diagram, showing results of the analysis of the yeast protein interaction network as performed in Example 2. Individual hub molecules are represented by triangles (party hubs) and by + signals (date hubs), axis X represents the n(M) measure (which can be interpreted as the extent of 'overlapness') and axis Y the nζH1^) measure (characterizing the 'intramodular bridge property'). Figure 2 shows that party hubs belong to relatively few modules of the network meanwhile, among the elements which belong to the same number of modules, a party hub is most expected to be an intramodular bridge.
Figure 3: A diagram showing the inverse geodesic length as a function of the number of removed links in the case of the power grid analyzed in Example 4. The figure clearly demonstrates that the network collapses the earliest when the links are removed in the order of their higher community landscape height value.
Examples
In the following illustrative examples demonstrating the applicability of the methods of the invention for the analysis of the fine structure of existing real life networks, we use the following indices for characterizing special important features of the analyzed networks and elements or links thereof. Centrality
As opposed to formerly used local or global centrality measures (such as the degree or betweenness centrality of elements or links) we define the centrality of a link or element using all scales of network topology rendering it equal to the community landscape value of the given element or link.
The n(M) measure of a given element or link
The n(M) measure represents the number of modules, to which a given element or link significantly belongs. The n(M) measure can be interpreted as the extent of 'overlapness' of the given element or link. In the following we give an example of the determination of the 'overlapness' of an element, in accordance with Examples 1 and 2.
The entropy of a discrete distribution (Cl ^ , where i \ 1 ..N ) is given by the expression S = — Σ; P1 In P1 , where P1 = Q1 11L1Cl1. In a continuous case the entropy of a function Cl( X) is defined as S = — I p\X) U\ p{x)uX, where P\X) = CIy X) I Cl\X)uX. The value of S is maximal, when the distribution function takes only two distinct values: a zero and a non-zero constant value. In the discrete case the value of β is N , which equals to the number of elements. In the continuous case the value of g. gives the magnitude of the non-zero intervals of the function. If the values of the distribution are not equal, then g. can only take smaller values, which will be referred as an extended number.
Using the entropy term given above we may now define the value of n(M) as the extended number of a discrete distribution, where the discrete distribution is the module assignment vector ( A[] ) of the given link or element. (In the directed case, if we have two module assignment vectors N and M according to the inbound and outbound links of the given element, we have to summarize them as A[i] = N[i] + M[i].)
The lf^ measure of a given element or link
The /T*1 measure defines the extent of the 'bridgeness' function between modules K and L of the given element or link. This measure can be formalized in many ways based upon the strength, the community landscape height value and the module assignment vector of the elements and links of the network. We will now show an example of the Z/*1 measure of the elements, in accordance with Examples 1 and 2. Let O C*
Je * denote the sum of the weight of those outbound links of element 'e', which have non-zero community landscape
height values. Similarly, let *-* *e denote the sum of the weight of those inbound links of element 'e', which have non-zero
community landscape height values. Let the value of e be the maximal value among e* and *£ .
If the module assignment vectors of links of the network are not normalized to the initial strength of the links, then let us re- normalize the module assignment vector of each link of the network to the initial strength of the given link. (In the patent examples 1 and 2 this normalization is not required, since the links are already normalized fulfilling said condition.) Let
N e denote the outbound assignment value vector of element 'e' and similarly, let M e denote the inbound assignment value vector of element ' e' Then the bridgeness of an element ' e' between modules K and L can be calculated as rrKL Me [K]Ne[L]
The more the inbound links of element 'e' belong to module K or the more the outbound links of element 'e' belong to
Z T KL £* module E, the higher is the bridgeness between module K and L. The maximal value of -* -* e is e .
Note that if the network analyzed is undirected, thenN[i] equals M[i] for i=l..m, where m is the number of modules. Also in the undirected case O e — ύ where ι3 is the sum of the strength of the links connected to the given element e. So
Tj KKLL the L l e measure in the given example is defined the same way for undirected networks.
The nQi^) measure of a given element or link
The nQ^) measure represents the ' VTP-ness' of a given element or link situated in module overlaps. jy-j jy-j As can be seen in the definition of the TL measure, the Tl measure characterizes the absolute bridge function
7 KL of the respective element or link between modules K and L. On the contrary, measure Yl characterizes the relative bridge function of the given element or link between modules K and L, with "relative" understood in terms of the fraction of the total bridge function between modules K and L. Measure YIyJfI ) gives the extended number of module pairs be¬
tween which the given element or link plays a relative bridge function. If YlI Yl J is large, the respective element or link is an important (and relatively equally important) bridge between a large number of module-pairs.
Note that various definitions of the measure Yl ( Yl J can be given, depending on the definition of Yl and consid¬
erations about the situation when K equals L (the special treatment may be necessary in this case because Yl gives information about the intramodular relative bridge function of the given element or link for module K, while one may only be interested in the intermodular relative bridge function of the given element or link). Following we give an example definition of the Yl ( ft J measure which is used in example 4. First we define the Yl measure as
Figure imgf000013_0001
KL where the measure Tl is as defined for undirected networks above, and i runs over the elements of the network. Also e note that we choose to define Yl as zero if i\ — Li for said reason.
Then using the extended number (entropy) formula as defined above, we may now define the value of Yl ( Yl ) as the
7 f KKLL extended number of the discrete distribution of the Yl measures of the given element e, where said discrete distribution is constituted over each possible pair (K5L). Note that considering the pairs (K5L) as ordered or as unordered does not affect the definition of the Yl\jl J measure
in case of undirected networks, as in the undirected case the Yl\ H ) value is the same for both ordered or unordered pairs.
The examples below are solely included to illustrate several advantageous embodiments of the invention in more detail and to facilitate the more comprehensive understanding of the principles of the methods of the claimed invention. Said examples, therefore, can not be interpreted anyway as being limiting to the claimed scope of the invention, said scope being exclusively limited by the specific terms and expressions as used in the appended claims.
Example 1
Refined topological analysis of a model signaling network
In the following example we demonstrate the analysis of a model network representing a small segment of the signaling network of human cells. A technologically relevant example of directed networks is the signaling networks of human cells. In these biological networks, elements are various proteins (occasionally RNA-s and RNA or DNA segments) and the directed links between them are the effects of an element on another element, when a signaling step occurs. Such an effect might be phosphorylation, methylation, modification of binding affinity, induction of a conformational change, activation or inhibition, increase or decrease in the synthesis rate of the downstream element induced by the upstream element of the network (Balazsi and Oltvai, 2005; Papin et al., 2005). Finding of key elements of signaling networks, such as central elements of signaling modules can be used as a discovery tool helping the identification of drug targets, e.g. for the attack of central elements of signaling pathways in cancer cells by anticancer drugs (Adjei and Hidalgo, 2005; Chen et al., 2005). The identification of overlaps and bridges between signaling modules gives unbiased information on the cross-talk between signaling pathways, which may be crucial to design therapeutic interventions, which de-couple a signaling pathway from another, and, therefore, allow a selective inhibition of a certain pathway. Finally, the identification of signaling modules gives crucial information on the possible back-up mechanisms, which may give an alternative pathway once a certain pathway has been blocked (Hornberg et al., 2006). Models of such altered networks can be used to predict the behavior of the signaling network after the action of a proposed drug candidate. All the above information opens the way the heretofore unsolved mechanism to find target-sets for multi-target drugs (Borisy et al., 2003; Csermely et al., 2005; Huang, 2002). Such target sets may contain elements of the core of a signaling module, as well as major bridges of the same module to adjacent modules. Larger, mostly linear segments of signaling networks are often called pathways. However, multifunctional signaling molecules provide a large number of cross-talks between pathways. Similarly, adaptor and scaffold proteins function as collection platforms of various signaling steps. These make a rationale to organize smaller sets of signaling elements to larger cohorts introducing a hierarchical organization of signaling (Yu and Gerstein, 2006).
In the following hypothetical, illustrative example we give a grossly simplified version of human signaling networks (see Figure 1 A) to illustrate the applicability of the modularization method of the invention for directed networks. Element E might be considered as a receptor, with elements D and F as its distinct primary targets. Element K represents a negative feedback mechanism how the dominant 'pathway-D' down-regulates receptor E, and provides a cross-talk to 'pathway-F'. Elements B, or G serve as adaptor molecules (switchboards) of pathway-D or pathway-F, respectively. Elements L and C provide a reverse cross-talk from pathway-F to pathway-D, which is regulated by elements A and B via element C. Element A is the final effector of pathway-D, while elements J, H and I are the final effectors of pathway-F. The final effectors J, H and I form a direct signaling cascade, which coordinates their action. The final effector I is under triple control: it has a direct influence of the primary target F, a secondary amplification loop from the adaptor G and a coordinative regulation from the competing final effector H.
Defining the community heap of a given network element using the SpreadLand method
To present an example of a refined analysis of directed and weighted signaling networks, we calculate the community landscape height value for every element and link of the network, which indicates the centrality of the position of said element or link in the network. The community landscape value is a global measure of centrality, which means that it represents how much the said element or link is affected by a change anywhere in the network, or how much the change of said element or link affects the whole network. The method for calculating the community landscape height values used in the present example will be referred to herein as the ' SpreadLand method' . Step (i): Calculation of the community heap values
The global topological information represented by the community landscape height values is explored by a calculation process involving many elements of local topological information. As a first step of this process we define a community heap of each (starting) element of the network. To characterize and define the community heap of a given starting element, a community heap value is calculated for each element and link of the network, which is the local information representing the extent of the effect the starting element has on all the other elements and links of the network. In the following example we use a specific method (the SpreadLand method), in accordance with an advantageous embodiment of the present invention, to calculate the community heap values for each element and link of the illustrative network described above. While community heap values of community heaps of all elements of the network are required to calculate the community landscape height values, for the sake of compactness and clarity, the first part of the example will guide us through the calcula- tion of the community heap values of elements and links for the single community heap of element K of our signaling network example.
This community heap value calculation of one specific method of the invention used herein represents a simplified model of information spreading. By initially placing a minor effect/information/perturbation at the starting element, that effect spreads from element to element according to the topology of the network, and the strength and direction of the links of the network. This effect-spreading is simulated in a step by step process. The community heap value of an element or link of the network will represent how much effect has reached the said element or link that is, how much said element or link belongs to the community heap of the starting element K. Please note that it is indifferent, which element we choose first as a starting element, since the community heaps have to be assigned to each element of the network. Therefore, it is arbitrary, that we show the example of element K here, representing a negative feedback. A similar information flow might result, if protein K became oxidatively damaged or conformationally changed, phosphorylated, etc. affecting all its links to neighboring proteins. Step (i)/(a): Initialization
First we initialize the community heap element value of elements (from now on: CHEV) and the community heap link values of links (from now on: CHLV) to zero. The only exception is the starting element, now element K, whose CHEV is set to two (because two is the sum strength of outbound links of the given element). Step (i)/(b): Start of the spreading process
In each step, the starting effect will spread from one or more elements (reached by the starting effect in one of the previous steps) to other one or more elements along the outbound links of the source elements. In the first step, only element K has a non-zero CHEV value, so naturally it is the initial source of spreading. Element K has two outbound links (note that the direction of links is important) with the same strength of one. First we calculate the 'neighboring link value' (from now on: NLV) of each link. The NLV of a link represents that if the effect spread through the given link, how much effect would spread through the given link. In the specific method, the NLV of a given directed link X→Y is given by the formula
C S I S , where C is the CHEV of elements, s is the strength of the given link, and S is the sum of the strength of all outbound links of element X, if the CHLV of the given link is zero, or zero otherwise. In the given situation, the NLV of the link K→E is 1, because in said formula C=I, 5=1, 5=2, and the CHLV of said link is yet zero. The same is true for the link K→G. After the calculation of the NLV of each link, we assign a 'neighboring element value' (from now on: NEV) to each ending element of links with non-zero NLV, calculated by summing the NLV of inbound links of the given element, if said ending element has zero CHEV, or leaving the NEV undefined for said ending element otherwise. In the actual case each such ending element has only one inbound link with non-zero NLV, and the NEV of element E and K will both be 1. After the NEV calculation, we choose the elements with maximal NEV, and set the CHEV of each chosen element to the NEV of that chosen element, and set the CHLV of each link pointing to any chosen element to the NLV of that link (see Figure 1.B). Repeating step (i)/(b): Continuation of the spreading process
After recalculating the NLV of each link, the second step of the spreading process is illustrated on Figure LC. Here we can see quite a few links with non-zero NLV. After calculating the NEV of respective elements, the maximal NEV turns out to be Vi, therefore elements D, F and / are selected and their CHEV are set to Vi, and CHLV of links E→D, E→F and G→I which are pointing to one of said selected elements is set to the NLV of said link, actually all Vi. Recalculation of neighboring link values (NLVs) after repeated step (i)/(b)
Now there are links whose CHLV is zero, starting point element is one of those elements with newly assigned CHEV in the previous step and end point element has non-zero CHEV. These are the D— >K, F→G and F→I links. Let the NLV of these links be recalculated the same way as in first step (i)/(b), then let the CHLV of each of these links be the NLV of said link. Parallel with this step, for each of these links let the CHVE of the end point element of the given link be increased by the NLV of the given link. Figure 1.D represents the result after completion of repeated step (i)/(b).
If we continue this spreading procedure, all elements and links of the network will be assigned a non-zero CHEV or CHLV sooner or later, representing the belonging of said element or link to the community heap of element K. Figure 1.E illustrates the final result, where the community heap values of the community heap of element K are already assigned to all elements and links of the network.
Step (H): Determination of community heap values and the community landscape values for all elements of the network
In the following steps the CHEV and CHLV is determined for each element and link of the network taking each element as a starting point of a spreading process in the same way as we did in case of element K. However, it is important to note that it is not necessary for any community heap that all of its elements and links have a non-zero CHEV or CHLV. Clearly, only elements and links reachable from the starting element by a directed path will have a non-zero CHEV or CHLV, for example the community heap of element //has only the elements H and / with non-zero CHEV and the respective link with non-zero CHLV.
After the completion of these steps, we can proceed to determine the community landscape value for each link. For this, we will sum the Δ values of each community heap, where the given link is belonging by non-zero CHLV. The Δ is defined as A = C * A + S * B , where C is the CHLV of the given link, A is the summarized inbound strength of the endpoint of the given link, S is the strength of the given link and B is the CHEV of the endpoint of the given link. The summarized Δ value is the community landscape height of the given link, which measures the centrality of the given link by means of the quantity of the effect the given link was reached by originating from each starting element. In other words, more densely interconnected parts of the network (with the farther parts of the network taken into account with a smaller weight) have higher community landscape height. Figure LF illustrates the community landscape of the signaling network, where the numbers besides the links represent their community landscape height.
Reaching this point of our process we already have the community landscape height value of each link of the network, which we can use for the fine analysis of the network. Identifying module cores and the extent that links belong to any such module core using the Proportional Distribution method
The centers (or cores) of modules are defined as the hilltops of the community landscape. By definition a hilltop is either a link, whose outbound links are of smaller height (by the outbound links of a link i→j we mean the outbound links of element j), or a strongly connected component consisting of equally high links which have the previous property. This way, in this method (as opposed to most of the previously known methods) there is no need to set a pre-determined number of the modules to be found.
After determining module center links, we assign all the remaining links of the network to the various hilltops/highlands constructing a complete procedure, where the original strength of the link will be fully distributed between the communities it belongs to. In this method, the assignment of a link depends only on the outbound links of its endpoint. The Proportional Distribution Method
By the term 'proportional distribution' we mean that links of the network are assigned to modules of neighboring links of equal or higher community landscape height value (we define neighboring links of link i→j as the outbound links of element j). To demonstrate the method, a simplified, comprehensive procedure is presented instead of a detailed concrete implementation. In this procedure module centers are not required to be determined beforehand. All links are 'removed' from the network, then, by using an appropriate rule, they are put back, while the assignment of links to modules also happens. In the start of the proportional distribution method all links are marked as unassigned. After this, multiple rounds of link- assignment is performed: in all rounds, links are assigned to modules based on the assignment of previously assigned links. In each round, we descend to the next level of links, starting from the top community-landscape level, where links of the same community landscape height value are considered to constitute a community landscape level. Here we describe the steps of a single round:
- The first step: links of the top community landscape level without any neighboring assigned links are marked, and are sought for distinct sets of links that fulfill the previously defined hilltop-criterion. Each such set of links becomes a new module core. Each link of all these sets is assigned to its own module with an assignment-strength of their strength
- In the consecutive steps, unassigned links of the actual community landscape level, which have at least one neighbor- ing assigned link, are assigned to modules with their strength distributed as assignment strength, based on their already assigned neighboring links. A link is assigned to modules to which its assigned neighbors are already assigned, in proportion of the community landscape height value (m1}) of these neighbors. When considering the assignment strength of the neighbors (which is normalized for the link strength), we renormalize them to unity. This results in the following assignment formula for link i→j (S5 is the initial strength, VJ5 is the assignment strength vector of link i→j):
Figure imgf000017_0001
In such a step, links assigned in the current step are not considered as 'assigned neighbors' during the whole duration of the same step. The step described here is repeated, while any unassigned links still exist on the actual community landscape level. Once all links of the actual community landscape level are assigned, the round is over, and the next round begins, unless there are no more community landscape levels. As an outcome of the assignment process, for each link the assignment strength values to various modules are fulfilling the requirement that the sum of these assignment strength values is the initial strength of that link. Applying the Proportional Distribution Method for the modularization of the model signaling network
The community landscape height values of the links of our signaling network are illustrated on Figure LF. Since there are no links of equal community landscape height value, we can disregard the community landscape levels for this particular example, because all links represent a separate community landscape level now, so in any step it is obvious to assign modules to an unassigned link whose neighboring links of higher community landscape height value are already assigned to modules.
As first step local maxima of the community landscape are sought. Local maxima are the links with no neighboring link of higher community landscape height value, and there are five such links, namely B→A, C→D, F→I, G→I and H→I. So the network has five modules, and said links become the module cores of respective modules. (Modules will be referenced by a number starting from 1 according to the position of the module core of that module in this list of module core links; so for example module ' 1 ' refers to the module whose module core is the link B→A, module '4' refers to the module whose module core is the link G→I.). Five modules may seem much taking into account the small size of the signaling net- work, but shows the special illustrative circumstances selected for the hypothetical example as well as the sensitivity of our method. It is important to note, however, that the method is able to create a new level of hierarchy by merging some of these modules as described later.
Through the module assignment process, a module assignment vector [X15X25Xs5X^Xs] is assigned to each link of the network, where X1 is the assignment strength to module i. Components of the module assignment vector of a module core link of module i are zero, except component i, which is the strength of said link, because module core links are totally assigned to the respective module. So the following module assignment vectors are initially known: B→A [1,0,0,0,0]; C→D [0,5,0,0,0]; F→I [0,0,3,0,0]; G→I [0,0,0,8,0]; H→I [0,0,0,0,7]. The initial strengths of the links of the network are shown in Figure LA.
In following steps yet unassigned links are assigned to modules in the descending order of the community landscape height value of said links. For such a given link, already assigned neighboring links of the given link with higher community landscape height value are regarded to calculate the module assignment vector of the given link: First the link B→C is processed, whose only assigned neighboring link C→D belongs to module '2' only, thus so will link B→C, with a module assignment vector [0,5,0,0,0]. The situation is similar in case of the following link D→B, whose only assigned neighboring link is also B→C. Following are link G→H and then link J→H which both have link H→I as their only assigned neighbor- ing link belonging totally to module '5', which renders links G→H and J→H also belong to module '5'. The module assignment vectors of links assigned this far are illustrated on Figure LG.
In the next step link F→G is to be assigned to modules. This is the first non-trivial step. Said link has two already assigned neighbors, which are assigned to different modules: link G→I belongs totally to module '4', while link G→H belongs totally to module '5'. The assignment formula (1) described in the proportional distribution method is used, so the module assignment vector of link F→G is given by calculating the weighted sum of the module assignment vectors of links G→I and G→H, where the weight is the community landscape height of the given link, and renormalizing the calculated weighted sum to the strength of link F→G. This results in the module assignment vector [0,0,0,1.43,0.56], so link F→G belongs to modules '4' and '5' in the approximate ratio of 3:1. As we descend on the community landscape, overlapping links become more and more frequent, whose module assignment vector is determined the same way as we described here. Fig- ure 1.H illustrates the result of the final assignment of each link to the network modules. Assignment of elements of the model signaling network to modules Element assignment values can be originated from the link assignment values as follows. An element has two assignment value vectors: the 'inbound assignment value vector' represents how much the modules are connected to the said element, while the 'outbound assignment value vector' represents how much said element is connected to the modules. The inbound assignment value vector of said element is calculated by summing the assignment value vectors of the inbound links of said element. The outbound assignment value vector of said element is calculated by summing the assignment value vectors of outbound links of said element. Note, that in some practical cases, the in- and outbound assignment value vectors of said element may be combined (in the simplest case, added) to render a single module assignment value vector of said element. Analysis of module overlaps It could be seen in the introduction, that overlapping regions of signaling networks are important to determine potential elements of cross-talk and regulation. How much an element lies in an overlapping region of the network can be quantified using the n(M) measure the meaning of which was defined above immediately preceding the present example (See Figure U).
Analyzing the signaling network example, we identified element K as the element of maximal modular overlap. In the rationalization of the putative signaling network of the example element K has been noted as a "negative feedback mechanism how the dominant 'pathway-D' down-regulates receptor E, and provides a cross-talk to 'pathway-F'". This description is in a complete agreement with the paramount modular overlap of this element, since we would expect an extreme modular overlap of cross-talk elements in signaling pathway. The agreement between the detailed analysis and the 'common-sense' identification of element K as a cross talk element supports the idea that the modular analysis will also show cross-talk ele- ments in real-world examples, where the complexity of the signaling network precludes an easy, 'common-sense', or experimental identification of cross-talk elements. Analyzing the higher hierarchical level of the model signaling network
The determination of the hierarchical organization of modules is of extreme importance to assign elements of the network into larger units in an overlapping manner (Yu and Gerstein, 2006). Let us construct a new network which represents a higher hierarchical level of the original network as follows. Let the new network contain no elements neither links in the beginning. For each module 'm' of the original network let us add an element 'm' to the new network. For each ordered pair (k,l) of modules of the original network where 'k' not equals '1' let us add a directed link to the new network starting from element 'k' and ending at element T, and let the weight of said link
Tj M Tj M be the sum of i i g for e running over all the elements of the original network. The meaning of measure J~i e was de- fined above immediately preceding the present example. Note that the said condition 'k' not equals T has the desired effect of eliminating loops of the new network, since loops does not represent inter-modular connections in this current example.
The structure of the resulting new network represents an essential backbone of the original network. Modules of said new network can be determined and even higher hierarchical levels can be constructed by repeating the described renor- malization process for the new network. Note that in this process we define 'modules of modules' (called 'supermodules'), and we can project these larger modules to the original elements of the network, as can be seen on Figure IJ. Said renor- malization gives a highly efficient help in understanding the modular structure of any complex network, for example a signaling network with thousands of proteins.
Example 2 Identification of targets for hub-based multi-target drug design using the yeast protein-protein interaction network The rationale of hub-based multi-target drug design can be summarized as follows. Hubs, i.e. network elements, which have a much higher number of neighbors than the average degree of the network, have a paramount importance as primary targets for the modification of various networks. Importantly, the inhibition of hubs proved to be a very efficient way to disturb a large number of self-organized complex systems (Barabasi & Albert, 1999, Albert et al., 2000). As an analogy of this intervention, hubs of yeast protein-protein interaction networks (where elements are the proteins of the yeast cell, network links are the direct physical interactions between them and weights may be defined as connection probabilities having a rough correlation with binding affinities of the respective protein-pair) may be used as targets of anti-fungal drugs. However, recent examples indicated that multi-target drugs may behave significantly better than conventional, single target drugs (Borisy et al., 2003; Csermely et al., 2005; Huang, 2002) decreasing the danger of the development of resistance, which is a primary concern with fungicides. To the development of an efficient targeting of hub-based multi-target drug design, we have to solve two basic problems:
- Condition 1 : selection of non-redundant hubs. We have to select those hubs as elements of our multiple targets, which are non-redundant.
- Condition 2: selection of multi-interface (party) hubs. Additionally, to achieve a high efficiency it is very useful to select hubs, which have their interactions at the same time (multi-interface, or party hubs), since they were shown to be much more essential for yeast survival than those hubs, which bind to their neighbors consecutively (single-interface, or date- hubs) (Eckman et al, 2006; Han et al, 2004; Kim et al, 2006). Condition 1: selection of non-redundant hubs
Modules of the yeast protein-protein interaction network are corresponding to grossly different functions of the respec- tive protein-complexes in the yeast cell (Valente et al, 2005). Therefore, it is advisable to select a list of hubs as non- redundant hubs, which are in different modules and satisfy the first condition. An even higher anti-fungal efficiency of the proposed multi-target drug can be expected, if we select those hubs, which are in the core of different modules. Lastly, we may expect the least functional overlap, if we select hubs, which are in the cores of non-adjacent modules. In this example we will show, that the current method is useful to solve this task and very efficiently and elegantly shows the position of the desired non-adjacent core-hubs defining the module cores and using a hierarchical representation of the modular structure of the yeast protein-protein interaction network. Condition 2: selection of multi-interface (party) hubs
To satisfy the second condition, so far independent information on the temporal order of protein expression, or protein structure was required (Eckman et al, 2006; Han et al, 2004; Kim et al, 2006). In this example we show that an efficient discrimination between party and date hubs (or multi- and single-interface hubs, respectively) can be achieved, if we dissect hubs of module cores (representing party-hubs) and hubs of modular overlaps (representing date hubs).
As a summary, in the example we will show that our network analysis method can be efficiently used for the determination of multiple targets for anti-fungal drugs. The described selection method can also be applied in many other, therapeutically relevant settings using different networks, such as the determination of multiple target-sets of signaling networks or human/cancer-specific protein-protein interaction networks for anti-cancer drugs, or finding the most efficient intervention point-sets of various technological networks, such as power-grids, the internet, etc. Description of the yeast protein-protein interaction network
In the current example we used the high confidence yeast protein interaction network of Ekman et al (2006). This network contained 6379 protein-protein interaction data of 2633 yeast proteins total. These proteins comprise approximately half of the total yeast genome. In this network 2445 proteins form a connected giant component, and the residual 188 proteins are distinct from this giant component. The original paper identified 519 hubs, as proteins having more than 8 neighbors in this network. We retained this classification of hubs in our example. This protein-protein interaction network is an undirected network therefore, if we use our general selection method for directed networks, we have to take into account that undirected links represent directed links of equal (half of the original) strength in both directions. Description of the method used to determine network modules In the first step of the analysis of the protein network of this example (represented as an undirected network) community heaps according to the method of the invention were assigned to each protein of the said yeast protein-protein interaction network regarding them as 'starting proteins' of their community heap, wherein a simplified method according to the invention was used producing a set of binary values as the final set of CHLV for all links of the network. Consecutively, for each community heap, a binary community heap value (zero or one) was assigned to each protein-protein interaction of said community heap characterizing, if these interactions belong to the heap or not. For the calculation of community heap values of said protein-protein interaction, we used the 'NodeLand' method defined below in Example 4 in the case of each ' starting protein' . The community landscape height value of a protein-protein interaction was set as the sum of the community heap values of the said interaction for all 2445 'starting proteins' of the giant component of the network. Module assignment of the protein-protein interactions and proteins was performed using the 'Total Distribution Method'. In the below box we describe this method in detail.
Description of the Total Distribution Method of module assignment
Applying the 'total distribution method' the assignment of module core links remained the same (similarly to the 'proportional' method described in the first patent example), but when additional links were assigned to modules in proportion to their absolute community landscape heights of their neighboring links, neighboring links of both higher and lower community landscape heights were considered, and not only the higher or equal neighboring links (as in the 'proportional' method described before). We used this distribution method in the current example, since it reveals the highest detail of the structure of the network. Let's define the following expressions.
• ??! : number of modules
S* J : the initial strength of link \l wu : community landscape height of link U
" ! ■* : a vector of dimension m , its A: -th component is the module assignment strength of the non -module core link **£ into the module IL" . It is normalized for the initial strength of the link. It is a zero vector for module core links.
• *""*3 : a vector of dimension ??!, its & -th component is the module assignment strength of the module core link 1J into the module A: . It is normalized for the initial strength of the link. For non-module core links, its value is the zero vector. If the link ij is the module core of module s , then *".? .**J '""" "*i , while other components of ***$ are zero.
' * ~~ <i--*-! i ? , and *-* i ^^ j ij • The module assignment strength of element ϊ is calculated by
the equation *'* ^ '"*" ^-^
• E: is a set of links of the network Note: as for the undirected case, every link-indexed quantity is symmetric for reversing the endpoints of the link shown in the index (for example: '^3 "" ^J*). In the total distribution method, in undirected case for non- module core links we can apply:
Y d u k Ui^111kUi + , v y- du mu j (k,i)e E,k≠j S U (ijyΞ EJ≠i S lj ij ij i , if A not equals zero, where
A - ^ or m ue zero vector ot uherwi .se.
(k,
Figure imgf000022_0001
The link *-•* is assigned to the links connected to it, in proportion to the community landscape height of these connecting links. The module assignment strength of a link is dy = by + C1- . We note that using of directed links would result in an assignment procedure different from the one described herein.
This method results in a large number of primary modules with are extremely overlapping. If this overlap is not convenient, we may use a higher level of the hierarchical representation of the modular structure of the network, where many of the highly overlapping modules are already merged.
Satisfying condition 1: Selection of non-redundant hubs of non-adjacent module cores Applying the total distribution method we have received 50 primary protein-protein interaction modules of the 2445 connected yeast proteins in the Ekman et al. (2006) dataset. These modules represented 'protein complexes' in the sense of dynamic sets of dense protein-protein interactions. In the following table we list these primary modules in the order of their maximal community landscape height (meaning that the 'densest' protein complexes are followed by 'looser' and 'looser' protein complexes in the sense of their general interaction-intensity with the whole network). To identify these modules we also give the name of the hub protein (which, by definition has more than 8 protein neighbors in the dataset), which has the highest community landscape height in the respective primary module. (In some modules the name of the protein having the maximal community landscape height is written in brackets meaning that the protein was not a hub.) We also give the number of hubs of the respective primary module core defining them as having at least 50% of the maximal community landscape height of the respective module. In parentheses we give the number of date hubs among these as determined by Ek- man et al. (2006), we noted with an asterisk if at least 50% of the core hubs is a date hub meaning that the respective module is highly dynamic. Finally, we give a possible function based on the identified functions of the module core hubs/proteins.
Table 1
Functional assignment of yeast protein-protein interaction modules
Rank of Most central Number Possible function based on function of module module hub [protein] of core core hubs [proteins] hubs
(date hubs) 1 YERO 12 W 8 (2) proteasome regulatory complex YBRl 6OW HV cell cycle regulation
YDR394W 1 proteasome regulatory complex
YMR047C 14 (8)* nuclear pore complex
YGL048C 2 proteasome regulatory complex
YDR388W 1 (3) cytoskeleton (actomyosin)
YPL043W 2 (1)* nucleolar complex
YNL189W 2 (2)* nuclear pore complex
YDR427W 2 proteasome regulatory complex
YJR022W 14 (12)* RNA maturation (spliceosome)
YBR198C 18 (18)* general transcription factor (TATA) complex
YCR057C 6 (2) Hsp60-complex
YIL142W 5 (5)* Hsp60-complex
YMR049C 5 nucleolar complex
YCR088W 5 (1) cytoskeleton (actin)
YNLI lOC 14 ribosome (with ribosome biogenesis)
YNL263C 5 (4*) vesicle transport
YKR036C 2 (2)* transcriptional regulation
YML109W 3 (3)* transcriptional regulation
YOL149W 3 (3)* RNA maturation
YPL204W 1 (1)* [casein kinase I]
YERl 33 W 5 (3)* RNA maturation
YMR061W 7 (5)* RNA maturation
YDL140C 1 (1) RNA polymerase
YNL093W 2 (2)* vesicle transport
YLR208W 2 (1)* nuclear pore complex
YMR197C 4 (2)* vesicle transport
YBR245C 1 (6)* Nucleosome
YPR181C J vesicle transport
[YKL058W] general transcription factor (TATA) complex
YDR099W 1 (1)* vesicle transport (coated pits)
[YLR148W] vesicle transport (vacuoles)
YLR342W 3 (3)* Cell wall synthesis (and ER-Golgi transport)
[YKR020W] cytoskeleton (tubulin)
YCR002C 2 cell cycle regulation (cell fission, septin)
[YILl 05C] transcriptional regulation
YBR289W 2 (2)* transcriptional regulation (SWI/SNF complex)
YJR121W 2 mitochondrial ATPase
[YDL030W] RNA maturation (splicing)
[YDL220C] telomere complex
[YER062C] [stress-response complex (new)]
YBR087W 5 DNA replication 43 YJR066W 2 (2)* transcriptional regulation
44 [YOR254C] ER-protein translocon
45 [YOR069W] vesicle transport (Golgi-vacuole sorting)
46 YDL126C HV* proteasome regulatory complex
47 [YDR510W] proteasome regulatory complex
48 [YDR092W] proteasome regulatory complex (DNA repair)
49 [YBR251W] mitochondrial ribosome
50 [YHR005C-A] 1 (1)* mitochondrial protein import
It is important to note that the modular functions of the above Table 1 were assigned on a 'consensus-based manner' meaning that in most cases all core hubs/proteins had the same function, which verifies the general validity of our refined analysis to identify functional modules of the yeast cell. It is noteworthy, that most of the 50 modules contain at least one but most of the times a majority of date hubs and, therefore, are dynamic. The completely static modules are elements of the proteasome regulatory complex, vesicle transport and cell fission, and also include the ribosome, DNA, RNA synthesis and the mitochondrial ATPase. These latter 5 functions are the key elements of cellular life, which need a very stringent, error- free operation. A static setup of the respective protein complexes efficiently ensures this requirement.
We have listed the 19 'functional outlier proteins' besides their 181 consensually assigned peers in the following Table 2. Actually, these outlier proteins may be a source of important additional and novel biochemical information, since many of the assignments, which seemed to be irregularities at first, were supported by primary experimental data of the Pubmed database (www.pubiTicdcorn). The assignment of 25 core proteins with no known functions to functional modules may also help the biochemical analysis of these proteins.
Table 2
Functional assessment of 'outlier' proteins
ORF name Rank of Assessment of function Reference of outlier respective protein module
YERO 12W 1 PREl proteasome core protein is unique among Russell et al. (1999) J. the proteasome regulatory complexes. HowBiol. Chem. 274, 21943- ever, it has a unique nuclear co-localization 21952. with regulatory elements.
YEL037C RAD23 is an excision-repair protein, however, Guerrero et al. (2006) it is a ubiquitin receptor and part of the proteaMoI Cell Proteomics. 5, some complex. 366-378.
YER148W SPTl 5 is a TATA-binding protein but has not Bess et al. (2003) Bio- been shown to associate with the proteasome. chem J. 374, 667-675; However, at least two TATA-binding proteins Makino et al. (1999) (PRH and TIP120) were shown to interact with Genes Cells 4, 529-539 the proteasome.
YML092C PRE8 is a part of the proteasome. However, it Kiyomiya et al. (2001) may be involved in the nuclear transport of the Cancer Res. 61, 2467- proteasome. 2471.
YGR092W 18 The two hubs of this complex, the CAF4 RNA Liu et al. (2001) J. Biol. polymerase regulator and DBF2 kinase were Chem.276, 7541-7548. shown to interact functionally.
YHR061C 19 GICl GTPase binding protein was shown to inZanelli and Valentini teract with the other two transcriptional activa(2005) Genetics 171, tor hubs to regulate yeast cell polarity. 1571-1581.
YPL204W 21 Casein kinase 1 was shown to be involved in He and Moore (2005) the regulation of RNA maturation. MoI. Cell 19, 619-629.
YERl 33 W 22 GLC2 phosphatase was shown to counteract He and Moore (2005) casein kinase 1 -dependent regulation of RNA MoI. Cell 19, 619-629. maturation.
YAL043C 23 PTAl was shown to recruit GLC2 to the site of He and Moore (2005) RNA maturation. MoI. Cell 19, 619-629.
YKROOlC 28 Histone-related modularization of the putative Pavlovic et al. (1993) peroxisome-related Vpsl protein may reveal its Ciba Found Symp. 176, novel function similar to the RNA polymerase 233-243. binding of the related Mx proteins.
YKR068C 33 Linkage of the TRAPP Golgi-transport comGecz et al. (2003) Gene plex to cell wall synthesis might reveal its func320, 137-144. tional role in the functionally related SEDL proteins in a human skeletal dysplasia.
YBR254C Linkage of the TRAPP Golgi-transport comGecz et al. (2003) Gene plex to cell wall synthesis might reveal its func320, 137-144. tional role in the functionally related SEDL proteins in a human skeletal dysplasia.
YNL307C 36 Yeast GSK3 is a well known regulator of tranNeigeborn and Mitchell scriptional activity. (1991) Genes Dev. 5,
533-548.
YBL008W 37 HIR complex-made resistance of nucleosomes Prochasson et al. (2005) against SWI/SNF remodeling has been shown Genes Dev. 19, 2534- recently. 2539.
YOR038C 37 HIR complex-made resistance of nucleosomes Prochasson et al. (2005) against SWI/SNF remodeling has been shown Genes Dev. 19, 2534- recently. 2539.
YJR066W 43 The TORI (PI-3-kinase) complex may be a Loevith et al (2002) MoI novel complex regulator of yeast gene tranCell 10, 457-468. scription.
YNL006W 43 LST8 has been shown to be a part of the TORI Loevith et al (2002) MoI complex. Cell 10, 457-468
YJRlOlW 49 Superoxide dismutase was shown to be in- Zielinski et al. (2002) volved in the regulation of the ribosome. Biochem Biophys Res
Commun. 296, 1310-
1316
YHR005C-A 50 Regulation of the mitochondrial transport comMa (2001) Curr Biol 11, plex ΗM22 may be a novel element of the R869-R871. complexity of GPAl -mediated signaling pathways.
To be able to select the hub proteins of cores in non-adjacent modules, we have to map the 50 modules, and find those, which have a large distance from each other in the overall network topology. The most convenient method to do this is to treat the 50 modules as 50 elements of a hierarchically higher representation of the original protein-protein interaction network and to find their corresponding modules at this higher level (as shown in Example 1 for directed networks). When we repeated the modularization of the 50 primary modules, they could be sorted to 4 macro-modules. At this level we have a small and rather separate macro-module organized around the primary module #25 and containing smaller parts of primary modules #17, #34, #27 and #32. This macro-module is responsible for the core of vesicle traffic in yeast. The additional primary modules are highly overlapping but have two additional separate peaks of the community landscape. One of these macro-modules is best represented by primary modules #2 and #6. These modules are the cores of the yeast cytoskeletal apparatus. (It is very remarkable, that the cyclin-dependent protein kinase cdc28 segregates best together with this movement- related complex.) The last separate macro-module is organized around the primary modules #14 and #16, which is centered around ribosome biogenesis. From the above list of macro-modules, the following hubs in non-adjacent module cores may be selected as the most separate targets for multi-target anti- fungal drugs:
Table 3
List of hubs present in non-adjacent module cores that may be selected as the most separate targets for multi-target antifungal drugs
Ranks of Most central hubs Essential for Deletion inprimary of primary modules yeast viabilduces drug modules (# in parentheses ity (++ lethal, sensitivity in order of commu+ synthetic nity landscape lethal/sick) height)
25 (17,34, YNL263C (17) ++ N.D.
27,32) YGL161C (17) + 44
YGL198W (17) 42
YNL093W (25) 46
YER136W (17) ++ N.D.
YOR089C (25) + 41
YMR197C (27) ++ N.D.
YBL050W (27) ++ N.D.
2,6 YBRl 6OW (2) N.D.
YDR388W (6) + 43 YORl 81 W (6) N.D.
YDL029W (6) N.D.
YMRl 09 W (6) + 42
14,16 YMR049C (14) ++ N.D.
YDL213C (14) N.D.
YNLl 1OC (16) 43
YMR290C (16) ++ N.D. residual 35 YER012W (1) ++ N.D. primary modYDR394W (3) ++ N.D. ules YKL145W (1) N.D.
YHR200W (1) + N.D.
YDL147W (1) N.D.
YMR047C (4) N.D.
YGRl 19C (4) N.D.
YMR308C (4) ++ N.D.
YGL048C (5) N.D.
In the above Table 3 we have listed the lethal (++) and synthetic lethal/sick (+) yeast deletion strains (Tong et al., 2004; Winzeler et al., 2000) as well as the effect of a large variety of 82 drugs on the growth rate of yeast deletion strains (Parsons et al., 2006) of the specific hubs selected. 14 hubs of the 26 total either kill the yeast cell, or significantly hinder its growth, if deleted (lethal) or deleted in pair with another gene (synthetic lethal/sick). Those 3 of the residual 12 hubs, which were determined in the Parsons et al. (2006) experiment, have an equal potency with the roughly equal number, 4 synthetic lethal/sick hubs to induce drug sensitivity against approximately half of the drugs administered. Methodological considerations to satisfy condition 2: discrimination between multi-interface (party) hubs
As we described in condition 2, to achieve an even higher multi-target efficiency it is very useful to select hubs, which have their interactions at the same time (multi-interface, or party hubs), since they were shown to be much more essential for yeast survival than those hubs, which bind to their neighbors consecutively (single-interface, or date-hubs) (Eckman et al., 2006; Han et al, 2004; Kim et al, 2006).
Party hubs will be discriminated from date hubs using two measures. It is expected that party hubs will generally belong to fewer modules that date hubs. (In other words, if using the general and well-applicable measure of the 'extended number of modules', n(M) as defined in section of the description immediately preceding Example 1, date hubs will generally have a lower extended number of modules than that of party hubs.)
For the further refinement of the discrimination between party hubs and date hubs, we used an additional measure in said section above, the measure of 'intramodular bridge property', nζH1^). Due to the extremely overlapping nature of the modules determined by the currently used total distribution method noted above, date hubs generally belong significantly to a few, but weakly to multiple modules, while party hubs generally belong to multiple modules.
To assess the general validity of the above approach we have examined the combination of the two measures proposed to discriminate between date and party hubs. Date (crosses) and party hubs (triangles) were assigned based on the previous assignment procedure of Han et al, 2004) Figure 2 shows that party hubs belong to relatively few modules of the network, meanwhile among the elements which belong to the same number of modules a party hub is most expected to be an intra- modular bridge. Obviously, these assumptions are only general assumptions, which may not be valid for each individual hub. More importantly, the large uncertainty and incompleteness of the protein-protein interaction network data (Ekman et al, 2006) makes the classification of several elements only approximate. Taken these considerations into account, we gave a simple and quick method to predict/select the party hubs in a protein network, using only the topological information of the network.
Satisfying condition 2: Selection of multi-interface (party) hubs
We have refined our selection of potential targets of the hub-based multi-target drug design approach using the above two measures to discriminate between party hubs and date hubs. From the hubs identified in condition 1, we have further selected hubs having maximum half of the average extended number of modules (i.e. less than n(M) -13.3 in this case). And from these selected hubs, we kept those, which are important intra-modular bridges in minimum (n(M)/2)-l modules. From all party hubs of the network we have identified 33.7% and 76.6% of these hubs were party hubs of Ekman et al. (2006). Therefore, we have found a more stringent criterion for the identification of intra-modular hubs with our analysis. The above selection refinement dissected the hubs from our original list as follows:
Table 4 List of multi-interface (party) hubs
ORF-s of Common Party or Known effect in drug References selected name of date hub sensitivity hubs gene
YMR049C ERBl party hub Partial inhibition induces Kilian et al. (2004) Oncogene 23, chromosome instability, 8597-8602. protein synthesis errors.
YDL213C FYV14 party hub N.D.
YNLI lOC party hub N.D.
YMR290C HASl party hub Essential component of Mnaimeh et al. (2004) Cell 118, ribosome biogenesis, par31-44. tial inhibition induces drug sensitivity.
YERO 12W PREl party hub Essential component of Haugen et al. (2004) Genome Biol. the proteasome, partial in5 R95 hibition induces drug sensitivity.
YDR394W RPT3 party hub Essential ATPase of the Mayer and Fujita (2006) Biochem. proteasome involved in Soc. Trans. 34, 746-748. cell cycle regulation.
YKL145W RPTl party hub ATPase of the proteasome Chuang and Madura (2005) Genet- essential in processing ics 171, 1477-1484. damaged proteins.
YHR200W RPNlO party hub Ubiquitin receptor of the Kiss et al. (2005) Biochem. J. 391, proteasome, essential for 301-310. the recognition of damaged proteins and transcription regulation. YDL147W RPN5 party hub Essential for proteasome Yen et al. (2003) J. Biol. Chem. assembly, partial inhibi278, 30669-30676. tion leads to mitotic abnormalities.
YGL048C RPT6 party hub Essential ATPase of the Ezhkova and Tansey (2004) MoI. proteasome involved in Cell 13, 435-442; Ferdous et al. transcription regulation (2002) Biochemistry 41, 12798- and development of the 12805. histone code.
The remaining targets are all party-hubs in the Ekman et al. (2006) definition and display a variety of essential functions in yeast. Literature data indicate that their partial inhibition already leads to major disturbances in yeast transcription, protein synthesis, protein degradation, cell cycle, and induces drug sensitivity. Parallel disturbance of transcription, protein synthesis and protein degradation at their key points is expected to act in synergy, since these processes are consecutively activated in yeast. Therefore, our analysis suggests, that parallel but partial inhibition of (ERBl OR HASl) AND (PREl OR RPTl OR RPNlO OR RPT6) may be a successful strategy to develop multi-target antifungal drugs. Such target combinations are expected to lead multi-target drug-candidates, which have a significant advantage against the development of drug resistance.
Example 3
Determination of VIP -elements of the yeast protein-protein interaction networks
The discrimination of date and party hubs (Eckman et al., 2006; Han et al, 2004; Kim et al., 2006) already gave an initial example for the identification of proteins with a significant level of centrality (in this representation: proteins with a large number of neighbors, i.e. hubs), which have changing, dynamic contacts. As we have seen in Example 2, date hubs are not superior targets to disrupt key cellular functions. However, proteins with a high level of centrality and dynamism have a paramount importance to regulate cellular functions. The 'critical nodes' of signaling networks identified by Taniguchi et al (2006) are good examples for the above dynamically central proteins. These proteins are primary targets of signaling therapies in the development of e.g. anti-cancer drugs.
A much better representation of proteins with a high level of centrality and dynamism are the VTP-elements (proteins) of protein-protein interaction networks. Here centrality is analyzed not in the level of local topology like in the case of hubs (which represent a central protein for their neighbors only) but from the point of the entire network topology. As defined herein, VTP-proteins are intra-modular elements, which significantly belong to more than two modules. According to this definition VTP-proteins are not bridges between two modules, but share their contacts between at least 3 or more modules of the protein-protein interaction network. Since these modules represent different functions as we have seen in example 2, VTP-proteins are highly flexible interfaces, mediators and regulators of a variety of cellular functions. This mediator and regulatory role will be especially important, when the complex system (yeast cell) experiences stress, since VTP-proteins have a unique potential to re-direct, and re-organize inter-modular contacts. The re-organization of inter-modular contacts is the most efficient way to change the cellular response-repertoire, which becomes especially important, when the cell experiences stress, which substantially limits its resources. The stress in the above module-reorganization scenario can be generalized as any outside stimulus, which is either novel or too sudden, and for which the complex system (yeast cell) does not have a pre-made (learned) adaptive response. Current analytical tools are either unable to identify modular overlaps, or they need to delete weak links (low affinity interactions) to be able to do so (Palla et al., 2005). The high resolution analytical method provided by the present invention is the first method, which is able to identify overlapping modules with sufficient details to assess the position of VTP- elements (proteins). In the current example we used the same high confidence yeast protein-protein interaction network we already introduced in example 2 (Ekman et al., 2006). This network contained 6379 protein-protein interaction data of 2633 yeast proteins total, where 2445 proteins form a connected giant component and the residual 188 proteins are distinct from this giant component.
First we screened the protein-protein interaction network for VTP-proteins using their property not to belong any modules more than 50% (proteins in modular overlaps). Most of the proteins in the giant component were in this category. We continued the search for VTP-proteins using the more stringent criterion that they could not belong to any two modules more than 70%. This still resulted in a too large sample in this particular example (where protein-protein connections are overlapping and dense). Therefore, we further discriminated VTP-proteins by means of the ^h*2) measure. As we have already defined in the section of the specification immediately preceding Example 1, H_kl measure characterizes an absolute bridge function of the respective protein between modules K and L. On the contrary, measure h12' characterizes a relative bridge function of the same protein between modules K and L meaning the fraction of the total bridge function between modules K and L exemplified by the given protein. Measure nζh^) gives the extended number of module pairs, where the given protein plays relatively important bridge function between the two modules of the module pair. If nζh^) is large, the respective protein is an important (and relatively equally important) bridge between a large number of module-pairs.
In the following table we list the first 24 proteins (top 1%) having the largest nQ^) values from the yeast protein- protein interaction network of Ekland et al. (2006). We also note, if any of these proteins was a core date hub (c-dh), a core party hub (c-ph), a peripheral date hub (p-dh) or peripheral party hub (p-ph) identified in the analysis of example 2. To assess the dynamism of these proteins in the long-run we list the probability of their mutation rate using the data of Hirsh et al. (2005). Finally, to assess the involvement of these proteins in the response against an unusual stimulus, such as the effect of a xenobiotics we list, if the deletion of these proteins induced drug hypersensitivity in yeast in the experiment of Parsons et al, (2006) using a set of 82 drugs.
Table 5
The list VTP-proteins having the largest n(h ) values (top 1 %) from the yeast protein-protein interaction network
Figure imgf000030_0001
Figure imgf000031_0001
* extrapolated numbers taking into account the non-determined drug interactions
From the data of Table 5 it can be seen that most of the top 1% VIP-proteins are actually not hubs (i.e. have less than 8 neighbors). This is in agreement with our expectations, since the Ekman et al. (2006) dataset does not contain the majority of weak links (low affinity protein-protein interactions), which are the most relevant type of interaction for VIP-proteins. All the remaining 4 hubs in the list are peripheral hubs, i.e. lie outside the cores of all the 50 primary modules. This is a property expected from an inter-modular bridge between multiple modules. All VTP-hubs are date hubs. This is again in agreement with our expectations assuming an increased dynamism of VIP-proteins. Interestingly, one of the proteins YDR529C is classified as a party-hub in the definition of Kim et al. (2006). This apparent discrepancy makes sense if take into account that Kim et al. (2006) listed party-hubs as proteins, which have multiple binding sites simultaneously. This is a property we would actually expect from a VTP-protein. The evolution rate clearly dissects the group into two subgroups. While 7 VIP- proteins have smaller evolution rate than the median, another 7 VIP-proteins have substantially larger evolution rate than the median of the 3036 yeast genes tested by Hirsh et al. (2005). Deletion of the VIP-proteins induces sensitivity against a large variety of drugs and xenobiotics in all cases tested. This suggests their involvement in the newly developed cellular response against a novel, previously unknown stimulus. However, we have to note that currently available protein-protein interaction datasets are regretfully pauperized in low affinity interactions. This is due to the current limitations of experimental techniques, which produce a large number of false-positive results in the low affinity interaction range.
Example 4 Analysis of Western States Power Grid, USA
A practically very useful example of network examination in industry is the analysis of electricity networks. There are many answers to be found aiming the defense of complex networks against failures and disasters in usual (Barabasi & Albert, 1999, Albert et al., 2000), and in the special case of electric networks (Kinney et al., 2005). Revealing the weak points of the network structure provides design guidelines and helps preventing extremely expensive troubleshooting and repairing tasks. In this manner there are many excellent scientific studies discussing analysis of electricity networks.
In this example the analyzed network is an undirected non- weighted representation of the topology of the Western States Power Grid of the United States, compiled by Duncan Watts and Steven Strogatz. The data are from the web site of Prof. Duncan Watts at Columbia University. (Watts & Strogatz, 1998)
In general, several centrality measures have vital practical importance considering the defense/stability of networks. These measures describe how much the particular links, elements of the network reside in a central position considering the whole network. One of the milestones of network studies was the revelation claiming that degree distribution of natural networks is scale free in many cases (the degree of an element is the number of links of the given element). (Barabasi & Albert, 1999) So there are such elements called ,,hubs" whose degree are greater in magnitude than the average degree of elements. More specifically the probability of the existence of an element with one magnitude greater degree is just only about a magnitude less, but this leads to the existence of nodes with very large degree in a complex enough system with appropriately large number of elements. According to this, the degree of an element is usually considered as the measure of its centrality. In this case the centrality of a link is defined by the product (degree_of_A) * (degree of B) where A and B are the two endpoints of the link.
Along with the degree based centrality measures the betweenness centrality, a measure assigned to the links of the network, is generally used for a long time in the field of analysis of complex systems. First it was worked out in the field of sociology and used in social networks. (Anthonisse, 1971 ; Freeman, 1977) It refers to the number of shortest paths between any two points of the network across the given link, and it is a preferred centrality measure in network clustering (Newman, 2004) and network stability (Holme et al, 2002).
The community landscape height value can also be regarded as a kind of centrality value. It is important to recognize that centrality measures based upon the degree of the elements, utilize the local properties of an element, on the other hand, betweenness centrality value - taking into account the shortest paths of the network - considers the whole network. While both values can be used for industrial purposes, neither approach is perfect because generally nearby regions (belonging to the same/close module) are evidently more determining in terms of the centrality of the element as other parts in distant modules. Our community heaps allow us to specify certain extent of importance of an element compared to other elements in the network. It takes into consideration the less important, distant regions, but with less measure. So, community landscape value - derived from community heap values - eliminates the above problem and can associate a more applicable centrality value to elements and links of the network.
Determination of community heap values using NodeLand method in case of a given starting element The NodeLand heap method assigns a community heap value to all links (CHLV) and elements (CHEV) of an undirected network, where the community heap value of given element or link can be only zero or one. The expressive idea behind this choice is that the community heap value of the given element or link signs whether the given element or link 'belongs' to the community heap of the given starting element.
(a) The NodeLand method starts from a specified element of the network, called the 'starting element'. In the beginning, each CHEV and CHLV is set to zero, except the CHEV of the starting element which is initialized to one.
(b) In this step a neighboring value (NV) is calculated for every element (NEV) and link (NLV) of the network. The NLV of a given link is set to one, if one end point element of the given link belongs to the community heap while the other end point element does not, otherwise the NLV of the given link is set to zero. The NEV of a given element is set to zero if the given element belongs to the community heap, otherwise the NEV of the given element is calculated using the following formula: (S+N) / (E+l) where S = 'the sum of the strength of the links belonging to the community heap', N =
'the sum of the strength of links connected to the given element and having NLV of one', and E = 'the number of elements belonging to the community heap'.
A threshold value is defined by the formula S/E, where S and E have the same meaning as above. Elements with the maximal NEV are selected, and if the NEV of selected elements is equal or higher than said threshold value, the CHEV of said selected elements is set to one and the CHLV of links connected to said selected elements and having non-zero NLV is set to one.
At the end of step (b) the CHLV of links, whose both end point elements have CHEV of one but whose CHLV equals zero is set to one.
(c) Step (b) is repeated while the maximal NEV calculated in the actual step (b) is non-zero, and is not lower than the threshold value of the given step (b). The NodeLand method is finished for the given starting element once the maximal NEV is zero or is lower than said threshold. The actual CHEV of each element and the actual CHLV of each link belonging to the community heap of the given starting element will be the final community heap value of the given element or link.
Thus, we performed the NodeLand method, a particularly useful embodiment of the present invention. A community heap was assigned to every element of the network resulting in a set of binary community heap values for each link. The community landscape value of a link is calculated by multiplying the initial strength of the given link by the number of community heaps the given link belongs to. As the input network used in this example was non-weighted, the initial strengths of links were set to one, so the multiplication can be omitted. The resulting community landscape values serve as our centrality measures.
After that, we determined other two widely used centrality values: the betweenness centrality and the degree of the links. With these three values, the links of the networks were sorted in descending order separately and the following three tests were performed. Every test was made on a separate network at the same time. Every centrality value has its own net- work (initially the same network): the above presented power gird. In every step, the link with the biggest centrality value was removed from the network. So, the number of connections in each network was decreased by one in every iteration. The so-called inverse geodesian length was calculated for the reduced networks to measure the transmissivity of the network (Holme et al., 2002). The results are summarized in Figure 3.
The less the inverse geodesian length is, the harder the reachability of the elements from each other (the bigger the av- erage step size between two element). Consequently, as the figure shows, removing links on the grounds of the community landscape height causes the network to collapse earlier. So, in this sense the herein defined community landscape height value gives a better characterization of the importance of the links of the network than formerly used centrality values, and helps to select the power lines, which need special protection.
As it was clearly demonstrated by the above description and examples, the present inventors have elaborated high resolution network analyzing methods based on an entirely novel concept. The high resolution network analyzing methods of the invention enables the present inventors to modularize complex networks with an unprecedented high resolution, identifying and analyzing thereby the structure of so far hidden module overlaps and identifying so far hidden key elements of complex networks which play important roles in the interactions among apparently very distant parts of the analyzed com- plex networks. List of references
I. Adjei, A.A. and Hidalgo, M. (2005) Intracellular signal transduction pathway proteins as targets for cancer therapy. J. Clin. Oncol. 23, 5386-5403. 2. Agnati, L.F., Santarossa, L., Genedani, S., Canela, E.I., Leo, G., Franco, R., Woods, A., Lluis, C, Ferre, S. and Fuxe, K. (2004) On the nested hierarchical organization of CNS: basic characteristics of neuronal molecular networks. In: Lecture Notes in Computer Science (eds.: P. Erdi, A. Esposito, M. Marinaro and S. Scarpetta), Springer Verlag, pp.24- 54.
3. Albert, R., Jeong, H. and Barabasi, L. (2000) Attack and error tolerance of complex networks. Nature 406, 378-382. 4. Allesina, S. and Bodini, A. (2004) Who dominates whom in the ecosystem? Energy flow bottlenecks and cascading extinctions. J. Theor. Biol. 230, 351-358.
5. Anthonisse, J.M.(1971) The rush in a directed graph. Technical Report BN9/71, Stichting Mathematisch Centrum, Amsterdam.
6. Balazsi, G. and Oltvai, Z.N. (2005) Sensing your surroundings: how transcription-regulatory networks of the cell dis- cern environmental signals. Sci STKE. pe20.
7. Batagelj, V. and Mrvar, A. (1998) Pajek - Program for Large Network Analysis. Connections 21, Al -51.
8. Barabasi, A.L. and Albert, R. (1999) Emergence of scaling in random networks. Science 286, 509-512.
9. Barabasi, A.L. and Oltvai, Z.N. (2004) Network biology: understanding the cell's functional organization. Nature Rev. Gen. 5, 101-114. 10. Borisy, AA, Elliott, PJ., Hurst. N.W., Lee, M.S., Lehar, J., Price, E.R., Serbedzija, G., Zimmermann, G.R., Foley, M.A., Stockwell, B.R. and Keith, CT. (2003) Systematic discovery of multicomponent therapeutics. Proc. Natl. Acad. Sci. U. S. A. 100, 7977-7982.
I 1. Breitkreutz, BJ., Stark, C, Tyers M. (2003) Osprey: A Network Visualization System. Genome Biol.4, R22.
12. Burt, R. (1995) Structural holes: the social structure of competition. Harvard University Press, Cambridge MA, USA 13. Burt, R. (2005) Brokerage and closure: an introduction to social capital. Oxford University Press, Oxford UK
14. Chen, Y.L., Law, P.Y. and Loh, H.H. (2005) Inhibition of PI3K/Akt signaling: an emerging paradigm for targeted cancer therapy. Curr. Med. Chem. Anticancer Agents 5, 575-589.
15. Csermely, P. (2005) A rejtett halόzatok ereje. Hogyan stabilizaljak a vilagot a gyenge kapcsolatok? (Hung.) Vince Publishers, 376 pp. (www.weaklink.sote.hu/halozat.html) 16. Csermely, P. (2006) Weak links: Stabilizers of Complex Systems from Proteins to Social Networks, Springer Verlag, 396 pp. (www.weaklink.sote.hu/weakbook.html)
17. Csermely, P., Agoston, V. and Pongor, S. (2005) The efficiency of multi-target drugs: the network approach might help drug design. Trends Pharmacol. Sci.26, 178-182.
18. Duch, J. and Arenas, A. (2005) Community detection in complex networks using extremal optimization. Phys. Rev. E. 72, 027104
19. Ekman, D., Light, S., Bjorklund, A.K. and Elofsson, A. (2006) What properties characterize the hub proteins of the protein-protein interaction network of Saccharomyces cerevisiae? Genome Biol. 7, R45.
20. Freeman, L.C.(1977) A set of measuring centrality based on betweenness. Sociometry 40, 35-41.
21. Garlaschelli, Caldarelli, G. and Pietronello, L. (2003) Universal scaling relations in food webs. Nature 423, 165-168. 22. Gavin, A. C, Aloy, P., Grandi, P., Krause, R., Boesche, M., Marzioch, M., Rau, C, Jensen, L. J., Bastuck, S., Dum- pelfeld, B., Edelmann, A., Heurtier, M. A., Hoffman, V., Hoefert, C, Klein, K., Hudak, M., Michon, A. M., Schelder, M., Schirle, M., Rernor, M., Rudi, T., Hooper, S, Bauer, A., Bouwmeester, T., Casari, G., Drewes, G., Neubauer, G.,
Rick, J. M., Kuster, B., Bork, P., Russell, R. B. and Superti-Furga, G. (2006) Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 631-636.
23. Gfeller, D., Chappelier, J.-C. and De Los Rios, P. (2005) Finding instabilities in the community structure of complex networks. Cond-mat/0503593.
24. Girvan, M. and Newman, M.EJ. (2002) Community structure in social and biological networks. Proc. Natl. Acad. Sci. U. S. A.99, 7821-7826.
25. Goh, K-L, Salvi, G., Kahng, B. and Kim, D. (2005) Skeleton and fractal scaling in complex networks. Cond- mat/0508332. 26. Granovetter, M. (1973) The strength of weak ties. Am. J. Sociology 78, 1360-1380.
27. Guimera, R. and Amaral, L.A.N. (2005) Functional cartography of complex networks. Nature 433, 895-900.
28. Han, L-DJ, Bertin, N, Hao, T, Goldberg, D.S, Berriz, G.F, Zhang, L.V, Dupuy, D, Walhout, A.J.M., Cusick, M.E, Roth, F.P. and Vidal, M. (2004) Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature 430, 88-93 29. Hirsh, A.E, Fraser, H.B. and Wall, D.P. (2005) Adjusting for selection on synonymous sites in estimates of evolutionary distance. MoI. Biol. Evol.22, 174-177.
30. Holme, P, Beom, LK. (2002) Attack vulnerability of complex networks. Phys. Rev. E 65, 056109
31. Hornberg, ].]., Bruggeman, FJ,, Westerhoff, H.V. and Lankelma, L (2006) Cancer: a Systems Biology disease. Bio- systems 83, 81-90. 32. Huang, S. (2002) Rational drug discovery: what can we learn from regulatory networks? Drug Discov. Today 7, S 163- S 169.
33. Kim, P.M., Lu, LJ, Xia, Y. and Gerstein, M.B. (2006) Relating three-dimensional structures to protein networks provides evolutionary insights. Science 314, 1938-1941.
34. Kinney, R. Crucitti, P, Albert, R. and Latora, V. (2005), Modeling Cascading Failures in the North American Power Grid. American Physical Society, APS March Meeting, March 21-25, 2005, abstract #A24.010
35. Kohn, KW. (1999) Molecular Interaction Map of the Mammalian Cell Cycle Control and DNA Repair Systems. MoI. Biol. Cell 10, 2703-2734.
36. Masuda and Konno (2005) VLP-club phenomenon: inevitable emergence of elites and masterminds in social networks. Cond-mat/0501129. Social Networks, in press 37. Moreno, Y.; Gomez, L B.; Pacheco, A. F. (2002) Instability of scale-free networks under node-breaking avalanches. Europhys. Lett, 58 (4), 630.
38. Newman, M.EJ. (2004) Fast algorithm in detecting community structure in networks. Phys. Rev. E 69, 066133.
39. Newman, M.EJ. and Girvan, M. (2004) Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113. 40. Palla, G, Derenyi, I, Farkas, T. and Vicsek, T. (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 814-818
41. Papin, LA, Hunter, T, Palsson, B.O. and Subramaniam, S. (2005) Reconstruction of cellular signaling networks and analysis of their properties. Nat. Rev. MoI. Cell Biol. 699-111.
42. Papin, LA, Reed, LL. and Palsson, B.O. (2004) Hierarchical thinking in network biology: the unbiased modularization of biochemical networks. Trends Biochem. Sci. 29, 641-647. 43. Parsons, A.B., Lopez, A., Givoni, I.E., Williams, D.E., Gray, C.A., Porter, J., Chua, G., Sopko, R., Brost, R.L., Ho,
C.H., Wang, J., Ketela, T., Brenner, C, Brill, J.A., Fernandez, G.E., Lorenz, T.C., Payne, G.S., Ishihara, S., Ohya, Y., Andrews, B., Hughes, T.R., Frey, BJ., Graham, T.R., Andersen, RJ. and Boone, C. (2006) Exploring the mode-of- action of bioactive compounds by chemical-genetic profiling in yeast. Cell 126, 611-625. 44. Poyatos, J.F. and Hurst, L.D. (2004) How biologically relevant are interaction-based mocules in protein networks? Genome Biol. 5, R93.
45. Radicchi, F., Castellano, C, Cecconi, F., Loreto, V. and Parisi, D. (2004) Defining and identifying communities in networks. Proc. Natl. Acad. Sci. U. S. A. 101, 2658-2663.
46. Ravasz, R., Somera, A.L., Mongru, D.A., Oltvai, Z.N. and Barabasi, A.L. (2002) Hierarchical organization of modular- ity in metabolic networks. Science 297, 1551-1555.
47. Reichardt, J. and Bornholdt, S. (2004) Detecting fuzzy community structures in complex networks with a Potts model. Phys. Rev. Lett. 93, 218701.
48. Rogers, E.M. (2003) Diffusion of innovations. Free Press, New York NY, USA.
49. Rosen, E. (2000) The anatomy of buzz. Doubleday, New York NY, USA. 50. Song, C, Havlin, S. and Makse, HA. (2005) Self-assembly of complex networks. Nature 433, 392-395.
51. Spirin, V. and Mirny, LA. (2003) Protein complexes and functional modules in molecular networks. Proc. Natl. Acad. Sci. U. S. A. 100, 12123-12128.
52. Taniguchi, CM., Emanuelli, B. and Kahn, CR. (2006) Critical nodes in signaling pathways: insights into insulin action. Nat. Rev. MoI. Cell. Biol. 7, 85-96. 53. Tong, A.H., et al. (2004) Global mapping of the yeast genetic interaction network. Science 303, 808-813.
54. Valente, A.X.C.N. and Cusick, M.E. (2006) Yeast protein interactome topology provides framework for coordinated- functionality. NuclAc. Res. 22, 2812-2819.
55. Wasserman, S. and Faust, K. (1994) Social network analysis. Cambridge University Press. Cambridge, UK.
56. Watts, DJ. and Strogatz S. H., (1998) Collective dynamics of "small-world networks, Nature 393, 440-442 57. Winzeler, E.A., et al. (1999) Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285, 901-906.
58. Wilkinson, D.M. and Huberman, BA. (2004) A method for finding communities of related genes. Proc. Natl. Acad. Sci. U. S. A. 101, 5241-5248.
59. Yip, A.M. and Horvath, S. (2005) The generalized topology overlap matrix for detecting modules in gene networks. htip://www.genetics.ucla.edu'labs/liorvatli''GTQM
60. Yu, H. and Gerstein, M. (2006) Genomic analysis of the hierarchical structure of regulatory networks.Proc. Natl. Acad. Sci. USA 103, 14724-14731.

Claims

What is claimed is:
1. A method for analyzing the fine structure of a network comprising a plurality of elements of identical nature and links between said elements wherein each link being a directed or undirected connection between two elements and having a strength representing the intensity of the connection between said connected elements, the strengths of each of said links being commensurable with each other and each of said elements being connected to at least one other element; said elements representing predetermined physical, chemical or biological entities; said method comprising the following steps:
(i) taking each element or link of said network, one by one, as a starting point, calculating a 'community heap element value' (CHEV) for each element and/or a 'community heap link value' (CHLV) for each link, in a step-by-step process, characterizing the extent to which each element and/or each link belongs to the community heap of said starting element or starting link, by gradually exploring said network starting from said starting element or starting link; said step-by-step process comprising the following steps:
(a) setting the starting CHEV and/or CHLV to zero for each element and link, except for said starting element or link, the starting CHEV or CHLV for which being set to a predetermined positive value differing from zero;
(b) calculating, in a predetermined manner, a 'neighboring value' (NV) for each element (NEV) and/or each link (NLV) having a zero CHEV and/or CHLV, characterizing the intensity of the cumulated connectedness of said element and/or link to the actual community heap of said starting element and/or link; adding the elements) and/or link(s) having the highest NV to the actual community heap of said starting element or link; and calculating, in a predetermined manner, an actual CHEV and/or CHLV for all elements and links belonging to the actual community heap of said starting element or starting link; (c) repeating step (b) until the maximum NV defined in actual step (b) becomes zero or becomes lower than a threshold value calculated in a predetermined manner and considering the actual CHLV and CHEV of each link and element, respectively, as the final CHLV and CHEV characterizing the extent to which each link and element belongs to said community heap of said starting element;
(ii) repeating steps (a)-(c) for each element taken as starting element, rendering thereby a set of final CHLV and CHEV to each link and element, respectively, said set representing the extent to which said link or element belongs to the community heaps of all elements of the network;
(iii) representing said set of CHLVs or CHEVs of each link or element, respectively, by a corresponding number calculated according to predetermined integration rules and considering said number as the 'community landscape height value' of said link or element characterizing the centrality of said link or element in said network from the cumulated viewpoint of all links or elements of said network;
(iv) analyzing the fine structure of said network and the special roles certain elements play in said network on the basis of combined consideration of the topology, strengths and directions of the links connecting certain elements or groups of elements of interest together with the community landscape height values of said links and/or elements.
2. A method according to claim 1, wherein all said links being directed links and wherein said CHEV and CHLV for each element and link of said network is defined in one single step-by-step process, wherein both said actual CHEV and
CHLV values being calculated in step (b) as follows: defining a 'neighboring link value' (NLV) for each link, said value being set to zero for all links the actual CHLV of which differs from zero or the actual CHEV of the starting point element of which is zero, wherein said NLV for each link being set to be higher when the strength and/or the actual CHEV of the starting point element of said link is higher; then defining a 'neighboring element value' (NEV) for each element the actual CHEV of which being zero and being the end point element of one of said links having a NLV differing from zero, wherein said NEV of said element being set to be higher when the cumulated NLV of all links ending at said element is higher while said NEV of all other elements being set to zero; and selecting one ore more element having zero as the actual CHEV and having the highest NEV; selecting all links starting from elements having greater than zero actual CHEV and ending at said selected elements and selecting all links, start- ing from said selected elements and ending at elements having greater than zero actual CHEV, and selecting all links starting from any of said selected elements and ending at any of said selected elements, and increasing the actual CHEV of each said selected element, wherein the extent of said increasing being higher when said NEV is higher, and increasing the actual CHLV of each said selected link, wherein the extent of said increasing being higher when said NLV of said link is higher.
3. A method according to claim 1 or claim 2, wherein said CHEV of each starting element being set to one, said start- ing CHLV of each link being set to zero, and when a zero CHEV or CHLV is increased in any step, said increased actual CHEV or CHLV is set to remain zero until said actual CHEV or CHLV reaches a threshold value, calculated in a predetermined manner, and becomes one when exceeding said threshold value, and when a CHEV or CHLV being one is increased in any step, said increased CHEV or CHLV remains one, whereby all set of CHEVs and CHLVs of said elements and links of said network will exclusively comprise zero and one values.
4. A method according to any of claims 1-3, wherein said increasing of the actual CHLV or CHEV from zero is carried out only once for each link and each element and any further increasing of said actual CHLV or CHEV in any further step is omitted.
5. A method according to any of claims 1 -4 for analyzing the fine structure of an undirected network, wherein each said connection between said elements being undirected connection, wherein said actual CHEV or CHLV values being calcu- lated in step (b) as follows: defining a NLV for each link, said NLV being set to zero for all links the actual CHEVs of the end point elements of which are both zero or non-zero, respectively, wherein said NLV for each further link being set to be higher when the strength and/or the actual CHEV of the end point element of said link having a non-zero CEHV is higher; then defining a NEV for each element the actual CHEV of which being zero and being connected to one of said links having a NLV differing from zero, wherein said NEV of said element being set to be higher when the cumulated NLV of all links connected to said element is higher while said NEV of all other elements being set to zero; and selecting one ore more element having zero as the actual CHEV and having the highest NEV; selecting all links connecting elements having greater than zero actual CHEV and one of said selected elements, further selecting all links being connected exclusively to said selected elements, and increasing the actual CHEV of each said selected element, wherein the extent of said increasing being higher when said NEV is higher, and increasing the actual CHLV of each said selected link, wherein the extent of said increasing being higher when said NLV of said link and/or the strength of said link and/or the actual CHEVs of said elements connected by said links are higher.
6. A method for high resolution modularization of a network as defined in claim 1 comprising the steps of:
(i) rendering a community landscape height value to all elements or links of said network according to the method of any of claim 1 -5 ;
(ii) identifying the elements or links having local maximum community landscape height values as module core elements or links of said network;
(iii) determining the extent to which each element or link belongs to any particular module core element or link of said network by gradually exploring the surrounding of each module core element or link, and rendering a set of 'module core assignment values' to each element or link, each of said values characterizing the extent of the assignment of said element or link to a respective module core element or link, wherein each said module core assignment value for each element or link is determined as a function of the module core assignment values of all neighboring elements or links, and the sum of said module core assignment values for each element or link correlating with the community landscape height value of said element or link in accordance with predetermined calculation rules; and
(iv) arbitrarily defining a threshold module core assignment value and each element or link having a higher module core assignment value than said threshold value with respect to any module core element or link is considered as belonging to the module defined by said module core element or link.
7. A method for identifying elements or links of a network as defined in claim 1 being typically situated in module overlaps, comprising the following steps:
(i) allocating all elements or links of said network to all identified module cores according to the method of claim 6; and
(ii) identifying elements or links not being assigned to belong to any module core by more than 50% of their community landscape height values as being situated in module overlaps.
8. A method for identifying elements or links of a network as defined in claim 1 supposedly playing special roles in said network ('VIP-elements' or 'VIP-links'), comprising the following steps: (i) identifying elements or links typically situated in module overlaps according to the method of claim 7; and
(ii) identifying elements or links being typically situated in module overlaps and not being assigned to belong to any two module cores aggregately by more than 70% of their community landscape height values as VIP-elements or VTP-links.
9. A method according to any one of claims 1-8, wherein said elements are selected from the group consisting of atoms of a macromolecule, such as a protein, a DNA-molecule, an RNA-molecule or a polysaccharide; proteins, such as proteins of a cell's signaling network, a cell's cytoskeletal network or a cell's gene expression regulatory network, proteins present in a particular cell membrane or cell organelle, proteins having special enzymatic or regulatory functions; coenzymes; cells of an organism, such as nerve cells or immune cells; microorganisms, groups of microorganisms; technical devices, such as computers, computer or microchip controlled devices, robots, transportation or communication devices, telephones, mobile telephones, radios, televisions, elements of a pipeline, communication or transportation network elements, power grid ele- ments, digital organisms and elements of a technical device.
10. A method according to any one of claims 1-8, wherein said links are selected from the group consisting of covalent or non-covalent bonds between atoms of a macromolecule, such as proteins, a DNA-molecules, RNA-molecules or a polysaccharides; protein-protein interactions; enzyme-coenzyme interactions; intracellular or intercellular interactions; specific interactions between microorganisms or groups of microorganisms and specific interactions between technical devices or parts of technical devices, advantageously communicative interactions.
11. A method for improving at least one important characteristic of a network as defined in claim 1 , comprising: (i) identifying at least one VTP-element or VTP-link of said network according to the method of claim 8;
(ii) altering, multiplying, deleting or replacing said at least one VIP-element or VIP-link of said network, producing thereby an altered network; (iii) comparing at least one important characteristic of said altered network to that of the original network and effecting another replacement or alteration on the same or another VIP-element or VIP-link of the original network if said important characteristic is not improved sufficiently.
12. The method according to any one of claims 1-11, wherein said method is carried out by using a computing device, advantageously a computer, a microprocessor or a chip.
13. A non-programmable processing device, advantageously a microprocessor or a chip, capable of carrying out the method according to any one of claims 1-11.
14. Use of a computing device, advantageously a computer, a microprocessor or a chip for carrying out the method according to any one of claims 1-11.
15. A computer readable data carrier comprising a computer readable algorithm suitable for carrying out the method according to any one of claims 1-10.
PCT/IB2007/050471 2006-02-13 2007-02-13 Method for analyzing the fine structure of networks WO2007093960A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
HUP0600116 2006-02-13
HU0600116A HU0600116D0 (en) 2006-02-13 2006-02-13 Method of selecting elements of technical and other networks reacting optimal to sudden effects, capable to develop in leaps, needing special protection and development, or being capable to act efficiently

Publications (1)

Publication Number Publication Date
WO2007093960A2 true WO2007093960A2 (en) 2007-08-23

Family

ID=89986580

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2007/050471 WO2007093960A2 (en) 2006-02-13 2007-02-13 Method for analyzing the fine structure of networks

Country Status (2)

Country Link
HU (1) HU0600116D0 (en)
WO (1) WO2007093960A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045574A (en) * 2015-06-24 2015-11-11 广东电网有限责任公司电力科学研究院 Software key function identification method based on complex network fault propagation

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045574A (en) * 2015-06-24 2015-11-11 广东电网有限责任公司电力科学研究院 Software key function identification method based on complex network fault propagation

Also Published As

Publication number Publication date
HU0600116D0 (en) 2006-04-28

Similar Documents

Publication Publication Date Title
Moore et al. A curated benchmark of enhancer-gene interactions for evaluating enhancer-target gene prediction methods
Huang et al. Constructing prediction models from expression profiles for large scale lncRNA–miRNA interaction profiling
Li et al. GeNets: a unified web platform for network-based genomic analyses
Jin et al. Finding research trend of convergence technology based on Korean R&D network
Kabir et al. Identification of active signaling pathways by integrating gene expression and protein interaction data
Alcalá-Corona et al. The hierarchical modular structure of HER2+ breast cancer network
Cui et al. Protein evolution on a human signaling network
Tu et al. Differential network analysis by simultaneously considering changes in gene interactions and gene expression
Rizman Žalik Evolution algorithm for community detection in social networks using node centrality
Chirom et al. Network medicine in ovarian cancer: topological properties to drug discovery
Cui et al. MMCO-Clus–an evolutionary co-clustering algorithm for gene selection
Nowakowska et al. Topological analysis as a tool for detection of abnormalities in protein–protein interaction data
Sun et al. Drug repositioning with adaptive graph convolutional networks
Gan et al. Entropy-based inference of transition states and cellular trajectory for single-cell transcriptomics
Bonomo et al. Topological ranks reveal functional knowledge encoded in biological networks: a comparative analysis
Zhang et al. Signed network propagation for detecting differential gene expressions and DNA copy number variations
Juan et al. Systems biology: applications in cancer-related research
Vimaladevi et al. A microarray gene expression data classification using hybrid back propagation neural network
WO2007093960A2 (en) Method for analyzing the fine structure of networks
Varrone et al. CellCharter: a scalable framework to chart and compare cell niches across multiple samples and spatial-omics technologies
Sree et al. Identification of Promoter Region in Genomic DNA Using Cellular Automata Based Text Clustering.
Nguyen et al. MGKA: A genetic algorithm-based clustering technique for genomic data
Chereddy et al. Predicting the Driver Variants and Mutations in Lung Cancer Genome using Transcriptional Regulation Network
Roy et al. Soft computing approaches to extract biologically significant gene network modules
Zhang et al. Definition of a new metric with mutual exclusivity and coverage for identifying cancer driver modules

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07705868

Country of ref document: EP

Kind code of ref document: A2