WO2012051757A1

WO2012051757A1 - Method and tool suite device for identifying modular structure in complex network

Info

Publication number: WO2012051757A1
Application number: PCT/CN2010/077949
Authority: WO
Inventors: Rui Wang; Xingyuan Chen; Huachang Li
Original assignee: Beijing Prosperous Biopharm Co., Ltd.
Priority date: 2010-10-21
Filing date: 2010-10-21
Publication date: 2012-04-26
Also published as: CN102667710A; CN102667710B

Abstract

Disclosed is a tool suite device for identifying a modular structure in a complex network using a computing system with a CPU and a parallel processing device. The tool suite device comprises a data reading means for reading task data; a block storage means for storing a predefined set of sub-blocks each of which indicates a particular process; a determining means for determining a task block for assigning subtask process to be performed in a plurality of parallel processing units on the parallel processing device, respectively, from the predefined set of sub-blocks stored in the block storage means; a first interface for receiving the task block transferred from the determining means; a dispatcher means for dividing the task data into data subsets; a second interface for receiving the task data from the dispatcher means; a first frontend for receiving the task block from the first interface; an assembler means for receiving the task block passed from the first frontend, generating the subtask process readable by the parallel processing units from the task block and assigning the subtask process to the parallel processing units; a second frontend for receiving the data subsets from the second interface and passing data subsets to the parallel processing units; the parallel processing units for performing in parallel, the subtask process on the data subsets, respectively, to obtain parallel results; and a classification means for processing the parallel results to obtain the modular structure in the complex network.

Description

METHOD AND TOOL SUITE DEVICE FOR IDENTIFYING

MODULAR STRUCTURE IN COMPLEX NETWORK

FIELD OF THE INVENTION [0001] The present invention relates generally to modular structure identification, and in particular, to a method and a tool suite device for identifying modular structure in a complex network, and a computing system.

BACKGROUND OF THE INVENTION

[0002] A complex network is a network with non-trivial topological features. The study of complex networks is inspired by empirical study of real networks such as networks in biotechnology (cell networks, protein-protein interaction networks, neuro-networks), Internet/WWW (World Wide Web) networks, social networks, etc. One of the most pervasive features in such real networks is the existence of modular structure, or clustering, i.e., if a graph is used to represent a complex network, organization of vertices in the clusters, with many edges within the same cluster and relatively few edges connecting vertices from different clusters. Identifying modular structure in a complex network is of great importance for understanding real problems the graph represents, e.g., tracking online viruses, community behaviors analysis in, for example, social network services, detecting important gene functions, etc. [0003] There have been some methods in the related art of identifying the modular structure in a complex network which are based on sequential computing devices, namely, CPUs (Central Processing Units). Execution time of the traditional methods, such as hierarchical clustering, partitioning clustering and spectral clustering, divisive methods, e.g., Girvan and Newman algorithm, and modular-based greedy algorithm, takes hours to complete the computation due to the massive scale of the complex networks. For instance, in the field of social networks, Facebook announced 400 million users in February, 2010. Hence, a corresponding graph has millions of edges and vertices. As a result, the existing real complex networks have the characteristics of massive amount of data, high computational complexity in both time and storage space. Detecting modular structures in the complex networks using CPUs have exposed to disadvantages of long execution time, low user interaction and energy inefficiency. On the flip side, supercomputer workstations or high performance computing clusters, though capable of completing the computation in a short time, are expensive, developer-unfriendly, and raise an entrance barrier of common research and business entities.

[0004] Thus, there is a need for a method and a tool suite device for identifying modular structure in a complex network, and a computing system using a computing system with a CPU and a parallel processing device such as GPU (Graphic Processing Unit), or processing units distributed on a network, which are capable of completing the computation in a short time while saving in cost.

SUMMARY OF THE INVENTION

[0005] The present invention provides a tool suite device, a method and a system for identifying a modular structure in a complex network that are capable of completing the computation in a short time while saving in cost.

[0006] According to one aspect of the present invention, a tool suite device for identifying a modular structure in a complex network using a computing system with a CPU and a parallel processing device, the tool suite device comprising a data reading means on the CPU for reading task data which includes nodes in the complex network, edges with values indicating relationships among the nodes, and task parameter for a task to be performed on the complex network; a block storage means on the CPU for storing a predefined set of sub-blocks each of which indicates a particular process; a determining means on the CPU for determining a task block for assigning subtask process to be performed in a plurality of parallel processing units on the parallel processing device, respectively, from the predefined set of sub-blocks stored in the block storage means according to the task data; a first interface on the CPU for receiving the task block transferred from the determining means; a dispatcher means on the CPU for dividing the task data into a plurality of data subsets with respect to the plurality of parallel processing units; a second interface on the CPU for receiving the task data transferred from the dispatcher means; a first frontend on the parallel processing device connected to the first interface for receiving the task block transferred from the first interface; an assembler means on the parallel processing device for receiving the task block passed from the first frontend, generating the subtask process readable by the plurality of parallel processing units from the task block and assigning the subtask process to the plurality of parallel processing units; a second frontend on the parallel processing device connected to the second interface for receiving the plurality of data subsets from the second interface and passing the plurality of data subsets to the plurality of parallel processing units, respectively; the plurality of parallel processing units for performing in parallel, the subtask process assigned by the assembler means on the data subsets, respectively, to obtain parallel results; and a classification means for processing the parallel results to obtain the modular structure in the complex network.

[0007] According to another aspect of the present invention, a method of identifying a modular structure in a complex network using a computing system with a CPU and a parallel processing device, the method comprising reading, by a data reading means on the CPU, task data which includes nodes in the complex network, edges with values indicating relationships among the nodes, and task parameter for a task to be performed on the complex network; determining, by a determining means on the CPU, a task block for assigning subtask process to be performed in a plurality of parallel processing units on the parallel processing device, respectively, from a predefined set of sub-blocks each of which indicates a particular process, according to the task data, and transferring the task block to a first interface on the CPU; transferring the task block, by the first interface, to a first frontend on the parallel processing device; passing, by the first frontend, the task block to an assembler means on the parallel processing device; generating, by the assembler means, the subtask process readable by the plurality of parallel processing units from the task block and assigning the subtask process to the plurality of parallel processing units; dividing, by a dispatcher means on the CPU, the task data into a plurality of data subsets with respect to the plurality of parallel processing units; transferring, by a second interface on the CPU, the plurality of data subsets to a second frontend on the parallel processing device; passing, by the second frontend, the plurality of data subsets to the plurality of parallel processing units, respectively; performing in parallel, by the plurality of parallel processing units, the subtask process assigned by the assembler means on the data subsets, respectively, to obtain parallel results; and processing, by a classification means, the parallel results to obtain the modular structure in the complex network.

[0008] According to another aspect of the present invention, A system for identifying modular structure in a complex network comprising a CPU and a parallel processing device. The CPU includes a data reading means for reading task data which includes nodes in the complex network, edges with values indicating relationships among the nodes, and task parameter for a task to be performed on the complex network; a block storage means for storing a predefined set of sub-blocks each of which indicates a particular process; a determining means for determining a task block for assigning subtask process to be performed in a plurality of parallel processing units on the parallel processing device, respectively, from the predefined set of sub-blocks stored in the block storage means according to the task data; a first interface for receiving the task block transferred from the determining means; a dispatcher means for dividing the task data into a plurality of data subsets with respect to the plurality of parallel processing units; and a second interface for receiving the task data transferred from the dispatcher means. The parallel processing device includes: a first frontend on the parallel processing device connected to the first interface for receiving the task block transferred from the first interface; an assembler means on the parallel processing device for receiving the task block passed from the first frontend, generating the subtask process readable by the plurality of parallel processing units from the task block and assigning the subtask process to the plurality of parallel processing units; a second frontend on the parallel processing device connected to the second interface for receiving the plurality of data subsets from the second interface and passing the plurality of data subsets to the plurality of parallel processing units, respectively; the plurality of parallel processing units for performing in parallel, the subtask process assigned by the assembler means on the data subsets, respectively, to obtain parallel results; and a classification means for processing the parallel results to obtain the modular structure in the complex network.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] The foregoing summary of the invention, as well as the following detailed description of exemplary embodiments of the invention, is better understood when read in conjunction with the accompanying drawings, which are included by way of example, and not by way of limitation with regard to the claimed invention.

[0010] FIG.l is a block diagram illustrating a computing system in accordance with an embodiment of the invention.

[0011] FIG.2 is a block diagram illustrating a computing system in accordance with another embodiment of the invention.

[0012] FIG.3 illustrates a structure example of the complex network data. [0013] FIG.4 shows an embodiment of the block storage means storing the predefined set of sub-blocks.

[0014] FIG.5 is a flow chart of a BFS routine that is applicable in the graph search sub-block.

[0015] FIG.6 is a flow chart of the routine indicated by the centrality measure sub-block.

[0016] FIG.7 is a flow chart of a routine performed by the classification means.

[0017] FIG.8 shows a flow chart of method of identifying modular structure in a complex network in accordance with another embodiment of the invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0018] Reference will now be made in detail to embodiments, examples of which are in illustrated in the accompanying drawings. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding the present invention.

[0019] FIG.l is a block diagram illustrating a computing system 1000 in accordance with an embodiment of the invention. The computing system 1000 includes a CPU 200 and a GPU 300 as an example of a parallel processing device. It should be noted that although only one CPU 200 and one GPU 300 are illustrated in FIG.l, the numbers of the CPUs and GPUs contained in the computing system 1000 are not limited to only one and may be altered as necessary. Also it is to be understood that although GPU 300 is shown in FIG.l as the parallel processing device, the specific product of the parallel processing device may be changed in different cases. For example, the parallel processing device may be a plurality of processing units distributed on a network that can perform parallel processing and communicate data or information with each other, such as a LAN (Local Area Network) or a WAN (Wide Area Network).

[0020] Examples of GPU 300 may use the general-purpose graphics processing unit (GPGPU) platform Computing Unified Device Architecture (CUDA) developed by nVidia. However, other commercially available GPUs can be applied under the spirit and scope of the present invention.

[0021] Referring to FIG.l, the CPU 200 includes a data reading means 210, a dispatcher means 220, a block storage means 230, a determining means 240, a first interface 250 and a second interface 255. The GPU 300 comprises a plurality of parallel processing units 310-1, 310-2 ..., 310-N, where N is an integer. For clarity of description, the plurality of parallel processing units 310-1, 310-2 310-N will be collectively referred to as the parallel processing units 310 hereinafter as appropriate.

[0022] The GPU 300 further comprises a first frontend 320, a second frontend 325, an assembler means 330, and a classification means 340. However, it should be noted that although the classification means 340 is shown as included in the GPU 300, it can be located alternatively. For example, the classification means 340 may be located in the CPU 200, and in such case, the present invention is also applicable.

[0023] In particular, the data reading means 210 reads task data which includes nodes in a complex network, edges with values indicating relationships among the nodes, and task parameter for a task to be performed on the complex network.

[0024] The modular structure to be determined by the computing system 1000 may be comprised of a plurality of communities each of which contains the nodes that have closer relationships. For example, in a protein-protein interaction network, the interactions between proteins are important for almost all biological functions. For example, signals from the exterior of a cell are mediated to the inside of that cell by protein-protein interactions of the signaling molecules. This process, called signal transduction, plays a fundamental role in many biological processes and in many diseases e.g. cancers. It is common practice to visualize protein interactions in a network representation where characterizing the embedded modular structure is of great importance. In such case, the nodes in the task data can be various proteins, respectively. The relationships among the nodes indicate interactions among the proteins. The task parameters describe the task for determining the modular structure of the proteins, and each community in the modular structure contains the proteins that tend to interact among them.

[0025] As another example, in a social network, the nodes can represent different members in the social network, and the edges with corresponding values represent specific relationships among the nodes. For instance, in the social network of a company, and the nodes refer to respective staff members in the company. The relationships, denoted by edges with corresponding values may represent spatial distances between work locations of two arbitrary staff members. In such case, the staff members having closer distances can be deemed as in a department. Therefore, the modular structure obtained finally contains the staff members in the same department. [0026] The task parameter may be used to control the modular structure obtained finally. For example, the task parameter may specify a condition that should be meted by the final modular structure, such as a maximum size of the modules in the modular structure, a minimum number of the modules in the modular structure.

[0027] The block storage means 230 stores a predefined set of sub-blocks each of which indicates a particular process. The predefined sub-blocks will be described in detail later.

[0028] The determining means 240 determines a task block from the predefined set of sub-blocks stored in the block storage means 230 according to the task data. Then the determining means 240 transfers the task data to the first interface 250. The task data is used to assign subtask process to be performed in a plurality of parallel processing units 310, respectively. The determination of the task block may be customized to specific tasks and will be exemplified below.

[0029] The dispatcher means 220 divides the task data into a plurality of data subsets with respect to the plurality of parallel processing units 310 on the parallel processing device (GPU) 300. The dispatcher means 220 may divide the task data by checking the GPU configurations and then setting size of the data subsets.

[0030] As shown in FIG.l, the first frontend 320 on the GPU 300 is connected to the first interface 250 on the CPU 200. The first frontend 320 receives the task block transferred from the first interface 250, and passes the task block to the assembler means 330.

[0031] The assembler means 330 receives the task block passed from the first frontend 320 and generates the subtask process readable by the plurality of parallel processing units 310 from the task block. For example, the assembler means 330 translates the task block into the subtask process in a form of GPU readable machine code for the units 310. Then, the assembler means 330 assigns the subtask process to the plurality of parallel processing units 310, respectively.

[0032] The second frontend 325 on the GPU 300 is connected to the second interface 255 on the CPU 200 and receives the plurality of data subsets from the second interface 255. Then the second frontend 325 passes the plurality of data subsets to the plurality of parallel processing units 310, respectively.

[0033] The plurality of parallel processing units 310 performs in parallel, the subtask process assigned by the assembler means on the data subsets, respectively, to obtain parallel results. The classification means 340 processes the parallel results to obtain the modular structure in the complex network.

[0034] It should be noted that although two interfaces 250 and 255 are shown in FIG.l separately, they can be combined. That is, the interfaces 250 and 255 may be embodied by one component. This holds true for the first frontend 320 and the second frontend 325, and they may be embodied by one component.

[0035] On the other hand, the components in FIG.1 are shown as distributed on CPU and parallel processing device. However, these components may be integrated into one entity, i.e. a tool suite device according to the present invention.

[0036] Therefore the present invention proposes a parallel computing system based on parallel processing device of low cost, such as GPU, to identify the modular structures in complex networks, reduces the execution time of the computation and the cost significantly, and provides a complex network research platform for commercial and academic entities.

[0037] FIG.2 is a block diagram illustrating a computing system 1100 in accordance with another embodiment of the invention. Similar reference numbers are used to indicate the same or similar parts as those in FIG.l and therefore the detailed descriptions thereof will be omitted for the purpose of clarity.

[0038] In addition to those components of the computing system 1000 as shown in FIG.l, the computing system 1100 in FIG.2 further comprises a visualization means 260, a network data means 270 and a data storage means 280.

[0039] Particularly, in the case of FIG.2, the visualization means 260 receives the modular structure obtained by the classification means 340 directly (if the classification means 340 is located in the CPU 200) or via other intermediary components (such as the frontends and the interfaces), and displays on a monitor the modular structure. Therefore, a user-friendly way of interpreting the data is provided.

[0040] The network representation means 270 can extract a complex network data representing the nodes and the edges with corresponding values. For example, the network representation means 270 can convert a real problem in a specific network, such as a biological or social network, into the complex network data. FIG.3 illustrates a structure example of the complex network data. As shown in FIG.3, an entry of the complex network data may comprise an index of a node (Node 1) and an index of a node adjacent to the Node 1 (Node 2), and a value representing the edge between the Node 1 and the adjacent Node 2. Noted that FIG.3 shows only one entry of the complex network data for the purpose of explanation. However, there is no limitation on the number of the entries in the complex network data. For example, if there are M nodes in the complex network and the edges among the nodes have no direction (i.e. edge from Node 1 to Node 2 is equivalent to that from Node 2 to Node 1), then the number of the entries in the complex network data may be from 1 to C(M,2). In the case that the edges have directions, e.g. the edge from Node 1 to Node 2 is different from that from Node 2 to Node 1 and therefore has different value, the number of the entries in the complex network data may be from 1 to P(M,2).

[0041] Returning to FIG.2, the data storage means 280 stores the complex network data as a part of the task data, to be read by the data reading means 210. It should be noted that although the data storage means 280 is shown located in CPU 200 in FIG.2, it may be located alternatively. For example, the data storage means 280 may be a non-volatile or volatile memory independent of the CPU 200 and the GPU 300, such as a ROM (Read-Only Memory), a RAM (Random Access Memory), a hard disk, an optical disk, a flash memory, or the like.

[0042] Let G(V, E, C) be a complex network with vertices v^ V, and edges e ^ E with corresponding cost c ^ C. In case of real complex networks such as biological networks, depending on the particular network representation, one node may include gene/protein names, and edges with values may refer to specific interactions therebetween, such as catalysis or binding. For example, the task data may be determined in advance or generated by the network data means 270. The network data means 270 may parse protein-protein interaction networks into a numerical format G(V, E, C).

[0043] Define a betweenness of an edge e as number of all pair shortest paths that pass through the edge e. Betweenness is a centrality measure of an edge e within a graph. Edges that occur on many shortest paths between other edges have higher betweenness than those that do not.

[0044] The data storage means 280 stores the task data for the complex network G(V, E, C), e.g. in the form of adjacency matrix for further processing.

[0045] FIG.4 shows an embodiment of the block storage means 230 storing the predefined set of sub-blocks. In the block storage means 230, some processing sub-blocks each of which indicates a particular processing are illustrated, such as a graph search sub-block 231, a shortest path sub-block 233, and a centrality measure sub-block 235. However, other kinds of processing sub-block(s) can be incorporated into the block storage means 230, alternatively or additionally.

[0046] As described above, a task block may be determined by the determining means 240 by selecting/combining one or more sub-blocks from the sub-blocks stored in the block storage means 230. For example, the task block may comprise a combination of the graph search sub-block 231 and the centrality measure sub-block 235, or a combination of the shortest path sub-block 233 and the centrality measure sub-block 235.

[0047] The graph search sub-block 231 indicates a routine to cause the parallel processing units 310 to perform Breadth First Search (BFS). In such case, inputs to the graph search sub-block 231 may be a preprocessed input matrix and source node. Output of the graph search sub-block 231 is a breadth first tree from a source node s in the network. For example, FIG.5 is a flow chart of a BFS routine 500 that is applicable in the graph search sub-block 231.

[0048] At S520, data is input to the graph search sub-block 231, such as one of the plurality of the data subsets.

[0049] At S530, each node is mapped to a thread on one CUDA streaming multiprocessor (parallel processing unit). At S540, based on the topological features, the graph search sub-block 231 specifies a source node s and an initial frontier set F. Let an array F denote the frontier of the search, an array X denote the visited nodes. During each iteration, every frontier node explores its neighbor nodes and adds them to the frontier node set F. At S550, the graph search sub-block 231 find connected nodes of F in the next level and add them to the visited array X. In the meantime, upon the completion of searching neighbor nodes, the current node adds itself to the visited node set X. At S560, the frontier node set F is updated. For example, if the current path is shorter than the existing path, then the length of the path is updated and one iteration is completed.

[0050] Then at S570, it is determined whether all the nodes have been discovered. If there are nodes to be discovered, the routine 500 returns to S550 and starts another iteration. If the frontier set F is empty and there is no node to be discovered, the routine of the graph search sub-block 231 terminates at S580, where a breadth first tree is generated. The breadth first tree contains information on betweenness of the breadth first tree.

[0051] The graph search sub-block 231 may further or alternatively indicate the DFS (Deep First Search) which is well-known in the art and can be applied in the present invention similarly to the BFS. [0052] When the task block comprises a combination of the graph search sub-block 231 and the centrality measure sub-block 235 serially in this order, the centrality measure sub-block 235 has a routine to cause the parallel processing units to calculate a breadth first tree for all of the nodes. FIG.6 is a flow chart of the routine 600 indicated by the centrality measure sub-block 235.

[0053] At S610 of the routine 600, the centrality measure sub-block 235 may call the graph search sub-block 231 to obtain the breadth first trees for the nodes in the network. At S620, it is determined whether all the breadth first trees for all of the nodes as source are found. If it is determined that there is still a breadth first tree to be found for a certain node, the routine 400 returns to S610 to obtain the tree for the certain node as the source.

[0054] If it is determined as S620 that all the breadth first trees are found, the routine goes to S630 where a parallel reduction is performed on the betweenness of each edge to obtain correlation coefficients between the nodes. The correlation coefficient of an edge is obtained using the formula w/b, where w denotes the weight of the edge from the complex network weight, and b denotes the betweenness of the edge.

[0055] The routines 500 and 600 are of parallel processes, and can be performed on the parallel processing units 310, respectively, to obtain parallel results therefrom, such as the correlation coefficients described above.

[0056] In such case, the classification means 340 uses the global correlation coefficients input from the centrality measure sub-block 235 to obtain the modular structures of the network. FIG.7 is a flow chart of a routine 700 performed by the classification means 340.

[0057] At S710, the correlation coefficients are input from all of the plurality of parallel processing units 310, as the parallel results. At S720, the classification means 340 identifies the edge with the largest correlation coefficient.

[0058] At S730, the edge with the largest correlation coefficient is deleted. Then At S740, the network with such edge deleted is determined whether it satisfies the condition specified by the task parameter. For example, it is determined in S740 whether all the communities remaining in the network has sizes that are smaller than the maximum size.

[0059] If the condition is not satisfied at S740, the routine 700 goes to S705 where the centrality measure sub-block 235 is called again to obtain correlation coefficients for the network deleting the edge with the largest correlation coefficient. [0060] If the condition is satisfied at S740, the routine 700 terminates at S750 where a modular structure with modules satisfying the condition specified by the task parameter is obtained.

[0061] On the other hand, the shortest path sub-block 233 may have a routine for finding shortest paths between every pair of nodes in the graph (network). For example, routines for APSP (All pairs shortest path), SPSP (Single Pair Shortest Path), SSSP (Single Source Shortest Path), SDSP (Single Destination Shortest Path) or the like can be applied in the shortest path sub-block 233. These shortest path methods are all well-known in the art, therefore omitted herein.

[0062] When the task block comprises a combination of the shortest path sub-block 233 and the centrality measure sub-block 235 serially in this order, the centrality measure sub-block 235 has a routine to cause the parallel processing units to build a hierarchical clustering tree.

[0063] In such case, the classification means 340 cut the hierarchical clustering tree using task parameters as the constraints to obtain the modular structure.

[0064] Thus, a modular structure with modules satisfying the condition specified in the task parameter is obtained.

[0065] In traditional hierarchical clustering algorithms, a typical procedure is as follows.

Hierarchical clustering procedure produces a series of partitions of the data, Pn,

Pn-1, , PI. The first Pn consists of n single object 'clusters', and the last PI consists of a single group containing all n cases. At each particular stage the method joins together the two clusters which are closest together (most similar). (At the first stage, of course, this amounts to joining together the two objects that are closest together, since at the initial stage each cluster has one object.) . Differences between methods arise because of the different ways of defining distance (or similarity) between clusters. In this embodiment, we calculate the distance using matrix multiplication-based all pair shortest path algorithm.

[0066] Our implementation modifies the matrix multiplication routine given by Volkov and Demmel by replacing the multiplication and addition operations with addition and minimum operations. Shared memory is used as a user managed cache to improve performance. Volkov and Demmel bring sections of matrices R, C and Di into shared memory in blocks: R is brought in 64x4 sized blocks, C in 16 xl6 sized blocks and Di in 64x16 sized blocks. These values are selected to maximize throughput of the CUD A device. During execution, each thread computes 64x16 values of Di. Algorithm 1 describes the modified matrix multiplication kernel. Please see http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-49.html for full details on the matrix multiplication kernel.

[0067] The invention develops a GPU-based parallel computing system to identify the modular structures in complex networks, reduces the execution time of computation and the cost significantly, and provides a complex network research platform for commercial and academic entities.

[0068] FIG.8 shows a flow chart of method 2000 of identifying modular structure in a complex network using a computing system with a CPU and a parallel processing device (e.g. GPU) in accordance with an embodiment of the present invention. The method 2000 may be performed by the respective components in the computing system 1000 in FIG.l.

[0069] In particular, at S2100, task data which includes nodes in the complex network, edges with values indicating relationships among the nodes, and task parameters for a task to be performed on the complex network, may be read by the data reading means 210 on the CPU 200.

[0070] At S2200, the determining means 240 on the CPU 200 determines a task block from a predefined set of sub-blocks each of which indicates a particular process, according to the task data, and transferring the task block to the first interface 250 on the CPU 200. The task block is used for assigning subtask process to be performed in the plurality of parallel processing units 310-1, 310-2 310-N on the parallel processing device 300, respectively, where N is an integer.

[0071] At S2220, the task block is transferred by the first interface 250 to a first frontend 320 on the parallel processing device 300. Then at 2240, passing, by the first frontend, the task block to an assembler means 330 on the parallel processing device 300.

[0072] At S2260, the assembler means 330 generates the subtask process readable by the plurality of parallel processing units 310 from the task block and assigns the subtask process to the plurality of parallel processing units 310.

[0073] On the other hand, at S2300, the dispatcher means 220 may divide the task data into a plurality of data subsets with respect to the plurality of parallel processing units 310.

[0074] At S2320, a second interface 255 transfers on the CPU 200, the plurality of data subsets to a second frontend 325 on the parallel processing device. Then at S2340, the second frontend 325 passes the plurality of data subsets to the plurality of parallel processing units, respectively.

[0075] Having received the task process and the data subsets, at S2400, the plurality of parallel processing units 310-1, 310-2 310-N perform in parallel, by the plurality of parallel processing units, the subtask process assigned by the assembler means on the data subsets, respectively, to obtain parallel results.

[0076] At S2500, the classification means 340 (located in the CPU 200 or GPU 300) processes the parallel results to obtain the modular structure in the complex network.

[0077] The method of identifying modular structure in a complex network according to the present invention may incorporates one or more aspects described above with reference to FIGs.1-7.

[0078] For example, the method may further comprise extracting a complex network data representing the nodes and the edges with values and storing the complex network data as a part of the task data.

[0079] Also, the task block may comprise a combination of a graph search sub-block and a centrality measure sub-block, or a combination of a shortest path sub-block and a centrality measure sub-block. The routines described in FIGs.5-7 can also be applied in the method 2000.

[0080] It should be noted that the steps in method 2000 as shown in FIG.8 do not have to be performed in the order as shown. For example, steps S2200-S2260 may be performed after or at the same time as S2300-S2340.

[0081] As can be appreciated by one skilled in the art, a computer system with an associated computer-readable medium containing instructions for controlling the computer system can be utilized to implement the exemplary embodiments that are disclosed herein. The computer system may include at least one computer such as a microprocessor, digital signal processor, and associated peripheral electronic circuitry.

[0082] Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

CLAIMS What is claimed is:

1. A method of identifying a modular structure in a complex network using a computing system with a CPU and a parallel processing device, the method comprising

reading, by a data reading means on the CPU, task data which includes nodes in the complex network, edges with values indicating relationships among the nodes, and task parameter for a task to be performed on the complex network;

determining, by a determining means on the CPU, a task block for assigning subtask process to be performed in a plurality of parallel processing units on the parallel processing device, respectively, from a predefined set of sub-blocks each of which indicates a particular process, according to the task data, and transferring the task block to a first interface on the CPU; transferring the task block, by the first interface, to a first frontend on the parallel processing device;

passing, by the first frontend, the task block to an assembler means on the parallel processing device;

generating, by the assembler means, the subtask process readable by the plurality of parallel processing units from the task block and assigning the subtask process to the plurality of parallel processing units;

dividing, by a dispatcher means on the CPU, the task data into a plurality of data subsets with respect to the plurality of parallel processing units;

transferring, by a second interface on the CPU, the plurality of data subsets to a second frontend on the parallel processing device;

passing, by the second frontend, the plurality of data subsets to the plurality of parallel processing units, respectively;

performing in parallel, by the plurality of parallel processing units, the subtask process assigned by the assembler means on the data subsets, respectively, to obtain parallel results; and processing, by a classification means, the parallel results to obtain the modular structure in the complex network.

2. The method of claim 1, wherein the parallel processing device is a Graphic Processing Unit, or is distributed on a local area network or a wide area network.

3. The method of claim 1, wherein the task block comprises a combination of a graph search sub-block and a centrality measure sub-block, or a combination of a shortest path sub-block and a centrality measure sub-block.

4. The method of claim 1, wherein the nodes represent different genes or proteins, and the values represent specific interactions among the nodes.

5. The method of claim 1, wherein the nodes represent different members in a group, and the values represent specific relationships among the nodes.

6. The method of claim 1, further comprising:

extracting, by a network data means on the CPU, a complex network data representing the nodes and the edges with values; and

storing, by a data storage means on the CPU, the complex network data as a part of the task data.

7. The method of claim 1, wherein the modular structure is comprised of a plurality of communities each of which contains the nodes that have closer relationships.

8. A tool suite device for identifying a modular structure in a complex network using a computing system with a CPU and a parallel processing device, the tool suite device comprising

a data reading means on the CPU for reading task data which includes nodes in the complex network, edges with values indicating relationships among the nodes, and task parameter for a task to be performed on the complex network;

a block storage means on the CPU for storing a predefined set of sub-blocks each of which indicates a particular process; a determining means on the CPU for determining a task block for assigning subtask process to be performed in a plurality of parallel processing units on the parallel processing device, respectively, from the predefined set of sub-blocks stored in the block storage means according to the task data;

a first interface on the CPU for receiving the task block transferred from the determining means;

a dispatcher means on the CPU for dividing the task data into a plurality of data subsets with respect to the plurality of parallel processing units;

a second interface on the CPU for receiving the task data transferred from the dispatcher means;

a first frontend on the parallel processing device connected to the first interface for receiving the task block transferred from the first interface;

an assembler means on the parallel processing device for receiving the task block passed from the first frontend, generating the subtask process readable by the plurality of parallel processing units from the task block and assigning the subtask process to the plurality of parallel processing units;

a second frontend on the parallel processing device connected to the second interface for receiving the plurality of data subsets from the second interface and passing the plurality of data subsets to the plurality of parallel processing units, respectively;

the plurality of parallel processing units for performing in parallel, the subtask process assigned by the assembler means on the data subsets, respectively, to obtain parallel results; and a classification means for processing the parallel results to obtain the modular structure in the complex network.

9. The tool suite device of claim 8, wherein the parallel processing device is a

Graphic Processing Unit, or is distributed on a local area network or a wide area network.

10. The tool suite device of claim 8, wherein the task block comprises a combination of a graph search sub-block and a centrality measure sub-block, or a combination of a shortest path sub-block and a centrality measure sub-block.

11. The tool suite device of claim 8, wherein the nodes represent different genes or proteins, and the edges with values represent specific interactions among the nodes.

12. The tool suite device of claim 8, wherein the nodes represent different members in a group, and the edges with values represent specific relationships among the nodes.

13. The tool suite device of claim 8, further comprising:

a network data unit for extracting a graph representation for the nodes and the edges with values; and

a data storage unit for storing the graph representation as the task data.

14. The tool suite device of claim 8, wherein the modular structure is comprised of a plurality of communities each of which contains the nodes that have closer relationships.

15. A system for identifying modular structure in a complex network comprising a and a parallel processing device, wherein

the CPU includes

a data reading means for reading task data which includes nodes in the complex network, edges with values indicating relationships among the nodes, and task parameter for a task to be performed on the complex network;

a block storage means for storing a predefined set of sub-blocks each of which indicates a particular process;

a determining means for determining a task block for assigning subtask process to be performed in a plurality of parallel processing units on the parallel processing device, respectively, from the predefined set of sub-blocks stored in the block storage means according to the task data;

a first interface for receiving the task block transferred from the determining means;

a dispatcher means for dividing the task data into a plurality of data subsets with respect to the plurality of parallel processing units; and a second interface for receiving the task data transferred from the dispatcher means, and

the parallel processing device includes:

a second frontend on the parallel processing device connected to the second interface for receiving the plurality of data subsets from the second interface and passing the plurality of data subsets to the plurality of parallel processing units, respectively; the plurality of parallel processing units for performing in parallel, the subtask process assigned by the assembler means on the data subsets, respectively, to obtain parallel results; and

a classification means for processing the parallel results to obtain the modular structure in the complex network.