WO2014092536A1 - A system and method for dynamic generation of distribution plan for intensive social network analysis (sna) tasks - Google Patents

A system and method for dynamic generation of distribution plan for intensive social network analysis (sna) tasks Download PDF

Info

Publication number
WO2014092536A1
WO2014092536A1 PCT/MY2013/000233 MY2013000233W WO2014092536A1 WO 2014092536 A1 WO2014092536 A1 WO 2014092536A1 MY 2013000233 W MY2013000233 W MY 2013000233W WO 2014092536 A1 WO2014092536 A1 WO 2014092536A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
network
graph
nodes
analysis
Prior art date
Application number
PCT/MY2013/000233
Other languages
French (fr)
Inventor
Ambiah Norbaitiah
Sieow Yeek Tan
Lukose Dickson
Original Assignee
Mimos Berhad
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mimos Berhad filed Critical Mimos Berhad
Publication of WO2014092536A1 publication Critical patent/WO2014092536A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs

Definitions

  • the present invention relates to a system and method for dynamic generation of distribution plan for intensive social network analysis (SNA) tasks in a distributed environment.
  • the present invention utilizes a pruned network which extracts the Sub Graph from network graph based on feature set extraction (non-dependent Sub Graph) and estimating the resource cost required to perform each of the given task which further maps the said task to the appropriate server.
  • US 189 Patent Existing mechanism which relates to data parallelism and task parallelism are described in United States Patent No. 6,675,189 B2 (hereinafter referred to as US 189 Patent).
  • computer system i.e. memory utilization and Central Processing Unit (CPU)
  • CPU Central Processing Unit
  • the said system as described in the US 189 Patent deals only with single server instead of resources of multiple servers.
  • application Data Color Tracker system as described in the US 189 Patent does not provide for network graph analysis task as the distribution plan focus only on Color Tracker System Task Distribution wherein task is divided into several subtasks.
  • Jiang's paper proposes task allocation plan and load balancing wherein available resources are identified and it involves resources of multiple servers.
  • Task allocation model is based on contextual resource negotiation in the complex software system.
  • a principle agent that has high contextual enrichment factor is provided for required resources wherein principle agent will negotiate with its contextual agent to execute the assigned task and it provide reduction of the acceptance capabilities upon receiving overlarge number of new tasks in the principle agent and contextual agent.
  • Peng's paper Another mechanism for task distribution is described in a published paper entitled “Assignment and Scheduling Communicating Periodic Tasks in Distributed Real-Time Systems” by Dar-Tzen Peng et al.; published by IEEE Transactions on Software Engineering, Vol. 23, No. 12, December 1997 (hereinafter referred to as Peng's paper).
  • task allocation and distribution plan is provided by allocation of communicating periodic tasks to heterogeneous processing nodes (PNs) in a distributed real-time system. Resource cost is proposed using an algorithm.
  • Tasks are based on task graph which then will be assigning to PNs utilizing a branch and bound (B&B) search algorithm, the said task graph prunes in polynomial time and does not provide a feature set extractor (i.e. temporal/spatial/user-defined extractor) as provided in the present invention.
  • B&B branch and bound
  • the present invention relates to a system and method for dynamic generation of distribution plan for intensive social network analysis (SNA) tasks in a distributed environment.
  • One aspect of the present invention provides a system for dynamic generation of distribution plan for intensive social network analysis (SNA) tasks.
  • the system comprising at least one Processing Environment Profiler (202); at least one Network Graph Analysis Task Profiler (204); at least one Resource Cost Analyzer (206); at least one Distribution Planner (208); and at least one Task Distributer (210).
  • the at least one Network Graph Analysis Task Profiler (204) further comprises at least one Network Graph Pruning module having means to eliminate unnecessary links and nodes from network graph to produce accurate analysis.
  • Another aspect of the invention provides for the at least one Processing Environment Profiler (202) wherein said Processing Environment Profiler (202) further having means for profiling each processing environment by calculating work load of each analysis server using workload analyzer; and identifying any free analysis server and available computational resources.
  • a further aspect of the invention provides for the at least one Resource Cost Analyzer (206) wherein said Resource Cost Analyzer (206) further having means for calculating resource cost for each task by calculating resource cost for each sub graph by calculating memory cost of sub graph based on summation of memory cost required to process all nodes and links in sub graph; and calculating total resource cost for each task based on sub graph resource cost plus pre-defined application resource cost.
  • Yet another aspect of the invention provides for the at least one Distribution Planner (208) wherein said Distribution Planner (208) further having means for generating distribution plan for each task by matching resource cost with environment profile; and identifying most appropriate server based on distribution heuristic.
  • the invention provides for the at least one Task Distributer (210) wherein said Task Distributer (210) further having means for distributing task to appropriate analysis server.
  • the invention provides method (300) for dynamic generation of distribution plan for intensive social network analysis (SNA) tasks in a distributed environment. The method comprising steps of profiling each processing environment (302); profiling network graph analysis task (304); calculating resource cost for each task (306); generating distribution plan for each task (308); and distributing task to appropriate analysis server (310).
  • SNA social network analysis
  • the step of profiling network graph analysis task further comprises network graph pruning (400) to eliminate unnecessary links and nodes from network graph to produce accurate analysis by identifying candidate links from first candidate link (402, 404); removing candidate link (406); and identifying associated nodes by first determining existence of nodes (408). If nodes exist; the said methodology proceeds by determining if there is dependency on associated nodes (412); removing associated node if said node does not have any dependency (416); and iterating step to identify existence of nodes when there is dependency of node with associated nodes. If nodes do not exist; the said methodology proceeds by checking next candidate link to determine existence of link to arrive at a pruned network when there are no links to candidate (418); and iterating step to remove candidate link and subsequent steps if there is a link to candidate.
  • network graph pruning 400 to eliminate unnecessary links and nodes from network graph to produce accurate analysis by identifying candidate links from first candidate link (402, 404); removing candidate link (406); and identifying associated nodes by first determining existence of nodes (408
  • profiling each processing environment further comprises steps of calculating work load of each analysis server using workload analyzer; and identifying any free analysis server and available computational resources.
  • the invention provides a method wherein profiling network graph analysis task upon arriving at a pruned network further comprises steps of extracting data from pruned network using an extractor engine for a specific task to obtain sub graph network (502, 504, 506); and analyzing sub graph network to obtain sub graph network data which includes graph density, node description ser and link description set (516, 518, 520).
  • calculating resource cost for each task further comprises steps of calculating resource cost for each sub graph by calculating memory cost of sub graph based on summation of memory cost required to process all nodes and links in sub graph (602); and calculating total resource cost for each task based on sub graph resource cost plus pre-defined application resource cost (604).
  • the invention provides a method wherein generating distribution plan for each task further comprises steps of matching resource cost with environment profile; and identifying most appropriate server based on distribution heuristic.
  • FIG. 1.0 illustrates a general background problem of performing network graph analysis tasks in a distributed environment.
  • FIG. 2.0 illustrates a general system in accordance with the present invention.
  • FIG. 3.0 is a flowchart illustrating an embodiment of the method of the present invention.
  • FIG. 4.0 is a flowchart illustrating the methodology of network pruning of the present invention.
  • FIG. 5.0 illustrates Feature Set Extractor and Sub Graph Profile Generator in profiling network graph analysis task upon arriving at a pruned network.
  • FIG. 6.0 illustrates Resource Cost Analyzer and Distribution Planner.
  • FIG. 7.0 illustrates dynamic profiling of the processing environment.
  • FIG. 8.0 illustrates an example of network graph pruning.
  • FIG. 9.0 illustrates an example of dynamically calculating resource cost for each task.
  • FIG. 10.0 illustrates an example of execution of the Distribution Planner.
  • FIG. 11.0 illustrates an example of Task Distributer. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • the present invention provides a system and method for dynamic generation of distribution plan for intensive social network analysis (SNA) tasks in a distributed environment.
  • the present invention utilizes a pruned network which extracts the Sub Graph from network graph based on feature set extraction (non- dependent Sub Graph) and estimating the resource cost required to perform each of the given tasks which further maps the said task to the appropriate server.
  • SNA social network analysis
  • FIG. 1.0 network graph analysis tasks suffer from inefficiency of execution with no proper distribution plan due to the need of network graph data communication between computer units. Consequently, this contributes to performance issue, whereby huge network graph data takes longer response time in obtaining analysis results which would degrade the application and system performance.
  • FIG. 2.0 the architecture of the system (200) of an embodiment of the present invention is illustrated.
  • the system of the present invention for dynamic generation of distribution plan for intensive social network analysis (SNA) tasks in a distributed environment comprising a Processing Environment Profiler (202); a Network Graph Analysis Task Profiler (204); a Resource Cost Analyzer (206); a Distribution Planner (208); and a Task Distributer (210).
  • SNA social network analysis
  • the methodology of an embodiment of the present invention is illustrated generally in FIG. 3.0 while FIGs. 4.0, 5.0 and 6.0 respectively illustrates the process flow of the components of the system and the examples of the execution process of the components of the system of the present invention.
  • the method (300) for dynamic generation of distribution plan for intensive social network analysis (SNA) tasks in a distributed environment comprising steps of first profiling each processing environment (302) wherein a Processing Environment Profiler (202) profile each processing environment (700) by calculating work load of each analysis server using workload analyzer. A plurality of parameters becomes the indication, such as CPU utilization, memory usage and the execution time of unit in each dedicated servers. Free analysis server and available computational resources are identified to determine the number of free analysis server as well as the available computational resource in the servers.
  • a Network Graph Analysis Task Profiler (204) profiles network graph analysis task to eliminate unnecessary links and nodes from network graph to produce accurate analysis.
  • the said Network Graph Analysis Task Profiler (204) further utilizes a Network Graph Pruning module to prune the network by first identifying candidate links from first candidate link (402, 404) and removing identified candidate link (406). Thereafter, associated nodes are identified by first determining the existence of nodes in the network (408). If nodes exist; dependency on associated nodes are determined (412) and subsequently associated nodes are removed if said node does not have any dependency (416). The said step is iterated to identify existence of nodes when there is dependency of node with associated nodes. If nodes do not exist, the next candidate link to determine the existence of the link to arrive at a pruned network when there are no links to candidate are checked (418).
  • the pruned network will be forwarded to Feature Set Extractor wherein Feature Set Extractor extracts the data from the pruned network using an extractor engine particularly to the specific task to obtain sub graph network (502, 504, 506).
  • Feature Set Extractor extracts the data from the pruned network using an extractor engine particularly to the specific task to obtain sub graph network (502, 504, 506).
  • a Temporal Feature Extractor engine will extract the data based on temporal feature set (e.g. time range, day, month, year and etc.).
  • a Spatial Feature Extractor engine will extract the data based on geo-spatial feature set (e.g. location, latitude, longitude and etc.).
  • the User-Defined Extractor engine is specified for any kind of task defined by the user.
  • the task can be any network graph analysis of particular properties and the feature set will be based on that specified properties.
  • the result of this process will be a sub graph network which will be processed further in Sub Graph Profile Generator.
  • the Sub Graph Profile Generator analyzes the sub graph network (516, 518, 520).
  • a plurality of parameters that can be obtained from the sub graph network data includes the graph density, node description set (e.g. number of nodes, type of nodes) and link description set (e.g. number of links, direction of links, weighted of links, type of links).
  • the output of this process is a sub graph profile which will be forwarded to the Resource Cost Analyzer.
  • the Resource Cost Analyzer calculates the resource cost for each task of the sub graph network. It takes the information of nodes and links from the sub graph profile and calculates the resource cost required to process the sub graph. For example, the memory cost required for the sub graph is calculated based on the summation of memory cost needed to process all nodes and links in the sub graph (602). The total resource cost for each task is calculated based on the sub graph resource cost plus the pre-defined application resource cost where the pre-defined application resource is pre-calculated before the present invention process starts (604).
  • the result of Resource Cost Analyzer module is the total required cost to perform each dedicated task and said resource cost is forwarded to Distribution Planner module wherein the required resource for each task is mapped and matched to each of environment profile resulted from Processing Environment Profiler module (1000).
  • the process uses a set of distribution heuristic to identify the most appropriate server for each task (1000). Each task distributed to the most appropriate server is identified from the Distribution Planner module.
  • FIG. 7.0 illustrates the dynamic profiling of each processing environment module.
  • an automated back-end program can be developed to call command "-free -t -m" to get the memory usage (in bytes) and command "-top” to get the details of occupied resources for each application running in the server.
  • FIG. 8.0 illustrates the example of Network Graph Pruning process, a sub process under Network Graph Task Profiler module while FIG. 9.0 illustrates the example of another sub process of under Network Graph Task Profiler_module, Dynamically Calculate Resource Cost for each task. This illustrative example shows on the methodology to calculate the memory cost to perform each task.
  • FIG. 10.0 illustrates the example on execution of the Distribution Planner. As illustrated in FIG. 10.0, the resource cost for each task is matched to each environment profile in order to identify the most appropriate server to perform the task.
  • the Distribution Heuristic can be any combination of resource indicator (e.g. CPU level, Memory Space and etc.) and is used as the main factor to determine the right server.
  • FIG. 11.0 illustrates the network graph analysis task distributed to the appropriate server wherein each process involved is important and is a key factor to dynamically generate an appropriate distribution plan for intensive network graph analysis tasks.
  • the present invention dynamically generates a distribution plan for Intensive Social Network (SNA) Tasks by utilizing a pruned network which extracts the Sub Graph from network graph based on feature set extraction (non-dependent Sub Graph) and estimating the resource cost required to perform each of the given tasks which further map the said task to the appropriate server.
  • SNA Intensive Social Network

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A system (200) and method (300) for dynamic generation of distribution plan for intensive social network analysis (SNA) tasks in a distributed environment comprising at least one Processing Environment Profiler (202); at least one Network Graph Analysis Task Profiler (204); at least one Resource Cost Analyzer (206); at least one Distribution Planner (208); and at least one Task Distributer (210). The at least one Network Graph Analysis Task Profiler (204) further comprises at least one Network Graph Pruning module having means to eliminate unnecessary links and nodes from network graph to produce accurate analysis. A distribution plan for Intensive Social Network (SNA) Tasks is achieved by utilizing a pruned network which extracts the Sub Graph from network graph based on feature set extraction (non-dependent Sub Graph) and estimating the resource cost required to perform each of the given tasks which further map the said task to the appropriate server.

Description

A SYSTEM AND METHOD FOR DYNAMIC GENERATION OF DISTRIBUTION PLAN FOR INTENSIVE SOCIAL NETWORK ANALYSIS (SNA) TASKS
FIELD OF INVENTION
The present invention relates to a system and method for dynamic generation of distribution plan for intensive social network analysis (SNA) tasks in a distributed environment. In particular, the present invention utilizes a pruned network which extracts the Sub Graph from network graph based on feature set extraction (non-dependent Sub Graph) and estimating the resource cost required to perform each of the given task which further maps the said task to the appropriate server.
BACKGROUND ART Huge network graph data cause inefficiency in execution especially in distributed computing environment which generate low performance and long response time in execution of Social Network Analysis (SNA) tasks as there is no proper distribution plan. Further, imbalance load in network contributes to waste of computational resources usage.
Existing mechanism which relates to data parallelism and task parallelism are described in United States Patent No. 6,675,189 B2 (hereinafter referred to as US 189 Patent). In this system, computer system (i.e. memory utilization and Central Processing Unit (CPU)) are measured while identifying available resources to propose a task distribution plan for parallel scheduling of distribution. However, the said system as described in the US 189 Patent deals only with single server instead of resources of multiple servers. Further, application Data Color Tracker system as described in the US 189 Patent does not provide for network graph analysis task as the distribution plan focus only on Color Tracker System Task Distribution wherein task is divided into several subtasks.
Another mechanism for task allocation is described in a published paper entitled "Contextual Resource Negotiation-Based Task Allocation and Load Balancing in Complex Software Systems" by Yichuan Jiang and Jiuchuan Jiang published by IEEE Transactions on Parallel and Distributed Systems, Vol. 20, No. 5, May 2009 (hereinafter referred to as Jiang's paper). Jiang's paper proposes task allocation plan and load balancing wherein available resources are identified and it involves resources of multiple servers. Task allocation model is based on contextual resource negotiation in the complex software system. A principle agent that has high contextual enrichment factor is provided for required resources wherein principle agent will negotiate with its contextual agent to execute the assigned task and it provide reduction of the acceptance capabilities upon receiving overlarge number of new tasks in the principle agent and contextual agent. Another mechanism for task distribution is described in a published paper entitled "Assignment and Scheduling Communicating Periodic Tasks in Distributed Real-Time Systems" by Dar-Tzen Peng et al.; published by IEEE Transactions on Software Engineering, Vol. 23, No. 12, December 1997 (hereinafter referred to as Peng's paper). In Peng's paper, task allocation and distribution plan is provided by allocation of communicating periodic tasks to heterogeneous processing nodes (PNs) in a distributed real-time system. Resource cost is proposed using an algorithm. Tasks are based on task graph which then will be assigning to PNs utilizing a branch and bound (B&B) search algorithm, the said task graph prunes in polynomial time and does not provide a feature set extractor (i.e. temporal/spatial/user-defined extractor) as provided in the present invention.
The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practice.
SUMMARY OF INVENTION
The present invention relates to a system and method for dynamic generation of distribution plan for intensive social network analysis (SNA) tasks in a distributed environment. One aspect of the present invention provides a system for dynamic generation of distribution plan for intensive social network analysis (SNA) tasks. The system comprising at least one Processing Environment Profiler (202); at least one Network Graph Analysis Task Profiler (204); at least one Resource Cost Analyzer (206); at least one Distribution Planner (208); and at least one Task Distributer (210). The at least one Network Graph Analysis Task Profiler (204) further comprises at least one Network Graph Pruning module having means to eliminate unnecessary links and nodes from network graph to produce accurate analysis.
Another aspect of the invention provides for the at least one Processing Environment Profiler (202) wherein said Processing Environment Profiler (202) further having means for profiling each processing environment by calculating work load of each analysis server using workload analyzer; and identifying any free analysis server and available computational resources. A further aspect of the invention provides for the at least one Resource Cost Analyzer (206) wherein said Resource Cost Analyzer (206) further having means for calculating resource cost for each task by calculating resource cost for each sub graph by calculating memory cost of sub graph based on summation of memory cost required to process all nodes and links in sub graph; and calculating total resource cost for each task based on sub graph resource cost plus pre-defined application resource cost.
Yet another aspect of the invention provides for the at least one Distribution Planner (208) wherein said Distribution Planner (208) further having means for generating distribution plan for each task by matching resource cost with environment profile; and identifying most appropriate server based on distribution heuristic. In still another aspect the invention provides for the at least one Task Distributer (210) wherein said Task Distributer (210) further having means for distributing task to appropriate analysis server. In another aspect the invention provides method (300) for dynamic generation of distribution plan for intensive social network analysis (SNA) tasks in a distributed environment. The method comprising steps of profiling each processing environment (302); profiling network graph analysis task (304); calculating resource cost for each task (306); generating distribution plan for each task (308); and distributing task to appropriate analysis server (310). The step of profiling network graph analysis task further comprises network graph pruning (400) to eliminate unnecessary links and nodes from network graph to produce accurate analysis by identifying candidate links from first candidate link (402, 404); removing candidate link (406); and identifying associated nodes by first determining existence of nodes (408). If nodes exist; the said methodology proceeds by determining if there is dependency on associated nodes (412); removing associated node if said node does not have any dependency (416); and iterating step to identify existence of nodes when there is dependency of node with associated nodes. If nodes do not exist; the said methodology proceeds by checking next candidate link to determine existence of link to arrive at a pruned network when there are no links to candidate (418); and iterating step to remove candidate link and subsequent steps if there is a link to candidate.
In a further aspect the invention provides a method wherein profiling each processing environment further comprises steps of calculating work load of each analysis server using workload analyzer; and identifying any free analysis server and available computational resources.
In yet another aspect the invention provides a method wherein profiling network graph analysis task upon arriving at a pruned network further comprises steps of extracting data from pruned network using an extractor engine for a specific task to obtain sub graph network (502, 504, 506); and analyzing sub graph network to obtain sub graph network data which includes graph density, node description ser and link description set (516, 518, 520). In a further aspect the invention provides a method wherein calculating resource cost for each task further comprises steps of calculating resource cost for each sub graph by calculating memory cost of sub graph based on summation of memory cost required to process all nodes and links in sub graph (602); and calculating total resource cost for each task based on sub graph resource cost plus pre-defined application resource cost (604).
In still another aspect the invention provides a method wherein generating distribution plan for each task further comprises steps of matching resource cost with environment profile; and identifying most appropriate server based on distribution heuristic.
The present invention consists of features and a combination of parts hereinafter fully described and illustrated in the accompanying drawings, it being understood that various changes in the details may be made without departing from the scope of the invention or sacrificing any of the advantages of the present invention.
BRIEF DESCRIPTION OF ACCOMPANYING DRAWINGS
To further clarify various aspects of some embodiments of the present invention, a more particular description of the invention will be rendered by references to specific embodiments thereof, which are illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail through the accompanying drawings in which: FIG. 1.0 illustrates a general background problem of performing network graph analysis tasks in a distributed environment.
FIG. 2.0 illustrates a general system in accordance with the present invention. FIG. 3.0 is a flowchart illustrating an embodiment of the method of the present invention.
FIG. 4.0 is a flowchart illustrating the methodology of network pruning of the present invention. FIG. 5.0 illustrates Feature Set Extractor and Sub Graph Profile Generator in profiling network graph analysis task upon arriving at a pruned network.
FIG. 6.0 illustrates Resource Cost Analyzer and Distribution Planner. FIG. 7.0 illustrates dynamic profiling of the processing environment.
FIG. 8.0 illustrates an example of network graph pruning.
FIG. 9.0 illustrates an example of dynamically calculating resource cost for each task. FIG. 10.0 illustrates an example of execution of the Distribution Planner.
FIG. 11.0 illustrates an example of Task Distributer. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The present invention provides a system and method for dynamic generation of distribution plan for intensive social network analysis (SNA) tasks in a distributed environment. In particular, the present invention utilizes a pruned network which extracts the Sub Graph from network graph based on feature set extraction (non- dependent Sub Graph) and estimating the resource cost required to perform each of the given tasks which further maps the said task to the appropriate server. Referring to FIG. 1.0, network graph analysis tasks suffer from inefficiency of execution with no proper distribution plan due to the need of network graph data communication between computer units. Consequently, this contributes to performance issue, whereby huge network graph data takes longer response time in obtaining analysis results which would degrade the application and system performance. Although particular servers are equipped with additional computation resources, such as higher CPU (Central Processing Unit) and memory, said computational resources are not loaded with tasks that require higher computation power. On the other hand, there might be other servers that are equipped with lower resources with the most load of tasks. This is due to improper planning of tasks distribution among the available computation resources which cause the waste of resources in distributed computing environment.
Turning to FIG. 2.0, the architecture of the system (200) of an embodiment of the present invention is illustrated. The system of the present invention for dynamic generation of distribution plan for intensive social network analysis (SNA) tasks in a distributed environment comprising a Processing Environment Profiler (202); a Network Graph Analysis Task Profiler (204); a Resource Cost Analyzer (206); a Distribution Planner (208); and a Task Distributer (210). The methodology of an embodiment of the present invention is illustrated generally in FIG. 3.0 while FIGs. 4.0, 5.0 and 6.0 respectively illustrates the process flow of the components of the system and the examples of the execution process of the components of the system of the present invention. The method (300) for dynamic generation of distribution plan for intensive social network analysis (SNA) tasks in a distributed environment comprising steps of first profiling each processing environment (302) wherein a Processing Environment Profiler (202) profile each processing environment (700) by calculating work load of each analysis server using workload analyzer. A plurality of parameters becomes the indication, such as CPU utilization, memory usage and the execution time of unit in each dedicated servers. Free analysis server and available computational resources are identified to determine the number of free analysis server as well as the available computational resource in the servers. Upon profiling each processing environment, a Network Graph Analysis Task Profiler (204) profiles network graph analysis task to eliminate unnecessary links and nodes from network graph to produce accurate analysis. The said Network Graph Analysis Task Profiler (204) further utilizes a Network Graph Pruning module to prune the network by first identifying candidate links from first candidate link (402, 404) and removing identified candidate link (406). Thereafter, associated nodes are identified by first determining the existence of nodes in the network (408). If nodes exist; dependency on associated nodes are determined (412) and subsequently associated nodes are removed if said node does not have any dependency (416). The said step is iterated to identify existence of nodes when there is dependency of node with associated nodes. If nodes do not exist, the next candidate link to determine the existence of the link to arrive at a pruned network when there are no links to candidate are checked (418). Thereafter, the said step is iterated to remove candidate link and subsequent steps if there is a link to candidate. The pruned network will be forwarded to Feature Set Extractor wherein Feature Set Extractor extracts the data from the pruned network using an extractor engine particularly to the specific task to obtain sub graph network (502, 504, 506). For example, if the network graph task is a temporal-based analysis, then a Temporal Feature Extractor engine will extract the data based on temporal feature set (e.g. time range, day, month, year and etc.). Similarly, if the task is a spatial-based analysis, then a Spatial Feature Extractor engine will extract the data based on geo-spatial feature set (e.g. location, latitude, longitude and etc.). User-Defined Extractor engine is specified for any kind of task defined by the user. The task can be any network graph analysis of particular properties and the feature set will be based on that specified properties. The result of this process will be a sub graph network which will be processed further in Sub Graph Profile Generator. The Sub Graph Profile Generator analyzes the sub graph network (516, 518, 520). A plurality of parameters that can be obtained from the sub graph network data includes the graph density, node description set (e.g. number of nodes, type of nodes) and link description set (e.g. number of links, direction of links, weighted of links, type of links). The output of this process is a sub graph profile which will be forwarded to the Resource Cost Analyzer. The Resource Cost Analyzer calculates the resource cost for each task of the sub graph network. It takes the information of nodes and links from the sub graph profile and calculates the resource cost required to process the sub graph. For example, the memory cost required for the sub graph is calculated based on the summation of memory cost needed to process all nodes and links in the sub graph (602). The total resource cost for each task is calculated based on the sub graph resource cost plus the pre-defined application resource cost where the pre-defined application resource is pre-calculated before the present invention process starts (604). The result of Resource Cost Analyzer module is the total required cost to perform each dedicated task and said resource cost is forwarded to Distribution Planner module wherein the required resource for each task is mapped and matched to each of environment profile resulted from Processing Environment Profiler module (1000). The process uses a set of distribution heuristic to identify the most appropriate server for each task (1000). Each task distributed to the most appropriate server is identified from the Distribution Planner module.
Turning to FIGs. 7.0, 8.0, 9.0, 10 and 11 , examples of each processes involved are illustrated. FIG. 7.0 illustrates the dynamic profiling of each processing environment module. For instance, in Linux environment, an automated back-end program can be developed to call command "-free -t -m" to get the memory usage (in bytes) and command "-top" to get the details of occupied resources for each application running in the server. FIG. 8.0 illustrates the example of Network Graph Pruning process, a sub process under Network Graph Task Profiler module while FIG. 9.0 illustrates the example of another sub process of under Network Graph Task Profiler_module, Dynamically Calculate Resource Cost for each task. This illustrative example shows on the methodology to calculate the memory cost to perform each task. The example given is based on an assumption that the object type of the node and link is represented as a Java String Object. FIG. 10.0 illustrates the example on execution of the Distribution Planner. As illustrated in FIG. 10.0, the resource cost for each task is matched to each environment profile in order to identify the most appropriate server to perform the task. The Distribution Heuristic can be any combination of resource indicator (e.g. CPU level, Memory Space and etc.) and is used as the main factor to determine the right server. FIG. 11.0 illustrates the network graph analysis task distributed to the appropriate server wherein each process involved is important and is a key factor to dynamically generate an appropriate distribution plan for intensive network graph analysis tasks. The present invention dynamically generates a distribution plan for Intensive Social Network (SNA) Tasks by utilizing a pruned network which extracts the Sub Graph from network graph based on feature set extraction (non-dependent Sub Graph) and estimating the resource cost required to perform each of the given tasks which further map the said task to the appropriate server.
Unless the context requires otherwise or specifically stated to the contrary, integers, steps or elements of the invention recited herein as singular integers, steps or elements clearly encompass both singular and plural forms of the recited integers, steps or elements.
Throughout this specification, unless the context requires otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated step or element or integer or group of steps or elements or integers, but not the exclusion of any other step or element or integer or group of steps, elements or integers. Thus, in the context of this specification, the term "comprising" is used in an inclusive sense and thus should be understood as meaning "including principally, but not necessarily solely". It will be appreciated that the foregoing description has been given by way of illustrative example of the invention and that all such modifications and variations thereto as would be apparent to persons of skill in the art are deemed to fall within the broad scope and ambit of the invention as herein set forth.

Claims

A system (200) for dynamic generation of distribution plan for intensive social network analysis (SNA) tasks in a distributed environment, the system comprising:
at least one Processing Environment Profiler (202);
at least one Network Graph Analysis Task Profiler (204);
at least one Resource Cost Analyzer (206);
at least one Distribution Planner (208); and
at least one Task Distributer (210)
characterized in that
the at least one Network Graph Analysis Task Profiler (204) further comprises at least one Network Graph Pruning module having means to eliminate unnecessary links and nodes from network graph to produce accurate analysis by
identifying candidate links from first candidate link; removing candidate link;
identifying associated nodes by first determining existence of nodes;
if nodes exist;
determining if there is dependency on associated nodes;
removing associated node if said node does not have any dependency;
iterating step to identify existence of nodes when there is dependency of node with associated nodes;
if nodes do not exist;
checking next candidate link to determine existence of link to arrive at a pruned network when are no links to candidate; and
iterating step to remove candidate link and subsequent steps if there is a link to candidate.
2. A system (200) according to Claim 1 , wherein the at least one Processing Environment Profiler (202) further having means for profiling each processing environment by:
calculating work load of each analysis server using workload analyzer; and
identifying any free analysis server and available computational resources.
3. A system (200) according to Claim 1 , wherein the at least one Resource Cost Analyzer (206) further having means for calculating resource cost for each task by:
calculating resource cost for each sub graph by calculating memory cost of sub graph based on summation of memory cost required to process all nodes and links in sub graph; and
calculating total resource cost for each task based on sub graph resource cost plus pre-defined application resource cost.
4. A system (200) according to Claim 1 , wherein the at least one Distribution Planner (208) further having means for generating distribution plan for each task by:
matching resource cost with environment profile; and
identifying most appropriate server based on distribution heuristic.
5. A system (200) according to Claim 1 , wherein the at least one Task Distributer (210) further having means for distributing task to appropriate analysis server.
6. A method (300) for dynamic generation of distribution plan for intensive social network analysis (SNA) tasks in a distributed environment, the method comprising steps of:
profiling each processing environment (302);
profiling network graph analysis task (304);
calculating resource cost for each task (306);
generating distribution plan for each task (308); and
distributing task to appropriate analysis server (310) characterized in that
profiling network graph analysis task further comprises network graph pruning (400) to eliminate unnecessary links and nodes from network graph to produce accurate analysis by
identifying candidate links from first candidate link (402, 404); removing candidate link (406);
identifying associated nodes by first determining existence of nodes (408);
if nodes exist;
determining if there is dependency on associated nodes (412);
removing associated node if said node does not have any dependency (416);
iterating step to identify existence of nodes when there is dependency of node with associated nodes;
if nodes do not exist;
checking next candidate link to determine existence of link to arrive at a pruned network when there are no links to candidate (418); and
iterating step to remove candidate link and subsequent steps if there is a link to candidate.
A method (700) according to Claim 3, wherein profiling each processing environment further comprises steps of:
calculating work load of each analysis server using workload analyzer; and
identifying any free analysis server and available computational resources.
A method (500) according to Claim 3, wherein profiling network graph analysis task upon arriving at a pruned network further comprises steps of:
extracting data from pruned network using an extractor engine for a specific task to obtain sub graph network (502, 504, 506); and analyzing sub graph network to obtain sub graph network data which includes graph density, node description ser and link description set (516, 518, 520).
9. A method according to Claim 3, wherein calculating resource cost for each task further comprises steps of:
calculating resource cost for each sub graph by calculating memory cost of sub graph based on summation of memory cost required to process all nodes and links in sub graph (602); and
calculating total resource cost for each task based on sub graph resource cost plus pre-defined application resource cost (604).
10. A method (1000) according to Claim 3, wherein generating distribution plan for each task further comprises steps of:
matching resource cost with environment profile; and
identifying most appropriate server based on distribution heuristic.
PCT/MY2013/000233 2012-12-14 2013-12-06 A system and method for dynamic generation of distribution plan for intensive social network analysis (sna) tasks WO2014092536A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
MYPI2012005442 2012-12-14
MYPI2012005442 2012-12-14

Publications (1)

Publication Number Publication Date
WO2014092536A1 true WO2014092536A1 (en) 2014-06-19

Family

ID=50031441

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/MY2013/000233 WO2014092536A1 (en) 2012-12-14 2013-12-06 A system and method for dynamic generation of distribution plan for intensive social network analysis (sna) tasks

Country Status (1)

Country Link
WO (1) WO2014092536A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105808779A (en) * 2016-03-30 2016-07-27 北京大学 Picture roaming parallel computing method based on pruning and application
WO2018004829A1 (en) * 2016-06-29 2018-01-04 Intel Corporation Methods and apparatus for subgraph matching in big data analysis
CN114298391A (en) * 2021-12-23 2022-04-08 拉扎斯网络科技(上海)有限公司 Distribution route determining method, device and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6675189B2 (en) 1998-05-28 2004-01-06 Hewlett-Packard Development Company, L.P. System for learning and applying integrated task and data parallel strategies in dynamic applications
US20050060287A1 (en) * 2003-05-16 2005-03-17 Hellman Ziv Z. System and method for automatic clustering, sub-clustering and cluster hierarchization of search results in cross-referenced databases using articulation nodes
US20090109872A1 (en) * 2007-10-25 2009-04-30 Siemens Aktiengesellschaft Method and an apparatus for analyzing a communication network
US20110302144A1 (en) * 2010-06-03 2011-12-08 International Business Machines Corporation Dynamic Real-Time Reports Based on Social Networks
US20120284340A1 (en) * 2010-01-29 2012-11-08 E-Therapeutics Plc Social media analysis system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6675189B2 (en) 1998-05-28 2004-01-06 Hewlett-Packard Development Company, L.P. System for learning and applying integrated task and data parallel strategies in dynamic applications
US20050060287A1 (en) * 2003-05-16 2005-03-17 Hellman Ziv Z. System and method for automatic clustering, sub-clustering and cluster hierarchization of search results in cross-referenced databases using articulation nodes
US20090109872A1 (en) * 2007-10-25 2009-04-30 Siemens Aktiengesellschaft Method and an apparatus for analyzing a communication network
US20120284340A1 (en) * 2010-01-29 2012-11-08 E-Therapeutics Plc Social media analysis system
US20110302144A1 (en) * 2010-06-03 2011-12-08 International Business Machines Corporation Dynamic Real-Time Reports Based on Social Networks

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DAR-TZEN PENG ET AL.: "Assignment and Scheduling Communicating Periodic Tasks in Distributed Real-Time Systems", IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, vol. 23, no. 12, December 1997 (1997-12-01)
DAR-TZEN PENG ET AL.: "Assignment and Scheduling Communicating Periodic Tasks in Distributed Real-Time Systems", IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, vol. 23, no. 12, December 1997 (1997-12-01), XP002724187 *
DING XIAO ET AL: "Community Ranking in Social Network", COMPUTER AND COMPUTATIONAL SCIENCES, 2007. IMSCCS 2007. SECOND INTERNATIONAL MULTI-SYMPOSIUMS ON, IEEE, PISCATAWAY, NJ, USA, 13 August 2007 (2007-08-13), pages 322 - 329, XP031338419, ISBN: 978-0-7695-3039-0 *
YICHUAN JIANG; JIUCHUAN JIANG: "Contextual Resource Negotiation-Based Task Allocation and Load Balancing in Complex Software Systems", IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, vol. 20, no. 5, May 2009 (2009-05-01), XP011247827

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105808779A (en) * 2016-03-30 2016-07-27 北京大学 Picture roaming parallel computing method based on pruning and application
WO2018004829A1 (en) * 2016-06-29 2018-01-04 Intel Corporation Methods and apparatus for subgraph matching in big data analysis
US11423082B2 (en) 2016-06-29 2022-08-23 Intel Corporation Methods and apparatus for subgraph matching in big data analysis
CN114298391A (en) * 2021-12-23 2022-04-08 拉扎斯网络科技(上海)有限公司 Distribution route determining method, device and equipment

Similar Documents

Publication Publication Date Title
CN108595157B (en) Block chain data processing method, device, equipment and storage medium
Wang et al. Performance prediction for apache spark platform
US20210035126A1 (en) Data processing method, system and computer device based on electronic payment behaviors
US8578023B2 (en) Computer resource utilization modeling for multiple workloads
Kumar et al. Exact formulas for fault aware core mapping on NoC reliability
EP2913756B1 (en) Operation management apparatus and operation management method
CN109309596B (en) Pressure testing method and device and server
CN101650687B (en) Large-scale parallel program property-predication realizing method
JP2018190450A (en) Efficient determination of join paths via cardinality estimation
US20170068747A1 (en) System and method for end-to-end application root cause recommendation
US10241902B2 (en) Systems and methods for benchmark based cross platform service demand prediction
Rizvandi et al. A study on using uncertain time series matching algorithms for MapReduce applications
Barve et al. Fecbench: A holistic interference-aware approach for application performance modeling
CN115134371A (en) Scheduling method, system, equipment and medium containing edge network computing resources
Beard et al. Analysis of a simple approach to modeling performance for streaming data applications
CN110334012B (en) Risk assessment method and device
WO2014092536A1 (en) A system and method for dynamic generation of distribution plan for intensive social network analysis (sna) tasks
Silva et al. SmartRank: a smart scheduling tool for mobile cloud computing
CN114297041A (en) Network heterogeneous computing platform testing method and device and computer equipment
Khaneghah et al. Challenges of load balancing to support distributed exascale computing environment
US10387578B1 (en) Utilization limiting for nested object queries
Yasudo et al. Performance estimation for exascale reconfigurable dataflow platforms
Dai et al. An improved straggler identification scheme for data-intensive computing on cloud platforms
Nozomi et al. Unavailability-aware backup allocation model for middleboxes with two-stage shared protection
Arigoni et al. Design space exploration of heterogeneous systems applied to the cloud resource allocation problem

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13826776

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13826776

Country of ref document: EP

Kind code of ref document: A1