CN107220111B

CN107220111B - A kind of method for scheduling task that task based access control is stolen and system

Info

Publication number: CN107220111B
Application number: CN201710290460.6A
Authority: CN
Inventors: 金海�; 李陈希; 廖小飞; 石翔
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2017-04-28
Filing date: 2017-04-28
Publication date: 2019-08-09
Anticipated expiration: 2037-04-28
Also published as: CN107220111A

Abstract

The invention discloses a kind of method for scheduling task that task based access control is stolen and system, the realization of this method includes: construction task dependency graph, will rely on node and is registered in the readjustment container for being relied on node as call back function；One is distributed for thread each in thread pool deque and to empty without lock, by root node according to polling mode be put into each thread without lock deque bottom；If thread be not without lock deque it is empty, from taking out node without lock deque bottom and execute；If thread is sky without lock deque, from other threads without stealing node at the top of lock deque, and by the node stolen be pressed into the thread without lock deque bottom, take out the node stolen and executed；After the completion of all node tasks execute, the in-degree of node each in task dependency graph is restored to original value, and terminate the obstruction to main thread.The present invention is directed to large-scale task-level parallelism application program, can effectively improve the performance of task with traditional grade concurrent application.

Description

A kind of method for scheduling task that task based access control is stolen and system

Technical field

The invention belongs to computer parallel Programming technical fields, steal more particularly, to a kind of task based access control Method for scheduling task and system.

Background technique

It constantly approaches the transistor size of physics limit and power consumption seriously limits the hair of single core processor in computer Exhibition, again only can not release new processor with waiting chip manufacturer as before can obtain the promotion of program feature. Further to promote application program capacity, only relies on and multiple cores are integrated into single cpu and by application program parallelization Method.The monokaryon serial epoch are over, and programmer starts the multi-core parallel concurrent epoch of marching toward.

Traditional parallel programming model (OpenMP including MPI and Versions) is only facing expert's grade, senior programmer Or the application of rule can only be adapted to.Multicore era it is desirable that towards broader application field, easy programming, high production capacity and Row programming tool.Many novel parallel programming models were emerged in recent years, wherein task-level parallelism programming model is because have suitable Become parallel programming model preferred on multi-core platform with advantage wide, that programming is convenient, computing resource utilization rate is high.Task Grade parallel programming model provides task division and synchronous programming interface, task is drawn using task as parallel basic unit Divide and synchronous working gives programmer's completion, user can mark off application program a large amount of finegrained tasks.However, specific to Each task is parallel to execute or serial execute, execute on which physical core and how same between realization task on earth Step is then completed by runtime system.Task-level parallelism programming model advocates nested recurrence task, and introduces and calculated with task stealing Method be core user-level thread dispatch, realize program high-performance and dynamic load balance.

It is similar with general procedure, allow programmer using control stream to realize that program is patrolled in task-level parallelism programming model Volume.Basic block end in control stream, programmer can voluntarily add or by running when implicitly adds simultaneously operating basic All task executions in basic block are waited to finish to prevent from occurring data contention between basic block when being executed at block end.However For large-scale concurrent application, if the control stream of program is complex, these simultaneously operatings, which will lead to, there are following problems:

(1) if dependence is not present for the task in the different basic blocks being distributed in same control stream or there is only parts Dependence, since basic block end is there are simultaneously operating, all tasks in basic block in time series rearward must be waited It can participate in dispatching after to all task executions in forward basic block.Institute in basic block in time series rearward There is task artificially to be prolonged there are one by what simultaneously operating between block introduced from into ready state to dispatching when being actually run Late.

(2) modern computer uses Caching hierarchies structure, and the data of reuse can be temporarily stored in cpu cache.Work as task Between communication when being based on shared drive model, there are the tasks of dependence often to share with a piece of region of memory.If it exists The task of dependence is distributed in the different basic blocks in control stream.When the basic BOB(beginning of block) where dependence task executes When, the execution of a large amount of unrelated tasks where being relied on task in basic block causes the required data in caching are larger may be changed Out, poor so as to cause program locality.

Summary of the invention

Aiming at the above defects or improvement requirements of the prior art, the present invention provides a kind of task tune that task based access control is stolen Method and system are spent, for large-scale task-level parallelism application program, the task schedule for proposing the driving of task based access control dependency graph is thought Think, the performance of task with traditional grade concurrent application can be effectively improved.

To achieve the above object, according to one aspect of the present invention, a kind of task schedule that task based access control is stolen is provided Method, comprising:

It is the task dependency graph being made of dependence edge between subtask node and subtask node by overall calculation task description, Node will be relied on to be registered to as call back function in the readjustment container for being relied on node；

The root node and leaf node in the task dependency graph are obtained, adds a virtual dependence for all leaf nodes Sink nodes, the virtual dependence sink nodes are for blocking main thread；

One is distributed without lock deque for thread each in thread pool and is emptied, and all root nodes are put according to polling mode Enter each thread without lock deque bottom；

For per thread, if thread is not sky without lock deque, from being taken without lock deque bottom for thread Egress simultaneously executes task of including in node, after task execution, executes all readjustments adjusted back in container in node； If thread is sky without deque is locked, which attempts to steal node without lock deque top from other threads, if Steal successfully then by the node stolen be pressed into the thread without lock deque bottom, execute the task in node stolen and steal Take all readjustments adjusted back in container in node；

After the completion of task in task dependency graph in all nodes is performed both by, by entering for each node in task dependency graph Degree is restored to original value, and terminates the obstruction to main thread.

Preferably, step (1) specifically includes the following steps:

(1.1) task dependency graph object is defined；

(1.2) calculating task is divided into several subtasks according to the attribute of calculating task, calls task dependency graph pair Each subtask is added in task dependency graph as provided insertion method and each subtask is encapsulated as node object, is returned The pointer of each node object；

(1.3) dependence between each subtask is constructed by the pointer of each node object, node will be relied on and be regarded as adjusting back Function registration adds 1 into the readjustment container for being relied on node, by the in-degree for relying on node, and the out-degree for being relied on node is added 1.

Preferably, step (2) specifically includes the following steps:

(2.1) root node set is added in the node that in-degrees all in task dependency graph are 0, the section for being 0 by all out-degree Leaf node set is added in point；

(2.2) for leaf node set L={ L1, L2 ... }, virtual dependence sink nodes virtual_sink_ is added Node, and virtual_sink_node- > depends (L1, L2 ...) is called, the virtual dependence sink nodes are for blocking master Thread is completed until all tasks, prevents main thread from terminating in advance.

Preferably, step (4) specifically includes the following steps:

(4.1) for per thread, if thread is not sky without lock deque, from thread without lock deque bottom Portion takes out node and executes task of including in node, after task execution, executes and adjusts back owning in container in node Readjustment, every primary readjustment of execution, the in-degree of the node of corresponding registered callbacks subtracts 1, if the node of a certain registered callbacks enters Degree is reduced to 0, then current thread is by the node indentation current thread of the registered callbacks without lock deque bottom；

(4.2) if thread is sky without lock deque, which is attempted from other threads without lock deque top Node is stolen in portion, the node stolen is pressed into the thread without lock deque bottom if stealing successfully, and execute step (4.1), epicycle CPU time slice is otherwise abandoned；

(4.3) step (4.1)~(4.2) are repeated until the task in task dependency graph in all nodes has been performed both by At.

Preferably, the number of threads in the thread pool is consistent with the number of CPU hardware core.

It is another aspect of this invention to provide that providing a kind of task scheduling system that task based access control is stolen, comprising:

Task dependency graph constructing module, for being by between subtask node and subtask node by overall calculation task description The task dependency graph of dependence edge composition will rely on node and is registered in the readjustment container for being relied on node as call back function；

Preprocessing module, for obtaining root node and leaf node in the task dependency graph, for all leaf nodes Addition one virtual dependence sink nodes, the virtual dependence sink nodes are for blocking main thread；

Initialization module saves all for distributing one without lock deque for thread each in thread pool and emptying Point according to polling mode be put into each thread without lock deque bottom；

Task scheduling modules are used for for per thread, if thread is not sky without lock deque, from the nothing of thread Lock deque bottom takes out node and executes task of including in node, after task execution, executes and adjusts back in node All readjustments in container；If thread is sky without lock deque, which is attempted from other threads without lock both-end team Node is stolen at column top, if stealing successfully by the node stolen be pressed into the thread without deque bottom is locked, execution steals Task in node and steal all readjustments adjusted back in container in node；

In-situ FTIR spectroelectrochemitry module, after the completion of being performed both by for the task in nodes all in task dependency graph, by task according to Rely the in-degree of each node in figure to be restored to original value, and terminates the obstruction to main thread.

Preferably, the task dependency graph constructing module includes:

Definition module, for defining task dependency graph object；

The calculating task is divided into several subtasks for the attribute according to calculating task, called by node package module Each subtask is added in task dependency graph and is encapsulated as each subtask by insertion method provided by task dependency graph object Node object returns to the pointer of each node object；

Readjustment registration module will be relied on for constructing the dependence between each subtask by the pointer of each node object Node is regarded as call back function and is registered in the call back function container for being relied on node, and the in-degree for relying on node is added 1, will be relied on The out-degree of node adds 1.

Preferably, the preprocessing module includes:

Node division module will own for the node addition root node set for being 0 by in-degrees all in task dependency graph Leaf node set is added in the node that out-degree is 0；

It is virtual to rely on sink nodes constructing module, for adding virtual rely on for leaf node set L={ L1, L2 ... } Sink nodes virtual_sink_node, and call virtual_sink_node- > depends (L1, L2 ...), it is described virtually according to Rely sink nodes for blocking main thread until all tasks are completed, prevents main thread from terminating in advance.

Preferably, the task scheduling modules include:

Task execution module is used for for per thread, if thread is not sky without lock deque, from the nothing of thread Lock deque bottom takes out node and executes task of including in node, after task execution, executes and adjusts back in node All readjustments in container, every primary readjustment of execution, the in-degree of the node of corresponding registered callbacks subtracts 1, if a certain register back The in-degree of the node of tune is reduced to 0, then current thread is by the node indentation current thread of the registered callbacks without lock deque Bottom；

Task stealing module, for when it is empty that thread is without lock deque, which to attempt the nothing from other threads Node is stolen at the top of lock deque, the node stolen is pressed into the thread without lock deque bottom if stealing successfully, and It takes out the node stolen to be executed, otherwise abandons epicycle CPU time slice；Repeat the task execution module and described The operation of task stealing module is until the task in task dependency graph in all nodes is performed both by completion.

In general, through the invention it is contemplated above technical scheme is compared with the prior art, mainly have skill below Art advantage:

(1) construct dependence between task using callback mechanism, there is no any entity between indicating task according to Lai Bian, dependence node in task dependency graph only indicate dependence with to being relied on Node registry readjustment.This feature can be used Reduce the implementation complexity of task dependency graph, task node can only be made of Task entity, reference count and readjustment container.

(2) load balancing is realized using task stealing method.

(3) easily task dependency graph is restored to the original state after all task executions are complete.Assuming that in a task dependency graph There are V node and E dependence edge, then the time and space complexity for constructing the process of task dependency graph are O (V+E).When appoint When business node is more or relationships between nodes are complex, the construction of task dependency graph will be a more time-consuming and consuming space Process.Realistic problem is often larger, if the task dependency graph constructed is reused, can substantially reduce task Specific gravity shared by the cost of dependency graph construction.Every time reuse before user replace data source can be realized " one construction, repeatedly make With ".

(4) using the method for scheduling task for being based strictly on task dependency graph: since root node, when in a task node Task execution when finishing, ready child node can be pressed into immediately to ready queue prepare investment and execute.In this way Reduce the unnecessary delay introduced due to the control stream in program.And generally there are in data between father node and child node Dependence, child node can be dispatched to the thread locating for father node to execute by such dispatching method, to utilize as far as possible The data in CPU cache are buffered in by father node, reduce memory access number, improve performance.

Detailed description of the invention

Fig. 1 is a kind of flow diagram for the method for scheduling task that task based access control is stolen disclosed by the embodiments of the present invention；

Fig. 2 is that a kind of pair of task dependency graph disclosed by the embodiments of the present invention carries out pretreated flow diagram；

Fig. 3 is a kind of data structure schematic diagram without lock deque and node disclosed by the embodiments of the present invention；

Fig. 4 is a kind of flow diagram of load-balancing method disclosed by the embodiments of the present invention.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below Not constituting a conflict with each other can be combined with each other.

It is as shown in Figure 1 a kind of process signal for the method for scheduling task that task based access control is stolen disclosed by the embodiments of the present invention Figure, comprising the following steps:

(1) it constructs task dependency graph: being by being relied between subtask node and subtask node by overall calculation task description The task dependency graph of side composition will rely on node and is registered in the readjustment container for being relied on node as call back function；

As an alternative embodiment, construction task dependency graph specifically includes the following steps:

(1.1) task dependency graph task_graph object is defined；

(1.2) calculating task is divided into several subtasks according to the attribute of calculating task, calls task dependency graph pair Each subtask is added in task dependency graph task_graph and appoints each son by the insertion method as provided by task_graph Business is encapsulated as node object, returns to the pointer of each node object；Specifically, portion can be held inside task_graph to insertion The copy for the task that method is passed to simultaneously adds one layer of encapsulation to constitute node object to it.Task_graph can be returned after calling The pointer of node object is returned for subsequent operation.

(1.3) dependence between each subtask is constructed by the pointer of each node object, node will be relied on and be regarded as adjusting back Function registration adds 1 into the readjustment container for being relied on node, by the in-degree for relying on node, and the out-degree for being relied on node is added 1.Tool For body, after the pointer for obtaining node object, depends method is called by the pointer to construct dependence between task and close System, it is assumed that there are node A logically depend on node B, node C ... wait the completion of nodes, then explicitly call A- > depends(B,C,…).Depends method is registered to node in the form that inside can adjust back the execution part of node A B, node C ... wait in the readjustment container of nodes.

(2) task dependency graph is pre-processed: obtains the root node and leaf node in the task dependency graph, for institute There is leaf node to add a virtual dependence sink nodes, the virtual dependence sink nodes are for blocking main thread；

As an alternative embodiment, step (2) specifically includes the following steps:

(3) it initializes running environment: distributing one without lock deque for thread each in thread pool and empty, by all Node according to polling mode be put into each thread without lock deque bottom；

As an alternative embodiment, the number of threads in thread pool is consistent with the number of CPU hardware core.

(4) all tasks in task dependency graph are executed: for per thread, if thread without lock deque It is not sky, then without lock deque bottom taking-up node and task of including in node is executed from thread, in task execution knot Shu Hou executes all readjustments adjusted back in container in node；If thread is sky without lock deque, which is attempted from it His thread without stealing node at the top of lock deque, if stealing successfully by the node stolen be pressed into the thread without locking both-end Queue bottom executes all readjustments stealing the task in node and stealing in node in readjustment container；

As an alternative embodiment, step (4) specifically includes the following steps:

(5) restoring scene:, will be in task dependency graph after the completion of the task in task dependency graph in all nodes is performed both by The in-degree of each node be restored to original value, and terminate the obstruction to main thread.

In general, the method for scheduling task that task based access control proposed by the present invention is stolen is applied to task level programming model, Realization process are as follows: define a task_graph object first to indicate task dependency graph, according to the reality of overall calculation task Overall calculation task is divided into a large amount of subtasks and is added in task_graph object and encapsulates by it by task by border situation For node object trustship.In the construction process of task dependency graph, the depends method provided by task_graph object refers to Determine the dependence object in one section of dependence and is relied on object.Object can will be relied in the calling process of depends method Execution part be registered in the readjustment container being relied in object in the form adjusted back, and will rely on object and be relied on respectively The in-degree and out-degree of object add 1, wherein since in-degree may be read and write by multiple threads so being set to atom variable.It is in office It is engaged in after dependency graph construction complete, the start method for calling task_graph object to provide begins a task with dependency graph pretreatment and meter Calculate process implementation procedure.It will be entered in task_graph by all node nodes of trustship first in preprocessing process, operation Degree information is backed up.By picking out all sections in all node nodes of trustship in task_graph when second operation Point and leaf node, wherein the number of root node is not restricted by, and is added virtual relies on for all leaf nodes and converged section Point.This virtual dependence sink nodes user is invisible, and effect is that obstruction main thread prevents it from tying before calculating task all terminates Beam, and after all calculating tasks according to previous backup task dependency graph information by task dependency graph restore to the original state for It reuses, show a kind of pair of task dependency graph disclosed by the embodiments of the present invention with reference to Fig. 2 and carry out pretreated process signal Figure.

It is illustrated in figure 3 a kind of data structure schematic diagram without lock deque and node disclosed by the embodiments of the present invention. In initial phase.Thread will be distributed according to the actual hardware core number of computer when operation, one between thread and core One is corresponding, and for each thread distribute one it is privately owned without lock deque.All root nodes in root node set will The bottom in the deque of each thread is successively pressed into according to the mode of poll.Hereafter all threads are constantly from deque Bottom take out and node and execute task wherein included.In node after task execution, thread can successively pick up node All readjustments registered in middle readjustment container.According to the number of the relied on node of some node, readjustment may be performed a number of times. The judgement to critical condition is increased in the readjustment that node is registered: readjustment is every to execute primary, the in-degree of the node of registered callbacks It can subtract 1, when the in-degree of a certain node is reduced to 0, represent all nodes that it is relied on and execution is completed, can be thrown immediately Enter operation, is the bottom that the node is pressed into all deques of current thread immediately by the way of in readjustment.

It is illustrated in figure 4 a kind of flow diagram of load-balancing method disclosed by the embodiments of the present invention.Task based access control according to What the scheduling mode of bad figure can actually make per thread execution is a subgraph in task dependency graph, but according to subgraph Scale and task amount, some subgraph execution time is longer, and some subgraph execution time is shorter, will cause the overall calculation time by holding The thread of row time at most determines, while will also result in load imbalance.The present invention uses task stealing algorithm to realize load Equilibrium, its step are as follows:

If 1) belong to thread be not without lock deque it is empty, thread is from taking out node without lock deque bottom and hold Row；Otherwise step 2) is skipped to.

2) if belong to thread is sky without lock deque, the double without lock of other each threads are accessed according to the mode of poll Queue is held, if sending out thread existing is not empty without lock deque, it tries from it without stealing section at the top of lock deque Put and be pressed into oneself without lock deque bottom, execute step 1)；Otherwise step 3) is skipped to.

3) epicycle CPU time slice is abandoned, thread enters dormant state, when being waken up next time, executes step 1).

The embodiment of the invention also provides a kind of task scheduling systems that task based access control is stolen, comprising:

In embodiments of the present invention, the specific implementation of each functional module can refer to the description in embodiment of the method, The embodiment of the present invention will not be repeated.

The present invention using the above scheme, is better than other parallel algorithm schemes, and on parallel program performance in performance It is greatly improved, specific as follows:

1) it is based purely on task dependency graph to be scheduled task, avoids task caused by the control stream of artificial settings and hold Row delay；

2) thread is set to correspond with core, and the node that dependence will be present in when operation is put as far as possible to same line It runs, therefore the data that latter task can utilize as far as possible the mistake used by previous task, be present in cpu cache, reduces in journey Access the number of memory.

3) load balancing is realized using task stealing algorithm.

As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should all include Within protection scope of the present invention.

Claims

1. a kind of method for scheduling task that task based access control is stolen characterized by comprising

It (1) is the task dependency graph being made of dependence edge between subtask node and subtask node by overall calculation task description, Node will be relied on to be registered to as call back function in the readjustment container for being relied on node；

(2) root node and leaf node in the task dependency graph are obtained, adds a virtual dependence for all leaf nodes Sink nodes, the virtual dependence sink nodes are for blocking main thread；

(3) one being distributed without lock deque for thread each in thread pool and being emptied, all root nodes are put according to polling mode Enter each thread without lock deque bottom；

(4) for per thread, if thread is not sky without lock deque, from being taken out without lock deque bottom for thread Node simultaneously executes task of including in node, after task execution, executes all readjustments adjusted back in container in node；If Thread is sky without lock deque, then the thread is attempted from other threads without node is stolen at the top of lock deque, if stealing Take successfully then by the node stolen be pressed into the thread without lock deque bottom, execution steal the task in node and steal All readjustments in container are adjusted back in node；

(5) after the completion of the task in task dependency graph in all nodes is performed both by, by entering for each node in task dependency graph Degree is restored to original value, and terminates the obstruction to main thread.

2. the method according to claim 1, wherein step (1) specifically includes the following steps:

(1.1) task dependency graph object is defined；

(1.2) calculating task is divided into several subtasks according to the attribute of calculating task, calls task dependency graph object institute Each subtask is added in task dependency graph and each subtask is encapsulated as node object by the insertion method of offer, returns to each section The pointer of point object；

(1.3) dependence between each subtask is constructed by the pointer of each node object, node will be relied on and be regarded as call back function It is registered in the readjustment container for being relied on node, the in-degree for relying on node is added 1, the out-degree for being relied on node is added 1.

3. according to the method described in claim 2, it is characterized in that, step (2) specifically includes the following steps:

(2.1) root node set is added in the node that in-degrees all in task dependency graph are 0, the node that all out-degree are 0 is added Enter leaf node set；

(2.2) for leaf node set L={ L1, L2 ... }, virtual dependence sink nodes virtual_sink_node is added, and Call virtual_sink_node- > depends (L1, L2 ...), virtual the dependences sink nodes be used for block main thread until All tasks are completed, and prevent main thread from terminating in advance.

4. according to the method described in claim 2, it is characterized in that, step (4) specifically includes the following steps:

(4.1) for per thread, if thread is not sky without lock deque, from being taken without lock deque bottom for thread Egress simultaneously executes task of including in node, after task execution, executes all readjustments adjusted back in container in node, Every primary readjustment of execution, the in-degree of the node of corresponding registered callbacks subtracts 1, if the in-degree of the node of a certain registered callbacks is subtracted To 0, then current thread is by the node indentation current thread of the registered callbacks without lock deque bottom；

(4.2) if thread is sky without deque is locked, which attempts stealing without lock deque top from other threads Node is taken, the node stolen is pressed into the thread without lock deque bottom if stealing successfully, and execute step (4.1), it is no Then abandon epicycle CPU time slice；

(4.3) step (4.1)~(4.2) are repeated until the task in task dependency graph in all nodes is performed both by completion.

5. the method according to claim 1, which is characterized in that number of threads in the thread pool with The number of CPU hardware core is consistent.

6. a kind of task scheduling system that task based access control is stolen characterized by comprising

Task dependency graph constructing module, for being by being relied between subtask node and subtask node by overall calculation task description The task dependency graph of side composition will rely on node and is registered in the readjustment container for being relied on node as call back function；

Preprocessing module, for obtaining root node and leaf node in the task dependency graph, for the addition of all leaf nodes One virtual dependence sink nodes, the virtual dependence sink nodes are for blocking main thread；

Initialization module presses all root nodes for distributing one without lock deque for thread each in thread pool and emptying According to polling mode be put into each thread without lock deque bottom；

Task scheduling modules are used for for per thread, if thread is not sky without lock deque, from the double without lock of thread End queue bottom takes out node and executes task of including in node, after task execution, executes in node and adjusts back container In all readjustments；If thread is sky without lock deque, which is attempted from other threads without lock deque top Node is stolen in portion, if stealing successfully by the node stolen be pressed into the thread without lock deque bottom, execution steal node In task and steal in node adjust back container in all readjustments；

In-situ FTIR spectroelectrochemitry module, after the completion of being performed both by for the task in nodes all in task dependency graph, by task dependency graph In the in-degree of each node be restored to original value, and terminate the obstruction to main thread.

7. system according to claim 6, which is characterized in that the task dependency graph constructing module includes:

Definition module, for defining task dependency graph object；

The calculating task is divided into several subtasks for the attribute according to calculating task, calls task by node package module Each subtask is added in task dependency graph and each subtask is encapsulated as node by insertion method provided by dependency graph object Object returns to the pointer of each node object；

Node will be relied on for constructing the dependence between each subtask by the pointer of each node object by adjusting back registration module It is regarded as call back function to be registered in the call back function container for being relied on node, the in-degree for relying on node is added 1, node will be relied on Out-degree add 1.

8. system according to claim 7, which is characterized in that the preprocessing module includes:

Node division module, for the node addition root node set for being 0 by in-degrees all in task dependency graph, by all out-degree Leaf node set is added for 0 node；

It is virtual to rely on sink nodes constructing module, it is used to that leaf node set L={ L1, L2 ... } to be added virtually to rely on to converge and be saved Point virtual_sink_node, and virtual_sink_node- > depends (L1, L2 ...) is called, the virtual dependence converges Node prevents main thread from terminating in advance for blocking main thread until all tasks are completed.

9. system according to claim 7, which is characterized in that the task scheduling modules include:

Task execution module is used for for per thread, if thread is not sky without lock deque, from the double without lock of thread End queue bottom takes out node and executes task of including in node, after task execution, executes in node and adjusts back container In all readjustments, every primary readjustment of execution, the in-degree of the node of corresponding registered callbacks subtracts 1, if a certain registered callbacks The in-degree of node is reduced to 0, then current thread is by the node indentation current thread of the registered callbacks without lock deque bottom；

Task stealing module, for when it is empty that thread is without lock deque, which to be attempted from the double without lock of other threads Node is stolen at the top of the queue of end, the node stolen is pressed into the thread without lock deque bottom if stealing successfully, and take out The node stolen is executed, and epicycle CPU time slice is otherwise abandoned；Repeat the task execution module and the task The operation of module is stolen until the task in task dependency graph in all nodes is performed both by completion.

10. according to system described in claim 6 to 9 any one, which is characterized in that number of threads in the thread pool with The number of CPU hardware core is consistent.