CN107220111A

CN107220111A - Method for scheduling task and system that a kind of task based access control is stolen

Info

Publication number: CN107220111A
Application number: CN201710290460.6A
Authority: CN
Inventors: 金海�; 李陈希; 廖小飞; 石翔
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2017-04-28
Filing date: 2017-04-28
Publication date: 2017-09-29
Anticipated expiration: 2037-04-28
Also published as: CN107220111B

Abstract

The method for scheduling task and system stolen the invention discloses a kind of task based access control, the realization of this method include：Construction task dependency graph, will rely on node and is registered to as call back function in the readjustment container for being relied on node；Distribute one without lock deque for each thread in thread pool and be empty, by root node according to polling mode be put into each thread without lock deque bottom；If thread is not sky without lock deque, from taking out node without lock deque bottom and perform；If thread is sky without lock deque, from other threads without stealing node at the top of lock deque, and by the node stolen be pressed into the thread without lock deque bottom, take out the node stolen and performed；After the completion of the execution of all node tasks, the in-degree of each node in task dependency graph is returned into original value, and terminate the obstruction to main thread.The present invention is directed to large-scale task-level parallelism application program, can effectively improve the performance of task with traditional level concurrent application.

Description

Method for scheduling task and system that a kind of task based access control is stolen

Technical field

The invention belongs to computer parallel Programming technical field, stolen more particularly, to a kind of task based access control Method for scheduling task and system.

Background technology

The transistor size and power consumption for constantly approaching physics limit seriously limit the hair of single core processor in computer Exhibition, again only can not release new processor with regard to that can obtain the lifting of program feature with wait chip manufacturer as before. For further lifting application program capacity, only rely on and multiple cores are integrated into single cpu and by application program parallelization Method.The monokaryon serial epoch are over, and programmer starts the multi-core parallel concurrent epoch of marching toward.

Traditional parallel programming model (OpenMP for including MPI and Versions) is only facing expert's level, senior programmer Or the application of rule can only be adapted to.Many nuclear ages it is desirable that towards broader application field, easy programming, high production capacity and Row programming tool.Many new parallel programming models were emerged in recent years, wherein, task-level parallelism programming model is because with suitable With advantage wide, that programming is convenient, computing resource utilization rate is high as parallel programming model preferred on multi-core platform.Task Level parallel programming model divides task there is provided task as parallel base unit and synchronous DLL, and task is drawn Divide and synchronous working gives programmer completion, user can mark off application program a large amount of finegrained tasks.However, specific to Each task be on earth it is parallel perform or it is serial perform, performed on which physical core and how to realize between task it is same Step is then completed by runtime system.Task-level parallelism programming model advocates nested recurrence task, and introducing is calculated with task stealing Method is dispatched for the user-level thread of core, realizes the high-performance and dynamic load balance of program.

It is similar with general procedure, programmer is allowed in task-level parallelism programming model using controlling stream to realize that program is patrolled Volume.Basic block end in controlling stream, programmer can voluntarily add or during by running implicit addition simultaneously operating with basic All tasks carryings in basic block are waited to finish to prevent occurring data contention upon execution between basic block at block end.But For large-scale concurrent application, if the controlling stream of program is complex, these simultaneously operatings can cause problems with occur：

(1) if being distributed in the different basic blocks in same controlling stream for task is not present dependence or only exists part Dependence, all tasks existed due to basic block end in the basic block in simultaneously operating, time series rearward must be waited Treat that all tasks carryings in forward basic block finish rear and may participate in scheduling.Institute in basic block in time series rearward There is task to there is artificial prolonging of introducing of simultaneously operating between block to being dispatched when being actually run from ready state is entered Late.

(2) modern computer uses Caching hierarchies structure, and the data of reuse can be temporarily stored in cpu cache.Work as task Between communication be when being based on shared drive model, there is the task of dependence and often share with a piece of region of memory.If in the presence of The task of dependence is distributed in the different basic blocks in controlling stream.When the basic BOB(beginning of block) where dependence task is performed When, the execution for a large amount of unrelated tasks being relied in the basic block of task place causes the required data in caching are larger may be changed Go out, so as to cause program locality poor.

The content of the invention

For the disadvantages described above or Improvement requirement of prior art, the invention provides the task tune that a kind of task based access control is stolen Method and system are spent, for large-scale task-level parallelism application program, it is proposed that the task scheduling of task based access control dependency graph driving is thought Think, the performance of task with traditional level concurrent application can be effectively improved.

To achieve the above object, according to one aspect of the present invention, there is provided the task scheduling that a kind of task based access control is stolen Method, including：

It is the task dependency graph that dependence edge is constituted between subtask node and subtask node by overall calculation task description, Node will be relied on to be registered to as call back function in the readjustment container for being relied on node；

The root node and leaf node in the task dependency graph are obtained, a virtual dependence is added for all leaf nodes Sink nodes, the virtual dependence sink nodes are used to block main thread；

One is distributed without lock deque for each thread in thread pool and is empty, and all root nodes are put according to polling mode Enter each thread without lock deque bottom；

For each thread, if thread is not sky without lock deque, taken from thread without lock deque bottom Egress simultaneously performs being included in node for task, after tasks carrying terminates, and performs all readjustments in readjustment container in node； If thread is sky without deque is locked, the thread is attempted to steal node from other threads without lock deque top, if Steal successfully then by the node stolen be pressed into the thread without lock deque bottom, perform stealing in node for task and steal Take all readjustments adjusted back in node in container；

After the completion of task in task dependency graph in all nodes is performed both by, by entering for each node in task dependency graph Degree returns to original value, and terminates the obstruction to main thread.

Preferably, step (1) specifically includes following steps：

(1.1) task dependency graph object is defined；

(1.2) calculating task is divided into some subtasks, calling task dependency graph pair by the attribute according to calculating task As each subtask is added in task dependency graph by the insertion method provided and each subtask is encapsulated as into node object, return The pointer of each node object；

(1.3) dependence between each subtask is constructed by the pointer of each node object, node will be relied on and be regarded as readjustment The in-degree for relying on node is added 1, plus 1 by the out-degree for being relied on node by function registration to being relied in the readjustment container of node.

Preferably, step (2) specifically includes following steps：

(2.1) all in-degrees in task dependency graph are added into root node set for 0 node, by the section that all out-degree are 0 Point adds leaf node set；

(2.2) for leaf node set L={ L1, L2 ... }, addition is virtual to rely on sink nodes virtual_sink_ Node, and call virtual_sink_node->Depends (L1, L2 ...), the virtual dependence sink nodes are used to block master Thread is completed until all tasks, prevents main thread from terminating in advance.

Preferably, step (4) specifically includes following steps：

(4.1) for each thread, if thread is not sky without lock deque, from thread without lock deque bottom Portion takes out node and performs being included in node for task, after tasks carrying terminates, and performs all in readjustment container in node Readjustment, often performs and once adjusts back, the in-degree of the node of corresponding registered callbacks subtracts 1, if the node of a certain registered callbacks enters Degree is reduced to 0, then current thread by the nodes of the registered callbacks be pressed into current thread without lock deque bottom；

(4.2) if thread is sky without lock deque, the thread is attempted from other threads without lock deque top Node is stolen in portion, the node stolen is pressed into the thread without lock deque bottom if stealing successfully, and perform step (4.1) epicycle CPU time slice, is otherwise abandoned；

(4.3) step (4.1)~(4.2) are repeated until the task in task dependency graph in all nodes has been performed both by Into.

Preferably, the number of threads in the thread pool is consistent with the number of CPU hardware core.

It is another aspect of this invention to provide that the task scheduling system stolen there is provided a kind of task based access control, including：

Task dependency graph constructing module, for being between subtask node and subtask node by overall calculation task description The task dependency graph of dependence edge composition, will rely on node and is registered to as call back function in the readjustment container for being relied on node；

Pretreatment module, for obtaining root node and leaf node in the task dependency graph, for all leaf nodes Addition one is virtual to rely on sink nodes, and the virtual dependence sink nodes are used to block main thread；

Initialization module, for distributing one without lock deque for each thread in thread pool and being empty, by all sections Point according to polling mode be put into each thread without lock deque bottom；

Task scheduling modules, for for each thread, if thread is not sky without lock deque, from the nothing of thread Lock deque bottom takes out node and performs being included in node for task, after tasks carrying terminates, and performs in node and adjusts back All readjustments in container；If thread is sky without lock deque, the thread is attempted from other threads without lock both-end team Node is stolen at row top, if stealing successfully by the node stolen be pressed into the thread without deque bottom is locked, execution is stolen Task in node and steal all readjustments adjusted back in node in container；

In-situ FTIR spectroelectrochemitry module, after the completion of being performed both by for the task in all nodes in task dependency graph, by task according to Rely the in-degree of each node in figure to return to original value, and terminate the obstruction to main thread.

Preferably, the task dependency graph constructing module includes：

Definition module, for defining task dependency graph object；

Node package module, is divided into some subtasks by the calculating task for the attribute according to calculating task, calls Each subtask is added in task dependency graph and is encapsulated as each subtask by the insertion method that task dependency graph object is provided Node object, returns to the pointer of each node object；

Registering modules are adjusted back, the dependence between each subtask is constructed for the pointer by each node object, will rely on Node is regarded as call back function and is registered in the call back function container for being relied on node, adds 1 by the in-degree for relying on node, will be relied on The out-degree of node adds 1.

Preferably, the pretreatment module includes：

Node division module, will be all for all in-degrees in task dependency graph to be added into root node set for 0 node Out-degree adds leaf node set for 0 node；

It is virtual to rely on sink nodes constructing module, for for leaf node set L={ L1, L2 ... }, addition virtually to be relied on Sink nodes virtual_sink_node, and call virtual_sink_node->Depends (L1, L2 ...), it is described it is virtual according to Rely sink nodes to be used to block main thread until all tasks are completed, prevent main thread from terminating in advance.

Preferably, the task scheduling modules include：

Task execution module, for for each thread, if thread is not sky without lock deque, from the nothing of thread Lock deque bottom takes out node and performs being included in node for task, after tasks carrying terminates, and performs in node and adjusts back All readjustments in container, often perform and once adjust back, and the in-degree of the node of corresponding registered callbacks subtracts 1, if a certain register back The in-degree of the node of tune is reduced to 0, then current thread by the nodes of the registered callbacks be pressed into current thread without lock deque Bottom；

Task stealing module, for, without deque is locked for space-time, the thread to attempt the nothing from other threads in thread Node is stolen at the top of lock deque, the node stolen is pressed into the thread without lock deque bottom if stealing successfully, and Take out the node stolen to be performed, otherwise abandon epicycle CPU time slice；Repeat the task execution module and described The operation of task stealing module is until the task in task dependency graph in all nodes is performed both by completing.

In general, there is following skill compared with prior art, mainly by the contemplated above technical scheme of the present invention Art advantage：

(1) dependence between task is built using callback mechanism, in the absence of any entity to represent task between according to Dependence node in Lai Bian, task dependency graph only represents dependence with to being relied on Node registry readjustment.This feature can be used To reduce the implementation complexity of task dependency graph, task node can be only made up of Task entity, reference count and readjustment container.

(2) load balancing is realized using task stealing method.

(3) easily task dependency graph is restored to the original state after all tasks carryings are complete.Assuming that in a task dependency graph There are V node and E bar dependence edges, then the time and space complexity for constructing the process of task dependency graph are O (V+E).When appoint It is engaged in that node is more or during complex relationships between nodes, the construction of task dependency graph would is that one more takes and expends space Process.Realistic problem is often larger, if the task dependency graph constructed is reused, can substantially reduce task Proportion shared by the cost of dependency graph construction.Every time reuse before user change data source be can be achieved " one construction, repeatedly make With ".

(4) using the method for scheduling task for being based strictly on task dependency graph：Since root node, when in a task node Tasks carrying when finishing, ready child node can be immediately pressed into ready queue prepare input and perform.In this way Reduce the unnecessary delay introduced due to the controlling stream in program.And it be generally there are between father node and child node in data Dependence, child node can be dispatched to residing for father node thread to perform by such a dispatching method, so as to utilize as far as possible The data in CPU cache are buffered in by father node, memory access number of times is reduced, performance is improved.

Brief description of the drawings

Fig. 1 is the schematic flow sheet for the method for scheduling task that a kind of task based access control disclosed in the embodiment of the present invention is stolen；

Fig. 2 is a kind of schematic flow sheet pre-processed to task dependency graph disclosed in the embodiment of the present invention；

Fig. 3 is a kind of data structure schematic diagram without lock deque and node disclosed in the embodiment of the present invention；

Fig. 4 is a kind of schematic flow sheet of load-balancing method disclosed in the embodiment of the present invention.

Embodiment

In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.As long as in addition, technical characteristic involved in each embodiment of invention described below Not constituting conflict each other can just be mutually combined.

The flow signal for the method for scheduling task stolen as shown in Figure 1 for a kind of task based access control disclosed in the embodiment of the present invention Figure, comprises the following steps：

(1) task dependency graph is constructed：It is to be relied between subtask node and subtask node by overall calculation task description The task dependency graph of side composition, will rely on node and is registered to as call back function in the readjustment container for being relied on node；

As an alternative embodiment, construction task dependency graph specifically includes following steps：

(1.1) task dependency graph task_graph objects are defined；

(1.2) calculating task is divided into some subtasks, calling task dependency graph pair by the attribute according to calculating task As each subtask is added in task dependency graph task_graph and by each sub by the insertion method that task_graph is provided Business is encapsulated as node object, returns to the pointer of each node object；Specifically, it can hold inside task_graph a to insertion The copy of the incoming task of method simultaneously adds one layer of encapsulation to constitute node objects to it.Task_graph can be returned after end is called The pointer of node objects is returned for subsequent operation.

(1.3) dependence between each subtask is constructed by the pointer of each node object, node will be relied on and be regarded as readjustment The in-degree for relying on node is added 1, plus 1 by the out-degree for being relied on node by function registration to being relied in the readjustment container of node.Tool For body, after the pointer of node objects is obtained, depends methods are called by the pointer and closed with constructing dependence between task System, it is assumed that there are node A logically dependent on node B, node C ... wait the completion of node, then explicitly call A-> depends(B,C,…).Node A executable portion internally can be registered to node by depends methods in the form of adjusting back B, node C ... in the readjustment container for waiting node.

(2) task dependency graph is pre-processed：The root node and leaf node in the task dependency graph are obtained, is institute There is leaf node to add a virtual dependence sink nodes, the virtual dependence sink nodes are used to block main thread；

As an alternative embodiment, step (2) specifically includes following steps：

(3) running environment is initialized：One is distributed without lock deque for each thread in thread pool and is empty, by all Node according to polling mode be put into each thread without lock deque bottom；

As an alternative embodiment, the number of threads in thread pool is consistent with the number of CPU hardware core.

(4) all tasks in task dependency graph are performed：For each thread, if thread without lock deque Be not sky, then taking out node without lock deque bottom from thread and including in performing node for task, in tasks carrying knot Shu Hou, performs all readjustments in readjustment container in node；If thread is sky without lock deque, the thread is attempted from it His thread without stealing node at the top of lock deque, if stealing successfully by the node stolen be pressed into the thread without locking both-end Queue bottom, performs the task in node of stealing and steals all readjustments adjusted back in node in container；

As an alternative embodiment, step (4) specifically includes following steps：

(5) restoring scene：After the completion of task in task dependency graph in all nodes is performed both by, by task dependency graph The in-degree of each node return to original value, and terminate the obstruction to main thread.

In general, the method for scheduling task that task based access control proposed by the present invention is stolen is applied to task level programming model, Implementation process is：A task_graph object is defined first to represent task dependency graph, according to the reality of overall calculation task Overall calculation task is divided into a large amount of subtasks and is added in task_graph objects and is encapsulated task by it by border situation For node object trustships.In the construction process of task dependency graph, the depends methods provided by task_graph objects refer to Dependence object in fixed one section of dependence is with being relied on object.Object can will be relied in the invoked procedure of depends methods Executable portion to be registered in the form of adjusting back in the readjustment container being relied in object, and respectively will rely on object with being relied on The in-degree and out-degree of object add 1, wherein because in-degree may be read and write by multiple threads so being set to atom variable.It is in office It is engaged in after dependency graph construction complete, the start methods for calling task_graph objects to provide begin a task with dependency graph pretreatment and meter Calculate flow implementation procedure.All node nodes in task_graph by trustship are entered first in preprocessing process, operation Degree information is backed up.By picking out all sections in all node nodes of trustship in task_graph during second operation Point and leaf node, the number of wherein root node is not restricted by, and adds virtual relies on for all leaf nodes and converge section Point.This virtual dependence sink nodes user is invisible, and it is that obstruction main thread prevents it from being tied before calculating task all terminates that it, which is acted on, Beam, and after all calculating tasks terminate according to previous backup task dependency graph information by task dependency graph restore to the original state for Reuse, a kind of flow signal pre-processed to task dependency graph with reference to disclosed in Fig. 2 show the embodiment of the present invention Figure.

It is illustrated in figure 3 a kind of data structure schematic diagram without lock deque and node disclosed in the embodiment of the present invention. In initial phase.Thread will be distributed according to the actual hardware core number of computer during operation, one between thread and core One correspondence, and for each thread distribute one it is privately owned without lock deque.All root nodes in root node set will Mode according to poll is pressed into the bottom in the deque of each thread successively.Hereafter all threads are constantly from deque Bottom take out node and perform wherein the including of the task.After tasks carrying is finished in node, thread can pick up node successively All readjustments registered in middle readjustment container.The number of node is relied on according to some node, its readjustment may be performed a number of times. The judgement to critical condition is added in the readjustment that node is registered：Readjustment is often performed once, the in-degree of the node of registered callbacks It can subtract 1, when the in-degree of a certain node is reduced to 0, represent its all node relied on and completed execution, it can be thrown immediately Enter operation, be the bottom that the node is pressed into all deques of current thread immediately by the way of in readjustment.

It is illustrated in figure 4 a kind of schematic flow sheet of load-balancing method disclosed in the embodiment of the present invention.Task based access control according to What the scheduling mode of bad figure can actually cause each thread execution is a subgraph in task dependency graph, but according to subgraph Scale and task amount, some subgraph execution time are longer, and some subgraph execution time is shorter, can cause the overall calculation time by holding The thread of row time at most is determined, while will also result in load imbalance.The present invention uses task stealing algorithm to realize load Equilibrium, its step is as follows：

If 1) belong to thread is not sky without lock deque, thread is from taking out node without lock deque bottom and hold OK；Otherwise step 2 is skipped to).

2) if belong to thread is empty without lock deque, the mode according to poll accesses the double without lock of other each threads Queue is held, if it is not sky without lock deque to send out thread existing, it tries from it without stealing section at the top of lock deque Put and be pressed into oneself without lock deque bottom, perform step 1)；Otherwise step 3 is skipped to).

3) epicycle CPU time slice is abandoned, thread enters resting state, when being waken up next time, performs step 1).

The embodiment of the present invention additionally provides the task scheduling system that a kind of task based access control is stolen, including：

In embodiments of the present invention, the specific implementation of each functional module may be referred to the description in embodiment of the method, The embodiment of the present invention will not be repeated.

The present invention uses such scheme, is better than other parallel algorithm schemes in performance, and on parallel program performance It is greatly improved, it is specific as follows：

1) it is based purely on task dependency graph to be scheduled task, it is to avoid the task that the controlling stream being manually set is caused is held Row delay；

2) thread is set to correspond with core, and the node that there will be dependence during operation is tried one's best and put to same line Run in journey, therefore latter task can be reduced as far as possible using the mistake used by previous task, the data being present in cpu cache Access the number of times of internal memory.

3) load balancing is realized using task stealing algorithm.

As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, it is not used to The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the invention etc., it all should include Within protection scope of the present invention.

Claims

1. the method for scheduling task that a kind of task based access control is stolen, it is characterised in that including：

(1) it is the task dependency graph that dependence edge is constituted between subtask node and subtask node by overall calculation task description, Node will be relied on to be registered to as call back function in the readjustment container for being relied on node；

(2) root node and leaf node in the task dependency graph are obtained, a virtual dependence is added for all leaf nodes Sink nodes, the virtual dependence sink nodes are used to block main thread；

(3) one being distributed without lock deque for each thread in thread pool and being empty, all root nodes are put according to polling mode Enter each thread without lock deque bottom；

(4) for each thread, if thread is not sky without lock deque, taken out from thread without lock deque bottom Node simultaneously performs being included in node for task, after tasks carrying terminates, and performs all readjustments in readjustment container in node；If Thread is sky without lock deque, then the thread is attempted from other threads without node is stolen at the top of lock deque, if stealing Take successfully then by the node stolen be pressed into the thread without lock deque bottom, execution is stolen the task in node and stolen All readjustments in container are adjusted back in node；

(5) after the completion of the task in task dependency graph in all nodes is performed both by, by entering for each node in task dependency graph Degree returns to original value, and terminates the obstruction to main thread.

2. according to the method described in claim 1, it is characterised in that step (1) specifically includes following steps：

(1.1) task dependency graph object is defined；

(1.2) calculating task is divided into some subtasks, calling task dependency graph object institute by the attribute according to calculating task Each subtask is added in task dependency graph and each subtask is encapsulated as into node object by the insertion method of offer, returns to each section The pointer of point object；

(1.3) dependence between each subtask is constructed by the pointer of each node object, node will be relied on and be regarded as call back function It is registered in the readjustment container for being relied on node, the in-degree for relying on node is added 1, the out-degree for being relied on node plus 1.

3. method according to claim 2, it is characterised in that step (2) specifically includes following steps：

(2.1) all in-degrees in task dependency graph are added into root node set for 0 node, all out-degree is added for 0 node Enter leaf node set；

(2.2) for leaf node set L={ L1, L2 ... }, the virtual dependence sink nodes virtual_sink_node of addition, and Call virtual_sink_node->Depends (L1, L2 ...), it is described it is virtual dependence sink nodes be used for block main thread until All tasks are completed, and prevent main thread from terminating in advance.

4. method according to claim 2, it is characterised in that step (4) specifically includes following steps：

(4.1) for each thread, if thread is not sky without lock deque, taken from thread without lock deque bottom Egress simultaneously performs being included in node for task, after tasks carrying terminates, and performs all readjustments in readjustment container in node, Often perform and once adjust back, the in-degree of the node of corresponding registered callbacks subtracts 1, if the in-degree of the node of a certain registered callbacks is subtracted To 0, then current thread by the node of the registered callbacks be pressed into current thread without lock deque bottom；

(4.2) if thread is sky without deque is locked, the thread is attempted to steal without lock deque top from other threads Node is taken, the node stolen is pressed into the thread without lock deque bottom if stealing successfully, and performs step (4.1), it is no Then abandon epicycle CPU time slice；

(4.3) step (4.1)~(4.2) are repeated until the task in task dependency graph in all nodes is performed both by completing.

5. the method according to Claims 1-4 any one, it is characterised in that number of threads in the thread pool with The number of CPU hardware core is consistent.

6. the task scheduling system that a kind of task based access control is stolen, it is characterised in that including：

Task dependency graph constructing module, for being to be relied between subtask node and subtask node by overall calculation task description The task dependency graph of side composition, will rely on node and is registered to as call back function in the readjustment container for being relied on node；

Pretreatment module, for obtaining root node and leaf node in the task dependency graph, is added for all leaf nodes One virtual dependence sink nodes, the virtual dependence sink nodes are used to block main thread；

Initialization module, for distributing one without lock deque for each thread in thread pool and being empty, all root nodes are pressed According to polling mode be put into each thread without lock deque bottom；

Task scheduling modules, it is double without lock from thread if thread is not sky without lock deque for for each thread End queue bottom takes out node and performs being included in node for task, after tasks carrying terminates, and performs in node and adjusts back container In all readjustments；If thread is sky without lock deque, the thread is attempted from other threads without lock deque top Node is stolen in portion, if stealing successfully by the node stolen be pressed into the thread without lock deque bottom, node is stolen in execution In task and steal in node adjust back container in all readjustments；

In-situ FTIR spectroelectrochemitry module, after the completion of being performed both by for the task in all nodes in task dependency graph, by task dependency graph In the in-degree of each node return to original value, and terminate the obstruction to main thread.

7. system according to claim 6, it is characterised in that the task dependency graph constructing module includes：

Definition module, for defining task dependency graph object；

Node package module, some subtasks, calling task are divided into for the attribute according to calculating task by the calculating task Each subtask is added in task dependency graph and each subtask is encapsulated as into node by the insertion method that dependency graph object is provided Object, returns to the pointer of each node object；

Registering modules are adjusted back, the dependence between each subtask is constructed for the pointer by each node object, node will be relied on It is regarded as call back function to be registered in the call back function container for being relied on node, the in-degree for relying on node is added 1, node will be relied on Out-degree add 1.

8. system according to claim 7, it is characterised in that the pretreatment module includes：

Node division module, for all in-degrees in task dependency graph to be added into root node set for 0 node, by all out-degree Leaf node set is added for 0 node；

Virtual dependence sink nodes constructing module, for for leaf node set L={ L1, L2 ... }, addition virtually to be relied on to converge and saved Point virtual_sink_node, and call virtual_sink_node->Depends (L1, L2 ...), virtual rely on is converged Node is used to block main thread until all tasks are completed, and prevents main thread from terminating in advance.

9. system according to claim 7, it is characterised in that the task scheduling modules include：

Task execution module, it is double without lock from thread if thread is not sky without lock deque for for each thread End queue bottom takes out node and performs being included in node for task, after tasks carrying terminates, and performs in node and adjusts back container In all readjustments, often perform and once adjust back, the in-degree of the node of corresponding registered callbacks subtracts 1, if a certain registered callbacks The in-degree of node is reduced to 0, then current thread by the nodes of the registered callbacks be pressed into current thread without lock deque bottom；

Task stealing module, for, without deque is locked for space-time, the thread to be attempted double without lock from other threads in thread Node is stolen at the top of the queue of end, the node stolen the thread is pressed into without lock deque bottom if stealing successfully, and take out The node stolen is performed, and otherwise abandons epicycle CPU time slice；Repeat the task execution module and the task The operation of module is stolen until the task in task dependency graph in all nodes is performed both by completing.

10. the system according to claim 6 to 9 any one, it is characterised in that number of threads in the thread pool with The number of CPU hardware core is consistent.