CN107220111B - A kind of method for scheduling task that task based access control is stolen and system - Google Patents

A kind of method for scheduling task that task based access control is stolen and system Download PDF

Info

Publication number
CN107220111B
CN107220111B CN201710290460.6A CN201710290460A CN107220111B CN 107220111 B CN107220111 B CN 107220111B CN 201710290460 A CN201710290460 A CN 201710290460A CN 107220111 B CN107220111 B CN 107220111B
Authority
CN
China
Prior art keywords
node
task
thread
lock
deque
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710290460.6A
Other languages
Chinese (zh)
Other versions
CN107220111A (en
Inventor
金海�
李陈希
廖小飞
石翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201710290460.6A priority Critical patent/CN107220111B/en
Publication of CN107220111A publication Critical patent/CN107220111A/en
Application granted granted Critical
Publication of CN107220111B publication Critical patent/CN107220111B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/483Multiproc
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/485Resource constraint
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/486Scheduler internals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/508Monitor

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a kind of method for scheduling task that task based access control is stolen and system, the realization of this method includes: construction task dependency graph, will rely on node and is registered in the readjustment container for being relied on node as call back function;One is distributed for thread each in thread pool deque and to empty without lock, by root node according to polling mode be put into each thread without lock deque bottom;If thread be not without lock deque it is empty, from taking out node without lock deque bottom and execute;If thread is sky without lock deque, from other threads without stealing node at the top of lock deque, and by the node stolen be pressed into the thread without lock deque bottom, take out the node stolen and executed;After the completion of all node tasks execute, the in-degree of node each in task dependency graph is restored to original value, and terminate the obstruction to main thread.The present invention is directed to large-scale task-level parallelism application program, can effectively improve the performance of task with traditional grade concurrent application.

Description

A kind of method for scheduling task that task based access control is stolen and system
Technical field
The invention belongs to computer parallel Programming technical fields, steal more particularly, to a kind of task based access control Method for scheduling task and system.
Background technique
It constantly approaches the transistor size of physics limit and power consumption seriously limits the hair of single core processor in computer Exhibition, again only can not release new processor with waiting chip manufacturer as before can obtain the promotion of program feature. Further to promote application program capacity, only relies on and multiple cores are integrated into single cpu and by application program parallelization Method.The monokaryon serial epoch are over, and programmer starts the multi-core parallel concurrent epoch of marching toward.
Traditional parallel programming model (OpenMP including MPI and Versions) is only facing expert's grade, senior programmer Or the application of rule can only be adapted to.Multicore era it is desirable that towards broader application field, easy programming, high production capacity and Row programming tool.Many novel parallel programming models were emerged in recent years, wherein task-level parallelism programming model is because have suitable Become parallel programming model preferred on multi-core platform with advantage wide, that programming is convenient, computing resource utilization rate is high.Task Grade parallel programming model provides task division and synchronous programming interface, task is drawn using task as parallel basic unit Divide and synchronous working gives programmer's completion, user can mark off application program a large amount of finegrained tasks.However, specific to Each task is parallel to execute or serial execute, execute on which physical core and how same between realization task on earth Step is then completed by runtime system.Task-level parallelism programming model advocates nested recurrence task, and introduces and calculated with task stealing Method be core user-level thread dispatch, realize program high-performance and dynamic load balance.
It is similar with general procedure, allow programmer using control stream to realize that program is patrolled in task-level parallelism programming model Volume.Basic block end in control stream, programmer can voluntarily add or by running when implicitly adds simultaneously operating basic All task executions in basic block are waited to finish to prevent from occurring data contention between basic block when being executed at block end.However For large-scale concurrent application, if the control stream of program is complex, these simultaneously operatings, which will lead to, there are following problems:
(1) if dependence is not present for the task in the different basic blocks being distributed in same control stream or there is only parts Dependence, since basic block end is there are simultaneously operating, all tasks in basic block in time series rearward must be waited It can participate in dispatching after to all task executions in forward basic block.Institute in basic block in time series rearward There is task artificially to be prolonged there are one by what simultaneously operating between block introduced from into ready state to dispatching when being actually run Late.
(2) modern computer uses Caching hierarchies structure, and the data of reuse can be temporarily stored in cpu cache.Work as task Between communication when being based on shared drive model, there are the tasks of dependence often to share with a piece of region of memory.If it exists The task of dependence is distributed in the different basic blocks in control stream.When the basic BOB(beginning of block) where dependence task executes When, the execution of a large amount of unrelated tasks where being relied on task in basic block causes the required data in caching are larger may be changed Out, poor so as to cause program locality.
Summary of the invention
Aiming at the above defects or improvement requirements of the prior art, the present invention provides a kind of task tune that task based access control is stolen Method and system are spent, for large-scale task-level parallelism application program, the task schedule for proposing the driving of task based access control dependency graph is thought Think, the performance of task with traditional grade concurrent application can be effectively improved.
To achieve the above object, according to one aspect of the present invention, a kind of task schedule that task based access control is stolen is provided Method, comprising:
It is the task dependency graph being made of dependence edge between subtask node and subtask node by overall calculation task description, Node will be relied on to be registered to as call back function in the readjustment container for being relied on node;
The root node and leaf node in the task dependency graph are obtained, adds a virtual dependence for all leaf nodes Sink nodes, the virtual dependence sink nodes are for blocking main thread;
One is distributed without lock deque for thread each in thread pool and is emptied, and all root nodes are put according to polling mode Enter each thread without lock deque bottom;
For per thread, if thread is not sky without lock deque, from being taken without lock deque bottom for thread Egress simultaneously executes task of including in node, after task execution, executes all readjustments adjusted back in container in node; If thread is sky without deque is locked, which attempts to steal node without lock deque top from other threads, if Steal successfully then by the node stolen be pressed into the thread without lock deque bottom, execute the task in node stolen and steal Take all readjustments adjusted back in container in node;
After the completion of task in task dependency graph in all nodes is performed both by, by entering for each node in task dependency graph Degree is restored to original value, and terminates the obstruction to main thread.
Preferably, step (1) specifically includes the following steps:
(1.1) task dependency graph object is defined;
(1.2) calculating task is divided into several subtasks according to the attribute of calculating task, calls task dependency graph pair Each subtask is added in task dependency graph as provided insertion method and each subtask is encapsulated as node object, is returned The pointer of each node object;
(1.3) dependence between each subtask is constructed by the pointer of each node object, node will be relied on and be regarded as adjusting back Function registration adds 1 into the readjustment container for being relied on node, by the in-degree for relying on node, and the out-degree for being relied on node is added 1.
Preferably, step (2) specifically includes the following steps:
(2.1) root node set is added in the node that in-degrees all in task dependency graph are 0, the section for being 0 by all out-degree Leaf node set is added in point;
(2.2) for leaf node set L={ L1, L2 ... }, virtual dependence sink nodes virtual_sink_ is added Node, and virtual_sink_node- > depends (L1, L2 ...) is called, the virtual dependence sink nodes are for blocking master Thread is completed until all tasks, prevents main thread from terminating in advance.
Preferably, step (4) specifically includes the following steps:
(4.1) for per thread, if thread is not sky without lock deque, from thread without lock deque bottom Portion takes out node and executes task of including in node, after task execution, executes and adjusts back owning in container in node Readjustment, every primary readjustment of execution, the in-degree of the node of corresponding registered callbacks subtracts 1, if the node of a certain registered callbacks enters Degree is reduced to 0, then current thread is by the node indentation current thread of the registered callbacks without lock deque bottom;
(4.2) if thread is sky without lock deque, which is attempted from other threads without lock deque top Node is stolen in portion, the node stolen is pressed into the thread without lock deque bottom if stealing successfully, and execute step (4.1), epicycle CPU time slice is otherwise abandoned;
(4.3) step (4.1)~(4.2) are repeated until the task in task dependency graph in all nodes has been performed both by At.
Preferably, the number of threads in the thread pool is consistent with the number of CPU hardware core.
It is another aspect of this invention to provide that providing a kind of task scheduling system that task based access control is stolen, comprising:
Task dependency graph constructing module, for being by between subtask node and subtask node by overall calculation task description The task dependency graph of dependence edge composition will rely on node and is registered in the readjustment container for being relied on node as call back function;
Preprocessing module, for obtaining root node and leaf node in the task dependency graph, for all leaf nodes Addition one virtual dependence sink nodes, the virtual dependence sink nodes are for blocking main thread;
Initialization module saves all for distributing one without lock deque for thread each in thread pool and emptying Point according to polling mode be put into each thread without lock deque bottom;
Task scheduling modules are used for for per thread, if thread is not sky without lock deque, from the nothing of thread Lock deque bottom takes out node and executes task of including in node, after task execution, executes and adjusts back in node All readjustments in container;If thread is sky without lock deque, which is attempted from other threads without lock both-end team Node is stolen at column top, if stealing successfully by the node stolen be pressed into the thread without deque bottom is locked, execution steals Task in node and steal all readjustments adjusted back in container in node;
In-situ FTIR spectroelectrochemitry module, after the completion of being performed both by for the task in nodes all in task dependency graph, by task according to Rely the in-degree of each node in figure to be restored to original value, and terminates the obstruction to main thread.
Preferably, the task dependency graph constructing module includes:
Definition module, for defining task dependency graph object;
The calculating task is divided into several subtasks for the attribute according to calculating task, called by node package module Each subtask is added in task dependency graph and is encapsulated as each subtask by insertion method provided by task dependency graph object Node object returns to the pointer of each node object;
Readjustment registration module will be relied on for constructing the dependence between each subtask by the pointer of each node object Node is regarded as call back function and is registered in the call back function container for being relied on node, and the in-degree for relying on node is added 1, will be relied on The out-degree of node adds 1.
Preferably, the preprocessing module includes:
Node division module will own for the node addition root node set for being 0 by in-degrees all in task dependency graph Leaf node set is added in the node that out-degree is 0;
It is virtual to rely on sink nodes constructing module, for adding virtual rely on for leaf node set L={ L1, L2 ... } Sink nodes virtual_sink_node, and call virtual_sink_node- > depends (L1, L2 ...), it is described virtually according to Rely sink nodes for blocking main thread until all tasks are completed, prevents main thread from terminating in advance.
Preferably, the task scheduling modules include:
Task execution module is used for for per thread, if thread is not sky without lock deque, from the nothing of thread Lock deque bottom takes out node and executes task of including in node, after task execution, executes and adjusts back in node All readjustments in container, every primary readjustment of execution, the in-degree of the node of corresponding registered callbacks subtracts 1, if a certain register back The in-degree of the node of tune is reduced to 0, then current thread is by the node indentation current thread of the registered callbacks without lock deque Bottom;
Task stealing module, for when it is empty that thread is without lock deque, which to attempt the nothing from other threads Node is stolen at the top of lock deque, the node stolen is pressed into the thread without lock deque bottom if stealing successfully, and It takes out the node stolen to be executed, otherwise abandons epicycle CPU time slice;Repeat the task execution module and described The operation of task stealing module is until the task in task dependency graph in all nodes is performed both by completion.
Preferably, the number of threads in the thread pool is consistent with the number of CPU hardware core.
In general, through the invention it is contemplated above technical scheme is compared with the prior art, mainly have skill below Art advantage:
(1) construct dependence between task using callback mechanism, there is no any entity between indicating task according to Lai Bian, dependence node in task dependency graph only indicate dependence with to being relied on Node registry readjustment.This feature can be used Reduce the implementation complexity of task dependency graph, task node can only be made of Task entity, reference count and readjustment container.
(2) load balancing is realized using task stealing method.
(3) easily task dependency graph is restored to the original state after all task executions are complete.Assuming that in a task dependency graph There are V node and E dependence edge, then the time and space complexity for constructing the process of task dependency graph are O (V+E).When appoint When business node is more or relationships between nodes are complex, the construction of task dependency graph will be a more time-consuming and consuming space Process.Realistic problem is often larger, if the task dependency graph constructed is reused, can substantially reduce task Specific gravity shared by the cost of dependency graph construction.Every time reuse before user replace data source can be realized " one construction, repeatedly make With ".
(4) using the method for scheduling task for being based strictly on task dependency graph: since root node, when in a task node Task execution when finishing, ready child node can be pressed into immediately to ready queue prepare investment and execute.In this way Reduce the unnecessary delay introduced due to the control stream in program.And generally there are in data between father node and child node Dependence, child node can be dispatched to the thread locating for father node to execute by such dispatching method, to utilize as far as possible The data in CPU cache are buffered in by father node, reduce memory access number, improve performance.
Detailed description of the invention
Fig. 1 is a kind of flow diagram for the method for scheduling task that task based access control is stolen disclosed by the embodiments of the present invention;
Fig. 2 is that a kind of pair of task dependency graph disclosed by the embodiments of the present invention carries out pretreated flow diagram;
Fig. 3 is a kind of data structure schematic diagram without lock deque and node disclosed by the embodiments of the present invention;
Fig. 4 is a kind of flow diagram of load-balancing method disclosed by the embodiments of the present invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below Not constituting a conflict with each other can be combined with each other.
It is as shown in Figure 1 a kind of process signal for the method for scheduling task that task based access control is stolen disclosed by the embodiments of the present invention Figure, comprising the following steps:
(1) it constructs task dependency graph: being by being relied between subtask node and subtask node by overall calculation task description The task dependency graph of side composition will rely on node and is registered in the readjustment container for being relied on node as call back function;
As an alternative embodiment, construction task dependency graph specifically includes the following steps:
(1.1) task dependency graph task_graph object is defined;
(1.2) calculating task is divided into several subtasks according to the attribute of calculating task, calls task dependency graph pair Each subtask is added in task dependency graph task_graph and appoints each son by the insertion method as provided by task_graph Business is encapsulated as node object, returns to the pointer of each node object;Specifically, portion can be held inside task_graph to insertion The copy for the task that method is passed to simultaneously adds one layer of encapsulation to constitute node object to it.Task_graph can be returned after calling The pointer of node object is returned for subsequent operation.
(1.3) dependence between each subtask is constructed by the pointer of each node object, node will be relied on and be regarded as adjusting back Function registration adds 1 into the readjustment container for being relied on node, by the in-degree for relying on node, and the out-degree for being relied on node is added 1.Tool For body, after the pointer for obtaining node object, depends method is called by the pointer to construct dependence between task and close System, it is assumed that there are node A logically depend on node B, node C ... wait the completion of nodes, then explicitly call A- > depends(B,C,…).Depends method is registered to node in the form that inside can adjust back the execution part of node A B, node C ... wait in the readjustment container of nodes.
(2) task dependency graph is pre-processed: obtains the root node and leaf node in the task dependency graph, for institute There is leaf node to add a virtual dependence sink nodes, the virtual dependence sink nodes are for blocking main thread;
As an alternative embodiment, step (2) specifically includes the following steps:
(2.1) root node set is added in the node that in-degrees all in task dependency graph are 0, the section for being 0 by all out-degree Leaf node set is added in point;
(2.2) for leaf node set L={ L1, L2 ... }, virtual dependence sink nodes virtual_sink_ is added Node, and virtual_sink_node- > depends (L1, L2 ...) is called, the virtual dependence sink nodes are for blocking master Thread is completed until all tasks, prevents main thread from terminating in advance.
(3) it initializes running environment: distributing one without lock deque for thread each in thread pool and empty, by all Node according to polling mode be put into each thread without lock deque bottom;
As an alternative embodiment, the number of threads in thread pool is consistent with the number of CPU hardware core.
(4) all tasks in task dependency graph are executed: for per thread, if thread without lock deque It is not sky, then without lock deque bottom taking-up node and task of including in node is executed from thread, in task execution knot Shu Hou executes all readjustments adjusted back in container in node;If thread is sky without lock deque, which is attempted from it His thread without stealing node at the top of lock deque, if stealing successfully by the node stolen be pressed into the thread without locking both-end Queue bottom executes all readjustments stealing the task in node and stealing in node in readjustment container;
As an alternative embodiment, step (4) specifically includes the following steps:
(4.1) for per thread, if thread is not sky without lock deque, from thread without lock deque bottom Portion takes out node and executes task of including in node, after task execution, executes and adjusts back owning in container in node Readjustment, every primary readjustment of execution, the in-degree of the node of corresponding registered callbacks subtracts 1, if the node of a certain registered callbacks enters Degree is reduced to 0, then current thread is by the node indentation current thread of the registered callbacks without lock deque bottom;
(4.2) if thread is sky without lock deque, which is attempted from other threads without lock deque top Node is stolen in portion, the node stolen is pressed into the thread without lock deque bottom if stealing successfully, and execute step (4.1), epicycle CPU time slice is otherwise abandoned;
(4.3) step (4.1)~(4.2) are repeated until the task in task dependency graph in all nodes has been performed both by At.
(5) restoring scene:, will be in task dependency graph after the completion of the task in task dependency graph in all nodes is performed both by The in-degree of each node be restored to original value, and terminate the obstruction to main thread.
In general, the method for scheduling task that task based access control proposed by the present invention is stolen is applied to task level programming model, Realization process are as follows: define a task_graph object first to indicate task dependency graph, according to the reality of overall calculation task Overall calculation task is divided into a large amount of subtasks and is added in task_graph object and encapsulates by it by task by border situation For node object trustship.In the construction process of task dependency graph, the depends method provided by task_graph object refers to Determine the dependence object in one section of dependence and is relied on object.Object can will be relied in the calling process of depends method Execution part be registered in the readjustment container being relied in object in the form adjusted back, and will rely on object and be relied on respectively The in-degree and out-degree of object add 1, wherein since in-degree may be read and write by multiple threads so being set to atom variable.It is in office It is engaged in after dependency graph construction complete, the start method for calling task_graph object to provide begins a task with dependency graph pretreatment and meter Calculate process implementation procedure.It will be entered in task_graph by all node nodes of trustship first in preprocessing process, operation Degree information is backed up.By picking out all sections in all node nodes of trustship in task_graph when second operation Point and leaf node, wherein the number of root node is not restricted by, and is added virtual relies on for all leaf nodes and converged section Point.This virtual dependence sink nodes user is invisible, and effect is that obstruction main thread prevents it from tying before calculating task all terminates Beam, and after all calculating tasks according to previous backup task dependency graph information by task dependency graph restore to the original state for It reuses, show a kind of pair of task dependency graph disclosed by the embodiments of the present invention with reference to Fig. 2 and carry out pretreated process signal Figure.
It is illustrated in figure 3 a kind of data structure schematic diagram without lock deque and node disclosed by the embodiments of the present invention. In initial phase.Thread will be distributed according to the actual hardware core number of computer when operation, one between thread and core One is corresponding, and for each thread distribute one it is privately owned without lock deque.All root nodes in root node set will The bottom in the deque of each thread is successively pressed into according to the mode of poll.Hereafter all threads are constantly from deque Bottom take out and node and execute task wherein included.In node after task execution, thread can successively pick up node All readjustments registered in middle readjustment container.According to the number of the relied on node of some node, readjustment may be performed a number of times. The judgement to critical condition is increased in the readjustment that node is registered: readjustment is every to execute primary, the in-degree of the node of registered callbacks It can subtract 1, when the in-degree of a certain node is reduced to 0, represent all nodes that it is relied on and execution is completed, can be thrown immediately Enter operation, is the bottom that the node is pressed into all deques of current thread immediately by the way of in readjustment.
It is illustrated in figure 4 a kind of flow diagram of load-balancing method disclosed by the embodiments of the present invention.Task based access control according to What the scheduling mode of bad figure can actually make per thread execution is a subgraph in task dependency graph, but according to subgraph Scale and task amount, some subgraph execution time is longer, and some subgraph execution time is shorter, will cause the overall calculation time by holding The thread of row time at most determines, while will also result in load imbalance.The present invention uses task stealing algorithm to realize load Equilibrium, its step are as follows:
If 1) belong to thread be not without lock deque it is empty, thread is from taking out node without lock deque bottom and hold Row;Otherwise step 2) is skipped to.
2) if belong to thread is sky without lock deque, the double without lock of other each threads are accessed according to the mode of poll Queue is held, if sending out thread existing is not empty without lock deque, it tries from it without stealing section at the top of lock deque Put and be pressed into oneself without lock deque bottom, execute step 1);Otherwise step 3) is skipped to.
3) epicycle CPU time slice is abandoned, thread enters dormant state, when being waken up next time, executes step 1).
The embodiment of the invention also provides a kind of task scheduling systems that task based access control is stolen, comprising:
Task dependency graph constructing module, for being by between subtask node and subtask node by overall calculation task description The task dependency graph of dependence edge composition will rely on node and is registered in the readjustment container for being relied on node as call back function;
Preprocessing module, for obtaining root node and leaf node in the task dependency graph, for all leaf nodes Addition one virtual dependence sink nodes, the virtual dependence sink nodes are for blocking main thread;
Initialization module saves all for distributing one without lock deque for thread each in thread pool and emptying Point according to polling mode be put into each thread without lock deque bottom;
Task scheduling modules are used for for per thread, if thread is not sky without lock deque, from the nothing of thread Lock deque bottom takes out node and executes task of including in node, after task execution, executes and adjusts back in node All readjustments in container;If thread is sky without lock deque, which is attempted from other threads without lock both-end team Node is stolen at column top, if stealing successfully by the node stolen be pressed into the thread without deque bottom is locked, execution steals Task in node and steal all readjustments adjusted back in container in node;
In-situ FTIR spectroelectrochemitry module, after the completion of being performed both by for the task in nodes all in task dependency graph, by task according to Rely the in-degree of each node in figure to be restored to original value, and terminates the obstruction to main thread.
In embodiments of the present invention, the specific implementation of each functional module can refer to the description in embodiment of the method, The embodiment of the present invention will not be repeated.
The present invention using the above scheme, is better than other parallel algorithm schemes, and on parallel program performance in performance It is greatly improved, specific as follows:
1) it is based purely on task dependency graph to be scheduled task, avoids task caused by the control stream of artificial settings and hold Row delay;
2) thread is set to correspond with core, and the node that dependence will be present in when operation is put as far as possible to same line It runs, therefore the data that latter task can utilize as far as possible the mistake used by previous task, be present in cpu cache, reduces in journey Access the number of memory.
3) load balancing is realized using task stealing algorithm.
As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should all include Within protection scope of the present invention.

Claims (10)

1. a kind of method for scheduling task that task based access control is stolen characterized by comprising
It (1) is the task dependency graph being made of dependence edge between subtask node and subtask node by overall calculation task description, Node will be relied on to be registered to as call back function in the readjustment container for being relied on node;
(2) root node and leaf node in the task dependency graph are obtained, adds a virtual dependence for all leaf nodes Sink nodes, the virtual dependence sink nodes are for blocking main thread;
(3) one being distributed without lock deque for thread each in thread pool and being emptied, all root nodes are put according to polling mode Enter each thread without lock deque bottom;
(4) for per thread, if thread is not sky without lock deque, from being taken out without lock deque bottom for thread Node simultaneously executes task of including in node, after task execution, executes all readjustments adjusted back in container in node;If Thread is sky without lock deque, then the thread is attempted from other threads without node is stolen at the top of lock deque, if stealing Take successfully then by the node stolen be pressed into the thread without lock deque bottom, execution steal the task in node and steal All readjustments in container are adjusted back in node;
(5) after the completion of the task in task dependency graph in all nodes is performed both by, by entering for each node in task dependency graph Degree is restored to original value, and terminates the obstruction to main thread.
2. the method according to claim 1, wherein step (1) specifically includes the following steps:
(1.1) task dependency graph object is defined;
(1.2) calculating task is divided into several subtasks according to the attribute of calculating task, calls task dependency graph object institute Each subtask is added in task dependency graph and each subtask is encapsulated as node object by the insertion method of offer, returns to each section The pointer of point object;
(1.3) dependence between each subtask is constructed by the pointer of each node object, node will be relied on and be regarded as call back function It is registered in the readjustment container for being relied on node, the in-degree for relying on node is added 1, the out-degree for being relied on node is added 1.
3. according to the method described in claim 2, it is characterized in that, step (2) specifically includes the following steps:
(2.1) root node set is added in the node that in-degrees all in task dependency graph are 0, the node that all out-degree are 0 is added Enter leaf node set;
(2.2) for leaf node set L={ L1, L2 ... }, virtual dependence sink nodes virtual_sink_node is added, and Call virtual_sink_node- > depends (L1, L2 ...), virtual the dependences sink nodes be used for block main thread until All tasks are completed, and prevent main thread from terminating in advance.
4. according to the method described in claim 2, it is characterized in that, step (4) specifically includes the following steps:
(4.1) for per thread, if thread is not sky without lock deque, from being taken without lock deque bottom for thread Egress simultaneously executes task of including in node, after task execution, executes all readjustments adjusted back in container in node, Every primary readjustment of execution, the in-degree of the node of corresponding registered callbacks subtracts 1, if the in-degree of the node of a certain registered callbacks is subtracted To 0, then current thread is by the node indentation current thread of the registered callbacks without lock deque bottom;
(4.2) if thread is sky without deque is locked, which attempts stealing without lock deque top from other threads Node is taken, the node stolen is pressed into the thread without lock deque bottom if stealing successfully, and execute step (4.1), it is no Then abandon epicycle CPU time slice;
(4.3) step (4.1)~(4.2) are repeated until the task in task dependency graph in all nodes is performed both by completion.
5. the method according to claim 1, which is characterized in that number of threads in the thread pool with The number of CPU hardware core is consistent.
6. a kind of task scheduling system that task based access control is stolen characterized by comprising
Task dependency graph constructing module, for being by being relied between subtask node and subtask node by overall calculation task description The task dependency graph of side composition will rely on node and is registered in the readjustment container for being relied on node as call back function;
Preprocessing module, for obtaining root node and leaf node in the task dependency graph, for the addition of all leaf nodes One virtual dependence sink nodes, the virtual dependence sink nodes are for blocking main thread;
Initialization module presses all root nodes for distributing one without lock deque for thread each in thread pool and emptying According to polling mode be put into each thread without lock deque bottom;
Task scheduling modules are used for for per thread, if thread is not sky without lock deque, from the double without lock of thread End queue bottom takes out node and executes task of including in node, after task execution, executes in node and adjusts back container In all readjustments;If thread is sky without lock deque, which is attempted from other threads without lock deque top Node is stolen in portion, if stealing successfully by the node stolen be pressed into the thread without lock deque bottom, execution steal node In task and steal in node adjust back container in all readjustments;
In-situ FTIR spectroelectrochemitry module, after the completion of being performed both by for the task in nodes all in task dependency graph, by task dependency graph In the in-degree of each node be restored to original value, and terminate the obstruction to main thread.
7. system according to claim 6, which is characterized in that the task dependency graph constructing module includes:
Definition module, for defining task dependency graph object;
The calculating task is divided into several subtasks for the attribute according to calculating task, calls task by node package module Each subtask is added in task dependency graph and each subtask is encapsulated as node by insertion method provided by dependency graph object Object returns to the pointer of each node object;
Node will be relied on for constructing the dependence between each subtask by the pointer of each node object by adjusting back registration module It is regarded as call back function to be registered in the call back function container for being relied on node, the in-degree for relying on node is added 1, node will be relied on Out-degree add 1.
8. system according to claim 7, which is characterized in that the preprocessing module includes:
Node division module, for the node addition root node set for being 0 by in-degrees all in task dependency graph, by all out-degree Leaf node set is added for 0 node;
It is virtual to rely on sink nodes constructing module, it is used to that leaf node set L={ L1, L2 ... } to be added virtually to rely on to converge and be saved Point virtual_sink_node, and virtual_sink_node- > depends (L1, L2 ...) is called, the virtual dependence converges Node prevents main thread from terminating in advance for blocking main thread until all tasks are completed.
9. system according to claim 7, which is characterized in that the task scheduling modules include:
Task execution module is used for for per thread, if thread is not sky without lock deque, from the double without lock of thread End queue bottom takes out node and executes task of including in node, after task execution, executes in node and adjusts back container In all readjustments, every primary readjustment of execution, the in-degree of the node of corresponding registered callbacks subtracts 1, if a certain registered callbacks The in-degree of node is reduced to 0, then current thread is by the node indentation current thread of the registered callbacks without lock deque bottom;
Task stealing module, for when it is empty that thread is without lock deque, which to be attempted from the double without lock of other threads Node is stolen at the top of the queue of end, the node stolen is pressed into the thread without lock deque bottom if stealing successfully, and take out The node stolen is executed, and epicycle CPU time slice is otherwise abandoned;Repeat the task execution module and the task The operation of module is stolen until the task in task dependency graph in all nodes is performed both by completion.
10. according to system described in claim 6 to 9 any one, which is characterized in that number of threads in the thread pool with The number of CPU hardware core is consistent.
CN201710290460.6A 2017-04-28 2017-04-28 A kind of method for scheduling task that task based access control is stolen and system Active CN107220111B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710290460.6A CN107220111B (en) 2017-04-28 2017-04-28 A kind of method for scheduling task that task based access control is stolen and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710290460.6A CN107220111B (en) 2017-04-28 2017-04-28 A kind of method for scheduling task that task based access control is stolen and system

Publications (2)

Publication Number Publication Date
CN107220111A CN107220111A (en) 2017-09-29
CN107220111B true CN107220111B (en) 2019-08-09

Family

ID=59943696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710290460.6A Active CN107220111B (en) 2017-04-28 2017-04-28 A kind of method for scheduling task that task based access control is stolen and system

Country Status (1)

Country Link
CN (1) CN107220111B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344136A (en) * 2018-12-13 2019-02-15 浪潮(北京)电子信息产业有限公司 A kind of access method of shared-file system, device and equipment
CN110597606B (en) * 2019-08-13 2022-02-18 中国电子科技集团公司第二十八研究所 Cache-friendly user-level thread scheduling method
CN110908794B (en) * 2019-10-09 2023-04-28 上海交通大学 Task stealing method and system based on task stealing algorithm
CN113703939B (en) * 2021-08-30 2024-06-14 竞技世界(北京)网络技术有限公司 Task scheduling method and system and electronic equipment
CN113703941B (en) * 2021-08-30 2024-08-06 竞技世界(北京)网络技术有限公司 Task scheduling method and system and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101226487A (en) * 2008-01-30 2008-07-23 中国船舶重工集团公司第七〇九研究所 Method for implementing inner core level thread library based on built-in Linux operating system
CN104156260A (en) * 2014-08-07 2014-11-19 北京航空航天大学 Concurrent queue access control method and system based on task eavesdropping
CN104965754A (en) * 2015-03-31 2015-10-07 腾讯科技(深圳)有限公司 Task scheduling method and task scheduling apparatus
CN105528243A (en) * 2015-07-02 2016-04-27 中国科学院计算技术研究所 A priority packet scheduling method and system utilizing data topological information
CN106055311A (en) * 2016-05-26 2016-10-26 浙江工业大学 Multi-threading Map Reduce task parallelizing method based on assembly line

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150363229A1 (en) * 2014-06-11 2015-12-17 Futurewei Technologies, Inc. Resolving task dependencies in task queues for improved resource management

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101226487A (en) * 2008-01-30 2008-07-23 中国船舶重工集团公司第七〇九研究所 Method for implementing inner core level thread library based on built-in Linux operating system
CN104156260A (en) * 2014-08-07 2014-11-19 北京航空航天大学 Concurrent queue access control method and system based on task eavesdropping
CN104965754A (en) * 2015-03-31 2015-10-07 腾讯科技(深圳)有限公司 Task scheduling method and task scheduling apparatus
CN105528243A (en) * 2015-07-02 2016-04-27 中国科学院计算技术研究所 A priority packet scheduling method and system utilizing data topological information
CN106055311A (en) * 2016-05-26 2016-10-26 浙江工业大学 Multi-threading Map Reduce task parallelizing method based on assembly line

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
任务并行编程模型研究与进展;王蕾;《软件学报》;20121123;第24卷(第1期);全文 *

Also Published As

Publication number Publication date
CN107220111A (en) 2017-09-29

Similar Documents

Publication Publication Date Title
CN107220111B (en) A kind of method for scheduling task that task based access control is stolen and system
Lelli et al. An efficient and scalable implementation of global EDF in Linux
US7647590B2 (en) Parallel computing system using coordinator and master nodes for load balancing and distributing work
US7975269B2 (en) Parallel processor methods and apparatus
CN102027452B (en) Scheduling collections in a scheduler
US7280558B1 (en) Asynchronous pattern
Mukherjee et al. A comprehensive performance analysis of HSA and OpenCL 2.0
CN110597606B (en) Cache-friendly user-level thread scheduling method
US8141076B2 (en) Cell processor methods and apparatus
CN105045658A (en) Method for realizing dynamic dispatching distribution of task by multi-core embedded DSP (Data Structure Processor)
CN111861412B (en) Completion time optimization-oriented scientific workflow scheduling method and system
Lin et al. An efficient work-stealing scheduler for task dependency graph
JP2015516633A (en) Apparatus, system, and memory management method
US20100036641A1 (en) System and method of estimating multi-tasking performance
CN111459622B (en) Method, device, computer equipment and storage medium for scheduling virtual CPU
EP2282265A1 (en) A hardware task scheduler
CN107133099B (en) A kind of cloud computing method
Radulescu et al. LLB: A fast and effective scheduling algorithm for distributed-memory systems
US11861416B2 (en) Critical section speedup using help-enabled locks
Bosch et al. Characterizing and improving the performance of many-core task-based parallel programming runtimes
Dang et al. Eliminating contention bottlenecks in multithreaded MPI
Hippold et al. Task pool teams for implementing irregular algorithms on clusters of SMPs
KR20180082560A (en) Method and apparatus for time-based scheduling of tasks
Wagner et al. User-land work stealing schedulers: Towards a standard
Hippold et al. A communication API for implementing irregular algorithms on SMP clusters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant