CN117222980A - Task scheduling method and device - Google Patents

Task scheduling method and device Download PDF

Info

Publication number
CN117222980A
CN117222980A CN202180097744.8A CN202180097744A CN117222980A CN 117222980 A CN117222980 A CN 117222980A CN 202180097744 A CN202180097744 A CN 202180097744A CN 117222980 A CN117222980 A CN 117222980A
Authority
CN
China
Prior art keywords
task
graph
barrer
information table
task graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180097744.8A
Other languages
Chinese (zh)
Inventor
张森
赵庆贺
杨意
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN117222980A publication Critical patent/CN117222980A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A task scheduling method and device relates to the technical field of computers, and solves the problems that the CPU needs to initialize the dependency relationship corresponding to a task graph to a scheduling device again when the scheduling device executes the task graph every time, so that the initialization time is long and the computing efficiency is affected. The method comprises the following steps: the task scheduling device comprises one or more task graph templates, wherein each task graph template is used for indicating the dependency relationship among a plurality of tasks included in the task graph template and the processing mode of each task; task scheduling means for: acquiring input data of a first task graph and a task graph template identifier corresponding to the first task graph; determining a task graph template corresponding to the first task graph in one or more task graph templates based on the task graph template identification corresponding to the first task graph; and scheduling the first task graph based on the input data of the first task graph and the task graph template corresponding to the first task graph.

Description

Task scheduling method and device Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a task scheduling method and device.
Background
Currently, to increase the computing power of a computing device, the computing device may generally employ a multi-core heterogeneous system architecture that includes a multi-core central processing unit (central processing unit, CPU) for performing general-purpose computing tasks and an accelerator for performing dedicated computing tasks.
In multi-core heterogeneous systems, for more complex task graphs, there may be interdependencies between multiple tasks (e.g., execution of one task depends on the computation result of another task). To alleviate the difficulty of parallel programming, scheduling software may determine dependencies between tasks based on inputs and outputs and schedule ready tasks for execution on the CPU cores or accelerators of the executable tasks. However, in the multi-core heterogeneous system, for a plurality of task graphs with the same dependency relationship, each time a task graph is executed, the CPU needs to initialize the dependency relationship corresponding to the task graph to the scheduling device again, and then the scheduling device maintains the dependency relationship among a plurality of tasks in the task graph, so as to ensure that the computation is normally performed. The problem that when the scheduling device schedules the task graph and analyzes the dependency relationship of the task graph, the initialization time is long, the interaction redundancy between the CPU and the scheduling device is caused, and the calculation efficiency is affected is caused.
Disclosure of Invention
The embodiment of the application provides a task scheduling method and device, which can save the time of loading a dependency relationship into a task scheduling device and improve the calculation efficiency.
In order to achieve the above purpose, the embodiment of the application adopts the following technical scheme:
in a first aspect of the embodiments of the present application, a task scheduling device is provided, where the task scheduling device includes one or more task graph templates, each task graph template is configured to indicate a dependency relationship between a plurality of tasks included in the task graph template, and a processing manner of each task; task scheduling means for: acquiring task information of a first task graph; the task information of the first task graph comprises input data of the first task graph and task graph template identifications corresponding to the first task graph; determining a task graph template corresponding to the first task graph from one or more task graph templates included in the task scheduling device based on the task graph template identification corresponding to the first task graph; and scheduling the first task graph based on the input data of the first task graph and the task graph template corresponding to the first task graph.
Optionally, the processing modes and the task graph templates corresponding to the task graphs with the same dependency relationship are the same. For example, task graph 1 is used to calculate (1+2) ×4-3, task graph 2 is used to calculate (5+6) ×8-7, the dependency relationship between the tasks in task graph 1 is the same as the dependency relationship between the tasks in task graph 2, and the manner of calculating the tasks in task graph 1 is the same as the manner of calculating the tasks in task graph 2. Therefore, the task graph templates of the task graph 1 and the task graph 2 are the same, and are (a+b) ×c-d. Task graph 1 differs from task graph 2 in that the input data of the two task graphs are different.
Optionally, the tasks included in the task graph template may be executed in series or in parallel. For example, the task graph template may include a plurality of tasks, some of which may be executed serially when executed and some of which may be executed in parallel when executed.
Based on the scheme, the task scheduling device supports the embedding of the static task graph template in the task scheduling device, so that when a plurality of task graphs with the same dependency relationship and processing mode as those of the task graph template are executed later, the dependency relationship and processing mode are not required to be initialized into the task scheduling device again, and therefore, when the plurality of task graphs are executed later, only dynamic input data and identifiers of the task graph templates to be used are acquired, and the time for loading the dependency relationship into the task scheduling device can be saved. That is, by creating the task graph template once, the task graph with the same processing mode and dependency relationship as those of the task graph template can be repeatedly executed, and when the task graphs are executed subsequently, the static processing mode and dependency relationship are not required to be loaded into the task scheduling device again, so that the time for loading the static processing mode and dependency relationship into the task scheduling device can be saved, and the calculation efficiency can be improved.
In a possible implementation manner, the task scheduling device is further configured to obtain one or more task graph templates; each task graph template comprises a task information table, a first synchronization information table and a second synchronization information table; the task information table comprises a plurality of task identifiers and processing modes corresponding to the task identifiers; the first synchronization information table comprises a plurality of events and identifications of one or more barrier barrers corresponding to each event, the plurality of events are in one-to-one correspondence with a plurality of tasks, and each event is used for indicating that the corresponding task is executed; the second synchronization information table comprises a plurality of barrers, one or more triggering conditions corresponding to each barrer, and a task identifier to be executed when each barrer meets the corresponding triggering conditions.
Optionally, the task graph template acquired by the task scheduling device may be sent to the task scheduling device by the CPU, or may be preset in the task scheduling device, which is not limited by the embodiment of the present application.
Based on the scheme, the data structure of each task graph template can be described by adopting three tables, namely a task information table, a first synchronization information table and a second synchronization information table, and the task scheduling device acquires the task graph template, namely the task information table, the first synchronization information table and the second synchronization information table corresponding to the task graph template, so that the task scheduling device can schedule based on the three tables when subsequently scheduling the task graph, and the calculation of a plurality of tasks in the task graph template can be normally performed. And when the task scheduling device schedules a plurality of task graphs with the same processing mode and dependency relationship as the task graph template in the follow-up scheduling, the task scheduling device can schedule tasks directly based on the task graph template without loading the processing mode and the dependency relationship into the task scheduling device again, so that the time for loading the static processing mode and the dependency relationship into the task scheduling device can be saved, and the calculation efficiency is improved.
In another possible implementation manner, the task scheduling device includes a first interface, a task graph control circuit, a task state machine, and a second interface that are coupled and connected; the task graph control circuit is used for acquiring a task graph template and task information of a first task graph through the first interface; the task state machine is used for acquiring a first task corresponding to the first task identifier from the task map control circuit according to the first task identifier, input data of the first task map and the task information table corresponding to the first task map when the value of the first barrier meets the first trigger condition corresponding to the value of the first barrier based on the second synchronous information table, and sending the first task to the computing unit through the second interface; the first task identifier is an identifier of a task to be executed when the value of the first barrier meets a first trigger condition. .
Optionally, when the second synchronization information table includes a trigger condition of the first task (for example, a trigger condition b1=0 corresponding to the first task T1), for the first task in the first task graph template, the task graph control circuit may send a first task trigger signal to the task state machine, where the first task trigger signal is used to instruct the task state machine to query the second synchronization information table, and determine whether the value of b1 meets the trigger condition corresponding to the first task. When the task state machine determines that the value of b1 meets the triggering condition corresponding to the first task, the task state machine acquires the task content of the first task from the task graph control circuit according to the first task identification, the input data of the first task graph and the task information table corresponding to the first task graph, and sends the task content of the first task to the computing unit through the second interface.
Optionally, when the second synchronization information table does not include the triggering condition of the first task, for the first task in the first task graph template, the task graph control circuit may also send a first task execution signal to the task state machine, and the task state machine obtains, according to the first task execution signal, task content of the first task from the task graph control circuit, and sends the task content of the first task to the computing unit through the second interface.
Optionally, for other tasks after the first task, the task state machine may query the second synchronization information table when the value of the barrier is updated, to determine whether the value of the barrier meets the trigger condition corresponding to the value of the barrier.
Based on the scheme, the task state machine determines whether the value of each barrier meets the corresponding trigger condition or not based on the second synchronous information table, determines the task identification to be executed when the value of each barrier meets the trigger condition, acquires the task content of the task to be executed from the task graph control circuit, and sends the task to the computing unit. Therefore, each task in the task graph can be scheduled according to the corresponding task graph template when the task scheduling device schedules the task, and the task scheduling device can acquire the processing mode according to the task graph template in the task scheduling device when the task is scheduled, so that the task processing mode does not need to be reloaded into the task scheduling device, the time for loading the static processing mode into the task scheduling device can be saved, and the calculation efficiency is improved.
In yet another possible implementation manner, when the first tasks are multiple, the computing unit performs the multiple first tasks in parallel.
Based on the scheme, the plurality of tasks with the same triggering conditions are executed in parallel, so that the dependency relationship among the plurality of tasks can be maintained, and the calculation efficiency is improved.
In yet another possible implementation manner, the task scheduling device further includes an event parsing circuit and a synchronous counting circuit, which are coupled and connected, wherein the synchronous counting circuit includes a plurality of counters, and each counter corresponds to one barrer; the event analysis circuit is used for receiving a first event through the first interface under the condition that the execution of the first task is completed, determining the identification of the second barrier corresponding to the first event based on the first synchronization information table, and informing the synchronization counting circuit to modify the value of the counter corresponding to the second barrier; wherein the first event is used for indicating that the execution of the first task is completed; and the synchronous counting circuit is used for modifying the value of the counter corresponding to the second barrer.
Optionally, when the synchronous counting circuit modifies the value of the counter corresponding to the second barrier, the value of the counter corresponding to the second barrier may be increased by one, the value of the counter corresponding to the second barrier may be decreased by one, and the value of the counter corresponding to the second barrier may be added or subtracted by other values. In practice, when the synchronous counting circuit modifies the counter value corresponding to the barrer, the counter value is increased (for example, increased by one) or decreased (for example, decreased by one), which is related to the initial value of the counter.
Illustratively, after the first task execution is completed, the first interface parses the first event by sending a first event to the task scheduling device indicating that the first task execution is completed, and sends the first event to the event parsing circuit. The event analysis circuit receives the first event, queries the first synchronization information table, determines the identifier of the second barrier corresponding to the first event, and notifies the synchronization counting circuit to modify the value of the counter corresponding to the second barrier. And after the synchronous counting circuit modifies the value of the counter corresponding to the second barrier, the task state machine is informed of the identification of the second barrier. The task state machine judges whether the value of the second barrier meets the corresponding trigger condition based on the second synchronous information table, and acquires the next task to be executed from the task graph control circuit and sends the task to the computing unit under the condition that the value of the second barrier meets the corresponding trigger condition. And until all tasks in the first task graph are executed.
Based on the scheme, the dependency relationship of a plurality of tasks in the task graph can be correctly maintained through the first synchronization information table and the second synchronization information table, so that normal execution of each task is ensured. According to the scheme, when the dependency relationship of a plurality of tasks in the task graph is maintained, the dependency relationship is not required to be loaded into the task scheduling device again, and the first synchronization information table and the second synchronization information table in the task scheduling device are directly used, so that the time for loading the static dependency relationship into the task scheduling device can be saved, and the calculation efficiency is improved.
In yet another possible implementation manner, the task graph control circuit is further configured to modify or delete a task graph template.
Based on the scheme, the task scheduling device can modify, delete and add a plurality of task graph templates stored in the task scheduling device, so that the task graph templates in the task scheduling device are more flexible and can be suitable for more scenes.
In yet another possible implementation, the task graph template includes a first task and a second task, where the first task and the second task multiplex the same barrier.
Alternatively, multiplexing multiple tasks with the same barrier means that the triggering of the multiple tasks may depend on the same barrier. That is, in the second synchronization information table, when a plurality of tasks multiplex the same carrier, and the value of the carrier satisfies one or more trigger conditions, the corresponding task to be executed is the plurality of tasks.
Based on the scheme, a plurality of tasks in the task graph template can multiplex the same barrer. Since the value of one barrier can be maintained by one counter, when a plurality of tasks multiplex the same barrier, the number of counters can be reduced, thereby reducing the chip area.
In yet another possible implementation, the first task and the second task satisfy at least one of: neither the first task nor the second task has a parent node; alternatively, the first task and the second task have the same parent node; or the first task is a unique father node of the second task; or, the root nodes of the first task and the second task multiplex the same barrer, and the first task is the unique father node of the second task.
It should be noted that if any one or more of the above four cases are satisfied by a plurality of tasks in one task graph template, the plurality of tasks may multiplex the same barrer. In practical application, the situation that multiple tasks multiplex the same barrier is not limited to the four situations, and specifically, whether the multiple tasks can multiplex the same barrier can be determined according to the dependency relationship of the multiple tasks in the task graph template.
Based on the scheme, when the same barrer is multiplexed by a plurality of tasks, the number of counters can be reduced, so that the chip area is reduced.
In yet another possible implementation, one barrer corresponds to a plurality of trigger conditions, including a first trigger condition and other trigger conditions, the first trigger condition having a trigger sequence earlier than the other trigger conditions; the second synchronous information table comprises a first sub-information table and a second sub-information table, the first sub-information table comprises a plurality of barrers, each barrer corresponds to a first trigger condition, and the identifier of a task to be executed when each barrer meets the corresponding first trigger condition; the second sub-information table comprises a plurality of barrers, other trigger conditions corresponding to each barrer, and identifiers of tasks to be executed when each barrer meets the corresponding other trigger conditions.
Optionally, the first sub-information table is stored in a cache of the task scheduling device, and the second sub-information table is stored in the memory.
Based on the scheme, the first trigger condition corresponding to each barreer is stored in the DDR, and other trigger conditions corresponding to each barreer are stored in the DDR, so that the chip area of the task scheduling device can be reduced.
In another possible implementation manner, in the case that the number of other trigger conditions corresponding to the barrer is multiple, the multiple other trigger conditions corresponding to the barrer in the second sub-information table are sequentially arranged according to the trigger sequence; and the task graph control circuit is also used for reading the next other trigger condition from the memory according to the trigger sequence of a plurality of other trigger conditions corresponding to the barrer in the second sub-information table when the value of the barrer meets the first trigger condition corresponding to the barrer, and replacing the first trigger condition corresponding to the barrer with the other trigger condition.
Optionally, the triggering sequence of the next other triggering condition is next to the first triggering condition in the plurality of triggering conditions corresponding to the barrer. I.e. the next other trigger condition is a trigger condition that will be triggered by the first barrer after the value of the first barrer has fulfilled the first trigger condition. Such as a second trigger condition.
Optionally, the task control circuit replaces the first trigger condition corresponding to the barrer stored in the cache with the second trigger condition corresponding to the barrer. Therefore, the task graph control circuit is further configured to, when the value of the barrier satisfies the second trigger condition in the cache, read a third trigger condition from the memory according to the trigger sequence of the plurality of other trigger conditions corresponding to the barrier in the second sub-information table, and replace the second trigger condition in the cache with the third trigger condition. And so on until all the triggering conditions corresponding to the same barrer are traversed.
Based on the scheme, the first trigger condition is stored in the cache of the task scheduling device, other trigger conditions are stored in the DDR, and the trigger conditions in the cache can be replaced dynamically, so that the trigger conditions can be loaded into the cache in sequence.
In a second aspect of the embodiment of the present application, a task scheduling method is provided, which is applied to a task scheduling device, where the task scheduling device includes one or more task graph templates, each task graph template is used to indicate a dependency relationship between a plurality of tasks included in the task graph template, and a processing manner of each task; the method comprises the following steps: the task scheduling device acquires task information of a first task graph; the task information of the first task graph comprises input data of the first task graph and task graph template identifications corresponding to the first task graph. The task scheduling device determines a task graph template corresponding to the first task graph from one or more task graph templates included in the task scheduling device based on the task graph template identification corresponding to the first task graph. The task scheduling device schedules the first task graph based on the input data of the first task graph and a task graph template corresponding to the first task graph.
In one possible implementation manner, the method further includes: the task scheduling device acquires one or more task graph templates; each task graph template comprises a task information table, a first synchronization information table and a second synchronization information table; the task information table comprises a plurality of task identifiers and processing modes corresponding to the task identifiers; the first synchronization information table comprises a plurality of events and identifications of one or more barrier barrers corresponding to each event, the plurality of events are in one-to-one correspondence with a plurality of tasks, and each event is used for indicating that the corresponding task is executed; the second synchronization information table comprises a plurality of barrers, one or more triggering conditions corresponding to each barrer, and a task identifier to be executed when each barrer meets the corresponding triggering conditions.
In another possible implementation manner, the task scheduling device includes a first interface, a task graph control circuit, a task state machine, and a second interface that are coupled and connected; the task scheduling device obtains task information of a task graph and a first task graph, and comprises the following steps: the task graph control circuit acquires a task graph and task information of the first task graph through a first interface; the task scheduling device schedules the first task graph based on the input data of the first task graph and a task graph template corresponding to the first task graph, and comprises: the task state machine obtains a first task corresponding to the first task identifier from the task map control circuit according to the first task identifier, input data of the first task map and the task information table corresponding to the first task map when determining that the value of the first barrier meets the first trigger condition corresponding to the first task identifier based on the second synchronous information table, and sends the first task to the computing unit through the second interface; the first task identifier is an identifier of a task to be executed when the value of the first barrier meets a first trigger condition.
In yet another possible implementation manner, when the first tasks are multiple, the computing unit performs the multiple first tasks in parallel.
In yet another possible implementation manner, the task scheduling device further includes an event parsing circuit and a synchronous counting circuit coupled and connected, where the synchronous counting circuit includes a plurality of counters, and each counter corresponds to one barrer; the task scheduling device schedules the first task graph based on the input data of the first task graph and a task graph template corresponding to the first task graph, and further comprises: the event analysis circuit receives a first event through a first interface under the condition that the execution of the first task is completed, determines the identification of a second barrier corresponding to the first event based on a first synchronization information table, and informs a synchronization counting circuit to modify the value of a counter corresponding to the second barrier; wherein the first event is used for indicating that the execution of the first task is completed; the synchronous counting circuit modifies the value of the counter corresponding to the second barrer.
In yet another possible implementation manner, the method further includes: the task graph control circuitry modifies or deletes the task graph templates.
In yet another possible implementation manner, the task graph template includes a first task and a second task, where the first task and the second task multiplex the same barrier.
In yet another possible implementation, the first task and the second task satisfy at least one of: neither the first task nor the second task has a parent node; alternatively, the first task and the second task have the same parent node; or the first task is a unique father node of the second task; or, the root nodes of the first task and the second task multiplex the same barrer, and the first task is the unique father node of the second task.
In yet another possible implementation, one barrer corresponds to a plurality of trigger conditions, where the plurality of trigger conditions includes a first trigger condition and other trigger conditions, and a trigger sequence of the first trigger condition is earlier than a trigger sequence of the other trigger conditions; the second synchronous information table comprises a first sub-information table and a second sub-information table, the first sub-information table comprises a plurality of barrers, each barrer corresponds to a first trigger condition, and the identifier of a task to be executed when each barrer meets the corresponding first trigger condition; the second sub-information table comprises a plurality of barrers, other trigger conditions corresponding to each barrer, and identifiers of tasks to be executed when each barrer meets the corresponding other trigger conditions.
In yet another possible implementation manner, the first sub-information table is stored in a cache of the task scheduling device, and the second sub-information table is stored in a memory.
In another possible implementation manner, in the case that the number of other trigger conditions corresponding to the barrer is multiple, the multiple other trigger conditions corresponding to the barrer in the second sub-information table are sequentially arranged according to the trigger sequence; the method further comprises the following steps: when the value of the barrer meets the first trigger condition corresponding to the barrer, the task graph control circuit reads the next other trigger condition from the memory according to the trigger sequence of a plurality of other trigger conditions corresponding to the barrer in the second sub information table, and replaces the first trigger condition corresponding to the barrer with the other trigger condition.
The description of the effects of the second aspect and the various implementation manners of the second aspect may refer to the description of the corresponding effects of the first aspect, which is not repeated herein.
A third aspect of an embodiment of the present application provides a computing device, where the computing device includes a central processing unit CPU, and a task scheduling device as set forth in the first aspect, where the CPU is configured to send the task graph template to the task scheduling device.
In one possible implementation, the computing device further includes an enhanced short message service EMS and a computing unit, where the EMS is configured to receive a task to be performed from the task scheduling device, and allocate the task to be performed to the computing unit, and the computing unit is configured to perform the task to be performed.
Drawings
Fig. 1 is a schematic structural diagram of a scheduling device according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of another scheduling apparatus according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a task graph template according to an embodiment of the present application;
FIG. 4 is a schematic diagram of another task graph template according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a task graph template according to another embodiment of the present application;
fig. 6 is a schematic structural diagram of a task graph template in multiplexing a task by using a task multiplexing barrier according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a structure of multiplexing-free barrers and multiplexing-free barrers in a task graph template according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a task scheduling device according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a computing device according to an embodiment of the present application;
Fig. 10 is a flow chart of a task scheduling method provided in the application embodiment;
fig. 11 is a flowchart of another task scheduling method according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. In the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, a and b, a and c, b and c, or a and b and c, wherein a, b and c may be single or plural. In addition, in order to clearly describe the technical solution of the embodiments of the present application, in the embodiments of the present application, the words "first", "second", etc. are used to distinguish the same item or similar items having substantially the same function and effect, and those skilled in the art will understand that the words "first", "second", etc. do not limit the number and execution order. For example, the "first" in the first sub-information table and the "second" in the second sub-information table in the embodiment of the present application are only used to distinguish different sub-information tables. The first, second, etc. descriptions in the embodiments of the present application are only used for illustrating and distinguishing the description objects, and no order is used, nor is the number of the devices in the embodiments of the present application limited, and no limitation on the embodiments of the present application should be construed.
In the present application, the words "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
Currently, in the context of artificial intelligence, high performance computing (high performance computing, HPC), etc., to increase the computing power of a computing device, the computing architecture of a computing device may generally be a heterogeneous computing hardware architecture that includes a central processor (central processing unit, CPU), and one or more accelerators. The CPU is used for executing general-purpose computing tasks, and the accelerator is used for executing special-purpose computing tasks. The specialized computing tasks may include artificial intelligence (artificial intelligence, AI) processing, such as artificial neural networks, machine Learning (ML) training, ML optimization/learning, inference, classification, etc., visual data processing, network data processing, object detection, rule analysis, content processing operations, etc. The accelerator may be one or more of a neural network processor (neural-network process unit, NPU), which may include a graphics processor (graphics processing unit, GPU), a digital signal processor (digital signal processor, DSP), a System On Chip (SOC), a Field-programmable gate array (Field-Programmable Gate Array, FPGA), an application specific integrated circuit (application specific integrated circuit, ASIC), or the like. The accelerator may also be a task scheduling device in the following embodiments of the present application.
In heterogeneous systems, for more complex task graphs, there may be interdependencies between multiple tasks (e.g., execution of one task depends on the computation of another task). To alleviate the difficulty of parallel programming, scheduling software may determine dependencies between tasks based on inputs and outputs and schedule ready tasks for execution on the CPU cores or accelerators of the executable tasks.
For example, as shown in fig. 1, a scheduling apparatus, which may be called a Task master (Task Maestro), may include the following steps:
1. a Master processor Core (Master Core) adds a Task descriptor (Task Description) to a Task Master (Task Maestro);
2. a Task master (Task Maestro) stores Task descriptors to a Task Pool (Task Pool);
3. a checking module (Check tips) checks the dependency relationship among the tasks, dispatches the ready tasks to a computing unit (workbench Core) for execution through a dispatching module, and modifies a dependency Table (dependency Table);
4. the Check tips module (Check tips) continues to Check if a task is ready and continues to schedule the ready task for execution by the computing unit (Worker Core) until all task execution is complete.
However, when the scheduling device shown in fig. 1 schedules a task, for a plurality of task graphs with the same processing manner and dependency relationship, each time the task graph is executed, the Master Core of the main processor needs to reinitialize the task descriptor (the task descriptor includes the processing manner of the task and the dependency relationship of the task) corresponding to the task graph into the scheduling device, and then the scheduling device maintains the dependency relationship among the plurality of tasks in the task graph so as to ensure that the calculation is normally performed. Therefore, the scheduling device shown in fig. 1 does not support the embedding of static dependency relationships and processing modes in the scheduling device, which causes the problems that when the scheduling device analyzes the dependency relationships of the task graph, the initialization time is long, the interaction redundancy between the Master Core and the scheduling device affects the calculation efficiency.
As another example, a task scheduler (task scheduling graph, TSG) device, as shown in fig. 2, includes a task library, an event counter, and a refresh command module. The method for scheduling tasks by the TSG device can comprise the following steps:
1. a main processor Core (Master Core) initializes Task descriptors (Task descriptions), dependency tables, and event Counter counters;
Task scheduling to a computing unit (Worker Core) for which the tsg device schedules ready;
3. generating an event after the computing unit (workbench Core) finishes executing the task, modifying an event Counter by the TSG device through a refresh command module, generating a ready task after the event Counter meets a trigger condition, and continuously scheduling the ready task to the computing unit (workbench Core) by the TSG device;
4. until all tasks are performed, the TSG device notifies the Master processor Core (Master Core) that reinitializes the new task.
However, when the scheduling device shown in fig. 2 schedules a task, for a plurality of task graphs with the same processing manner and dependency relationship, the Master Core needs to reinitialize the task descriptor corresponding to the task graph into the TSG device every time the task graph is executed, and then the TSG device maintains the dependency relationship among the plurality of tasks in the task graph through the event counter and the refresh command, so as to ensure that the calculation is normally performed. Therefore, the scheduling apparatus shown in fig. 2 does not support the embedding of static dependency relationships and processing manners in the scheduling apparatus, and each task graph needs to be reinitialized to the TSG apparatus, which causes a problem that the initialization time is long, and the computing efficiency is affected. In addition, each task is provided with one counter in the scheme, so that the number of the counters is large, and the chip area of the TSG device is large.
In order to alleviate the problem that the initialization time is long and the calculation efficiency is affected because the CPU needs to initialize the processing mode and the dependency relationship corresponding to the task graph into the scheduling device again each time when the scheduling device schedules a plurality of task graphs with the same processing mode and dependency relationship, the embodiment of the application provides the task scheduling device. That is, the task scheduling device provided by the embodiment of the application supports the embedding of the static task graph template, so that the time for loading the static processing mode and the dependency relationship into the task scheduling device can be saved, and the calculation efficiency is improved.
The embodiment of the application provides a task scheduling device which can be applied to the fields of communication processors, HPC, AI calculation and the like. The task scheduler may be a chip in a communication device or a computing device.
And the task scheduling device is used for acquiring one or more task graph templates. Optionally, the task graph template acquired by the task scheduling device may be sent to the task scheduling device by the CPU, or may be preset in the task scheduling device, which is not limited by the embodiment of the present application.
Each task graph template in the task scheduling device is used for indicating the dependency relationship among a plurality of tasks included in the task graph template and the processing mode of each task.
Illustratively, among a plurality of tasks included in the task graph template, if execution of one task depends on a calculation result of another task, there is a dependency relationship between the two tasks. The processing mode of each task may include a computing mode, a data copying mode, a data moving mode, and the like of each task.
For example, taking a task graph template including task 1, task 2, and task 3, where the task graph template is used to calculate (a+b) × (c-d), the task 1 is processed in a manner of a+b, the task 2 is processed in a manner of c-d, and the task 3 is processed in a manner of multiplying the calculation result e of the task 1 by the calculation result f of the task 2. Task 3 can begin execution after task 1 and task 2 in the task graph template are completed. That is, the execution of task 3 depends on the calculation results of task 1 and task 2. The task graph template is used for indicating the calculation modes of the task 1, the task 2 and the task 3 and the dependency relationship among the task 1, the task 2 and the task 3.
The processing mode and the dependency relationship are the same, and the task graph templates corresponding to the task graphs are the same. For example, task graph 1 is used to calculate (1+2) ×4-3, task graph 2 is used to calculate (5+6) ×8-7, the dependency relationship between the tasks in task graph 1 is the same as the dependency relationship between the tasks in task graph 2, and the manner of calculating the tasks in task graph 1 is the same as the manner of calculating the tasks in task graph 2. Therefore, the task graph templates of the task graph 1 and the task graph 2 are the same, and are (a+b) ×c-d. Task graph 1 differs from task graph 2 in that the input data of the two task graphs are different. That is, the task scheduling device in the embodiment of the application supports the embedding of the static task graph template in the task scheduling device, so that when a plurality of task graphs corresponding to the task graph template are executed subsequently, the dependency relationship and the processing mode do not need to be initialized in the task scheduling device again, therefore, only dynamic input data and the identification of the task graph template to be used are acquired when the plurality of task graphs are executed subsequently, and the time for loading the dependency relationship into the task scheduling device can be saved.
Optionally, the tasks included in the task graph template may be executed in series or in parallel. For example, in the plurality of tasks included in the task graph template, a part of tasks may be executed in series when executed, and a part of tasks may be executed in parallel when executed.
The task graph templates in the task scheduling device in the embodiment of the application can be used for any scene of multi-task calculation, and the task graph templates in three scenes are briefly introduced below.
Fig. 3 is a task graph template for L2 scheduling in wireless communication, and as shown in fig. 3, in a 5G commercial downlink shared channel (downlink shared channel, DL-SCH) in a wireless communication system, the scheduling procedure can be abstracted to the task graph template shown in fig. 3. Wherein, T0 is a transmission time interval (transport time interval, TTI) interrupt timing trigger task, T1 is a cell-level scheduling task, T2 is a space-domain scheduling task, T3 is a frequency-domain scheduling task, and T4 is a user-level post-processing task. The Master Core of the main processor can load the task graph template shown in fig. 3 into the task scheduling device, and the task scheduling device of the application can complete analysis and scheduling of the dependency relationship of a plurality of tasks, thereby ensuring the parallelism of task execution.
FIG. 4 is a task graph template of matrix computation in HPC, which may be abstracted to the task graph template shown in FIG. 4, taking the matrix computation as Ai= (aI + bIj) (cIj+dIj) as an example, T0 is an addition operation,
t1 is a multiplication operation. The Master Core abstracts the task graph template in FIG. 4 into two T0 and one T1, and loads the two T0 and one T1 into the task scheduling device, and the task scheduling device completes the analysis and scheduling of the dependency relationship of a plurality of tasks, thereby ensuring the parallelism of task execution. It will be appreciated that the matrix computation in the HPC may include various matrix computations, such as matrix LU decomposition, in addition to multiplication and addition, and fig. 4 is only illustrated by way of example with the task graph template including addition and multiplication.
Fig. 5 is a task graph template of a convolutional neural network (convolutional neural networks, CNN) in an AI computing scenario, and each computing step of the CNN shown in fig. 5 (a) may be abstracted into a task graph template shown in fig. 5 (b). Where T1 is the preprocessing, T2, T3, T4 are direct memory access (direct memory access, DMA) tasks, T5 is the VADD, T6, T7 is the convolutional CONV, T8 is the pooled Pool, and T9 is the DMA. The Master Core of the main processor loads the task graph template shown in fig. 5 into the task scheduling device, and the task scheduling device of the application completes the analysis and scheduling of the dependency relationship of a plurality of tasks, thereby ensuring the parallelism of task execution.
The task graph templates in the task scheduling device are not limited to the task graph templates of the three scenes, and any scene of multi-task calculation can abstract the dependency relationship and the processing mode among a plurality of tasks into the task graph templates.
Optionally, each task graph template may include a task information table, a first synchronization information table, and a second synchronization information table. The data structure of each task graph template in the embodiment of the application can be described by adopting three tables, namely a task information table, a first synchronization information table and a second synchronization information table. These three tables are described below, respectively.
The task information table comprises a plurality of task identifiers and processing modes corresponding to the task identifiers. Alternatively, the processing mode corresponding to each task identifier may be a specific computing mode of each task.
Illustratively, taking the task graph template including N tasks as an example, the task information table may be as shown in table 1.
TABLE 1
The task scheduling device can acquire the processing mode of each task based on the task information table shown in table 1. The task information table shown in table 1 is used to indicate the static processing mode of the task graph template.
The task types 0 to task type n in table 1 represent task identifiers, each task identifier corresponds to a function relationship function in the task information table, the task info represents a pointer position of a specific variable in the function, and the task scheduling device can obtain a specific calculation mode corresponding to each task identifier based on the task info and the function in table 1.
For example, taking a task corresponding to the TaskType0 as a+b as an example, the function0 is an addition, the TaskInfo0 is used for indicating pointer positions of the variable a and the variable b, and according to the task identifier TaskType0, table look-up 1 can know that the task corresponding to the TaskType0 is a+b. It can be understood that the specific values of the variable a and the variable b are dynamic data, and the dynamic data can be obtained in real time according to different task graphs. The processing mode of each task stored in the task information table is static data, so that the processing mode of the task acquired by the task information table can be adopted by a plurality of task graphs, and the numerical values of dynamic data of different task graphs are different.
Optionally, the task information table may further include information such as task priority, queue number, task calculation amount, affinity TAG (the same affinity TAG may be sent to the same calculation unit for execution), and the like. After the load balancing module receives the tasks scheduled by the task scheduling device, operations such as load balancing, priority scheduling, affinity scheduling and the like can be performed based on the information such as task priority, queue number, task calculated amount, affinity TAG and the like.
The first synchronization information table comprises a plurality of events and identifications of one or more barrier barrers corresponding to each event, the plurality of events are in one-to-one correspondence with a plurality of tasks, and each event is used for indicating that the corresponding task is executed.
The barrer is used for coordinating a plurality of tasks to work in parallel, and the next task can be continuously executed only when the barrer meets the trigger condition. Each barrer can correspond to a counter, and the value of the barrer is the value of the counter corresponding to the barrer.
For example, taking a task graph template including N tasks as an example, each task may correspond to an Event, each Event is used to indicate that the corresponding task is executed, and the first synchronization information may be as shown in table 2.
TABLE 2
When each task is executed, the task scheduling device can acquire the identifier of the corresponding barrier of each event based on the first synchronization information table, and modify the value of the counter corresponding to the barrier based on the identifier of the corresponding barrier of the event. Optionally, when the task scheduling device modifies the value of the counter corresponding to the barrer, the value of the counter corresponding to the barrer may be increased by one, or the value of the counter corresponding to the second barrer may be decreased by one. In practical application, when the task scheduling device modifies the value of the counter corresponding to the barrer, the task scheduling device adds one or subtracts one to the value of the counter, and the task scheduling device is related to the initial value of the counter. In the following embodiments, the initial values of the barrers are all 0, and the task scheduling device modifies the value of the counter corresponding to the barrer once by taking one as an example to describe the task scheduling device.
Alternatively, an event may correspond to one or more barrers. As shown in table 2, when an event corresponds to a plurality of barrers, after the task corresponding to the event is executed, the values of the plurality of barrers corresponding to the event are modified.
Alternatively, multiple events may correspond to the same barrer. I.e. the value of the barrer is modified after each of the tasks corresponding to the events is executed. For example, it is indicated by Event0, event1 and Event2 that Task0, task1 and Task2 are executed as examples, and as shown in table 2, since the flags of the Task0, event1 and Event2 corresponding to the Task0, event1 and Event2 each include 0x1, the Task scheduling device increases the value of the Task 0x1 by one after the Task0 is executed, and increases the value of the Task 0x1 by one again after the Task1 is executed, and increases the value of the Task 0x1 by one again after the Task2 is executed.
The second synchronization information table comprises a plurality of barrers, trigger conditions corresponding to each barrer, and task identification to be executed when each barrer meets the trigger conditions corresponding to the barrers. In the second synchronization information table, each barrier may correspond to one or more trigger conditions, and the task to be executed when each barrier satisfies the corresponding one of the trigger conditions may be one or more.
Optionally, the second synchronization information table may further include a valid bit of each barrier, and the valid bit of each barrier is used to indicate whether the barrier is valid. The second synchronization information table may further include the number of executions corresponding to each task identifier to be executed.
Illustratively, taking one barrer for two trigger conditions as an example, the second synchronization information may be as shown in table 3.
TABLE 3 Table 3
Under the condition that the value of the barrier is updated, the task scheduling device queries the second synchronous information table based on the identifier of the barrier, and determines whether the value of the barrier meets the corresponding trigger condition. When the value of the barrier meets the corresponding trigger condition, the task scheduling device determines a task identifier to be executed based on the second synchronous information table, acquires task content corresponding to the task identifier based on the task information table (table 1), and then sends the task to the computing unit.
As shown in table 3, one barrer may correspond to a plurality of trigger conditions, and when the value of the barrer satisfies different trigger conditions, the task to be executed is different. When one barrer corresponds to a plurality of trigger conditions, the plurality of trigger conditions in the second synchronization information table may be sequentially arranged according to the trigger sequence. For example, the trigger order of trigger condition trigger_condition0 in table 3 is earlier than the trigger order of trigger_condition 1.
It can be determined from table 3 above whether the value of each barrer satisfies its corresponding trigger condition. In case the value of the barrer satisfies its corresponding trigger condition, the identity of the next task to be performed may be obtained according to table 3.
Alternatively, multiple tasks may multiplex the same barrer. Multiplexing multiple tasks with the same barrier means that the triggering of the multiple tasks can depend on the same barrier. That is, in the second synchronization information table, when a plurality of tasks multiplex the same carrier, and the value of the carrier satisfies one or more trigger conditions, the corresponding task to be executed is the plurality of tasks. Since the value of one barrier can be maintained by one counter, when a plurality of tasks multiplex the same barrier, the number of counters can be reduced, thereby reducing the chip area.
Illustratively, taking the example that the task graph template includes a first task and a second task, the first task and the second task may multiplex the same barrer. Taking the first task and the second task multiplexing barrier0 as an example, in the second synchronization information table, the first task may be a task to be executed corresponding to the barrier0 meeting the first trigger condition, and the second task may be a task to be executed corresponding to the barrier0 meeting the second trigger condition, where the first trigger condition and the second trigger condition may be the same or different. That is, the trigger conditions for multiplexing a plurality of tasks of the same barrer may be the same or different. When the first trigger condition and the second trigger condition are the same, the first task and the second task are two tasks executed in parallel.
In the embodiment of the present application, when the first task and the second task meet at least one of the following four conditions, the first task and the second task may multiplex the same barrier.
In case one, the first task and the second task have no parent node.
Optionally, when the first task and the second task in the task graph template are root nodes, the first task and the second task may multiplex the same barrer.
For example, taking a task graph template including tasks T1 through T4, task T1, task T2, and task T3 as parent nodes of task T4 as an example. As shown in (a) of fig. 6, since the task T1, the task T2, and the task T3 are root nodes, i.e., the task T1, the task T2, and the task T3 have no parent node, the task T1, the task T2, and the task T3 can multiplex the same barrier. As shown in (a) of fig. 6, the trigger conditions of the task 1, the task 2 and the task 3 may be the same, taking the task 1, the task 2 and the task 3 as a multiplexing barrier0, and the trigger conditions of the task 1, the task 2 and the task 3 are barrier 0=0 as an example, when the barrier0 is 0, the task 1, the task 2 and the task 3 are triggered, and the computing unit executes the task T1, the task T2 and the task T3 in parallel.
The second case, the first task and the second task have the same parent node.
For example, with the task graph template including tasks T1 to T5, task T1, task T2, and task T3 as parent nodes of task T4 and task T5, task T4 and task T5 may be executed in parallel as an example. As shown in (b) of fig. 6, since the parent nodes of the task T4 and the task T5 are the tasks T1 to T3, the parent nodes of the task T4 and the task T5 are the same, and thus the task T4 and the task T5 can multiplex the same barrer. As shown in (b) of fig. 6, the trigger conditions of the task T4 and the task T5 may be the same, taking the task T4 and the task T5 multiplexed with the task T4 and the task T5 as a barrier 0=3 as an example, when the barrier0 is 3, the task T4 and the task T5 are triggered, and the computing unit executes the task T4 and the task T5 in parallel.
As can be seen from the above case one and case two, multiple tasks with the same trigger conditions in the task graph template can multiplex the same barrer.
And in the third case, the first task is the unique father node of the second task.
For example, taking a task graph template including tasks T1 through T5, task T1, task T2, and task T3 as parent nodes of task T4, task T4 as parent node of task T5 as an example. As shown in fig. 6 (c), since the task T4 is the only parent node of the task T5, the task T4 and the task T5 can multiplex the same barrer. As shown in (c) of fig. 6, the trigger conditions of the task T4 and the task T5 are different, taking the trigger condition of the task T4 as barrer 0=3, the trigger condition of the task T5 as barrer 0=4 as an example, when the barrer 0 is 3, the task T4 is triggered, after the task T4 is executed, the value of the barrer 0 is modified to 4, the trigger condition of the task T5 is satisfied, the task T5 is triggered, and the computing unit executes the task T5.
And in the fourth case, the root nodes of the first task and the second task multiplex the same barrer, and the first task is the unique father node of the second task.
For example, taking a task graph template including tasks T1 through T5, task T1, task T2, and task T3 as parent nodes of task T4, task T4 as parent node of task T5 as an example. As shown in (d) of fig. 6, since the root nodes of the task T4 and the task T5 are the task T1, the task T2 and the task T3, the task T1, the task T2 and the task T3 can multiplex the same barrier, and the task T4 is the unique parent node of the task T5, the task T4 and the task T5 can multiplex the same barrier. As shown in (d) of fig. 6, the trigger conditions of the task T4 and the task T5 are different, taking the trigger condition of the task T4 as barrer 0=3, the trigger condition of the task T5 as barrer 0=4 as an example, when the barrer 0 is 3, the task T4 is triggered, after the task T4 is executed, the value of the barrer 0 is modified to 4, the trigger condition of the task T5 is satisfied, the task T5 is triggered, and the computing unit executes the task T5.
As can be seen from the third and fourth cases, a plurality of tasks with different trigger conditions in the task graph template may also multiplex the same barrer.
It should be noted that if any one or more of the above four cases are satisfied by a plurality of tasks in one task graph template, the plurality of tasks may multiplex the same barrer. In practical application, the situation that multiple tasks multiplex the same barrier is not limited to the four situations, and specifically, whether the multiple tasks can multiplex the same barrier can be determined according to the dependency relationship of the multiple tasks in the task graph template.
For example, the task graph template includes 8 tasks, respectively, for example, task T1 through task T8. As shown in fig. 7 (a), the trigger of the tasks T1 to T8 in the task graph template shown in fig. 7 (a) depends on the barrers 1 to 7 without multiplexing the barrers (or counters). I.e. the triggering of tasks T1 to T8, respectively, depends on different barrers. The second information table corresponding to the task graph template shown in fig. 7 (a) is shown in table 4 below without multiplexing the barrer (or counter).
TABLE 4 Table 4
For another example, the task graph template includes 8 tasks, respectively, task T1 through task T8. As shown in fig. 7 (b), in the task graph template shown in fig. 7 (b), in the case of multiplexing a task (or a counter), since task T1 is a parent node unique to task T2, task T1 is a parent node unique to task T3, the parent nodes of task T2 and task T3 are the same as task T1, and task T2 is a parent node unique to task T4, task T1, task T2, task T3, and task T4 can all multiplex the same task (or counter), and denoted as b1 shown in fig. 7 (b). Since task T7 is the only parent node for task T8, task T7 and task T8 may multiplex the same barrer, denoted b4 in fig. 7 (b). In the case of multiplexing the barrer (or counter), the second information table corresponding to the task graph template shown in (b) of fig. 7 is shown in table 5 below.
TABLE 5
As can be seen from the above tables 4 and 5, for the same task graph template, the triggers of the tasks T1 to T8 depend on the tasks barrer 1 to barrer 7 for 7 barrers without multiplexing the barrers (or counters), and the triggers of the tasks T1 to T8 depend on the tasks barrer 1 to barrer 4 for 4 barrers in multiplexing the barrers (or counters). Because the value of one barrer is maintained by one counter, for the same task graph template, the multiplexing barrer can greatly reduce the number of the counters and the chip area compared with the multiplexing barrer.
Optionally, if there are multiple tasks to be executed when one barrer meets a trigger condition corresponding to the barrer, the multiple tasks to be executed may be executed in parallel. For example, as shown in table 5, when the value of b1 is 1, the tasks to be executed are task T2 and task T3, and the computing unit may execute the task T2 and task T3 in parallel.
Optionally, the task information table, the first synchronization information table, and the second synchronization information table stored in the task scheduling device may be sent by the CPU (e.g., master core) to the task scheduling device, or may be preconfigured in the task scheduling device, which is not limited by the embodiment of the present application.
It can be appreciated that the data structure of the task graph template in the embodiment of the present application may be described by three tables, i.e., a task information table (table 1), a first synchronization information table (table 2), and a second synchronization information table (table 3). The task scheduling device may schedule a plurality of tasks based on the three tables.
Optionally, the task scheduling device may modify and delete multiple task graph templates stored in the task scheduling device, and may also add a task graph template.
The task scheduling device schedules tasks, which comprises the following steps: acquiring task information of a first task graph; determining a task graph template corresponding to the first task graph from one or more task graph templates stored in the task scheduling device based on the task graph template identification corresponding to the first task graph; and scheduling the first task graph based on the input data of the first task graph and the task graph template corresponding to the first task graph.
The task information of the first task graph comprises input data of the first task graph and task graph template identifications corresponding to the first task graph. For example, the task scheduler may receive input data of a first task graph from the CPU, master core, or accelerator and a task graph template identification corresponding to the first task graph.
Alternatively, since a plurality of task graph templates are stored in the task scheduling device, the data structure of each task graph template may be described by the above task information table (table 1), the first synchronization information table (table 2), and the second synchronization information table (table 3). The task scheduling device may determine, from the plurality of task graph templates stored in the task scheduling device, a task information table (table 1), a first synchronization information table (table 2), and a second synchronization information table (table 3) corresponding to the first task graph according to the task graph template identifier corresponding to the first task graph.
Optionally, the task scheduling device may schedule the first task graph based on a task information table, a first synchronization information table, and a second synchronization information table corresponding to the first task graph.
The task scheduling device includes a plurality of circuit modules, and the task scheduling process of the task scheduling device will be described in detail with reference to each circuit module.
For example, as shown in fig. 8, the task scheduling device provided by the embodiment of the present application may include a first interface 801, a task graph control circuit 802, a task state machine 803, and a second interface 804 that are coupled.
The task graph control circuit 802 is configured to obtain a task graph template and task information of a first task graph through the first interface 801.
Optionally, the first interface 801 is responsible for receiving and identifying commands from upstream modules and routing different commands to different modules. For example, after the first interface 801 receives the task graph template sent by the CPU, the task graph template is routed to the task graph control circuit 802. For another example, after receiving an event indicating that execution of a task is completed, which is sent by the computing unit, the first interface 801 parses the event and routes the event to the event parsing circuit.
Illustratively, the task graph control circuitry 802 may receive a task graph template from the CPU via the first interface 801. The task graph template is created only once in the task scheduling device, and can be used for executing the subsequent task graph for a plurality of times. It can be understood that the dependency relationship and the processing mode in the task graph template in the embodiment of the application are static information, and only the dynamic data of the task graph and the identification of the task graph template to be used are acquired when the task graph is executed for multiple times later. For example, taking the same task graph template corresponding to the first task graph and the second task graph as an example, the task scheduling device creates the task graph template once, and when the first task graph and the second task graph are executed subsequently, the dependency relationship and the processing mode do not need to be loaded into the task scheduling device again, and only the dynamic data of the first task graph and the second task graph and the identification of the task graph template to be used are loaded, so that the initialization time of the task graph can be saved.
The task state machine 803 is configured to, based on the second synchronization information table, when determining that the value of the first barrier meets the first trigger condition corresponding to the value, obtain, from the task map control circuit 802, a first task corresponding to the first task identifier according to the first task identifier, input data of the first task map, and the task information table corresponding to the first task map, and send the first task to the computing unit through the second interface 804. The first task identifier is an identifier of a task to be executed when the value of the first barrier meets a first trigger condition.
Optionally, the second interface 804 is responsible for interacting with a downstream module for sending the ready task to the load balancing unit or the computing unit. The first interface 801 and the second interface 804 may be two different physical interfaces, or may be the same physical interface. When the first interface 801 and the second interface 804 are the same physical interface, the physical interface may receive a command or data, or may transmit a command or data. Fig. 8 illustrates only an example in which the first interface 801 and the second interface 804 are different physical interfaces.
Illustratively, the timing of the task state machine 803 determining whether the value of the barrier satisfies its corresponding trigger condition may include the following two cases.
In the first case, when the second synchronization information table includes a trigger condition of the first task (for example, trigger condition b1=0 corresponding to the first task T1 in table 5), for the first task in the first task graph template, the task graph control circuit 802 may send a first task trigger signal to the task state machine 803, where the first task trigger signal is used to instruct the task state machine 803 to query the second synchronization information table, and determine whether the value of b1 meets the trigger condition corresponding to the first task. When the task state machine 803 determines that the value of b1 meets the trigger condition corresponding to the first task, the task state machine 803 obtains the task content of the first task from the task graph control circuit 802 according to the first task identifier, the input data of the first task graph and the task information table corresponding to the first task graph, and sends the task content of the first task to the computing unit through the second interface 804.
In the second case, for other tasks after the first task, the task state machine 803 may query the second synchronization information table when the value of the barrier is updated, to determine whether the value of the barrier meets the trigger condition corresponding to the value of the barrier.
Optionally, when the second synchronization information table does not include the trigger condition of the first task (for example, the trigger condition b1=0 corresponding to the first task T1 is not included in table 5), the task graph control circuit 802 may send, to the task state machine 803, a first task execution signal to the task state machine 803, and according to the first task execution signal, the task state machine 803 obtains the task content of the first task from the task graph control circuit 802, and sends the task content of the first task to the computing unit through the second interface 804.
Optionally, when the value of the first barrier meets the corresponding first trigger condition, if the first tasks to be executed are multiple, the computing unit executes the multiple first tasks in parallel. For example, when the first task is plural, the task state machine 803 may send the plural first tasks to the plural computing units, respectively, so that the plural computing units execute the plural first tasks in parallel.
As shown in fig. 8, the task scheduling device may further include an event parsing circuit 805 and a synchronous counting circuit 806, where the synchronous counting circuit 806 includes a plurality of counters, each of the counters corresponds to one counter, and a value of each of the counters is a value of its corresponding counter.
The event parsing circuit 805 is configured to receive a first event through the first interface 801 when the execution of the first task is completed, determine an identifier of a second barrier corresponding to the first event based on the first synchronization information table, and notify the synchronization counting circuit 806 to modify a value of a counter corresponding to the second barrier. Wherein the first event is used to indicate that the first task execution is complete.
The synchronization counting circuit 806 is configured to modify a value of a counter corresponding to the second barrer.
Alternatively, the second barrer and the first barrer may be the same barrer or different barrers.
Optionally, when the synchronous counting circuit 806 modifies the value of the counter corresponding to the second barrier, the value of the counter corresponding to the second barrier may be increased by one, the value of the counter corresponding to the second barrier may be decreased by one, or the value of the counter corresponding to the second barrier may be increased or decreased by other values. In practice, when the synchronous counter circuit 806 modifies the counter value corresponding to the barrer, the counter value is increased (for example, increased by one) or decreased (for example, decreased by one), which is related to the initial value of the counter.
For example, when the initial value of the counter corresponding to the barrer is 0, the synchronous counting circuit 806 may increment the value of the counter corresponding to the second barrer by one when modifying the value of the counter. In this implementation, when the value of the barreer increases to a certain value, the barreer satisfies its corresponding trigger condition.
For another example, when the initial value of the counter corresponding to the barrier is a non-zero value preset according to the dependency relationship between tasks, the synchronization counting circuit 806 may decrease the value of the counter corresponding to the second barrier by one when modifying the value of the counter. In this implementation, when the value of the barreer decreases to 0, the barreer satisfies its corresponding trigger condition.
The specific method for modifying the counter value corresponding to the barrer by the synchronous counter circuit 806 in the embodiment of the present application is not limited, and the following embodiment uses the initial value of the barrer as 0, and the counter value corresponding to the barrer is modified once by the synchronous counter circuit 806, which is described by adding one to the initial value.
For example, after the computing unit performs the first task, a first event indicating that the execution of the first task is completed may be sent to the task scheduling device, and the first interface 801 parses the first event and sends the first event to the event parsing circuit 805. The event parsing circuit 805 receives the first event, queries the first synchronization information table, determines an identifier of a second barrier corresponding to the first event, and notifies the synchronization counting circuit 806 to modify a value of a counter corresponding to the second barrier. After the synchronization counting circuit 806 modifies the value of the counter corresponding to the second barrier, the task state machine 803 is notified of the identity of the second barrier. The task state machine 803 determines, based on the second synchronization information table, whether the value of the second barrier satisfies the corresponding trigger condition, and in the case that the value of the second barrier satisfies the corresponding trigger condition, acquires the next task to be executed from the task graph control circuit 802, and sends the task to the computing unit. And until all tasks in the first task graph template are executed.
It can be understood that the task scheduling device provided by the embodiment of the application supports the built-in of the static task graph template, so that when the task scheduling device executes a plurality of task graphs with the same processing modes and dependency relationships, a CPU is not required to initialize the processing modes and the dependency relationships corresponding to the task graphs into the task scheduling device each time, and the initialization time of the task graphs is reduced. That is, by creating the task graph template once, the task graph with the same processing mode and dependency relationship as those of the task graph template can be repeatedly executed, and when the task graphs are executed subsequently, the static processing mode and dependency relationship are not required to be loaded into the task scheduling device again, so that the time for loading the static processing mode and dependency relationship into the task scheduling device can be saved, and the calculation efficiency can be improved. In addition, the tasks in the task graph template provided by the embodiment of the application can be multiplexed with the barrers, so that the number of counters in the synchronous counting circuit can be reduced, the area of a task scheduling device is reduced, and the expandability of a chip is improved.
The following describes a task scheduling process of the task scheduling device according to the embodiment of the present application with reference to fig. 7 and fig. 9.
Illustratively, in connection with the task graph template shown in fig. 7 (a), the initial values of b1 to b7 are 0, which are examples in which the plurality of tasks in the task graph template do not multiplex the same barrer. The first synchronization information table corresponding to the task graph template shown in fig. 7 (a) is shown in table 6 below.
TABLE 6
The task scheduling device according to the embodiment of the present application will be described with reference to the above tables 1, 6 and 4.
As shown in fig. 9, the task graph control circuit 802 receives a task graph template from the CPU through the first interface 801, and stores the task graph template, and the data structure of the task graph template can be described using three tables, table 1, table 4, and table 6. The task map control circuit 802 receives task information of a first task map from the CPU through the first interface 801, and a task map template corresponding to the first task map is shown in fig. 7 (a). The task map control circuit 802 sends a first task trigger signal to the task state machine 803, and the task state machine 803 confirms that the initial value 0 of b1 satisfies the trigger condition b1=0 corresponding to the first task based on the first task trigger signal lookup table 4, and the task state machine 803 obtains the task content of T1 from the task map control circuit 802 according to the first task identifier T1, the input data of the first task map, and table 1, and sends the task T1 to the computing unit through the second interface 804.
After the computing unit completes execution of the task T1, an Event1 indicating completion of execution of the task T1 is sent to the first interface 801. The first interface 801 parses the Event1 and sends the Event1 to the Event parsing circuit 805, the Event parsing circuit 805 receives the Event1 and queries the table 6, determines the identifier of the barrier corresponding to the Event1 as b2, and notifies the synchronous counting circuit 806 to modify the value of the counter corresponding to b 2. The synchronization counter circuit 806 modifies the value of b2 to 1 and informs the task state machine 803 of the identity of b 2. Based on table 4, the task state machine 803 determines that the value of b2 satisfies the trigger condition b2=1 corresponding thereto, the task to be executed is identified as T2 and T3, and based on the task to be executed and table 1, the task state machine 803 acquires the task T2 and the task T3 from the task graph control circuit 802 and sends the task T2 and the task T3 to the computing unit 1 and the computing unit 2.
The computing unit executes the task T2 and the task T3 in parallel, and after the computing unit executes the task T2 and the task T3, the computing unit sends an Event2 and an Event3 indicating that the task T2 and the task T3 are executed to the first interface 801. The first interface 801 parses the Event2 and the Event3, and sends the Event2 and the Event3 to the Event parsing circuit 805, the Event parsing circuit 805 receives the Event2 and the Event3, and queries the table 6, determines that identifiers of the barrers corresponding to the Event2 are b3 and b5, determines that identifiers of the barrers corresponding to the Event3 are b4 and b5, and notifies the synchronization counting circuit 806 to modify the values of the counters corresponding to b3, b4, and b 5. The sync counter circuit 806 modifies the value of b3 to 1, the value of b4 to 1, the value of b5 to 2, and informs the task state machine 803 of the identity of b3, b4, and b 5. The task state machine 803 determines, based on table 4, that the value of b3 satisfies its corresponding trigger condition b3=1, that the value of b5 satisfies its corresponding trigger condition b5=2, the task to be executed is identified as T4 and T6, and the task state machine 803 obtains the task to be executed T4 and the task T6 from the task graph control circuit 802 based on the task to be executed and table 1, and sends the task T4 and the task T6 to the computing unit.
After the computing unit completes execution of the task T4, an Event4 indicating completion of execution of the task T4 is sent to the first interface 801. The first interface 801 parses the Event4 and sends the Event4 to the Event parsing circuit 805, the Event parsing circuit 805 receives the Event4 and queries the table 6, determines the identifier of the barreer corresponding to the Event4 as b4, and notifies the synchronous counting circuit 806 to modify the value of the counter corresponding to b 4. The synchronization counter circuit 806 modifies the value of b4 to 2 and informs the task state machine 803 of the identity of b 4. The task state machine 803 determines that the value of b4 satisfies the trigger condition b4=2 corresponding thereto based on table 4, the task to be executed is identified as T5, and the task state machine 803 acquires the task to be executed T5 from the task map control circuit 802 and sends the task T5 to the calculation unit based on the task to be executed identification and table 1.
After the computing unit completes execution of the task T6, an Event6 indicating completion of execution of the task T6 is sent to the first interface 801. The first interface 801 parses the Event6 and sends the Event6 to the Event parsing circuit 805, the Event parsing circuit 805 receives the Event6 and queries the table 6, determines the identifier of the barrier corresponding to the Event6 as b6, and notifies the synchronous counting circuit 806 to modify the value of the counter corresponding to b 6. The synchronization counter circuit 806 modifies the value of b6 to 1 and informs the task state machine 803 of the identity of b 6. The task state machine 803 determines that the value of b6 does not satisfy its corresponding trigger condition b6=2 based on table 4. Alternatively, the computing unit may execute task T4 and task T6 in parallel.
After the computing unit completes execution of the task T5, an Event5 indicating completion of execution of the task T5 is sent to the first interface 801. The first interface 801 parses the Event5 and sends the Event5 to the Event parsing circuit 805, the Event parsing circuit 805 receives the Event5 and queries the table 6, determines the identifier of the barrier corresponding to the Event5 as b6, and notifies the synchronous counting circuit 806 to modify the value of the counter corresponding to b 6. The synchronization counter circuit 806 modifies the value of b6 to 2 and informs the task state machine 803 of the identity of b 6. The task state machine 803 determines that the value of b6 satisfies the trigger condition b6=2 corresponding thereto based on table 4, the task to be executed is identified as T7, and the task state machine 803 acquires the task to be executed T7 from the task map control circuit 802 based on the task to be executed identification and table 1 and sends the task T7 to the calculation unit.
After the computing unit completes execution of the task T7, an Event7 indicating completion of execution of the task T7 is sent to the first interface 801. The first interface 801 parses the Event7 and sends the Event7 to the Event parsing circuit 805, the Event parsing circuit 805 receives the Event7 and queries the table 6, determines the identifier of the barrier corresponding to the Event7 as b7, and notifies the synchronous counting circuit 806 to modify the value of the counter corresponding to b 7. The synchronization counter circuit 806 modifies the value of b7 to 1 and informs the task state machine 803 of the identity of b 7. The task state machine 803 determines that the value of b7 satisfies the trigger condition b7=1 corresponding thereto based on table 4, the task to be executed is identified as T8, and the task state machine 803 acquires the task to be executed T8 from the task map control circuit 802 and sends the task T8 to the calculation unit based on the task to be executed identification and table 1. And after the execution of the task T8 is completed, all the tasks in the first task graph template are completed.
Illustratively, in connection with the task graph template shown in fig. 7 (b), the same barrer is multiplexed by a plurality of tasks in the task graph template, and the initial values of b1 to b4 are 0. The first synchronization information table corresponding to the task graph template shown in fig. 7 (b) is shown in table 7 below.
TABLE 7
The task scheduling device according to the embodiment of the present application will be described with reference to the above tables 1, 7 and 5.
As shown in fig. 9, the task graph control circuit 802 receives a task graph template from the CPU through the first interface 801, and stores the task graph template, and the data structure of the task graph template can be described using three tables, table 1, table 7, and table 5. The task map control circuit 802 receives task information of a first task map from the CPU through the first interface 801, and a task map template corresponding to the first task map is shown in (b) of fig. 7. The task map control circuit 802 sends a first task trigger signal to the task state machine 803, and the task state machine 803 confirms that the initial value 0 of b1 satisfies the trigger condition b1=0 corresponding to the first task based on the first task trigger signal lookup table 5, and the task state machine 803 obtains the task content of T1 from the task map control circuit 802 according to the first task identifier T1, the input data of the first task map, and table 1, and sends the task T1 to the computing unit through the second interface 804.
After the computing unit completes execution of the task T1, an Event1 indicating completion of execution of the task T1 is sent to the first interface 801. The first interface 801 parses the Event1 and sends the Event1 to the Event parsing circuit 805, the Event parsing circuit 805 receives the Event1 and queries the table 7, determines the identifier of the barrier corresponding to the Event1 as b1, and notifies the synchronous counting circuit 806 to modify the value of the counter corresponding to b 1. The synchronization counter circuit 806 modifies the value of b1 to 1 and notifies the task state machine 803 of the identity of b 1. Based on table 5, the task state machine 803 determines that the value of b1 satisfies the trigger condition b1=1 corresponding thereto, the task to be executed is identified as T2 and T3, and based on the task to be executed and table 1, the task state machine 803 acquires the task T2 and the task T3 from the task graph control circuit 802 and sends the task T2 and the task T3 to the computing unit 1 and the computing unit 2.
The computing unit executes the task T2 and the task T3 in parallel, and after the computing unit executes the task T2 and the task T3, the computing unit sends an Event2 and an Event3 indicating that the task T2 and the task T3 are executed to the first interface 801. The first interface 801 parses the Event2 and the Event3, and sends the Event2 and the Event3 to the Event parsing circuit 805, the Event parsing circuit 805 receives the Event2 and the Event3, and queries the table 7, determines that identifiers of the barrers corresponding to the Event2 are b1 and b2, determines that identifiers of the barrers corresponding to the Event3 are b2 and b3, and notifies the synchronization counting circuit 806 to modify the values of the counters corresponding to the b1, b2, and b 3. The sync counter circuit 806 modifies the value of b1 to 2, the value of b2 to 2, the value of b3 to 1, and informs the task state machine 803 of the identity of b1, b2, and b 3. The task state machine 803 determines, based on table 5, that the value of b1 satisfies its corresponding trigger condition b1=2, that the value of b2 satisfies its corresponding trigger condition b2=2, the task to be executed is identified as T4 and T6, and the task state machine 803 obtains the task to be executed T4 and the task T6 from the task graph control circuit 802 based on the task to be executed and table 1, and sends the task T4 and the task T6 to the computing unit.
After the computing unit completes execution of the task T4, an Event4 indicating completion of execution of the task T4 is sent to the first interface 801. The first interface 801 parses the Event4 and sends the Event4 to the Event parsing circuit 805, the Event parsing circuit 805 receives the Event4 and queries the table 7, determines the identifier of the barreer corresponding to the Event4 as b3, and notifies the synchronous counting circuit 806 to modify the value of the counter corresponding to b 3. The synchronization counter circuit 806 modifies the value of b3 to 2 and informs the task state machine 803 of the identity of b 3. Based on table 5, the task state machine 803 determines that the value of b3 satisfies the trigger condition b3=2 corresponding thereto, the task to be executed is identified as T5, and based on the task to be executed identification and table 1, the task state machine 803 acquires the task to be executed T5 from the task map control circuit 802 and sends the task T5 to the computing unit.
After the computing unit completes execution of the task T6, an Event6 indicating completion of execution of the task T6 is sent to the first interface 801. The first interface 801 parses the Event6 and sends the Event6 to the Event parsing circuit 805, the Event parsing circuit 805 receives the Event6 and queries the table 7, determines the identifier of the barrier corresponding to the Event6 as b4, and notifies the synchronous counting circuit 806 to modify the value of the counter corresponding to b 4. The synchronization counter circuit 806 modifies the value of b4 to 1 and informs the task state machine 803 of the identity of b 4. The task state machine 803 determines that the value of b4 does not satisfy its corresponding trigger condition b4=2 based on table 5. Alternatively, the computing unit may execute task T4 and task T6 in parallel.
After the computing unit completes execution of the task T5, an Event5 indicating completion of execution of the task T5 is sent to the first interface 801. The first interface 801 parses the Event5 and sends the Event5 to the Event parsing circuit 805, the Event parsing circuit 805 receives the Event5 and queries the table 7, determines the identifier of the barrier corresponding to the Event5 as b4, and notifies the synchronous counting circuit 806 to modify the value of the counter corresponding to b 4. The synchronization counter circuit 806 modifies the value of b4 to 2 and informs the task state machine 803 of the identity of b 4. The task state machine 803 determines that the value of b4 satisfies the trigger condition b4=2 corresponding thereto based on table 5, the task to be executed is identified as T7, and the task state machine 803 acquires the task to be executed T7 from the task map control circuit 802 based on the task to be executed and table 1, and sends the task T7 to the calculation unit.
After the computing unit completes execution of the task T7, an Event7 indicating completion of execution of the task T7 is sent to the first interface 801. The first interface 801 parses the Event7 and sends the Event7 to the Event parsing circuit 805, the Event parsing circuit 805 receives the Event7 and queries the table 7, determines the identifier of the barrier corresponding to the Event7 as b4, and notifies the synchronous counting circuit 806 to modify the value of the counter corresponding to b 4. The synchronization counter circuit 806 modifies the value of b4 to 3 and informs the task state machine 803 of the identity of b 4. Based on table 5, the task state machine 803 determines that the value of b4 satisfies the trigger condition b4=3 corresponding thereto, the task to be executed is identified as T8, and based on the task to be executed identification and table 1, the task state machine 803 acquires the task to be executed T8 from the task map control circuit 802 and sends the task T8 to the calculation unit. And after the execution of the task T8 is completed, all the tasks in the first task graph template are completed.
It can be understood that the task scheduling device provided by the embodiment of the application supports the built-in of the static task graph template, so that when the task scheduling device executes a plurality of task graphs with the same processing modes and dependency relationships, a CPU is not required to initialize the processing modes and the dependency relationships corresponding to the task graphs into the task scheduling device each time, and the initialization time of the task graphs is reduced. That is, by creating the task graph template once, the task graph with the same processing mode and dependency relationship as those of the task graph template can be repeatedly executed, and when the task graphs are executed subsequently, the static processing mode and dependency relationship are not required to be loaded into the task scheduling device again, so that the time for loading the static processing mode and dependency relationship into the task scheduling device can be saved, and the calculation efficiency can be improved. In addition, the tasks in the task graph template provided by the embodiment of the application can be multiplexed with the barrers, so that the number of counters in the synchronous counting circuit can be reduced, the area of a task scheduling device is reduced, and the expandability of a chip is improved.
Optionally, in the case that one barreer in the second synchronization information table corresponds to a plurality of trigger conditions, since the plurality of trigger conditions occupy large storage resources, a larger storage device is required to store the plurality of trigger conditions. In order to reduce the chip area of the task scheduling device, the second synchronization information table may be divided into a first sub-information table and a second sub-information table, and the first sub-information table is stored in the task scheduling device, and the second sub-information table is stored in the memory.
The first sub-information table comprises a plurality of barrers, a first trigger condition corresponding to each barrer, and an identifier of a task to be executed when each barrer meets the corresponding first trigger condition. The second sub-information table comprises a plurality of barrers, other trigger conditions corresponding to each barrer, and identifiers of tasks to be executed when each barrer meets the corresponding other trigger conditions. For the same barrer, the triggering sequence of the first triggering condition corresponding to the barrer is earlier than the triggering sequence of other triggering conditions corresponding to the barrer.
For example, taking the first trigger condition corresponding to each barrier as trigger_condition0, and the other trigger conditions corresponding to each barrier as trigger_condition1 and trigger_condition2 as examples, the first sub-information table and the second sub-information table are shown in table 8 and table 9, respectively.
TABLE 8
TABLE 9
Alternatively, the first sub-information table may be stored in a cache of the task scheduler, and the second sub-information table may be stored in a memory (e.g., double Data Rate (DDR) synchronous dynamic random access memory), which is not in the task scheduler, but is a memory other than the task scheduler. It can be appreciated that the chip area of the task scheduling device can be reduced by storing part of the trigger conditions in the DDR.
For example, in the case where the initial value of the counter corresponding to the barrer is 0, the value when the barrer satisfies the first trigger condition may be smaller than the value when the barrer satisfies the other trigger conditions, so that the trigger sequence of the first trigger condition is earlier than the trigger sequence of the other trigger conditions. Under the condition that the initial value of the counter corresponding to the barrer is a non-zero value preset according to the dependency relationship between tasks, the value when the barrer meets the first trigger condition can be larger than the value when the barrer meets other trigger conditions, so that the trigger sequence of the first trigger condition is earlier than that of the other trigger conditions.
Alternatively, the plurality of trigger conditions in the second sub-information table may be sequentially arranged according to the trigger sequence.
The task graph control circuit 802 is further configured to, when the value of the barrer meets the first trigger condition corresponding to the barrer, read the next other trigger condition from the memory according to the trigger sequence of the plurality of other trigger conditions corresponding to the barrer in the second sub-information table, and replace the first trigger condition corresponding to the barrer with the other trigger condition.
The triggering sequence of the next other triggering condition is next to the first triggering condition in the triggering conditions corresponding to the barrer. I.e. the next other trigger condition is a trigger condition that will be triggered by the first barrer after the value of the first barrer has fulfilled the first trigger condition. Such as a second trigger condition.
Optionally, the task control circuit replaces the first trigger condition corresponding to the barrer stored in the cache with the second trigger condition corresponding to the barrer. Therefore, the task graph control circuit 802 is further configured to, when the value of the barrier satisfies the second trigger condition in the cache, read a third trigger condition from the memory according to the trigger sequence of the plurality of other trigger conditions corresponding to the barrier in the second sub-information table, and replace the second trigger condition in the cache with the third trigger condition. And so on until all the triggering conditions corresponding to the same barrer are traversed.
Taking 3 trigger conditions corresponding to the barrer as an example, the 3 trigger conditions are the first trigger condition, the second trigger condition and the third trigger condition in sequence according to the trigger sequence (the second trigger condition and the third trigger condition are the other trigger conditions), the first trigger condition is stored in a cache of the task scheduling device, and the second trigger condition and the third trigger condition are stored in the DDR. When the value of the barrer satisfies the first trigger condition, the task graph control circuit 802 reads the second trigger condition from the DDR and replaces the first trigger condition in the cache with the second trigger condition. When the value of the barrier satisfies the second trigger condition, the task map control circuit 802 reads the next other trigger condition (i.e., the third trigger condition) from the DDR and replaces the second trigger condition in the cache with the third trigger condition.
It can be understood that in the embodiment of the application, the first trigger condition is stored in the cache of the task scheduling device, the other trigger conditions are stored in the DDR, and the trigger conditions in the cache are dynamically replaced, so that the trigger conditions can be loaded into the cache in sequence.
The embodiment of the application also provides a computing device, as shown in fig. 9, which comprises a Central Processing Unit (CPU) and a task scheduling device shown in fig. 8, wherein the CPU is used for sending a task graph template to the task scheduling device.
Optionally, the computing device may further comprise an enhanced short message service (enhanced message severice, EMS) and a computing unit, the EMS being configured to receive the task to be performed from the task scheduling means and to assign the task to be performed to the computing unit. The computing unit is used for executing the task to be executed. The computing unit may be an accelerator or a processor. The EMS is a hardware queue management and load balancing module used for distributing tasks to be executed to the computing units in a balanced mode.
Exemplary, an embodiment of the present application further provides a task scheduling method, as shown in fig. 10, where the task scheduling method is applied to the task scheduling device shown in fig. 8, and the task scheduling method includes the following steps:
S1001, the task scheduling device acquires task information of the first task graph.
The task information of the first task graph comprises input data of the first task graph and task graph template identifications corresponding to the first task graph.
The task scheduling device comprises one or more task graph templates, wherein the task graph templates in the task scheduling device can be the task graph templates received from a CPU or the task graph templates preset in the task scheduling device.
Alternatively, the data structure of the task graph template may be described by using the three tables, i.e., the task information table, the first synchronization information table, and the second synchronization information table.
Alternatively, the above step S1001 may be performed by the task map control circuit 802 in the task scheduling device shown in fig. 8, and the task map control circuit 802 may receive task information of a first task map from the CPU or the accelerator through the first interface 801.
S1002, determining a task graph template corresponding to the first task graph in one or more task graph templates based on the task graph template identification corresponding to the first task graph.
Optionally, the step S1002 may be performed by the task graph control circuit 802 in the task scheduling device shown in fig. 8, where the task graph control circuit 802 may determine, from among the multiple task graph templates stored in the task graph control circuit, a task graph template corresponding to the first task graph according to the task graph template identifier corresponding to the first task graph.
S1003, scheduling the first task graph based on the input data of the first task graph and a task graph template corresponding to the first task graph.
The specific execution procedure of step S1003 will be described below taking the task scheduling device as an example as the task scheduling device shown in fig. 8. As shown in fig. 11, the step S1003 may include the steps of:
s10031, the task state machine obtains a first task corresponding to the first task identifier from the task map control circuit according to the first task identifier, input data of the first task map and the task information table corresponding to the first task map when determining that the value of the first barrier meets the first trigger condition corresponding to the first task identifier based on the second synchronization information table, and sends the first task to the computing unit through the second interface.
The first task identifier is an identifier of a task to be executed when the value of the first barrier meets a first trigger condition.
Optionally, for the first task in the first task graph, before the step S10031, the task graph control circuit 802 may send a first task trigger signal to the task state machine 803, where the first task trigger signal is used to instruct the task state machine 803 to query the second synchronization information table, and determine whether an initial value of a barrier (in the second synchronization information table, the to-be-executed task corresponding to the trigger condition is the first task) meets the trigger condition corresponding to the first task. When the task state machine 803 determines that the initial value of the task identifier meets the trigger condition corresponding to the first task, the task state machine 803 obtains the task content of the first task from the task graph control circuit 802 according to the first task identifier, the input data of the first task graph and the task information table corresponding to the first task graph, and sends the task content of the first task to the computing unit through the second interface 804.
Optionally, for other tasks after the first task in the first task graph, the task state machine 803 may query the second synchronization information table when the value of the barrier is updated, to determine whether the value of the barrier meets the trigger condition corresponding to the value of the barrier.
Optionally, if the first tasks to be executed when the value of the first barrier satisfies the corresponding first trigger condition are multiple, the computing unit executes the multiple first tasks in parallel.
Alternatively, multiple tasks in the task graph template may multiplex the same barrer. Multiplexing multiple tasks with the same barrier means that the triggering of the multiple tasks can depend on the same barrier. That is, in the second synchronization information table, if a plurality of tasks multiplex the same carrier, when the value of the carrier satisfies one or more trigger conditions, the corresponding task to be executed is the plurality of tasks. For a description of multiplexing the same barreer by a plurality of tasks, reference may be made to the foregoing embodiments, and details thereof will not be repeated.
S10032, under the condition that the execution of the first task is completed, the event analysis circuit receives the first event through the first interface, determines the identification of the second barrier corresponding to the first event based on the first synchronization information table, and informs the synchronization counting circuit to modify the value of the counter corresponding to the second barrier.
Wherein the first event is used to indicate that the first task execution is complete.
The number of the second barrers corresponding to the first event may be one or more. The second barrer may be the same as the first barrer or may be different from the first barrer.
Optionally, after the computing unit performs the first task, a first event indicating that the first task is performed is sent to the first interface, and the first interface parses the first event and routes the first event to the event parsing circuit 805. The event parsing circuit 805 queries the first synchronization information table based on the first event identification, determines the identification of one or more second barrers corresponding to the first event, and notifies the synchronization counting circuit 806 to modify the value of the counter corresponding to the second barrers.
S10033, the synchronous counting circuit modifies the value of the counter corresponding to the second barrer.
Illustratively, the synchronous counting circuit modifies the value of the counter corresponding to the second barrer, and the value of the second barrer is updated.
Optionally, after the synchronization counting circuit 806 modifies the value of the counter corresponding to the second barrier, the task state machine 803 may be notified of the identity of the second barrier. The task state machine 803 determines, based on the second synchronization information table, whether the value of the second barrier satisfies the corresponding trigger condition, and if the value of the second barrier satisfies the corresponding trigger condition, continues to execute the steps S10031 to S10033 until all tasks in the first task graph are executed.
Optionally, when the first barrier corresponds to the plurality of trigger conditions, if the first trigger condition corresponding to the first barrier is stored in the cache of the task scheduling device and the other trigger conditions corresponding to the first barrier are stored in the DDR, the step S1003 may further include:
and S10034, when the value of the first barrier meets the corresponding first trigger condition, the task graph control circuit reads the next other trigger condition from the memory according to the trigger sequence of the other trigger conditions corresponding to the first barrier in the second sub-information table, and replaces the trigger condition corresponding to the first barrier in the task scheduling device with the next other trigger condition.
The triggering sequence of the next other triggering condition is next to the first triggering condition in the triggering conditions corresponding to the first barrier. I.e. the next other trigger condition is a trigger condition that will be triggered by the first barrer after the value of the first barrer has fulfilled the first trigger condition. For example, when the first trigger condition corresponding to the first barreer is the first trigger condition, the next other trigger condition is the second trigger condition corresponding to the first barreer. For another example, when the first trigger condition corresponding to the first barreer is the second trigger condition, the next other trigger condition is the third trigger condition corresponding to the first barreer.
In the case that the first trigger condition is the first trigger condition, the step S10034 may include: when the value of the first barrier meets the first trigger condition corresponding to the first barrier in the cache, the task graph control circuit reads the next other trigger condition from the memory according to the trigger sequence of the other trigger conditions corresponding to the first barrier in the second sub-information table, and replaces the first trigger condition corresponding to the first barrier in the cache with the next other trigger condition.
When the first trigger condition is another trigger condition after the first trigger condition, the step S10034 may include: when the value of the first barrier meets a first trigger condition corresponding to the first barrier in the cache, the task graph control circuit reads the next other trigger condition from the memory according to the trigger sequence of other trigger conditions corresponding to the first barrier in the second sub-information table, and replaces the first trigger condition corresponding to the first barrier in the cache with the next other trigger condition.
Taking 3 trigger conditions corresponding to the first barrer as an example, the 3 trigger conditions are the first trigger condition, the second trigger condition and the third trigger condition in sequence according to the trigger sequence (the second trigger condition and the third trigger condition are the other trigger conditions), the first trigger condition is stored in a cache of the task scheduling device, and the second trigger condition and the third trigger condition are stored in the DDR. When the value of the first barrier satisfies the first trigger condition corresponding to the first trigger condition, the task graph control circuit 802 reads the second trigger condition corresponding to the first barrier from the DDR, and replaces the first trigger condition corresponding to the first barrier in the cache with the second trigger condition. When the value of the first barrier satisfies the second trigger condition corresponding to the first barrier, the task graph control circuit 802 reads a third trigger condition corresponding to the first barrier from the DDR, and replaces the second trigger condition corresponding to the first barrier in the cache with the third trigger condition.
It can be understood that in the embodiment of the application, the first trigger condition is stored in the cache of the task scheduling device, the other trigger conditions are stored in the DDR, and the trigger conditions in the cache are dynamically replaced, so that the trigger conditions can be loaded into the cache in sequence.
Alternatively, step S10034 may be performed after step S10031 or may be performed simultaneously with step S10031, which is not limited by the embodiment of the application.
It should be noted that, the specific implementation manner of the steps S10031 to S10034 may refer to the related description of the foregoing embodiment, which is not repeated herein.
According to the task scheduling method provided by the embodiment of the application, the static task graph template is stored in the task scheduling device, so that the dependency relationship and the processing mode do not need to be initialized to the task scheduling device again when the task graph is executed each time, and only the dynamic data of the task graph need to be initialized to the task scheduling device, and therefore, the time for initializing the dependency relationship and the processing mode to the task scheduling device is reduced. Compared with the prior art that the CPU is required to initialize the processing mode and the dependency relationship corresponding to the task graph into the task scheduling device every time, the embodiment of the application can repeatedly execute a plurality of task graphs with the same processing mode and dependency relationship as the task graph template by creating the task graph template once, and the static processing mode and dependency relationship are not required to be loaded into the task scheduling device again when the plurality of task graphs are executed subsequently, so that the time for loading the static processing mode and dependency relationship into the task scheduling device can be saved, and the calculation efficiency is improved. In addition, the tasks in the task graph template provided by the embodiment of the application can be multiplexed with the barrers, so that the number of counters in the synchronous counting circuit can be reduced, the area of a task scheduling device is reduced, and the expandability of a chip is improved. According to the embodiment of the application, the first trigger condition is further stored in the cache, the other trigger conditions are stored in the DDR, and the trigger conditions in the cache are dynamically replaced, so that the trigger conditions can be loaded into the cache in sequence.
The steps of a method or algorithm described in connection with the present disclosure may be embodied in hardware, or may be embodied in software instructions executed by a processor. The software instructions may be comprised of corresponding software modules that may be stored in random access memory (random access memory, RAM), flash memory, erasable programmable read-only memory (erasable programmable ROM, EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disk, a removable disk, a compact disc read-only memory (CD-ROM), or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. In addition, the ASIC may be located in a core network interface device. The processor and the storage medium may reside as discrete components in a core network interface device.
Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present application may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention in further detail, and are not to be construed as limiting the scope of the invention, but are merely intended to cover any modifications, equivalents, improvements, etc. based on the teachings of the invention.

Claims (24)

  1. A task scheduling device, characterized in that the task scheduling device comprises one or more task graph templates, each task graph template is used for indicating the dependency relationship among a plurality of tasks included in the task graph template, and the processing mode of each task; the task scheduling device is used for:
    acquiring task information of a first task graph; the task information of the first task graph comprises input data of the first task graph and task graph template identifications corresponding to the first task graph;
    determining a task graph template corresponding to the first task graph in the one or more task graph templates based on the task graph template identification corresponding to the first task graph;
    and scheduling the first task graph based on the input data of the first task graph and a task graph template corresponding to the first task graph.
  2. The apparatus of claim 1, wherein the device comprises a plurality of sensors,
    the task scheduling device is further used for acquiring the one or more task graph templates; each task graph template comprises a task information table, a first synchronization information table and a second synchronization information table; the task information table comprises a plurality of task identifiers and processing modes corresponding to the task identifiers; the first synchronization information table comprises a plurality of events and identifications of one or more barrier barrers corresponding to each event, the plurality of events are in one-to-one correspondence with the plurality of tasks, and each event is used for indicating that the corresponding task is executed; the second synchronization information table comprises a plurality of barrers, one or more triggering conditions corresponding to each barrer, and a task identifier to be executed when each barrer meets the corresponding triggering condition.
  3. The apparatus of claim 2, wherein the task scheduling means comprises a first interface, a task graph control circuit, a task state machine, and a second interface coupled; wherein,
    the task graph control circuit is used for acquiring the task graph template and task information of the first task graph through the first interface;
    The task state machine is configured to, based on the second synchronization information table, when it is determined that the value of the first barrier meets a first trigger condition corresponding to the first synchronization information table, obtain, from the task map control circuit, a first task corresponding to the first task identifier according to the first task identifier, input data of the first task map, and a task information table corresponding to the first task map, and send the first task to the computing unit through the second interface; and the first task identifier is the identifier of the task to be executed when the value of the first barrer meets a first trigger condition.
  4. The apparatus of claim 3, wherein the computing unit executes a plurality of the first tasks in parallel when the first tasks are a plurality.
  5. The apparatus of claim 3 or 4, wherein the task scheduling apparatus further comprises an event parsing circuit and a synchronous counting circuit coupled to each other, the synchronous counting circuit comprising a plurality of counters, each counter corresponding to a barrier;
    the event analysis circuit is configured to receive a first event through the first interface and determine, based on the first synchronization information table, an identifier of a second barrier corresponding to the first event, and notify the synchronization counting circuit to modify a value of a counter corresponding to the second barrier when the execution of the first task is completed; wherein the first event is used for indicating that the first task execution is completed;
    The synchronous counting circuit is used for modifying the value of the counter corresponding to the second barrer.
  6. The device according to any one of claims 3 to 5, wherein,
    the task graph control circuit is also used for modifying or deleting the task graph template.
  7. The apparatus of any of claims 2-6, wherein the task graph template comprises a first task and a second task, the first task and the second task multiplexing a same barrier.
  8. The apparatus of claim 7, wherein the first task and the second task satisfy at least one of:
    the first task and the second task have no father node; or,
    the first task and the second task have the same parent node; or,
    the first task is a unique father node of the second task; or,
    the root nodes of the first task and the second task multiplex the same barrer, and the first task is the unique father node of the second task.
  9. The apparatus according to any one of claims 2-8, wherein one of the barrers corresponds to a plurality of trigger conditions, the plurality of trigger conditions including a first trigger condition and other trigger conditions, the first trigger condition having a trigger sequence earlier than the other trigger conditions;
    The second synchronization information table comprises a first sub-information table and a second sub-information table, the first sub-information table comprises a plurality of barrers, each barrer corresponds to the first trigger condition, and the identifier of a task to be executed when each barrer meets the corresponding first trigger condition; the second sub-information table includes the plurality of barrers, the other trigger conditions corresponding to each barrer, and the identifiers of tasks to be executed when each barrer satisfies the other trigger conditions corresponding to each barrer.
  10. The apparatus of claim 9, wherein the first sub-information table is stored in a cache of the task scheduler and the second sub-information table is stored in memory.
  11. The apparatus of claim 10, wherein, in the case that the other trigger conditions corresponding to the barrer are plural, the plural other trigger conditions corresponding to the barrer in the second sub-information table are sequentially arranged according to a trigger sequence;
    and the task graph control circuit is further configured to, when the value of the barrer meets the first trigger condition corresponding to the barrer, read the next other trigger condition from the memory according to the trigger sequence of the other trigger conditions corresponding to the barrer in the second sub-information table, and replace the first trigger condition corresponding to the barrer with the other trigger condition.
  12. The task scheduling method is characterized by being applied to a task scheduling device, wherein the task scheduling device comprises one or more task graph templates, each task graph template is used for indicating the dependency relationship among a plurality of tasks included in the task graph template and the processing mode of each task; the method comprises the following steps:
    the task scheduling device acquires task information of a first task graph; the task information of the first task graph comprises input data of the first task graph and task graph template identifications corresponding to the first task graph;
    the task scheduling device determines a task graph template corresponding to the first task graph in the one or more task graph templates based on the task graph template identification corresponding to the first task graph;
    the task scheduling device schedules the first task graph based on the input data of the first task graph and a task graph template corresponding to the first task graph.
  13. The method according to claim 12, wherein the method further comprises:
    the task scheduling device acquires the one or more task graph templates; each task graph template comprises a task information table, a first synchronization information table and a second synchronization information table; the task information table comprises a plurality of task identifiers and processing modes corresponding to the task identifiers; the first synchronization information table comprises a plurality of events and identifications of one or more barrier barrers corresponding to each event, the plurality of events are in one-to-one correspondence with the plurality of tasks, and each event is used for indicating that the corresponding task is executed; the second synchronization information table comprises a plurality of barrers, one or more triggering conditions corresponding to each barrer, and a task identifier to be executed when each barrer meets the corresponding triggering condition.
  14. The method of claim 13, wherein the task scheduling device comprises a first interface, a task graph control circuit, a task state machine, and a second interface coupled;
    the task scheduling device obtains task information of the task graph and the first task graph, including: the task graph control circuit acquires the task graph and task information of the first task graph through the first interface;
    the task scheduling device schedules the first task graph based on the input data of the first task graph and a task graph template corresponding to the first task graph, and includes: the task state machine obtains a first task corresponding to a first task identifier from the task map control circuit according to the first task identifier, input data of the first task map and a task information table corresponding to the first task map when determining that the value of the first barrier meets a first trigger condition corresponding to the first task identifier based on the second synchronous information table, and sends the first task to a computing unit through the second interface; and the first task identifier is the identifier of the task to be executed when the value of the first barrer meets a first trigger condition.
  15. The method of claim 14, wherein when the first task is plural, the computing unit executes plural of the first tasks in parallel.
  16. A method according to claim 14 or 15, wherein the task scheduling device further comprises an event parsing circuit and a synchronous counting circuit coupled together, the synchronous counting circuit comprising a plurality of counters, one for each counter; the task scheduling device schedules the first task graph based on the input data of the first task graph and a task graph template corresponding to the first task graph, and the task scheduling device further comprises:
    the event analysis circuit receives a first event through the first interface under the condition that the execution of the first task is completed, determines the identifier of a second barrier corresponding to the first event based on the first synchronization information table, and informs the synchronization counting circuit to modify the value of a counter corresponding to the second barrier; wherein the first event is used for indicating that the first task execution is completed;
    the synchronous counting circuit modifies the value of the counter corresponding to the second barrer.
  17. The method according to any one of claims 14-16, further comprising:
    The task graph control circuit modifies or deletes the task graph template.
  18. A method according to any of claims 14-17, wherein the task graph template comprises a first task and a second task, the first task and the second task multiplexing the same barrier.
  19. The method of claim 18, wherein the first task and the second task satisfy at least one of:
    the first task and the second task have no father node; or,
    the first task and the second task have the same parent node; or,
    the first task is a unique father node of the second task; or,
    the root nodes of the first task and the second task multiplex the same barrer, and the first task is the unique father node of the second task.
  20. A method according to any one of claims 14 to 19, wherein one of the barrers corresponds to a plurality of trigger conditions, the plurality of trigger conditions including a first trigger condition and other trigger conditions, the first trigger condition having a trigger sequence earlier than the other trigger conditions;
    The second synchronization information table comprises a first sub-information table and a second sub-information table, the first sub-information table comprises a plurality of barrers, each barrer corresponds to the first trigger condition, and the identifier of a task to be executed when each barrer meets the corresponding first trigger condition; the second sub-information table includes the plurality of barrers, the other trigger conditions corresponding to each barrer, and the identifiers of tasks to be executed when each barrer satisfies the other trigger conditions corresponding to each barrer.
  21. The method of claim 20, wherein the first sub-information table is stored in a cache of the task scheduler and the second sub-information table is stored in memory.
  22. The method of claim 21, wherein, in the case that the other trigger conditions corresponding to the barrer are plural, the plural other trigger conditions corresponding to the barrer in the second sub-information table are sequentially arranged according to a trigger sequence; the method further comprises the steps of:
    and when the value of the barrer meets the first trigger condition corresponding to the barrer, the task graph control circuit reads the next other trigger condition from the memory according to the trigger sequence of a plurality of other trigger conditions corresponding to the barrer in the second sub-information table, and replaces the first trigger condition corresponding to the barrer with the other trigger condition.
  23. A computing device comprising a central processing unit CPU and task scheduling means according to any one of claims 1-11, wherein the CPU is arranged to send the task graph template to the task scheduling means.
  24. The computing device of claim 23, further comprising an enhanced short message service, EMS, and a computing unit, the EMS to receive a task to be performed from the task scheduling means and to assign the task to be performed to the computing unit, the computing unit to perform the task to be performed.
CN202180097744.8A 2021-06-16 2021-06-16 Task scheduling method and device Pending CN117222980A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/100415 WO2022261867A1 (en) 2021-06-16 2021-06-16 Task scheduling method and apparatus

Publications (1)

Publication Number Publication Date
CN117222980A true CN117222980A (en) 2023-12-12

Family

ID=84526884

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180097744.8A Pending CN117222980A (en) 2021-06-16 2021-06-16 Task scheduling method and device

Country Status (2)

Country Link
CN (1) CN117222980A (en)
WO (1) WO2022261867A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140039954A1 (en) * 2012-07-31 2014-02-06 Wj Global Llc Project management with task templification and concentration, and automated provider identification and scheduling
CN104166590A (en) * 2013-05-20 2014-11-26 阿里巴巴集团控股有限公司 Task scheduling method and system
CN110895486B (en) * 2018-09-12 2022-08-12 北京奇虎科技有限公司 Distributed task scheduling system
CN110888721A (en) * 2019-10-15 2020-03-17 平安科技(深圳)有限公司 Task scheduling method and related device
CN111522635B (en) * 2019-12-31 2023-10-20 支付宝实验室(新加坡)有限公司 Computing task processing method, computing task processing device, server and storage medium

Also Published As

Publication number Publication date
WO2022261867A1 (en) 2022-12-22

Similar Documents

Publication Publication Date Title
US7752611B2 (en) Speculative code motion for memory latency hiding
Gajski et al. Essential issues in multiprocessor systems
US8161453B2 (en) Method and apparatus for implementing task management of computer operations
CN110955535B (en) Method and related device for calling FPGA (field programmable Gate array) equipment by multi-service request process
US20100107174A1 (en) Scheduler, processor system, and program generation method
CN114741207B (en) GPU resource scheduling method and system based on multi-dimensional combination parallelism
US11847497B2 (en) Methods and apparatus to enable out-of-order pipelined execution of static mapping of a workload
US20190079805A1 (en) Execution node selection method and information processing apparatus
Nozal et al. Exploiting co-execution with oneAPI: heterogeneity from a modern perspective
KR100694212B1 (en) Distribution operating system functions for increased data processing performance in a multi-processor architecture
Zhou et al. Task mapping in heterogeneous embedded systems for fast completion time
US10564947B2 (en) Computer system and method for multi-processor communication
EP3779778A1 (en) Methods and apparatus to enable dynamic processing of a predefined workload
CN114637536A (en) Task processing method, computing coprocessor, chip and computer equipment
US20240193721A1 (en) System and method for adaptive graph-to-stream scheduling
WO2024109312A1 (en) Task scheduling execution method, and generation method and apparatus for task scheduling execution instruction
CN114911586A (en) Task scheduling method, device and system
Li et al. A static task scheduling framework for independent tasks accelerated using a shared graphics processing unit
US10025605B2 (en) Message handler compiling and scheduling in heterogeneous system architectures
Kwok Parallel program execution on a heterogeneous PC cluster using task duplication
CN117222980A (en) Task scheduling method and device
US20230236878A1 (en) Efficiently launching tasks on a processor
US20190310857A1 (en) Method of Concurrent Instruction Execution and Parallel Work Balancing in Heterogeneous Computer Systems
Ino et al. GPU-Chariot: A programming framework for stream applications running on multi-GPU systems
CN114116150A (en) Task scheduling method and device and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination