CN115904681A - Task scheduling method and device and related products - Google Patents

Task scheduling method and device and related products Download PDF

Info

Publication number
CN115904681A
CN115904681A CN202111108265.XA CN202111108265A CN115904681A CN 115904681 A CN115904681 A CN 115904681A CN 202111108265 A CN202111108265 A CN 202111108265A CN 115904681 A CN115904681 A CN 115904681A
Authority
CN
China
Prior art keywords
task
target
graph
tasks
degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111108265.XA
Other languages
Chinese (zh)
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cambricon Technologies Corp Ltd
Original Assignee
Cambricon Technologies Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cambricon Technologies Corp Ltd filed Critical Cambricon Technologies Corp Ltd
Priority to CN202111108265.XA priority Critical patent/CN115904681A/en
Publication of CN115904681A publication Critical patent/CN115904681A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application provides a task scheduling method, a task scheduling device and a related product, wherein the method comprises the following steps: acquiring a target graph task, wherein the target graph task comprises a plurality of target tasks; splitting the target graph task into at least one target sub-graph task, and sending the target sub-graph task to the equipment end so that the equipment end executes the target sub-graph task; each target sub-graph task comprises at least one target task, and the split target sub-graph tasks have no dependency relationship. Because there is no dependency relationship between the target sub-graph tasks, the device side does not need to maintain the target sub-graph tasks. And the split target sub-graph tasks comprise at least one target task, so that when the target sub-graph tasks are issued to the equipment end, compared with the sequential issuing of single tasks, the issuing times can be effectively reduced, and the issuing efficiency is improved.

Description

Task scheduling method and device and related products
Technical Field
The embodiment of the application relates to the technical field of data transmission, in particular to a task scheduling method and device and a related product.
Background
Since a plurality of consecutive tasks have a dependency relationship therebetween, the plurality of consecutive tasks are represented in a graph structure in order to represent the dependency relationship therebetween. Each task is a node in the graph structure, and a plurality of continuous tasks in the representation form are visually called graph tasks.
When processing a graph task, the task is generally issued to a device (device for short) through a host (host for short), and a processing unit of the device executes the task. At present, when a host issues a task to an equipment end, the tasks are generally issued in sequence by a single task, or the tasks are issued to the equipment end as a whole.
The sequential issuing of the single tasks can lead the interaction between the host end and the equipment end to be repeated, and the issuing efficiency is lower. The graph tasks are issued to the equipment end as a whole, and the equipment end needs to maintain the dependency relationship among a plurality of tasks in the graph tasks, so that the execution sequence of each task in the graph tasks meets the dependency relationship, and the maintenance cost of the equipment end is increased.
Disclosure of Invention
The embodiment of the application provides a task scheduling method, a task scheduling device and a related product, and aims to solve the technical problems that the issuing efficiency is low or the maintenance cost of an equipment end is increased in the prior art.
In a first aspect, an embodiment of the present application provides a task scheduling method, where the method is applied to a host side, and the method includes:
acquiring a target graph task, wherein the target graph task comprises a plurality of target tasks;
splitting the target graph task into at least one target sub-graph task, and sending the target sub-graph task to an equipment end so that the equipment end can execute the target sub-graph task;
each target sub-graph task comprises at least one target task, and the split target sub-graph tasks have no dependency relationship.
In a second aspect, an embodiment of the present application provides a task scheduling apparatus, where the apparatus is located in a host side, and the apparatus includes:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a target graph task, and the target graph task comprises a plurality of target tasks;
the scheduling module is used for splitting the target graph task into at least one target sub-graph task and sending the target sub-graph task to an equipment end so that the equipment end can execute the target sub-graph task;
each target sub-graph task comprises at least one target task, and the split target sub-graph tasks are independent of each other.
In a third aspect, an embodiment of the present application provides a task scheduling apparatus, including at least one processor, a memory, and a transceiver;
the processor, the memory and the transceiver are interconnected through a circuit;
the memory stores computer execution instructions; the transceiver is used for transceiving data with the equipment end;
execution of the computer-executable instructions stored by the memory by the at least one processor causes the at least one processor to perform the method of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by at least one processor, the method according to any one of the first aspect is implemented.
According to the task scheduling method, the task scheduling device and the related product, the target graph task is obtained and comprises a plurality of target tasks; splitting the target graph task into at least one target sub-graph task, and sending the target sub-graph task to the equipment end so that the equipment end executes the target sub-graph task; each target sub-graph task comprises at least one target task, and the split target sub-graph tasks have no dependency relationship. Because the target sub-graph tasks have no dependency relationship, the device side does not need to maintain the target sub-graph tasks. And the split target sub-graph tasks comprise at least one target task, so that when the target sub-graph tasks are issued to the equipment end, compared with the sequential issuing of single tasks, the issuing times can be effectively reduced, and the issuing efficiency is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a flowchart of a task scheduling method according to an embodiment of the present application;
fig. 2 is a network architecture diagram of a task scheduling method according to an embodiment of the present application;
FIG. 3 is a schematic illustration of the tasks of the figures in the present application;
fig. 4 is a flowchart of a task scheduling method according to another embodiment of the present application;
FIG. 5 is a flowchart of a task scheduling method according to another embodiment of the present application;
fig. 6 is a working schematic diagram of a task scheduling method according to yet another embodiment of the present application;
fig. 7 is a flowchart of a task scheduling method according to a further embodiment of the present application;
fig. 8 is a working schematic diagram of a task scheduling method according to yet another embodiment of the present application;
fig. 9 is a schematic structural diagram of a task scheduling device according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a task scheduling device according to another embodiment of the present application;
fig. 11 is a structural diagram showing a board according to the embodiment of the present application;
fig. 12 is a block diagram showing a combined processing apparatus according to an embodiment of the present application;
FIG. 13 is a schematic diagram illustrating an internal structure of a single-core computing device according to an embodiment of the present application
FIG. 14 is a schematic diagram illustrating an internal architecture of a multi-core computing device according to an embodiment of the application;
fig. 15 is a schematic diagram illustrating an internal structure of a processor core according to an embodiment of the present application.
Specific embodiments of the present disclosure have been shown by way of example in the drawings and will be described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
For a clear understanding of the technical solutions of the present application, a detailed description of the prior art solutions will be given first.
When a heterogeneous system (which may include a host side and a device side) is used to process a graph task, the task is generally issued to the device side through the host side, and the processing unit of the device side executes the task. At present, when a host issues a task to a device, there are two ways:
in the first mode, the host end issues the single tasks to the device end in sequence. Each time a task is issued, the interaction between the host end and the device end is required, so that a large overhead is brought, and the issuing efficiency is low.
In the second mode, the host sends the graph task to the device as a whole in one time. However, because the graph tasks include multiple tasks with dependency relationships, in order to ensure that the execution sequence of each task in the graph tasks meets the dependency relationships, the device side needs to add dependency relationship processing logic of software or hardware to maintain the dependency relationships among the multiple tasks in the graph tasks, thereby increasing development and maintenance costs of the software and hardware of the device side.
Therefore, when the technical problems in the prior art are faced, the inventors have creatively researched and found that all tasks in the graph tasks do not have dependency relationships, so that after the target graph task to be issued is obtained, the target graph task can be split into at least one target sub-graph task, and if the split target sub-graph tasks are multiple, the split target sub-graph tasks have no dependency relationship after the target graph task is split each time. The host side can issue the split target sub-graph task to the device side after splitting the target sub-graph task each time, so that the device side can execute the target sub-graph task. After the target sub-graph tasks are issued to the device side, the device side does not need to maintain the target sub-graph tasks because the target sub-graph tasks have no dependency relationship. And the split target sub-graph tasks comprise at least one target task, so that when the target sub-graph tasks are issued to the equipment end, compared with the sequential issuing of single tasks, the issuing times can be effectively reduced, and the issuing efficiency is improved.
The following describes the technical solution of the present invention and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a task scheduling method according to an embodiment of the present application, where the task scheduling method may be applied to the heterogeneous computer system shown in fig. 2. As shown in fig. 2, the host side 21 and the device side 22 form a heterogeneous computing system 2. In the host side 21, there are included: a processor 211 (e.g., a CPU) and a host memory 212. The device side 22 communicates with the host side 21, and in this embodiment, the device side may be a chip or a board, and the device side may include an artificial intelligence processor having a plurality of arithmetic units.
As shown in fig. 1, the execution body of this embodiment is the host side. The task scheduling method provided by the embodiment comprises the following steps:
step 101, a target graph task is obtained, wherein the target graph task comprises a plurality of target tasks.
The target graph task is a graph task which needs to be issued to the equipment end for execution. A plurality of target tasks are included in the target graph task. At least two of the plurality of target tasks have a dependency relationship. Illustratively, as shown in fig. 3, the target graph tasks include 5 target tasks, which are a, B, C, D, and E, respectively.
Wherein, B, C, D and A have dependency relationship, B, C, D depend on A. E has a dependency relationship with B, C and D, and E depends on B, C and D.
And step 102, splitting the target graph task into at least one target sub-graph task, and sending the target sub-graph task to the equipment end, so that the equipment end executes the target sub-graph task.
Each target sub-graph task comprises at least one target task, and the split target sub-graph tasks have no dependency relationship.
Specifically, in this embodiment, the target graph task may be split multiple times, at least one target sub-graph task is split each time, and the target sub-graph task is sent to the device side after at least one target sub-graph task is split each time. Because the split target sub-graph tasks have no dependency relationship, after the target sub-graph tasks are sent to the equipment end, the equipment end does not need to maintain the target sub-graph tasks and can directly execute the target sub-graph tasks. If the target sub-graph task comprises a target task, the target task can be directly executed. If the target subtask includes a plurality of target tasks, whether a dependency relationship exists among the plurality of target tasks is judged, if the dependency relationship does not exist, different processing units can be allocated to the target tasks, the different processing units process the target tasks in parallel, and the execution efficiency of the target tasks is improved. If the dependency relationship is determined to exist, at least one processing unit can be allocated to each target task according to a first-in first-out strategy so as to ensure that the execution sequence of each target task meets the dependency relationship. For example, if it is determined that a dependency exists, the target tasks may be assigned to the same processing unit; for another example, if it is determined that a dependency relationship exists, target tasks may be allocated to the same hardware queue, and the tasks in the hardware queue are executed according to an order-preserving manner according to a first-in first-out policy, so that the target tasks in the hardware queue may be sequentially allocated to the same or different processing units according to the first-in first-out order for execution.
According to the task scheduling method provided by the embodiment, a target graph task is obtained, and the target graph task comprises a plurality of target tasks; splitting the target graph task into at least one target sub-graph task, and sending the target sub-graph task to the equipment end so that the equipment end executes the target sub-graph task; each target sub-graph task comprises at least one target task, and the split target sub-graph tasks are independent of each other. Because the target sub-graph tasks have no dependency relationship, the device side does not need to maintain the target sub-graph tasks. And the split target sub-graph tasks comprise at least one target task, so that when the target sub-graph tasks are issued to the equipment end, compared with the sequential issuing of single tasks, the issuing times can be effectively reduced, and the issuing efficiency is improved.
Fig. 4 is a flowchart of a task scheduling method according to another embodiment of the present application, and as shown in fig. 4, the task scheduling method according to this embodiment is further refined with respect to step 102 in the embodiment shown in fig. 1, and then the task scheduling method according to this embodiment includes the following steps:
step 201, obtaining a target graph task, where the target graph task includes multiple target tasks.
In this embodiment, the implementation manner of step 201 is similar to that of step 101 in the first embodiment, and is not described in detail here.
Step 202, the following steps are executed in a loop until all the target sub-graph tasks are sent to the device side: and splitting the current target graph task into target sub-graph tasks, sending the target sub-graph tasks to the equipment end, and updating the node state corresponding to each target task in the current target graph task if a target sub-graph task execution completion response sent by the equipment end is received.
It should be noted that, in this embodiment, step 202 is a further refinement of step 102.
Firstly, splitting a target graph task into at least one target sub-graph task according to a preset splitting strategy. The split target sub-graph tasks have no dependency relationship with the rest target graph tasks. And sending the split target sub-graph task to the equipment end so that the equipment end executes the at least one target sub-graph task. And after the device end finishes executing at least one target sub-graph task, sending a target sub-graph task execution finish response to the host end. And the host updates the node state corresponding to the target task in the current target graph task according to the target sub-graph task execution completion response. Because the target graph task is split to obtain at least one target sub-graph task, the number of the target tasks in the current target graph task is smaller than the number of the target tasks in the target graph task when the target sub-graph task is not split. And after the node state corresponding to the target task in the current target graph task is updated, ending the first cycle operation. Then, the second circulation operation is performed. Specifically, at least one target sub-graph task is continuously split according to a preset splitting strategy, and the split target sub-graph task is sent to the device side. So that the device side executes the at least one target sub-graph task. And after the device end executes at least one target sub-graph task, sending a target sub-graph task execution completion response to the host end. And the host end updates the node state corresponding to the target task in the current target graph task according to the target sub-graph task execution completion response, and the second cycle operation is finished. And by analogy, after the Nth cycle is finished, if the last target sub-graph task is determined to be left in the target graph tasks, the last target sub-graph task is sent to the equipment end, and the cycle operation is stopped.
When the node state corresponding to the target task in the current target graph task is updated, the in-degree of the node state corresponding to the target task can be updated.
The preset splitting policy may be a breadth-first search policy, a depth-first search policy, or another splitting policy, which is not limited in this embodiment.
In this embodiment, since the next target sub-graph task is split only after the target sub-graph task sent to the host end is executed again each time, the target sub-graph task sent to the host end each time can be independently executed without depending on other unexecuted target sub-graph tasks, and it can be shown that there is no dependency relationship between the split target sub-graph tasks.
The task scheduling method provided in this embodiment splits a target graph task into at least one target sub-graph task, and sends the target sub-graph task to the device side, so that when the device side executes the target sub-graph task, the method specifically includes: circularly executing the following steps until all the target sub-graph tasks are sent to the equipment terminal: and splitting the current target graph task into target sub-graph tasks, sending the target sub-graph tasks to the equipment end, and updating the node state corresponding to each target task in the current target graph task if a target sub-graph task execution completion response sent by the equipment end is received. By the aid of the circulation operation, the target sub-graph tasks sent to the equipment end each time can be independently executed without depending on other unexecuted target sub-graph tasks, so that the target sub-graph tasks are sequentially executed according to the sending sequence of the target sub-graph tasks, and the target sub-graph tasks do not need to be maintained.
Fig. 5 is a flowchart of a task scheduling method according to still another embodiment of the present application, and as shown in fig. 5, in the task scheduling method according to this embodiment, in the above loop step, a current target graph task may be split into target sub-graph tasks according to a breadth-first search policy. In the task scheduling method provided in this embodiment, splitting the current target graph task into the target sub-graph tasks according to the breadth-first search policy includes the following steps:
step 2021, at least one target task with an in-degree of zero in the node states in the current target graph task is determined.
Step 2022, determining at least one target task with an in-degree of zero as a target sub-graph task.
Specifically, in this embodiment, when splitting the current target graph task into target sub-graph tasks according to the breadth-first search policy, at least one target task with a node state of zero in the target graph task is determined first, and the at least one target task with zero degree of in-degree is combined to form a target sub-graph task and is issued to the device side. And the rest target graph tasks have no target tasks with zero in-degree in the node state. And after the device end executes the target sub-graph task, sending a target sub-graph task execution completion response to the host end. And the host terminal updates the node state corresponding to the target task in the current target sub-graph task according to the target sub-graph task execution completion response, and the degree of income in the node state corresponding to at least one target task is zero. This completes the first cycle. And in the second circulation, continuously determining at least one target task with zero in-degree in the node states in the current target graph task, combining the at least one target task with zero in-degree to form a target sub-graph task, and issuing the target sub-graph task to the equipment terminal. And after the device end executes the target sub-graph task, the device end sends a target sub-graph task execution completion response to the host end. If the host side determines that the target tasks exist in the current target graph task, the node state corresponding to the target tasks in the current target subgraph task is continuously updated, and the income degree in the node state corresponding to at least one target task is zero. And completing the second circulation, and repeating the steps, wherein after the Nth circulation is finished, if the last target sub-graph task is determined to be left in the target graph tasks, the last target sub-graph task is sent to the equipment end, and the circulation operation is stopped.
This embodiment is described for clarity. The following is an exemplary description of the loop operation using fig. 6 as an example.
As shown in fig. 6, the target task with the node status of zero in the target graph task is only a, and a forms a target sub-graph task a1 and is issued to the device side. And after the device end finishes executing the target sub-graph task a1, sending a target sub-graph task execution finish response to the host end. And the host terminal updates the node states corresponding to B, C, D and E in the current target sub-graph task according to the response of the completion of the execution of the target sub-graph task, and the degree of income in the node states corresponding to B, C and D is updated from 1 to 0. And E still has the income degree of 3 in the corresponding node state. So far, the first cycle is completed. In the second cycle, the target tasks with the zero in-degree in the node states in the current target graph task are determined to be B, C and D, the B, C and D are combined to form a target sub graph task a2, and the target sub graph task a2 is issued to the equipment end. And after the device end executes the target sub-graph task a2, the device end sends a target sub-graph task execution completion response to the host end. And if the host end determines that the current target graph task has the target task E, updating the degree of income in the node state corresponding to the target task E from 3 to 0, and forming the target sub-graph task A3 by the E. And issuing the target sub-graph task a3 to the equipment side. And sending the target sub-graph tasks in the target graph tasks to the device side.
According to the task scheduling method provided by the embodiment, when the current target graph task is split into the target sub-graph task, the current target graph task is split into the target sub-graph task according to the breadth-first search strategy. When the current target graph task is divided into target sub-graph tasks according to the breadth-first search strategy, at least one target task with zero in-degree in the node states in the current target graph task is determined; and determining at least one target task with zero in degree as a target sub-graph task. The first mode of splitting the target graph task into the target sub-graph task is provided, and the target graph task can be accurately split into the target sub-graph task.
Fig. 7 is a flowchart of a task scheduling method according to a further embodiment of the present application, and as shown in fig. 7, in the task scheduling method according to the embodiment, in the above loop step, a current target graph task may be split into target sub-graph tasks according to a depth-first search strategy. In the task scheduling method provided in this embodiment, splitting the current target graph task into the target sub-graph task according to the depth-first search policy includes the following steps:
step 202a, splitting a target task with a node state of zero in the current target graph task.
In this embodiment, when splitting the current target graph task into target sub-graph tasks according to a depth-first search strategy, the number of entries in the node states corresponding to each target task in the current target graph task is first determined, and the target tasks with zero entries in the node states are split.
Step 202b, determining whether the target task with zero in-degree is followed by the corresponding single-chain task, if yes, executing step 202c, otherwise, executing step 202d.
As an alternative implementation manner, in this embodiment, the step 202b includes the following steps:
step 202b1, determining whether the in-out degrees of the node states corresponding to the intermediate target tasks associated with the target task with the in-degree of zero are all preset values, if yes, executing step 202b2, otherwise, executing step 202b3.
Step 202b2, it is determined that a corresponding single-chain task follows.
Step 202b3, it is determined that no corresponding single-chain task is followed.
Wherein the predetermined value may be 1.
In this embodiment, after splitting the target task with zero entry, the target task with zero entry continues to be searched along the depth direction of the current target graph task with the target task with zero entry as a starting point, and if the target task with zero entry has an associated intermediate target task, and the intermediate target task is a target task which is located in an intermediate position and depends on the entry with zero entry, the entry and the exit of the node state corresponding to the associated intermediate target task are determined. And judging whether the in-degree and the out-degree of the node state corresponding to at least one associated intermediate target task are both 1, and if the in-degree and the out-degree of the node state corresponding to at least one associated intermediate target task are both 1, determining that the target task with the in-degree of zero is followed by the corresponding single-chain task. And if the in-degree and the out-degree of the node state corresponding to at least one associated intermediate target task are not both 1, determining that the target task with the in-degree of zero is not followed by the corresponding single-chain task.
As an alternative embodiment, if the target task with zero in-degree is not associated with the intermediate target task, step 202b includes the following steps:
step 202b4, determining whether the degree of entry of the node state corresponding to the last target task node associated with the target task with the degree of entry being zero is a preset value, if so, executing step 202b5, otherwise, executing step 202b6.
Step 202b5, determining that a corresponding single-chain task follows.
Step 202b6, it is determined that no corresponding single-chain task is followed.
Specifically, in this embodiment, since step 202 is a process of a loop operation, after the node state is updated, the intermediate target task at the intermediate position in the original target graph task may have zero in-degree but no associated intermediate target task. As in fig. 3, after the degree of in-degree in the C and D update node states is zero, C and D have no associated intermediate goal tasks and only have associated last goal task. In this case, it is determined whether the degree of entry of the node state corresponding to the last target task node associated with the target task whose degree of entry is zero is 1. And if the number of the tasks is 1, determining that the target task with zero in-degree is followed by the corresponding single-chain task. And if the in-degree is not 1, determining that the target task with the in-degree of zero is not followed by the corresponding single-chain task.
And step 202c, determining the corresponding single-chain task, and determining the single-chain task as a target subgraph task.
As an alternative implementation manner, in this embodiment, in step 202c, the determining the corresponding single-chain task includes the following scheme:
determining a target task with the degree of entry of zero, an associated intermediate target task with the degree of entry of a node state being a preset numerical value and a last target task with the degree of entry of the node state being the preset numerical value as a single-chain task; or determining the target task with the in-degree of zero, the associated intermediate target task with the in-degree of a node state both being preset values or the last target task with the in-degree of the node state being preset values as a single-chain task.
Specifically, in this embodiment, if it is determined that the target task with the zero degree of entry is followed by the corresponding single-chain task, the target task with the zero degree of entry is taken as a starting point, and a search is performed along the depth direction corresponding to the current target graph task, and if the target task with the zero degree of entry exists in both the associated intermediate target task with the 1 degree of entry and exit in the node state and the last target task with the 1 degree of entry in the node state, the target task with the zero degree of entry, the associated intermediate target task with the 1 degree of entry and exit in one node state, and the last target task with the 1 degree of entry in the node state are determined as the single-chain task together. And if the target task with zero in-degree only has the associated intermediate target tasks with the node states with the in-degree of 1 and the last target task with the node states with the in-degree of 1, determining the target task with zero in-degree and the associated intermediate target task with the node states with the in-degree of 1 as the single-chain task. And if the target task with the zero in-degree does not have the associated intermediate target tasks with the 1 in-degree of the node state but has the last target task with the 1 in-degree of the node state, determining the target task with the zero in-degree and the last target task with the 1 in-degree of the node state as the single-chain task.
Step 202d, the target task with the zero in-degree is determined as a target sub-graph task.
In this embodiment, if it is determined that the target task with the zero in-degree is not followed by the corresponding single-chain task, the target task with the zero in-degree is determined as the target sub-graph task.
This embodiment is described for clarity. The following is an exemplary description of the loop operation using fig. 8 as an example.
As shown in fig. 8, the target task with a node state entry of zero in the target graph tasks is only a, and whether the target task a is followed by the corresponding single-chain task is determined. Specifically, the intermediate target tasks associated with the target task a are B, C, and D, respectively. And determining that the node state access degrees corresponding to the target tasks B, C and D are all 1, and indicating that the target task A follows the single-chain task. And judging whether the degree of the corresponding node state of the last target task E associated with the target task A is 1 or not, and determining that the degree of the corresponding node state of the target task E is 3 and not 1, so that A → B is determined as a single-chain task corresponding to the target task A. And issuing the A → B as a target sub-graph task B1 to the equipment terminal. And after the device end executes the target sub-graph task b1, sending a target sub-graph task execution completion response to the host end. And the host terminal updates the node states corresponding to C, D and E in the current target sub-graph task according to the response of the completion of the execution of the target sub-graph task, and the degree of income in the node states corresponding to C and D is updated from 1 to 0. And E, updating the degree of income in the node state corresponding to the E from 3 to 2. This completes the first cycle. In the second cycle, determining target tasks with zero in-degree in node states in the current target graph task as C and D, and judging whether the target task C is followed by a corresponding single-chain task. Specifically, the last target task associated with target task C is E. And determining that the node state degree corresponding to the target task E is 2 and not 1, so that the target task C is determined not to be followed by the single-chain task, and determining the target task C as a target subtask b2. And issuing the target sub-graph task b2 to the equipment side. And after the device end finishes executing the target sub-graph task b2, sending a target sub-graph task execution finish response to the host end. And the host terminal updates the node states corresponding to D and E in the current target sub-graph task according to the response of the completion of the execution of the target sub-graph task, and the degree of income in the node state corresponding to D is updated from 1 to 0. And E, updating the degree of in the node state corresponding to the E from 2 to 1. And D, determining the target task with zero in-degree in the node state in the current target graph task as D. And judging whether the target task D follows the corresponding single-chain task or not, wherein the last target task associated with the target task D is E. And determining that the node state corresponding to the target task E has the degree of income of 11, so that D → E forms a single-chain task. And D → E is taken as a target subtask b3 and is sent to the equipment side.
In another embodiment, as shown in fig. 8, the target task with a node state introductivity of zero in the target graph tasks is only a, and whether the target task a is followed by the corresponding single-chain task is determined. Specifically, the intermediate target tasks associated with the target task a are B, C, and D, respectively. In the first cycle, if the node state access degrees corresponding to the target tasks B, C and D are all 1, the target task A is followed by the single-chain task. And judging whether the degree of entry of the node state corresponding to the last target task E associated with the target task A is 1 or not, and determining that the degree of entry of the node state corresponding to the target task E is 3 and not 1, so that the A B is determined as a single-chain task corresponding to the target task A. And issuing the A B serving as a target sub-graph task B1 to the equipment terminal. And after the device end executes the target sub-graph task b1, sending a target sub-graph task execution completion response to the host end. And the host terminal updates the node states corresponding to C, D and E in the current target sub-graph task according to the target sub-graph task execution completion response, and the degree of income in the node states corresponding to C and D is updated from 1 to 0. And E, updating the degree of income in the node state corresponding to the E from 3 to 2. This completes the first cycle, which is consistent with the above embodiment. In the second cycle, determining that the target tasks with zero in-degree in the node states in the current target graph task are C and D, and respectively judging whether the target tasks C and D follow the corresponding single-chain tasks at the moment. Specifically, only the last target task associated with the target tasks C and D is E, and the node state in-degree corresponding to the target task E is 2 and is not 1, so that it is determined that the target tasks C and D are not followed by the single-chain task, and at this time, the target tasks C and D are respectively determined as target subtasks b2 and b3. And then the target sub-graph tasks b2 and b3 can be issued to the device side in sequence. The device side can execute the target sub-tasks b2 and b3 in parallel, and after the device side finishes executing the target sub-graph tasks b2 and b3, the device side respectively sends target sub-graph task execution finish responses to the host side. And the host terminal updates the node state corresponding to the E in the current target sub-graph task according to the target sub-graph task execution completion response, the degree of entry in the node state corresponding to the E is updated from 2 to 0, and the E is the last node, so that the node E is determined as a target sub-task b4 and is sent to the equipment terminal in the third cycle.
According to the task scheduling method provided by the embodiment, when the current target graph task is split into the target sub-graph task, the current target graph task is split into the target sub-graph task according to the depth-first search strategy. When the current target graph task is divided into the target sub-graph tasks according to the depth-first search strategy, the target tasks with the node states of zero in-degree are divided into the current target graph tasks; judging whether the target task with zero in-degree is followed by a corresponding single-chain task or not; if the corresponding single-chain task is determined to follow, determining the corresponding single-chain task, and determining the single-chain task as a target subgraph task; and if the corresponding single-chain task is not followed, determining the target task with the zero in-degree as the target sub-graph task. The second mode of splitting the target graph task into the target sub-graph task is provided, and the target graph task can be accurately split into the target sub-graph task.
Fig. 9 is a schematic structural diagram of a task scheduling device according to an embodiment of the present application, and as shown in fig. 9, the task scheduling device according to this embodiment is located in a host side. The task scheduling device 90 provided by the present embodiment includes: an acquisition module 91 and a scheduling module 92.
The obtaining module 91 is configured to obtain a target graph task, where the target graph task includes multiple target tasks. The scheduling module 92 is configured to split the target graph task into at least one target sub-graph task, and send the target sub-graph task to the device side, so that the device side executes the target sub-graph task; each target sub-graph task comprises at least one target task, and the split target sub-graph tasks have no dependency relationship.
The task scheduling apparatus provided in this embodiment may execute the technical solution of the method embodiment shown in fig. 1, and the implementation principle and the technical effect are similar, which are not described herein again.
Optionally, the scheduling module 92 is specifically configured to circularly execute the following steps until all target sub-graph tasks are sent to the device side:
and splitting the current target graph task into target sub-graph tasks, sending the target sub-graph tasks to the equipment end, and updating the node state corresponding to each target task in the current target graph task if a target sub-graph task execution completion response sent by the equipment end is received.
Optionally, the scheduling module 92 is specifically configured to split the current target graph task into the target sub-graph task according to a breadth-first search strategy when splitting the current target graph task into the target sub-graph task.
Optionally, when splitting the current target graph task into the target sub-graph tasks according to the breadth-first search policy, the scheduling module 92 is specifically configured to:
determining at least one target task with zero in-degree in node states in the current target graph task; and determining at least one target task with zero in degree as a target sub-graph task.
Optionally, the scheduling module 92 is specifically configured to split the current target graph task into the target sub-graph task according to a depth-first search policy when the current target graph task is split into the target sub-graph task.
Optionally, when splitting the current target graph task into the target sub-graph task according to the depth-first search policy, the scheduling module 92 is specifically configured to:
splitting a target task with a node state of zero in the current target graph task; judging whether the target task with zero in-degree is followed by a corresponding single-chain task or not; if the corresponding single-chain task is determined to follow, determining the corresponding single-chain task, and determining the single-chain task as a target sub-graph task; and if the corresponding single-chain task is not followed, determining the target task with the zero in-degree as the target sub-graph task.
Optionally, the scheduling module 92 is specifically configured to, when determining whether the target task with zero degree of income is followed by the corresponding single-chain task, determine whether the degree of income of the node state corresponding to the intermediate target task associated with the target task with zero degree of income is a preset numerical value; if the access degrees of the node states corresponding to the associated intermediate target tasks are determined to be preset values, determining that the corresponding single-chain tasks follow; and if the access degrees of the node states corresponding to the associated intermediate target tasks are not all preset values, determining that the corresponding single-chain task is not followed.
Optionally, the scheduling module 92, when determining whether the target task with the zero degree is followed by the corresponding single-chain task if the target task with the zero degree is not associated with the intermediate target task, is specifically configured to:
judging whether the in-degree of the node state corresponding to the last target task node associated with the target task with the in-degree of zero is a preset value or not; if the degree of entry of the node state corresponding to the last associated target task node is determined to be a preset value, determining that the corresponding single-chain task follows; and if the degree of entry of the node state corresponding to the last associated target task node is determined not to be a preset value, determining that the single-chain task is not followed.
Optionally, the scheduling module 92, when determining that the corresponding single-chain task is followed and determining that the corresponding single-chain task is followed, is specifically configured to:
if the corresponding single-chain task is determined to follow, determining a target task with zero in-degree, an associated intermediate target task with a node state in-and-out degree of a preset numerical value and a last target task with a node state in-degree of a preset numerical value as the single-chain task; or determining the target task with the in-degree of zero, the associated intermediate target task with the in-degree of a node state both being preset values or the last target task with the in-degree of the node state being preset values as a single-chain task.
The task scheduling apparatus provided in this embodiment may execute the technical solutions of the method embodiments shown in fig. 4, fig. 5, and fig. 7, and the implementation principles and technical effects thereof are similar, and are not described herein again.
Fig. 10 is a schematic structural diagram of a task scheduling apparatus according to another embodiment of the present application, and as shown in fig. 10, the task scheduling apparatus according to the embodiment of the present application includes: at least one processor 1001, memory 1002, and transceiver 1003;
the processor 1001, the memory 1002, and the transceiver are electrically interconnected.
The memory 1002 stores computer-executable instructions; a transceiver 1003; the device is used for receiving and transmitting data with the equipment end;
the at least one processor 1001 executes computer-executable instructions stored by the memory 1002 to cause the at least one processor to perform a task scheduling method as provided by any one of the embodiments of fig. 1, 4, 5, and 7.
In a possible implementation manner, a computer-readable storage medium is further disclosed, in which a computer program is stored, and when the computer program is executed by at least one processor, the computer program implements the task scheduling method provided in any one of the embodiments of fig. 1, fig. 4, fig. 5, and fig. 7.
In a possible implementation manner, a board card is also disclosed, and the board card can be a board card at an equipment end. Fig. 11 shows a schematic structural diagram of a board card 110 according to an embodiment of the present application. As shown in fig. 1, the board 110 includes a Chip 1101, which is a System-on-Chip (SoC) or System-on-Chip, and is integrated with one or more combined processing devices, which are artificial intelligence arithmetic units, for supporting various deep learning and machine learning algorithms, and meeting the intelligent processing requirements in the fields of computer vision, speech, natural language processing, data mining, and the like under complex scenes. Especially, the deep learning technology is widely applied to the cloud intelligent field, and one remarkable characteristic of the cloud intelligent application is that the input data size is large, and the requirements on the storage capacity and the computing capacity of the platform are high.
The chip 1101 is connected to an external device 1103 through an external interface device 1102. The external device 1103 is, for example, a server, a computer, a camera, a display, a mouse, a keyboard, a network card, or a wifi interface. The data to be processed may be transferred to the chip 1101 by the external device 1103 through the external interface device 1102. The results of the calculations of chip 1101 may be communicated back to external device 1103 via external interface device 1102. The external interface device 1102 may have different interface forms, such as a PCIe interface, according to different application scenarios.
The board 110 also includes a memory device 1104 for storing data, which includes one or more memory units 1105. The memory device 1104 is connected and data-transferred to the control device 1106 and the chip 1101 through a bus. Control device 1106 in board 110 is configured to regulate the state of chip 1101. For this reason, in an application scenario, the control device 1106 may include a single chip Microcomputer (MCU).
In a possible implementation manner, a combined processing device is further provided, and fig. 12 is a structural diagram illustrating the combined processing device in the chip 1101 of this embodiment. As shown in fig. 12, the combination processing means 120 includes a computing means 1201, an interface means 1202, a processing means 1203 and a storage means 1204.
The computing device 1201 is configured to perform user-specified operations, mainly implemented as a single-core smart processor or a multi-core smart processor, to perform deep learning or machine learning computations, which may interact with the processing device 1203 via the interface device 1202 to collectively perform the user-specified operations.
The interface means 1202 is used for transferring data and control instructions between the computing means 1201 and the processing means 1203. For example, the computing device 1201 may obtain input data from the processing device 1203 via the interface device 1202, and write the input data to a storage device on the computing device 1201 chip. Further, the computing device 1201 may obtain control instructions from the processing device 1203 via the interface device 1202, and write the control instructions into a control cache on the computing device 1201 chip. Alternatively or optionally, the interface means 1202 may also read data in storage means of the computing means 1201 and transmit them to the processing means 1203.
The processing device 1203, as a general-purpose processing device, executes basic control including, but not limited to, data transfer, and turning on and/or off of the computing device 1201. Depending on the implementation, the processing device 1203 may be one or more types of Central Processing Unit (CPU), graphics Processing Unit (GPU) or other general purpose and/or special purpose processor, including but not limited to a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc., and the number thereof may be determined according to actual needs. As previously mentioned, only computing device 1201 of the present application may be considered to have a single core structure or an isostructural multi-core structure. However, when considered together, the computing device 1201 and the processing device 1203 are considered to form a heterogeneous multi-core structure.
The storage device 1204 is used to store data to be processed, and may be a DRAM 1204, which is a DDR memory, and is typically 16G or larger in size, for storing data of the computing device 1201 and/or the processing device 1203.
Fig. 13 is a schematic diagram illustrating an internal structure of the computing apparatus 1201 as a single core. The single-core computing device 1301 is used for processing input data such as computer vision, voice, natural language, data mining, and the like, and the single-core computing device 1301 includes three modules: a control module 131, an operation module 132 and a storage module 133.
The control module 131 is used for coordinating and controlling operations of the operation module 132 and the storage module 133 to complete a deep learning task, and includes an Instruction Fetch Unit (IFU) 1311 and an Instruction Decode Unit (IDU) 1312. The instruction fetch unit 1311 is configured to fetch an instruction from the processing device 1203, and the instruction decoding unit 1312 decodes the fetched instruction and sends the decoded result to the operation module 132 and the storage module 133 as control information.
The operation module 132 includes a vector operation unit 1321 and a matrix operation unit 1322. The vector operation unit 1321 is used for performing vector operations, and can support complex operations such as vector multiplication, addition, nonlinear transformation, and the like; the matrix operation unit 1322 is responsible for the core calculation of the deep learning algorithm, i.e., matrix multiplication and convolution.
The storage module 133 is used to store or transport related data, and includes a neuron memory unit (NRAM) 1331, a parameter memory unit (weight RAM) 1332, and a Direct Memory Access (DMA) 1333.NRAM 1331 to store input neurons, output neurons, and intermediate results after computation; the WRAM 1332 is used for storing convolution kernels, namely weights, of the deep learning network; the DMA 1333 is connected to the DRAM 1204 via the bus 134 and is responsible for data transfer between the single-core computing device 1301 and the DRAM 1204.
Fig. 14 shows a schematic diagram of the internal structure of the computing apparatus 1201 as a multi-core. The multi-core computing device 1401 is designed in a hierarchical structure, and the multi-core computing device 1401 is a system on a chip and includes at least one cluster (cluster), each of which includes a plurality of processor cores, in other words, the multi-core computing device 1401 is constructed in a system on a chip-cluster-processor core hierarchy.
In a system-on-chip hierarchy, as shown in FIG. 14, a multi-core computing device 1401 includes an external storage controller 1401, a peripheral communication module 1402, an on-chip interconnect module 1403, a synchronization module 1404, and a plurality of clusters 1405.
There may be multiple external memory controllers 1401, 2 being shown for example, in the figure, which are used to access an external memory device, such as DRAM 1204 in fig. 12, in response to an access request issued by the processor core, to read data from or write data to off-chip. The peripheral communication module 1402 is configured to receive a control signal from the processing device 1203 via the interface device 1202, and enable the computing device 1201 to perform a task. On-chip interconnect module 1403 connects external memory controller 1401, peripheral communication module 1402, and plurality of clusters 1405 for transmitting data and control signals between the various modules. The synchronization module 1404 is a global synchronization barrier controller (GBC) for coordinating the operation progress of the clusters and ensuring the synchronization of information. The plurality of clusters 1405 are the computing cores of the multi-core computing device 1401, 4 are exemplarily shown in the figure, and as the hardware advances, the multi-core computing device 1401 of the present application may further include 8, 16, 64, or even more clusters 1405. The cluster 1405 is used to efficiently execute deep learning algorithms.
At the cluster level, as shown in fig. 4, each cluster 1405 includes a plurality of processor cores (IPU core) 1406 and a memory core (MEM core) 1407.
The processor cores 1406 are exemplarily shown in the figure as 4, and the number of the processor cores 1406 is not limited in the present application. The internal structure is shown in fig. 15. Each processor core 1406 is similar to the single core computing device 1301 of fig. 13, again including three major modules: a control module 151, an arithmetic module 152 and a storage module 153. The functions and structures of the control module 151, the operation module 152 and the storage module 153 are substantially the same as those of the control module 131, the operation module 132 and the storage module 133, and are not repeated herein. It is to be noted that the storage module 153 includes an input/output direct memory access (IODMA) module 1533 and a mobile direct memory access (MVDMA) module 1534.IODMA 1533 controls access of NRAM 1531/WRAM1532 to DRAM 1204 via broadcast bus 1409; MVDMA 1534 is used to control access to NRAM 1531/WRAM1532 and memory cell (SRAM) 1408.
Returning to fig. 14, the storage core 1407 is primarily used to store and communicate, i.e., to store shared data or intermediate results among the processor cores 1406, as well as to perform communications between the cluster 1405 and the DRAM 1204, communications among the clusters 1405 with each other, communications among the processor cores 1406 with each other, and so forth. In other embodiments, the memory core 1407 has the capability of scalar operations to perform scalar operations.
The memory core 1407 includes SRAM 1408, broadcast bus 1409, cluster Direct Memory Access (CDMA) 1410, and Global Direct Memory Access (GDMA) 1411. The SRAM 1408 plays a role of a high-performance data transfer station, data multiplexed between different processor cores 1406 in the same cluster 1405 does not need to be acquired to the DRAM 1204 by the processor cores 1406 respectively, but is transferred among the processor cores 1406 through the SRAM 1408, and the storage core 1407 only needs to rapidly distribute the multiplexed data from the SRAM 1408 to a plurality of processor cores 1406, so as to improve the inter-core communication efficiency and greatly reduce on-chip and off-chip input/output access.
Broadcast bus 1409, CDMA 1410, and GDMA 1411 are used to perform communication among processor cores 1406, cluster 1405, and cluster 1405 and DRAM 1204, respectively. As will be described separately below.
The broadcast bus 1409 is used to complete high-speed communication among the processor cores 1406 in the cluster 1405, and the broadcast bus 1409 of this embodiment supports inter-core communication modes including unicast, multicast and broadcast. Unicast refers to point-to-point (e.g., from a single processor core to a single processor core) data transfer, multicast is a communication that transfers a copy of data from SRAM 1408 to a particular number of processor cores 1406, and broadcast is a communication that transfers a copy of data from SRAM 1408 to all processor cores 1406, and is a special case of multicast.
CDMA 1410 is used to control access to SRAM 1408 between different clusters 1405 in the same computing device 1201.
GDMA 1411 cooperates with an external memory controller 1401 to control access of SRAM 1408 of cluster 1405 to DRAM 1204, or to read data from DRAM 1204 to SRAM 1408. As can be seen from the foregoing, communication between DRAM 1204 and NRAM 1531 or WRAM1532 can be achieved via 2 channels. The first channel is to directly contact DRAM 1204 with NRAM 1531 or WRAM1532 through IODAM 1533; the second channel is to transfer data between DRAM 1204 and SRAM 1408 via GDMA 1411 and then between SRAM 1408 and NRAM 1531 or WRAM1532 via MVDMA 1534. Although seemingly the second channel requires more components and the data flow is longer, in some embodiments, the bandwidth of the second channel is substantially greater than the first channel, and thus communication between DRAM 1204 and NRAM 1531 or WRAM1532 may be more efficient over the second channel. The embodiment of the application can select the data transmission channel according to the hardware condition.
In other embodiments, the functions of GDMA 1411 and IODMA 1533 may be integrated into the same component. For convenience of description, the GDMA 1411 and the IODMA 1533 are regarded as different components, and it is within the scope of the present application for those skilled in the art to achieve the same functions and achieve the same technical effects as the present application. Further, the functions of GDMA 1411, IODMA 1533, CDMA 1410 and MVDMA 1534 may be implemented by the same component.
The foregoing may be better understood in light of The following clauses:
clause 1, a task scheduling method, the method being applied to a host side, the method comprising:
acquiring a target graph task, wherein the target graph task comprises a plurality of target tasks;
splitting the target graph task into at least one target sub-graph task, and sending the target sub-graph task to an equipment end so that the equipment end executes the target sub-graph task;
each target sub-graph task comprises at least one target task, and the split target sub-graph tasks have no dependency relationship.
Clause 2, the method according to clause 1, where splitting the target graph task into at least one target sub-graph task and sending the target sub-graph task to a device side includes:
circularly executing the following steps until all the target sub-graph tasks are sent to the equipment terminal:
splitting a current target graph task into target sub-graph tasks, sending the target sub-graph tasks to an equipment end, and updating node states corresponding to the target tasks in the current target graph task if a target sub-graph task execution completion response sent by the equipment end is received.
Clause 3, the method according to clause 2, wherein the splitting the current target graph task into target sub-graph tasks includes:
and splitting the current target graph task into target sub-graph tasks according to a breadth-first search strategy.
The method of clause 4, according to clause 3, splitting the current target graph task into target sub-graph tasks according to a breadth-first search strategy, comprising:
determining at least one target task with zero in-degree in the node states in the current target graph task;
and determining at least one target task with zero in degree as a target sub-graph task.
Clause 5, according to the method of clause 2, the splitting the current target graph task into target sub-graph tasks includes:
and splitting the current target graph task into a target sub-graph task according to a depth-first search strategy.
Clause 6, the method according to clause 5, where splitting the current target graph task into target sub-graph tasks according to the depth-first search policy includes:
splitting a target task with a node state of zero in the current target graph task;
judging whether the target task with zero in-degree follows a corresponding single-chain task or not;
if the corresponding single-chain task is determined to follow, determining the corresponding single-chain task, and determining the single-chain task as a target sub-graph task;
and if the corresponding single-chain task is not followed, determining the target task with the zero in degree as the target sub-graph task.
Clause 7, the method according to clause 6, wherein the determining whether the target task with zero in-degree is followed by the corresponding single-chain task includes:
judging whether the in-out degrees of the node states corresponding to the intermediate target tasks associated with the target task with the in-out degree of zero are all preset numerical values;
if the access degrees of the node states corresponding to the associated intermediate target tasks are all preset values, determining that the corresponding single-chain tasks follow;
and if the access degrees of the node states corresponding to the associated intermediate target tasks are determined not to be all preset values, determining that the corresponding single-chain tasks are not followed.
Clause 8, according to the method described in clause 7, if the target task with zero in-degree is not associated with the intermediate target task, determining whether the target task with zero in-degree is followed by the corresponding single-chain task, including:
judging whether the in-degree of the node state corresponding to the last target task node associated with the target task with the in-degree of zero is a preset numerical value or not;
if the in-degree of the node state corresponding to the last associated target task node is determined to be a preset value, determining that a corresponding single-chain task follows;
and if the degree of entry of the node state corresponding to the last associated target task node is determined not to be a preset value, determining that the single-chain task is not followed.
Clause 9, the method of clause 8, wherein the determining the corresponding single-chain task comprises:
determining the target task with the degree of entry of zero, the associated intermediate target task with the degree of entry and exit of a node state both being preset values and the last target task with the degree of entry of the node state being a preset value as a single-chain task;
or determining the target task with the in-degree of zero, the associated intermediate target task with the in-degree of a node state both being preset values or the last target task with the in-degree of the node state being preset values as a single-chain task.
Clause 10, a task scheduling device, the device being located in a host side, the device comprising:
the acquisition module is used for acquiring a target graph task, wherein the target graph task comprises a plurality of target tasks;
the scheduling module is used for splitting the target graph task into at least one target sub-graph task and sending the target sub-graph task to an equipment end so that the equipment end can execute the target sub-graph task;
each target sub-graph task comprises at least one target task, and the split target sub-graph tasks have no dependency relationship.
Clause 11, a task scheduling apparatus comprising at least one processor, memory, and a transceiver;
the processor, the memory and the transceiver are interconnected through a circuit;
the memory stores computer-executable instructions; the transceiver is used for transceiving data with the equipment end;
execution of computer-executable instructions stored by the memory by the at least one processor causes the at least one processor to perform the method of any of clauses 1-9.
Clause 12, a computer-readable storage medium having stored therein a computer program which, when executed by at least one processor, implements the method of any of clauses 1-9.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that the acts and modules referred to are not necessarily required in this application.
It should be further noted that, although the steps in the flowchart are shown in sequence as indicated by the arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in the flowchart may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
It will be appreciated that the above described apparatus embodiments are merely illustrative and that the apparatus of the present application may be implemented in other ways. For example, the division of the units/modules in the above embodiments is only one logical function division, and there may be another division manner in actual implementation. For example, multiple units, modules, or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented.
In addition, unless otherwise specified, each functional unit/module in the embodiments of the present application may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may be integrated together. The integrated unit/module may be implemented in the form of hardware, or may also be implemented in the form of a software program module.
If the integrated unit/module is implemented in hardware, the hardware may be a digital circuit, an analog circuit, or the like. Physical implementations of hardware structures include, but are not limited to, transistors, memristors, and the like. The artificial intelligence processor may be any suitable hardware processor, such as a CPU, GPU, FPGA, DSP, ASIC, and the like, unless otherwise specified. Unless otherwise specified, the Memory unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as Resistive Random Access Memory (RRAM), dynamic Random Access Memory (DRAM), static Random Access Memory (SRAM), enhanced Dynamic Random Access Memory (EDRAM), high-Bandwidth Memory (HBM), hybrid Memory Cubic (HMC), and so on.
The integrated units/modules, if implemented in the form of software program modules and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solutions of the present application, which are essential or part of the technical solutions contributing to the prior art, or all or part of the technical solutions, may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. The technical features of the embodiments may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

Claims (12)

1. A task scheduling method is applied to a host side, and comprises the following steps:
acquiring a target graph task, wherein the target graph task comprises a plurality of target tasks;
splitting the target graph task into at least one target sub-graph task, and sending the target sub-graph task to an equipment end so that the equipment end executes the target sub-graph task;
each target sub-graph task comprises at least one target task, and the split target sub-graph tasks have no dependency relationship.
2. The method according to claim 1, wherein the splitting the target graph task into at least one target sub-graph task and sending the target sub-graph task to a device side includes:
circularly executing the following steps until all the target sub-graph tasks are sent to the equipment terminal:
splitting a current target graph task into target sub-graph tasks, sending the target sub-graph tasks to an equipment end, and updating node states corresponding to the target tasks in the current target graph task if a target sub-graph task execution completion response sent by the equipment end is received.
3. The method of claim 2, wherein splitting the current target graph task into target sub-graph tasks comprises:
and splitting the current target graph task into target sub-graph tasks according to a breadth-first search strategy.
4. The method of claim 3, wherein splitting the current target graph task into target sub-graph tasks according to the breadth-first search strategy comprises:
determining at least one target task with zero in-degree in the node states in the current target graph task;
and determining at least one target task with zero in degree as a target sub-graph task.
5. The method of claim 2, wherein the splitting the current target graph task into target sub-graph tasks comprises:
and splitting the current target graph task into target sub-graph tasks according to a depth-first search strategy.
6. The method of claim 5, wherein the splitting the current target graph task into target sub-graph tasks according to the depth-first search strategy comprises:
splitting a target task with a node state of zero in the current target graph task;
judging whether the target task with zero in-degree follows a corresponding single-chain task or not;
if the corresponding single-chain task is determined to follow, determining the corresponding single-chain task, and determining the single-chain task as a target sub-graph task;
and if the corresponding single-chain task is not followed, determining the target task with the zero in degree as the target sub-graph task.
7. The method of claim 6, wherein the determining whether the target task with zero in-degree is followed by the corresponding single-chain task comprises:
judging whether the in-out degrees of the node states corresponding to the intermediate target tasks associated with the target task with the in-out degree of zero are all preset numerical values;
if the access degrees of the node states corresponding to the associated intermediate target tasks are determined to be preset values, determining that the corresponding single-chain tasks follow;
and if the access degrees of the node states corresponding to the associated intermediate target tasks are determined not to be all preset values, determining that the corresponding single-chain tasks are not followed.
8. The method of claim 7, wherein if the target task with zero in-degree is not associated with an intermediate target task, determining whether the target task with zero in-degree is followed by a corresponding single-chain task comprises:
judging whether the in-degree of the node state corresponding to the last target task node associated with the target task with the in-degree of zero is a preset value or not;
if the degree of entry of the node state corresponding to the last associated target task node is determined to be a preset value, determining that the corresponding single-chain task follows;
and if the in-degree of the node state corresponding to the last associated target task node is determined not to be a preset value, determining that the single-chain task is not followed.
9. The method of claim 8, wherein determining the corresponding single-chain task comprises:
determining the target task with the degree of entry of zero, the associated intermediate target task with the degree of entry and exit of a node state both being preset values and the last target task with the degree of entry of the node state being a preset value as a single-chain task;
or determining the target task with the degree of in-degree of zero, the associated intermediate target task with the degree of in-degree of one node state both being preset values or the last target task with the degree of in-degree of the node state being a preset value as a single-chain task.
10. A task scheduling device, the device being located in a host side, the device comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a target graph task, and the target graph task comprises a plurality of target tasks;
the scheduling module is used for splitting the target graph task into at least one target sub-graph task and sending the target sub-graph task to an equipment end so that the equipment end can execute the target sub-graph task;
each target sub-graph task comprises at least one target task, and the split target sub-graph tasks have no dependency relationship.
11. A task scheduler comprising at least one processor, a memory and a transceiver;
the processor, the memory and the transceiver are interconnected through a circuit;
the memory stores computer execution instructions; the transceiver is used for transceiving data with the equipment end;
execution of the computer-executable instructions stored by the memory by the at least one processor causes the at least one processor to perform the method of any one of claims 1-9.
12. A computer-readable storage medium, in which a computer program is stored which, when being executed by at least one processor, carries out the method according to any one of claims 1 to 9.
CN202111108265.XA 2021-09-22 2021-09-22 Task scheduling method and device and related products Pending CN115904681A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111108265.XA CN115904681A (en) 2021-09-22 2021-09-22 Task scheduling method and device and related products

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111108265.XA CN115904681A (en) 2021-09-22 2021-09-22 Task scheduling method and device and related products

Publications (1)

Publication Number Publication Date
CN115904681A true CN115904681A (en) 2023-04-04

Family

ID=86486950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111108265.XA Pending CN115904681A (en) 2021-09-22 2021-09-22 Task scheduling method and device and related products

Country Status (1)

Country Link
CN (1) CN115904681A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116795519A (en) * 2023-08-25 2023-09-22 江苏盖睿健康科技有限公司 Internet-based remote intelligent debugging method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116795519A (en) * 2023-08-25 2023-09-22 江苏盖睿健康科技有限公司 Internet-based remote intelligent debugging method and system
CN116795519B (en) * 2023-08-25 2023-12-05 江苏盖睿健康科技有限公司 Internet-based remote intelligent debugging method and system

Similar Documents

Publication Publication Date Title
CN112799726B (en) Data processing device, method and related product
WO2024093292A1 (en) Automatic operator fusion method for computational graph and related product
CN110059797B (en) Computing device and related product
CN114035916A (en) Method for compiling and scheduling calculation graph and related product
CN112686379A (en) Integrated circuit device, electronic equipment, board card and calculation method
CN115904681A (en) Task scheduling method and device and related products
CN113434814B (en) Matrix multiplication operation method based on neural network and related device
CN114595813A (en) Heterogeneous acceleration processor and data calculation method
CN113434813A (en) Matrix multiplication method based on neural network and related device
CN113704156A (en) Sensing data processing device, board card, system and method
CN112948001A (en) Method for setting tensor hardware configuration, readable storage medium and device
CN115794604B (en) Data generation method, device, equipment, medium and program product
CN118035618B (en) Data processor, data processing method, electronic device, and storage medium
WO2023045478A1 (en) Graph task scheduling method, execution-end device, storage medium, and program product
CN113742266B (en) Integrated circuit device, electronic apparatus, board and computing method
CN115904643A (en) Graph task scheduling method, execution device, storage medium, and program product
CN113469328B (en) Device, board, method and readable storage medium for executing revolution passing
CN113792867B (en) Arithmetic circuit, chip and board card
CN113469327B (en) Integrated circuit device for performing rotation number advance
CN113469365A (en) Inference and compilation method based on neural network model and related products thereof
CN115904682A (en) Graph task scheduling method, device and storage medium
CN115878299A (en) Graph task scheduling method and device and related products
CN118210552A (en) Instruction generation method, device and storage medium
Deng et al. Design and Implementation of SSSP Accelerator based-on Reconfigurable and Programmable Computing Array
CN115495157A (en) Grouping operation device, method, integrated circuit chip, calculation device and board card

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination