WO2021139726A1 - Task migration method and apparatus, and computer device and readable storage medium - Google Patents

Task migration method and apparatus, and computer device and readable storage medium Download PDF

Info

Publication number
WO2021139726A1
WO2021139726A1 PCT/CN2021/070663 CN2021070663W WO2021139726A1 WO 2021139726 A1 WO2021139726 A1 WO 2021139726A1 CN 2021070663 W CN2021070663 W CN 2021070663W WO 2021139726 A1 WO2021139726 A1 WO 2021139726A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
node
target
migratable
executed
Prior art date
Application number
PCT/CN2021/070663
Other languages
French (fr)
Chinese (zh)
Inventor
高燕强
柴庆龙
张鑫宇
徐远超
Original Assignee
中科寒武纪科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202010012242.8A external-priority patent/CN113157427B/en
Priority claimed from CN202010012302.6A external-priority patent/CN113157403A/en
Application filed by 中科寒武纪科技股份有限公司 filed Critical 中科寒武纪科技股份有限公司
Publication of WO2021139726A1 publication Critical patent/WO2021139726A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • This application relates to the field of computer technology, and in particular to a method, device, computer equipment, and readable storage medium for task migration, and a method, device, computer equipment, and readable storage medium for job processing.
  • NUMA Non Uniform Memory Access Architecture
  • a chip based on the NUMA architecture usually includes a processor with multiple arithmetic units and multiple storage units. Among them, a plurality of arithmetic units are usually divided into a plurality of arithmetic unit groups, and each arithmetic unit group is equipped with at least one storage unit, and an arithmetic unit group and its corresponding storage unit constitute a node. In this way, the reading and writing of data required by the arithmetic unit in a node can be realized through the storage unit in the node.
  • the task or job to be executed needs to be assigned to a certain node for execution, but there are still problems with the task or job processing at present.
  • the present application provides a task migration method, device, computer equipment, and readable storage medium.
  • a method for task migration includes:
  • a target node that matches the migratable task is determined in each node according to the task attribute of the migratable task, and the task attribute includes executing the migratable task.
  • Migrating the migratable task to the target node so as to execute the migratable task through the target node.
  • a device for task migration includes:
  • the first determining module is configured to determine a target node matching the migratable task in each node according to the task attribute of the migratable task when it is detected that the migratable task meets the preset migration condition, and the task The attribute includes the target number of arithmetic units required to execute the migratable task;
  • the migration module is configured to migrate the migratable task to the target node, so as to execute the migratable task through the target node.
  • a computer device including a memory and a processor
  • the memory stores a computer program that can run on the processor, and is characterized in that the processor implements any of the methods described above when the processor executes the computer program step.
  • a computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, it realizes the steps of any one of the above-mentioned methods.
  • This application provides a method, device, computer equipment, and readable storage medium for task migration.
  • the target node that matches the migratable task is determined in each node according to the task attribute of the migratable task.
  • the task attribute includes the target number of arithmetic units required to execute the migratable task. Then, the CPU migrates the migratable task to the target node to execute the migratable task through the target node.
  • the CPU can perform the migratable task Migrate to the target node, thereby reducing the waiting time of the migratable task and improving the execution efficiency of the migratable task.
  • This application also provides a method, device, computer equipment and readable storage medium for job processing.
  • a method for job processing comprising:
  • the first node matching the target job is determined in each node according to the job attributes of the target job contained in the target task, and the job attributes include the operations required to execute the target job The target number of units;
  • the target job included in the target task is executed through the first node and the second node where the arithmetic unit that executes the target task is located.
  • the determining the first node matching the target job in each node according to the job attribute of the target job included in the target task includes:
  • the node For each of the nodes, if the number of free arithmetic units in the node is greater than or equal to the target number, then the node is determined as the first node.
  • the method further includes:
  • the Methods when the preset processing conditions are met, according to the job attributes of the target job contained in the target task, before the first node matching the target job is determined in each node, the Methods also include:
  • the target task is added to the list of splittable tasks
  • the method before the execution of the target job included in the target task through the first node and the second node where the arithmetic unit that executes the target task is located, the method further includes:
  • the position corresponding to the first node is 1.
  • the method before the execution of the target job included in the target task through the first node and the second node where the arithmetic unit that executes the target task is located, the method further includes:
  • the bits corresponding to the first node and the second node are both 1, then the passing of the first node and the execution of the target are executed.
  • the second node where the computing unit of the task is located executes the steps of the target job included in the target task.
  • the affinity mask of the target job is the same as the affinity mask of the target task
  • the usage mask of the target job is the same as the usage mask of the target task.
  • a device for job processing comprising:
  • the first determining module is used to determine the first node that matches the target job in each node according to the job attributes of the target job contained in the target task when the preset processing conditions are met, and the job attributes include the execution of the target job. State the target number of arithmetic units required for the target operation;
  • the execution module is configured to execute the target job included in the target task through the first node and the second node where the arithmetic unit that executes the target task is located.
  • a computer device includes a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the steps of any one of the methods when the computer program is executed.
  • a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of any one of the methods.
  • the embodiments of the present application provide a method, device, computer equipment, and readable storage medium for job processing.
  • the CPU determines the first node matching the target job among the nodes according to the job attributes of the target job included in the target task.
  • the job attribute includes the target number of arithmetic units required to execute the target job. Then, the CPU executes the target job included in the target task through the first node and the second node where the arithmetic unit that executes the target task is located.
  • the CPU can jointly execute the target job through the first node and the second node, thereby reducing the waiting time of the target job, Improve the execution efficiency of the target job.
  • Figure 1-1 is a schematic diagram of an intelligent processor provided by an embodiment of the application.
  • Figure 1-2 is a schematic flowchart of a task migration method provided by an embodiment of the application.
  • Figures 1-3 are schematic structural diagrams of a task migration device provided by an embodiment of this application.
  • Figure 1-4 is a schematic structural diagram of a computer device provided by an embodiment of this application.
  • Figure 2-1 is a schematic flowchart of a method for job splitting and affinity mask modification provided by an embodiment of the application
  • Figure 2-3 is a schematic flowchart of a method for determining processing conditions provided by an embodiment of the application
  • Figures 2-4 are schematic structural diagrams of a job processing device provided by an embodiment of the application.
  • the term “if” can be interpreted as “when” or “once” or “in response to determination” or “in response to detection” depending on the context.
  • the phrase “if determined” or “if detected [described condition or event]” can be interpreted as meaning “once determined” or “in response to determination” or “once detected [described condition or event]” depending on the context ]” or “in response to detection of [condition or event described]”.
  • the task to be executed needs to be allocated to a node for execution.
  • the specific allocation process is: first determine the memory size required to execute the task, and then determine the remaining memory space according to the memory unit corresponding to each node The target node that meets the memory size. For example, the node with the largest remaining memory space may be used as the target node, or, among the nodes with the remaining memory space greater than the memory size, a node may be randomly selected as the target node. Then, based on the affinity binding principle, the task is assigned to the target node for execution.
  • the embodiment of the present application provides a task migration method.
  • the method can be applied to a chip.
  • the chip may include at least one processor.
  • the chip may have heterogeneous multiprocessors.
  • the chip may include an intelligent processor with a NUMA architecture and a general-purpose processor.
  • the general-purpose processor may be a CPU (central processing unit, central processing unit), and the intelligent processor may be an accelerator or an IPU (Intelligent Processing Unit, Intelligent processing unit), or GPU (Graphics Processing Unit, graphics unit), may also be other types of intelligent processors, which are not limited in the embodiment of the present application.
  • the method can be applied to a chip, and a CPU (central processing unit, central processing unit) in the chip can execute the task migration method described above to schedule multiple tasks to an intelligent processor for processing.
  • a CPU central processing unit, central processing unit
  • the intelligent processor of the chip can also execute the above-mentioned task migration method.
  • the specific execution process of the task migration method in the embodiment of the present application please refer to the following description.
  • the intelligent processor of the NUMA architecture further includes a processor with multiple arithmetic units and multiple storage units.
  • Multiple arithmetic units are usually divided into multiple arithmetic unit groups, and each arithmetic unit group is equipped with at least one storage unit, and an arithmetic unit group and its corresponding storage unit constitute a node.
  • the reading and writing of data required by the arithmetic unit in a node can all be realized through the storage unit in the node, and the reading and writing of data between different nodes is realized through the communication interface.
  • Figure 1-1 is a schematic diagram of an intelligent processor with a NUMA architecture provided by an embodiment of the application.
  • the smart processor contains 16 arithmetic units and 4 storage units.
  • the smart processor is divided into 4 nodes, and each node contains 4 arithmetic units and 1 storage unit.
  • Figure 1-1 only provides a schematic diagram of an intelligent processor in a schematic manner.
  • each node may also include more than four arithmetic units and one storage unit, and the storage unit may include multiple Sub-storage unit.
  • each node may include four sub-nodes, that is, each node may include 16 arithmetic units.
  • Each sub-node contains four arithmetic units and one sub-storage unit, and the arrangement of the four sub-nodes can be arranged in the manner of four nodes. Further, the above-mentioned task allocation method can be executed among the sub-nodes of a single node, and the execution process of the method can be detailed in the description of the task allocation method below.
  • the processor can allocate the task expected by the task in the node to which the storage unit storing the task data of the task belongs according to the number of computing units required to execute the task Arithmetic unit, and add 1 to the wait reference count (ie, clu_wait_ref) of each arithmetic unit expected by the task. For example, as shown in Figure 1-1, the number of arithmetic units required for this task is 2, and the storage unit storing the task data of this task is storage unit 1. Then the processor can combine the arithmetic unit 1 and the arithmetic unit 1 in node 1. Unit 2 is determined as the arithmetic unit expected by the task, and the waiting reference counts of arithmetic unit 1 and arithmetic unit 2 are incremented by one.
  • the processor determines the arithmetic unit that executes the task
  • the task is scheduled to the hardware queue, and the real reference count (ie, clu_real_ref) of each arithmetic unit that executes the task is incremented by 1.
  • the processor can add 1 to the true reference counts of arithmetic unit 1 and arithmetic unit 2.
  • the waiting reference count of each arithmetic unit expected by the task is decreased by 1, and at the same time, the true reference count of each arithmetic unit that executes the task is decreased by 1.
  • the processor may decrement the waiting reference count and the true reference count of the arithmetic unit 1 and the arithmetic unit 2 by one. If the arithmetic unit expected by the task migrates, the waiting reference count of each source arithmetic unit expected by the task is decreased by 1, and the waiting reference count of each destination arithmetic unit expected by the task is increased by 1.
  • the arithmetic unit expected by the task is migrated from arithmetic unit 1 and arithmetic unit 2 to arithmetic unit 3 and arithmetic unit 4. Subtract 1 and add 1 to the waiting reference counts of arithmetic unit 3 and arithmetic unit 4.
  • Step 1 Obtain the target task to be executed, and determine the task type of the target task, the task execution time, the minimum cross-node memory access delay of the node to which the third arithmetic unit belongs to the target task, and the expected execution in the third arithmetic unit The number of tasks.
  • the processor needs to determine whether a certain task (that is, the target task) is a migratable task.
  • the processor can obtain the task type of the target task, the task execution duration, the minimum cross-node memory access delay of the node to which the third arithmetic unit is expected for the target task, and the expected execution time of the task in the third arithmetic unit. The number (that is, the waiting reference count of the third arithmetic unit) and so on.
  • the task types can include memory-intensive (that is, there are many I/O (Input/Output) instructions in the task, and tasks that require frequent reading and writing of data in the storage unit during execution) and computationally intensive (that is, There are many calculation instructions in the task, and tasks that require a large amount of computing resources when executed), and may also include other task types, which are not limited in the embodiment of the present application.
  • the processor can determine whether the task type of the target task is computationally intensive, whether the task execution time of the target task is greater than the minimum cross-node memory access delay, and whether the waiting reference count of the third arithmetic unit is greater than or equal to the third The preset number threshold.
  • the third preset number threshold can be set by a technician based on experience.
  • Step 2 If the task type is computationally intensive, and/or the task execution time is greater than the minimum cross-node memory access delay, and/or the number of tasks expected to be executed in the third arithmetic unit is greater than or equal to the third preset number threshold, The target task is determined to be a transferable task, and the affinity mask of the target task is modified according to the preset affinity mask modification rule.
  • the processor can determine that the target task is a migratable task. Then, the processor can modify the affinity mask of the target task according to the preset affinity mask modification rule.
  • the affinity mask of the target task (affinity) is used to indicate the nodes that can execute the target task in each node, and the affinity mask includes the total number of nodes contained in the intelligent processor.
  • the bit uniquely corresponds to a node, if a bit is 1, it means that the node corresponding to the bit can perform the target task, if a bit is 0, it means that the node corresponding to the bit cannot perform the target task; affinity
  • affinity The mask modification rule can be set by the technician according to the migration scope of the migratable task.
  • the affinity mask modification rule is that a migratable task can be migrated to all nodes, and the original affinity mask of the target task is 0001. If the target task is a migratable task, the processor can be based on the affinity Mask modification rules, modify the affinity mask of the target task to 1111. For another example, the affinity mask modification rule is that the migratable task can be migrated to node 3 and node 4. The original affinity mask of the target task is 0001. If the target task is a migratable task, the processor can follow Affinity mask modification rules, modify the affinity mask of the target task to 1101.
  • Step 1 If the task attribute of the task executed in the second computing unit expected by the transferable task is different from the task attribute of the transferable task, it is determined that the transferable task meets the preset migration condition.
  • the arithmetic unit when a certain arithmetic unit is assigned to perform a certain task, the arithmetic unit can only execute tasks with the same task attributes as the task attributes. Among them, the task attribute is the number of arithmetic units required to execute the task.
  • the processor can obtain the task attributes of the migratable task and the task attributes of the tasks executed in the second computing unit expected by the migratable task. Then, the processor can determine whether the task attribute of the task executed in the second computing unit is the same as the task attribute of the migratable task.
  • the processor can determine that the migratable task meets the preset migration condition. In this way, subsequent processors can migrate the migratable task to other nodes that can execute the migratable task. If the task attribute of the task executed in the second computing unit is the same as the task attribute of the migratable task, the processor executes step two.
  • Step 2 If the task attributes of the tasks executed in the second arithmetic unit are the same as the task attributes of the transferable tasks, it is determined whether the total number of tasks to be executed in the second arithmetic unit is greater than or equal to the first preset number threshold.
  • the processor may further determine whether the total number of tasks to be executed in the second arithmetic unit (that is, the true reference count of the second arithmetic unit) is greater than or equal to the first preset number threshold.
  • the first preset number threshold can be set by a technician based on experience.
  • the processor determines that the migratable task is not Meet the preset migration conditions. If the total number of tasks to be executed in the second computing unit is greater than or equal to the first preset number threshold, the processor executes step three.
  • Step 3 If the total number of tasks to be executed in the second computing unit is greater than or equal to the first preset number threshold, it is determined that the migratable tasks meet the preset migrating condition.
  • the processor can determine that the migratable task satisfies the preset migration condition, so that the processor migrates the migratable task to other nodes, thereby reducing the waiting time of the migratable task and improving the execution efficiency of the migratable task . If the total number of tasks to be executed in the second arithmetic unit is less than the first preset number threshold, it means that the migratable task can be executed by the second arithmetic unit without waiting a long time. Correspondingly, the processor may determine that the migratable task does not meet the preset migrating condition.
  • Step 201 When it is detected that the migratable task meets the preset migration condition, a target node that matches the migratable task is determined in each node according to the task attribute of the migratable task.
  • the task attribute includes the target number of arithmetic units required to execute the migratable task.
  • the processor can determine whether the task is a migratable task. If the task is a migratable task, the processor may further detect whether the migratable task meets the preset migration condition. When the processor detects that the migratable task meets the preset migration condition, it can determine the target node matching the migratable task among the nodes according to the task attributes of the migratable task. Wherein, the task attribute of the migratable task includes the target number of computing units required to execute the migratable task.
  • the target number of arithmetic units required by the target task may be represented by a task identifier, and the task identifier may be a Block task or a Union task, etc., which is not specifically limited here.
  • the task identifier is a Union task
  • the task identifier is a Block task, it indicates that 1 arithmetic unit is required to run the target task.
  • the processor determines the specific processing process of the target node matching the migratable task in each node as follows.
  • Step 1 If there is a candidate node containing the target number of free computing units in each node, among the candidate nodes, the candidate node with the smallest distance from the node to which the computing unit expected by the migratable task belongs is determined as Target node.
  • the processor when it detects that the migratable task meets the preset migration condition, it can firstly determine whether there are candidate nodes containing the target number of free computing units (that is, computing units with a true reference count equal to 0) in each node. . If there is a candidate node, the processor can determine the candidate node with the smallest distance from the node to which the computing unit expected by the migratable task belongs among the candidate nodes as the target node. In this way, the subsequent processor can determine the candidate node as the target node. The migration task is migrated to the target node, and the migratable task is executed through the idle computing unit in the target node, thereby reducing the waiting time of the migratable task and improving the execution efficiency of the migratable task.
  • the target number of free computing units that is, computing units with a true reference count equal to 0
  • the node to which the computing unit expected by the migratable task belongs is node 1, the candidate nodes are node 1 and node 2, and the distances between node 1 and node 1 and node 2 are 0 and 1, respectively, then the target node is node 1.
  • the node to which the computing unit expected by the migratable task belongs is node 1, the candidate nodes are node 2 and node 4, and the distances between node 1 and node 2 and node 4 are 1 and 2, respectively, then the target node is Node 2.
  • the processor can determine the number of candidate nodes with the smallest distance in descending order or descending order of the node identifiers.
  • Target node if there are multiple candidate nodes with the smallest distance among the candidate nodes, one of the above multiple candidate nodes can be randomly selected as the target node, which is not specifically limited here.
  • Step 2 If there is no candidate node containing the target number of idle arithmetic units in each node, the node containing the first arithmetic unit with the smallest total number of tasks to be executed is determined as the target node.
  • the first operation unit is an operation unit whose task attribute of the executed task is the same as the task attribute of the transferable task.
  • the arithmetic unit when a certain arithmetic unit is assigned to perform a certain task, the arithmetic unit can only execute tasks with the same task attributes as the task attributes. Based on the above principle, if there is no candidate node containing the target number of idle arithmetic units in each node, the processor can further determine in each node that the task attribute of the executed task is the same as the task attribute of the migratable task. Operation unit. Then, the processor may determine the first arithmetic unit with the smallest total number of tasks to be executed (that is, the smallest true reference count) in each first arithmetic unit, and determine the first arithmetic unit with the smallest total number of tasks to be executed.
  • the node to which the arithmetic unit belongs is used as the target node.
  • the subsequent processor can migrate the migratable task to the target node, and execute the migratable task through the target computing unit in the target node, thereby reducing the waiting time of the migratable task and improving the execution efficiency of the migratable task .
  • the number of arithmetic units required to perform a migratable task is 3, and the number of arithmetic units required for the task executed in node 1 is 3 arithmetic units are arithmetic unit 1 to arithmetic unit 3, and arithmetic unit 1 to arithmetic unit
  • the total number of tasks to be executed in 3 is 10, and the number of arithmetic units required for tasks to be executed in node 2 is 3 arithmetic units are arithmetic units 6 to 8, and arithmetic units 6 to 8 are to be executed
  • the total number of tasks is 15, and the number of arithmetic units required for tasks executed in node 4 is 3.
  • the arithmetic units are arithmetic units 13 to 15 and the total number of tasks to be executed from arithmetic units 13 to 15 are 5 ,
  • the target node is the node 4 to which the computing unit 13 to the computing unit 15 belong.
  • the migration of tasks will affect the execution of other tasks. Therefore, when the processor detects that the migratable task meets the preset migration conditions, according to the task attributes of the migratable task, before determining the target node matching the migratable task in each node, the processor can determine the Whether the arithmetic unit has uneven load, the specific processing process is as follows.
  • Step 1 Obtain the number of tasks expected to be executed in each arithmetic unit.
  • the processor can obtain the number of tasks expected to be executed in each arithmetic unit (that is, the waiting reference count of each arithmetic unit). Then, the processor can determine the maximum waiting reference count and the minimum waiting reference count among the waiting reference counts of each arithmetic unit. After that, the processor may calculate the difference between the maximum waiting reference count and the minimum waiting reference count (that is, the maximum difference), and determine whether the maximum difference is greater than or equal to the second preset number threshold.
  • the second preset number threshold can be set by a technician based on experience. If the maximum difference is less than the second preset number threshold, it means that the computing unit in the intelligent processor has no load unevenness, and the processor does not need to perform task migration. If the maximum difference is greater than or equal to the second preset number threshold, it means that the computing units in the smart processor are not evenly loaded, and the processor executes step two.
  • Step 2 If the maximum difference between the number of tasks expected to be executed in each arithmetic unit is greater than or equal to the second preset number threshold, when it is detected that the migratable task meets the preset migrating condition, according to the task of the migratable task Properties, in each node, determine the target node that matches the migratable task.
  • the processor detects that the migratable task meets the preset migration condition, it will be based on the task attribute of the migratable task. , In each node, determine the target node that matches the migratable task. Wherein, when the processor detects that the migratable task meets the preset migration condition, according to the task attributes of the migratable task, in each node, the process of determining the target node matching the migratable task is similar to step 201, here No longer.
  • Step 202 Migrate the migratable task to the target node, so as to execute the migratable task through the target node.
  • the processor determines the target node, it can migrate the migratable task to the target node, so as to execute the migratable task through the target node.
  • the processor may also modify the use mask of the migratable task.
  • the specific processing process is: if the target node and the migratable task expect The nodes where the arithmetic units are located are not the same, then in the use mask of the migratable task, the position corresponding to the target node is set to 1, and the position corresponding to the node where the arithmetic unit expected by the migratable task is located is 0.
  • the usage_mask of the migratable task is used to indicate the node that determines the execution of the migratable task in each node.
  • the usage mask includes the total number of bits contained in the chip, and each bit uniquely corresponds to For a node, if a bit is 1, it means that the node corresponding to the bit is determined to execute the migratable task, and if a bit is 0, it means that the node corresponding to the bit does not execute the migratable task.
  • the processor determines the target node of the migratable task, it can determine whether the target node is the same as the node where the computing unit expected by the migratable task is located.
  • the processor does not need to modify the use mask of the migratable task. If the target node is not the same as the node where the computing unit expected by the migratable task is located, the processor can set the position corresponding to the target node to 1 in the use mask of the migratable task, and set the desired position of the migratable task. The position corresponding to the node where the arithmetic unit is located is 0. For example, if the node where the computing unit expected by the migratable task is located is node 1, then the original usage mask of the migratable task is 0001. Assuming that the target node is node 2, the modified usage mask of the migratable task is 0010.
  • the processor can also determine whether the migratable task can be migrated to the target node according to the affinity mask and the usage mask of the migratable task.
  • the specific processing process is: if the bits corresponding to the target node in the affinity mask and the usage mask of the migratable task are both 1, then the migratable task is migrated to the target node.
  • the processor can determine whether the bits corresponding to the target node in the affinity mask and the usage mask of the migratable task are both 1. If the bits corresponding to the target node in the affinity mask and the usage mask of the migratable task are both 1, it means that the migratable task can be migrated to the target node. Correspondingly, the processor can migrate the migratable task to the target node. If the position corresponding to the target node in the affinity mask of the migratable task is 0, it means that the migratable task cannot be migrated to the target node.
  • the target node that matches the migratable task is determined in each node according to the task attribute of the migratable task.
  • the task attribute includes the target number of arithmetic units required to execute the migratable task.
  • the processor migrates the migratable task to the target node to execute the migratable task through the target node.
  • the processor can perform the migratable task.
  • the task is migrated to the target node, thereby reducing the waiting time of the migratable task and improving the execution efficiency of the migratable task.
  • the embodiment of the present application also provides a device for task migration. As shown in Figures 1-3, the device includes:
  • the first determining module 310 is configured to, when it is detected that the migratable task meets the preset migration condition, determine the target node matching the migratable task in each node according to the task attribute of the migratable task, and the task attribute includes executing the migratable task.
  • the migration module 320 is configured to migrate the migratable task to the target node, so as to execute the migratable task through the target node.
  • the first determining module 310 is specifically configured to:
  • the candidate node with the smallest distance from the node to which the computing unit expected by the migratable task belongs is determined as the target node;
  • the node containing the first arithmetic unit with the smallest total number of tasks to be executed is determined as the target node, and the first arithmetic unit is the task to be executed
  • the task attribute of the task attribute is the same as the task attribute of the transferable task.
  • the device further includes:
  • the second determining module is configured to determine that the migratable task meets the preset migration condition if the task attribute of the task executed in the second computing unit expected by the migratable task is different from the task attribute of the migratable task;
  • a judging module for judging whether the total number of tasks to be executed in the second arithmetic unit is greater than or equal to the first preset number threshold if the task attributes of the tasks executed in the second arithmetic unit are the same as those of the migratable tasks ;
  • the third determining module is configured to determine that if the total number of tasks to be executed in the second computing unit is greater than or equal to the first preset number threshold, the migratable tasks meet the preset migrating condition.
  • the device further includes:
  • the obtaining module is used to obtain the number of tasks expected to be executed in each arithmetic unit
  • the fourth determining module is configured to trigger the first determining module 310 to execute when it is detected that the migratable task meets the predetermined threshold if the maximum difference between the number of tasks expected to be executed in each arithmetic unit is greater than or equal to the second preset number threshold.
  • the device further includes:
  • the fifth determination module is used to obtain the target task to be executed, and determine the task type of the target task, the task execution time, the minimum cross-node memory access delay of the node to which the third operation unit belongs and the third operation expected by the target task The number of tasks expected to be performed in the unit;
  • Modification module used if the task type is computationally intensive, and/or the task execution time is greater than the minimum cross-node memory access delay, and/or the number of tasks expected to be executed in the third arithmetic unit is greater than or equal to the third preset number Threshold, the target task is determined to be a transferable task, and the affinity mask of the target task is modified according to the preset affinity mask modification rule.
  • the device further includes:
  • the setting module is used to set the corresponding position of the target node to 1 in the use mask of the migratable task if the target node is not the same as the node where the computing unit expected by the migratable task is located, and set the desired position of the migratable task The position corresponding to the node where the arithmetic unit of is 0.
  • the device further includes:
  • the sixth determining module is used to trigger the migration module 320 to execute the step of migrating the migratable task to the target node if the bits corresponding to the target node in the affinity mask and the usage mask of the migratable task are both 1.
  • the target node that matches the migratable task is determined in each node according to the task attribute of the migratable task.
  • the task attribute includes the target number of arithmetic units required to execute the migratable task. Then, the CPU migrates the migratable task to the target node to execute the migratable task through the target node.
  • the CPU can perform the migratable task Migrate to the target node, thereby reducing the waiting time of the migratable task and improving the execution efficiency of the migratable task.
  • the present application also provides a computer device, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor
  • the method steps of the above task migration are realized when the computer program is executed.
  • the implementation process of the method in which the processor executes the above-mentioned task migration can refer to FIG. 1-2 and the above description, which will not be repeated here.
  • a computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the steps of the above-mentioned task migration method are realized.
  • a certain task can be split into at least one subtask (hereinafter referred to as a job).
  • different jobs can also be allocated to a certain node for execution. Therefore, due to the affinity binding principle in the above allocation process, the job often needs to wait for a long time, which seriously affects the execution efficiency of the job.
  • an embodiment of the present application also provides a job processing method, which can be applied to a chip, and the chip can include an intelligent processor with a NUMA architecture and a general-purpose processor.
  • the general-purpose processor can be a CPU (central processing unit, central processing unit) and so on.
  • the intelligent processor using the NUMA architecture can be an accelerated processor, or an IPU (Intelligent Processing Unit) processor, or a GPU (Graphics Processing Unit, graphics processing unit) processor, or other types of processors ,
  • the embodiments of this application are not limited.
  • the method can be applied to the above-mentioned chip, and a general-purpose processor (CPU) in the above-mentioned chip can execute the above-mentioned job processing method to distribute multiple jobs to at least one arithmetic unit in the intelligent processor for execution.
  • a general-purpose processor CPU
  • the job processing method of this application please refer to the following description.
  • the intelligent processor of the NUMA architecture includes a processor with multiple arithmetic units and multiple storage units.
  • Multiple arithmetic units are usually divided into multiple arithmetic unit groups, and each arithmetic unit group is equipped with at least one storage unit, and an arithmetic unit group and its corresponding storage unit constitute a node.
  • the reading and writing of data required by the arithmetic unit in a node can all be realized through the storage unit in the node, and the reading and writing of data between different nodes is realized through the communication interface.
  • Figure 1-1 is a schematic diagram of an intelligent processor with a NUMA architecture provided by an embodiment of the application. As shown in Figure 1-1, the smart processor contains 16 arithmetic units and 4 storage units.
  • each node contains 4 arithmetic units and 1 storage unit.
  • Figure 1-1 only provides a schematic diagram of an intelligent processor in a schematic manner.
  • each node may also include more than four arithmetic units and a storage unit, and the storage unit may include multiple Sub-storage unit.
  • each node may include four sub-nodes, that is, each node may include 16 arithmetic units.
  • Each sub-node contains four arithmetic units and one sub-storage unit, and the arrangement of the four sub-nodes can be arranged in the manner of four nodes.
  • the above-mentioned job processing method may be executed between the sub-nodes of a single node, and the execution process can be referred to the description of the job processing method below.
  • the embodiment of this application first introduces the division of jobs and the modification of the affinity mask, as shown in Figure 2-1.
  • the specific processing process is as follows:
  • Step 201 Obtain the target task to be executed, and determine each dimensional information of the target task and the target number of arithmetic units required to execute the target task.
  • the processor can determine the dimensional information of the target task (that is, dimX, dimY, and dimZ) and the calculation unit required to execute the target task.
  • the number of targets ie kernel_class.
  • the processor can calculate the ratio of the product of each dimension information (that is, dimX*dimY*dimZ) to the target number, and determine whether the ratio is greater than one. If the ratio is greater than 1, it means that the target task can be split into multiple jobs, and the processor executes step 202. If the ratio is less than or equal to 1, it means that the target task cannot be split into multiple jobs.
  • Step 202 If the ratio of the product of each dimension information to the number of targets is greater than 1, the target task is added to the list of splittable tasks.
  • the processor can add the target task to the list of splittable tasks.
  • the splittable task list is used to store tasks that can be split into multiple jobs; the splittable task list may be a linked list or other types of lists, which is not limited in the embodiment of the present application.
  • the processor may delete the task from the splittable task list.
  • the target task when it is determined that the target task can be split into multiple jobs, the target task can be sent to the scheduler, and the scheduler can be based on the dimensional information of the target task and the calculation unit required by the target task.
  • Task attributes such as the number of targets, split the target task into multiple jobs.
  • the scheduler may be a hardware scheduler placed on a chip, and the hardware scheduler may include multiple circuit modules such as a task splitting unit.
  • the scheduler may also be a software scheduler, which is not specifically limited here.
  • Step 203 Modify the affinity mask of the target task according to the preset affinity mask modification rule.
  • the processor can modify the affinity mask of the target task according to the preset affinity mask modification rule.
  • affinity is used to indicate the nodes that can execute the target task in each node, and the affinity mask includes the total number of nodes contained in the intelligent processor.
  • a bit uniquely corresponds to a node. If a bit is 1, it means that the node corresponding to the bit can perform the target task; if a bit is 0, it means that the node corresponding to the bit cannot perform the target task.
  • the affinity mask modification rule can be set by the technician according to the range of nodes processed by the job.
  • the processor can modify the rule according to the affinity mask to change the affinity of the target task.
  • the harmony mask is revised to 1111.
  • the affinity mask modification rule is that the task can be migrated to node 3 and node 4.
  • the original affinity mask of the target task is 0001, and the processor can modify the rule according to the affinity mask to change the target
  • the affinity mask of the task is modified to 1101.
  • the affinity mask of the target job is the same as the affinity mask of the target task.
  • Step 301 When the preset processing conditions are met, the first node matching the target job is determined in each node according to the job attribute of the target job included in the target task.
  • the job attribute includes the target number of arithmetic units required to execute the target job.
  • the processor can determine whether the target task can be split into multiple target jobs (JOB). If the target task can be divided into multiple target tasks, it can be determined that the target task is a task that can relax affinity, and then the processor can further determine whether the preset processing condition is satisfied. Among them, the processing procedure for the processor to determine whether the preset processing condition is met will be described in detail later. When the preset processing conditions are met, the processor may determine the first node matching the target job among the nodes according to the job attributes of the target job. Wherein, the job attribute of the target job includes the target number of arithmetic units required to execute the target job.
  • the processor determines the first node that matches the target job in each node.
  • the processing process is: for each node in each node, if there is an idle computing unit in the node, If the number is greater than or equal to the target number, the node is determined as the first node.
  • the processor can obtain the number of free arithmetic units (that is, the true reference count is equal to 0) in the node. Then, the processor can determine whether the number of free arithmetic units in the node is greater than or equal to the target number. If the number of free computing units in the node is greater than or equal to the target number, it means that the node can execute the target job included in the target task. Correspondingly, the processor may confirm the node as the first node. If the number of free arithmetic units in the node is less than the target number, it means that the node cannot execute the target job included in the target task. Correspondingly, this node is not the first node.
  • the processing procedure for the processor to determine whether the preset processing condition is satisfied is as follows:
  • Step 401 Obtain idle time lengths of idle computing units in each node.
  • the processor can obtain the idle duration of the idle operation unit (that is, the true reference count is equal to 0) in the node.
  • Step 402 If there is a task that includes multiple jobs waiting to be executed in the splittable task list, and there is an idle time longer than or equal to the preset time threshold in the idle time of each idle computing unit, it is determined that the preset processing condition is satisfied .
  • the processor can further determine whether there is a task containing multiple jobs waiting to be executed in the splittable task list, and determine the idle duration of each idle computing unit, Whether there is an idle period greater than or equal to the preset period threshold.
  • the preset duration threshold can be set by a technician based on experience. If there is a task that contains multiple jobs waiting to be executed in the splittable task list, and there is an idle duration greater than or equal to the preset duration threshold in the idle duration of each idle computing unit, it means that the processor can split the splittable task
  • the tasks to be executed in the list are allocated to the idle arithmetic units in each node for execution.
  • the processor can determine that the preset processing condition is satisfied. If there is no task waiting to be executed in the splittable task list, or there is no idle time greater than or equal to the preset duration threshold in the idle time of each idle computing unit, it means that the processor cannot wait for the splittable task list.
  • the tasks to be executed are allocated to the idle computing units in each node for execution.
  • the processor may determine that the preset processing condition is not satisfied.
  • Step 302 Execute the target job included in the target task through the first node and the second node where the arithmetic unit that executes the target task is located.
  • the processor can execute the target job included in the target task through the first node and the second node where the computing unit that executes the target task is located. In this way, when the target job needs to wait a long time before it can be executed by the arithmetic unit that the target job is waiting to execute, the processor can execute the target job through the first node and the second node, thereby reducing the waiting time of the target job , Improve the execution efficiency of the target job.
  • the processor may also modify the use mask of the target job according to the determined first node.
  • the process is: in the use mask of the target task, set the position corresponding to the first node to 1.
  • the usage_mask of the target task is used to indicate the node that determines the execution of the target task in each node.
  • the usage mask includes the total number of bits contained in the intelligent processor, and each bit uniquely corresponds to For a node, if a bit is 1, it means that the node corresponding to the bit is determined to perform the target task, and if a bit is 0, it means that the node corresponding to the bit does not perform the target task.
  • the position corresponding to the first node can be set to 1 in the usage mask of the target task. For example, the original use mask of the target task is 0001. Assuming that the first node is node 2 and node 4, the modified use mask of the target task is 1011.
  • the usage mask of the target job is the same as the usage mask of the target task.
  • the processor may also determine the first node according to the affinity mask and usage mask of the target job. Whether a node and a second node can execute the target job, the specific processing process is: if the affinity mask and usage mask of the target job, the bits corresponding to the first node and the second node are both 1, then execute The steps of the target job included in the target task are executed through the first node and the second node where the arithmetic unit that executes the target task is located.
  • the processor can determine the affinity mask and usage mask of the target job. Whether the bits corresponding to the first node are all 1. If the bits corresponding to the first node are all 1, it means that the first node can execute the target job. In the same way, for the second node where the arithmetic unit that executes the target task is located, the processor can determine whether the bit corresponding to the second node is also 1 in the affinity mask and the usage mask of the target job. If the bits corresponding to the second node are all 1, it means that the second node can execute the target job. Correspondingly, the processor can execute the target job included in the target task through the first node and the second node.
  • the processor will execute the target job only through the second node.
  • the affinity mask of the target job is 1101
  • the usage mask is 1001
  • the first node is node 2 and node 4
  • the second node is node 1
  • the bits corresponding to node 1 are all 1
  • node 4 corresponds to The bits are all 1
  • the bit of node 2 in the affinity mask is 0, and the processor can execute the target job through node 1 and node 4.
  • the processor determines the first node matching the target job among the nodes according to the job attributes of the target job included in the target task.
  • the job attribute includes the target number of arithmetic units required to execute the target job. Then, the processor executes the target job included in the target task through the first node and the second node where the arithmetic unit that executes the target task is located.
  • the processor can execute the target job through the first node and the second node, thereby reducing the waiting time of the target job , Improve the execution efficiency of the target job.
  • An embodiment of the present application also provides a device for job processing. As shown in Figures 2-4, the device includes:
  • the first determining module 510 is used to determine the first node matching the target job in each node according to the job attributes of the target job contained in the target task when the preset processing conditions are met, and the job attributes include those required to execute the target job.
  • the execution module 520 is configured to execute the target job included in the target task through the first node and the second node where the arithmetic unit that executes the target task is located.
  • the first determining module 510 is specifically configured to:
  • the node For each node in each node, if the number of free arithmetic units in the node is greater than or equal to the target number, then the node is determined as the first node.
  • the device further includes:
  • the obtaining module is used to obtain the idle time of the idle computing unit in each node;
  • the second determining module is used to determine that if there are tasks that include multiple jobs waiting to be executed in the splittable task list, and the idle time length of each idle computing unit is greater than or equal to the preset time length threshold, then it is determined to meet Preset processing conditions.
  • the device further includes:
  • the third determining module is used to obtain the target task to be executed, and determine the dimensional information of the target task and the target number of computing units required to execute the target task;
  • the modification module is used to modify the affinity mask of the target task according to the preset affinity mask modification rule.
  • the device further includes:
  • the setting module is used to set the position corresponding to the first node to 1 in the use mask of the target task.
  • the device further includes:
  • the fourth determining module is used to trigger the execution module 520 to execute the pass through the first node and execute the target task if the bits corresponding to the first node and the second node in the affinity mask and the usage mask of the target job are both 1.
  • the second node where the arithmetic unit is located executes the steps of the target job contained in the target task.
  • the affinity mask of the target job is the same as the affinity mask of the target task
  • the usage mask of the target job is the same as the usage mask of the target task
  • the CPU determines the first node matching the target job among the nodes according to the job attributes of the target job included in the target task.
  • the job attribute includes the target number of arithmetic units required to execute the target job. Then, the CPU executes the target job included in the target task through the first node and the second node where the arithmetic unit that executes the target task is located.
  • the CPU can jointly execute the target job through the first node and the second node, thereby reducing the waiting time of the target job, Improve the execution efficiency of the target job.
  • a computer device provided by the present application includes a memory and a processor, the memory stores a computer program that can run on the processor, and the processor executes The computer program implements the method steps of the above-mentioned job processing.
  • the implementation process of the method in which the processor executes the above-mentioned job processing can refer to FIG. 2-1, FIG. 2-2, FIG. 2-3, and the above description, which will not be repeated here.
  • a computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the steps of the above-mentioned job processing method are realized.
  • the above device embodiments are only illustrative, and the device of the present disclosure may also be implemented in other ways.
  • the division of units/modules in the above-mentioned embodiments is only a logical function division, and there may be other division methods in actual implementation.
  • multiple units, modules or components may be combined or integrated into another system, or some features may be omitted or not implemented.
  • the functional units/modules in the various embodiments of the present disclosure may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may exist.
  • the modules are integrated together.
  • the above-mentioned integrated unit/module can be realized in the form of hardware or software program module.
  • the hardware may be a digital circuit, an analog circuit, and so on.
  • the physical realization of the hardware structure includes but is not limited to transistors, memristors and so on.
  • the artificial intelligence processor may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, and so on.
  • the storage unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as RRAM (Resistive Random Access Memory), DRAM (Dynamic Random Access Memory), Static random access memory SRAM (Static Random-Access Memory), enhanced dynamic random access memory EDRAM (Enhanced Dynamic Random Access Memory), high-bandwidth memory HBM (High-Bandwidth Memory), hybrid storage cube HMC (Hybrid Memory Cube), etc. Wait.
  • RRAM Resistive Random Access Memory
  • DRAM Dynamic Random Access Memory
  • Static random access memory SRAM Static Random-Access Memory
  • enhanced dynamic random access memory EDRAM Enhanced Dynamic Random Access Memory
  • high-bandwidth memory HBM High-Bandwidth Memory
  • hybrid storage cube HMC Hybrid Memory Cube
  • the integrated unit/module is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present disclosure essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory. It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
  • a method for task migration includes:
  • a target node that matches the migratable task is determined in each node according to the task attribute of the migratable task, and the task attribute includes executing the migratable task.
  • Migrating the migratable task to the target node so as to execute the migratable task through the target node.
  • the method according to clause A1 the determining the target node matching the migratable task in each node according to the task attribute of the migratable task includes:
  • the candidate node with the smallest distance from the node to which the computing unit expected by the migratable task belongs will be selected , Determined as the target node;
  • the node containing the first arithmetic unit with the smallest total number of tasks to be executed is determined as the target node, and the first The arithmetic unit is an arithmetic unit whose task attribute of the executed task is the same as the task attribute of the transferable task.
  • Clause A4 when it is detected that the migratable task meets the preset migration condition, the target node that matches the migratable task is determined in each node according to the task attribute of the migratable task Previously, the method also included:
  • the step of determining the target node that matches the migratable task If the maximum difference between the number of tasks expected to be executed in each arithmetic unit is greater than or equal to the second preset number threshold, then execute said when it is detected that the migratable tasks meet the preset migrating condition, according to the migratable task For the task attribute of the migration task, in each node, the step of determining the target node that matches the migratable task.
  • Clause A5. The method according to clause A1, the method further comprising:
  • the target task to be executed and determine the task type of the target task, the task execution duration, the minimum cross-node memory access delay of the node to which the third arithmetic unit is expected for the target task, and the third arithmetic unit The number of tasks expected to be performed in the
  • the task type is computationally intensive, and/or the task execution time is greater than the minimum cross-node memory access delay, and/or the number of tasks expected to be executed in the third arithmetic unit is greater than or equal to the third
  • the preset number threshold is set, the target task is determined to be a transferable task, and the affinity mask of the target task is modified according to the preset affinity mask modification rule.
  • Clause A6 The method according to clause A1, before the migrating the migratable task to the target node, the method further includes:
  • the position corresponding to the target node is set to 1, and the The position corresponding to the node where the computing unit expected by the migratable task is located is 0.
  • Clause A7 The method according to clause A1, before the migrating the migratable task to the target node, the method further includes:
  • the step of migrating the migratable task to the target node is performed.
  • a device for task migration which includes:
  • the first determining module is configured to determine a target node matching the migratable task in each node according to the task attribute of the migratable task when it is detected that the migratable task meets the preset migration condition, and the task The attribute includes the target number of arithmetic units required to execute the migratable task;
  • the migration module is configured to migrate the migratable task to the target node, so as to execute the migratable task through the target node.
  • a computer device including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements any one of clauses A1 to A7 when the computer program is executed The steps of the method.
  • Clause A10 A computer-readable storage medium on which a computer program is stored, which, when executed by a processor, implements the steps of the method described in any one of clauses A1 to A7.
  • the first node matching the target job is determined in each node according to the job attributes of the target job contained in the target task, and the job attributes include the operations required to execute the target job The target number of units;
  • the target job included in the target task is executed through the first node and the second node where the arithmetic unit that executes the target task is located.
  • Clause B2 The method according to clause B1, wherein the determining a first node matching the target job in each node according to the job attribute of the target job contained in the target task includes:
  • the node For each of the nodes, if the number of free arithmetic units in the node is greater than or equal to the target number, then the node is determined as the first node.
  • Clause B3 the method according to clause B1, the method further comprising:
  • Clause B4 The method according to clause B1, when the preset processing conditions are met, before the first node matching the target job is determined in each node according to the job attributes of the target job included in the target task, The method also includes:
  • the target task is added to the list of splittable tasks
  • Clause B5. The method according to clause B1, before executing the target job included in the target task through the first node and the second node where the computing unit that executes the target task is located, the method further includes :
  • the position corresponding to the first node is 1.
  • Clause B6 The method according to clause B1, before executing the target job included in the target task through the first node and the second node where the computing unit that executes the target task is located, the method further includes :
  • the bits corresponding to the first node and the second node are both 1, then the passing of the first node and the execution of the target are executed.
  • the second node where the computing unit of the task is located executes the steps of the target job included in the target task.
  • a device for job processing comprising:
  • the first determining module is used to determine the first node that matches the target job in each node according to the job attributes of the target job contained in the target task when the preset processing conditions are met, and the job attributes include the execution of the target job. State the target number of arithmetic units required for the target operation;
  • the execution module is configured to execute the target job included in the target task through the first node and the second node where the arithmetic unit that executes the target task is located.
  • Clause B9 A computer device, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements any one of clauses B1 to B7 when the computer program is executed The steps of the method.
  • Clause B10 A computer-readable storage medium with a computer program stored thereon, which, when executed by a processor, implements the steps of the method described in any one of clauses B1 to B7.
  • the terms “include”, “include” or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, but also includes those that are not explicitly listed Other elements of, or also include elements inherent to this process, method, article or equipment. If there are no more restrictions, the element defined by the sentence “including a" does not exclude the existence of other identical elements in the process, method, article, or equipment that includes the element.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present application relates to a task migration method and apparatus, and a computer device and a readable storage medium. The method comprises: when it is detected that a migratable task satisfies a preset migration condition, determining a target node matching the migratable task from various nodes according to a task attribute of the migratable task, the task attribute comprising a target number of operation units required for executing the migratable task; and migrating the migratable task to the target node, so that the target node executes the migratable task. According to the present application, the waiting duration of the migratable task can be reduced, and the execution efficiency of the migratable task is improved. The present application relates to an operation processing method and apparatus, and a computer device and a readable storage medium.

Description

任务迁移的方法、装置、计算机设备及可读存储介质Method, device, computer equipment and readable storage medium for task migration
相关申请Related application
本申请要求2020年01月07日申请的,申请号为202010012242.8,名称为“任务迁移的方法、装置、计算机设备及可读存储介质”;申请号为202010012302.6,名称为“作业处理的方法、装置、计算机设备及可读存储介质”的中国专利申请的优先权,在此将其全文引入作为参考。This application requires the application on January 7, 2020, the application number is 202010012242.8, the name is "task migration method, device, computer equipment and readable storage medium"; the application number is 202010012302.6, the name is "job processing method, device "Computer equipment and readable storage medium" is the priority of the Chinese patent application, which is hereby incorporated by reference in its entirety.
技术领域Technical field
本申请涉及计算机技术领域,特别是涉及一种任务迁移的方法、装置、计算机设备及可读存储介质,以及一种作业处理的方法、装置、计算机设备及可读存储介质。This application relates to the field of computer technology, and in particular to a method, device, computer equipment, and readable storage medium for task migration, and a method, device, computer equipment, and readable storage medium for job processing.
背景技术Background technique
目前,在面向人工智能应用的芯片设计中普遍采用NUMA(Non Uniform Memory Access Architecture,非均匀存取结构)架构。基于NUMA架构的芯片通常包含具有多个运算单元的处理器和多个存储单元。其中,多个运算单元通常划分为多个运算单元组,每个运算单元组分配有至少一个存储单元,一个运算单元组及其对应的存储单元构成一个节点。这样,一个节点中的运算单元所需要数据的读写都可以通过本节点中的存储单元实现。在芯片运行过程中,需要将待执行的任务或作业分配到某一节点中执行,但目前任务或作业处理仍然存在问题。Currently, NUMA (Non Uniform Memory Access Architecture) architecture is commonly used in chip design for artificial intelligence applications. A chip based on the NUMA architecture usually includes a processor with multiple arithmetic units and multiple storage units. Among them, a plurality of arithmetic units are usually divided into a plurality of arithmetic unit groups, and each arithmetic unit group is equipped with at least one storage unit, and an arithmetic unit group and its corresponding storage unit constitute a node. In this way, the reading and writing of data required by the arithmetic unit in a node can be realized through the storage unit in the node. During the operation of the chip, the task or job to be executed needs to be assigned to a certain node for execution, but there are still problems with the task or job processing at present.
发明内容Summary of the invention
基于此,为解决上述问题,本申请提供一种任务迁移的方法、装置、计算机设备及可读存储介质。Based on this, in order to solve the above-mentioned problems, the present application provides a task migration method, device, computer equipment, and readable storage medium.
一种任务迁移的方法,所述方法包括:A method for task migration, the method includes:
当检测到可迁移任务满足预设迁移条件时,根据所述可迁移任务的任务属性,在各节点中确定与所述可迁移任务相匹配的目标节点,所述任务属性包括执行所述可迁移任务所需的运算单元的目标数目;When it is detected that a migratable task meets the preset migration condition, a target node that matches the migratable task is determined in each node according to the task attribute of the migratable task, and the task attribute includes executing the migratable task. The target number of computing units required for the task;
将所述可迁移任务迁移至所述目标节点,以通过所述目标节点执行所述可迁移任务。Migrating the migratable task to the target node, so as to execute the migratable task through the target node.
一种任务迁移的装置,所述装置包括:A device for task migration, the device includes:
第一确定模块,用于当检测到可迁移任务满足预设迁移条件时,根据所述可迁移任务的任务属性,在各节点中确定与所述可迁移任务相匹配的目标节点,所述任务属性包括执行所述可迁移任务所需的运算单元的目标数目;The first determining module is configured to determine a target node matching the migratable task in each node according to the task attribute of the migratable task when it is detected that the migratable task meets the preset migration condition, and the task The attribute includes the target number of arithmetic units required to execute the migratable task;
迁移模块,用于将所述可迁移任务迁移至所述目标节点,以通过所述目标节点执行所述可迁移任务。The migration module is configured to migrate the migratable task to the target node, so as to execute the migratable task through the target node.
一种计算机设备,包括存储器及处理器,所述存储器上存储有可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现上述任一项所述方法的步骤。A computer device, including a memory and a processor, the memory stores a computer program that can run on the processor, and is characterized in that the processor implements any of the methods described above when the processor executes the computer program step.
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任一项所述的方法的步骤。A computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, it realizes the steps of any one of the above-mentioned methods.
本申请提供了一种任务迁移的方法、装置、计算机设备及可读存储介质。当CPU检测到可迁移任务满足预设迁移条件时,根据可迁移任务的任务属性,在各节点中确定与可迁移任务相匹配的目标节点。其中,任务属性包括执行可迁移任务所需的运算单元的目标数目。然后,CPU将可迁移任务迁移至目标节点,以通过目标节点执行可迁移任务。这样,当可迁移任务所期待的运算单元无法执行该可迁移任务,或者该可迁移任务需要等待较长时间才可以被该可迁移任务所期待的运算单元执行时,CPU可以将该可迁移任务迁移至目标节点,从而减少该可迁移任务的等待时长,提高该可迁移任务的执行效率。This application provides a method, device, computer equipment, and readable storage medium for task migration. When the CPU detects that the migratable task meets the preset migration condition, the target node that matches the migratable task is determined in each node according to the task attribute of the migratable task. Among them, the task attribute includes the target number of arithmetic units required to execute the migratable task. Then, the CPU migrates the migratable task to the target node to execute the migratable task through the target node. In this way, when the computing unit expected by the migratable task cannot execute the migratable task, or the migratable task needs to wait a long time before it can be executed by the computing unit expected by the migratable task, the CPU can perform the migratable task Migrate to the target node, thereby reducing the waiting time of the migratable task and improving the execution efficiency of the migratable task.
本申请还提供一种作业处理的方法、装置、计算机设备及可读存储介质。This application also provides a method, device, computer equipment and readable storage medium for job processing.
一种作业处理的方法,所述方法包括:A method for job processing, the method comprising:
当满足预设处理条件时,根据目标任务包含的目标作业的作业属性,在各节点中确定与所述目标作业相匹配的第一节点,所述作业属性包括执行所述目标作业所需的运算单元的目标数目;When the preset processing conditions are met, the first node matching the target job is determined in each node according to the job attributes of the target job contained in the target task, and the job attributes include the operations required to execute the target job The target number of units;
通过所述第一节点和执行所述目标任务的运算单元所在的第二节点,执行所述目标任务包含的目标作业。The target job included in the target task is executed through the first node and the second node where the arithmetic unit that executes the target task is located.
作为一种可选的实施方式,所述根据目标任务包含的目标作业的作业属性,在各节点中确定与所述目标作业相匹配的第一节点,包括:As an optional implementation manner, the determining the first node matching the target job in each node according to the job attribute of the target job included in the target task includes:
针对所述各节点中的每个节点,如果该节点中空闲的运算单元的数目大于或等于所述目标数目,则将该节点确定为第一节点。For each of the nodes, if the number of free arithmetic units in the node is greater than or equal to the target number, then the node is determined as the first node.
作为一种可选的实施方式,所述方法还包括:As an optional implementation manner, the method further includes:
获取所述各节点中空闲的运算单元的空闲时长;Acquiring the idle time length of the idle computing unit in each node;
如果可拆分任务列表中存在等待执行的包含多个作业的任务、且各空闲的运算单元的空闲时长中存在大于或等于预设时长阈值的空闲时长,则确定满足预设处理条件。If there is a task containing multiple jobs waiting to be executed in the splittable task list, and there is an idle time longer than or equal to the preset time threshold in the idle time of each idle computing unit, it is determined that the preset processing condition is satisfied.
作为一种可选的实施方式,所述当满足预设处理条件时,根据目标任务包含的目标作业的作业属性,在各节点中确定与所述目标作业相匹配的第一节点之前,所述方法还包括:As an optional implementation manner, when the preset processing conditions are met, according to the job attributes of the target job contained in the target task, before the first node matching the target job is determined in each node, the Methods also include:
获取待执行的目标任务,并确定所述目标任务的各维度信息和执行所述目标任务所需的运算单元的目标数目;Acquiring the target task to be executed, and determining the dimensional information of the target task and the target number of arithmetic units required to execute the target task;
如果所述各维度信息的乘积与所述目标数目的比值大于1,则将所述目标任务添加至可拆分任务列表中;If the ratio of the product of the dimensional information to the number of targets is greater than 1, the target task is added to the list of splittable tasks;
根据预设的亲和性掩码修改规则,修改所述目标任务的亲和性掩码。Modify the affinity mask of the target task according to the preset affinity mask modification rule.
作为一种可选的实施方式,所述通过所述第一节点和执行所述目标任务的运算单元所在的第二节点,执行所述目标任务包含的目标作业之前,所述方法还包括:As an optional implementation manner, before the execution of the target job included in the target task through the first node and the second node where the arithmetic unit that executes the target task is located, the method further includes:
在所述目标任务的使用掩码中,将所述第一节点对应的位置为1。In the use mask of the target task, the position corresponding to the first node is 1.
作为一种可选的实施方式,所述通过所述第一节点和执行所述目标任务的运算单元所在的第二节点,执行所述目标任务包含的目标作业之前,所述方法还包括:As an optional implementation manner, before the execution of the target job included in the target task through the first node and the second node where the arithmetic unit that executes the target task is located, the method further includes:
如果所述目标作业的亲和性掩码和使用掩码中,所述第一节点和所述第二节点对应的位均为1,则执行所述通过所述第一节点和执行所述目标任务的运算单元所在的第二节点,执行所述目标任务包含的目标作业的步骤。If in the affinity mask and usage mask of the target job, the bits corresponding to the first node and the second node are both 1, then the passing of the first node and the execution of the target are executed. The second node where the computing unit of the task is located executes the steps of the target job included in the target task.
作为一种可选的实施方式,所述目标作业的亲和性掩码与所述目标任务的亲和性掩码相同,所述目标作业的使用掩码与所述目标任务的使用掩码相同。As an optional implementation manner, the affinity mask of the target job is the same as the affinity mask of the target task, and the usage mask of the target job is the same as the usage mask of the target task. .
一种作业处理的装置,所述装置包括:A device for job processing, the device comprising:
第一确定模块,用于当满足预设处理条件时,根据目标任务包含的目标作业的作业属性,在各节点中确定与所述目标作业相匹配的第一节点,所述作业属性包括执行所述目标作业所需的运算单元的目标数目;The first determining module is used to determine the first node that matches the target job in each node according to the job attributes of the target job contained in the target task when the preset processing conditions are met, and the job attributes include the execution of the target job. State the target number of arithmetic units required for the target operation;
执行模块,用于通过所述第一节点和执行所述目标任务的运算单元所在的第二节点,执行所述目标任务包含的目标作业。The execution module is configured to execute the target job included in the target task through the first node and the second node where the arithmetic unit that executes the target task is located.
一种计算机设备,包括存储器及处理器,所述存储器上存储有可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现任一项所述方法的步骤。A computer device includes a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the steps of any one of the methods when the computer program is executed.
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现任一项所 述的方法的步骤。A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of any one of the methods.
本申请实施例提供了一种作业处理的方法、装置、计算机设备及可读存储介质。当满足预设处理条件时,CPU根据目标任务包含的目标作业的作业属性,在各节点中确定与目标作业相匹配的第一节点。其中,作业属性包括执行目标作业所需的运算单元的目标数目。然后,CPU通过第一节点和执行目标任务的运算单元所在的第二节点,执行目标任务包含的目标作业。这样,当目标作业需要等待较长时间才可以被该目标作业所等待执行的运算单元执行时,CPU可以通过第一节点和第二节点共同执行该目标作业,从而减少该目标作业的等待时长,提高该目标作业的执行效率。The embodiments of the present application provide a method, device, computer equipment, and readable storage medium for job processing. When the preset processing conditions are met, the CPU determines the first node matching the target job among the nodes according to the job attributes of the target job included in the target task. Among them, the job attribute includes the target number of arithmetic units required to execute the target job. Then, the CPU executes the target job included in the target task through the first node and the second node where the arithmetic unit that executes the target task is located. In this way, when the target job needs to wait a long time before it can be executed by the arithmetic unit that the target job is waiting to execute, the CPU can jointly execute the target job through the first node and the second node, thereby reducing the waiting time of the target job, Improve the execution efficiency of the target job.
附图说明Description of the drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据公开的附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are the embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained from the disclosed drawings without creative work.
图1-1为本申请实施例提供的一种智能处理器的示意图;Figure 1-1 is a schematic diagram of an intelligent processor provided by an embodiment of the application;
图1-2为本申请实施例提供的一种任务迁移的方法的流程示意图;Figure 1-2 is a schematic flowchart of a task migration method provided by an embodiment of the application;
图1-3为本申请实施例提供的一种任务迁移的装置的结构示意图;Figures 1-3 are schematic structural diagrams of a task migration device provided by an embodiment of this application;
图1-4为本申请实施例提供的一种计算机设备的结构示意图;Figure 1-4 is a schematic structural diagram of a computer device provided by an embodiment of this application;
图2-1为本申请实施例提供的一种作业拆分及亲和性掩码修改的方法的流程示意图;Figure 2-1 is a schematic flowchart of a method for job splitting and affinity mask modification provided by an embodiment of the application;
图2-2为本申请实施例提供的一种作业处理的方法的流程示意图;2-2 is a schematic flowchart of a job processing method provided by an embodiment of this application;
图2-3为本申请实施例提供的一种处理条件的判断方法的流程示意图;Figure 2-3 is a schematic flowchart of a method for determining processing conditions provided by an embodiment of the application;
图2-4为本申请实施例提供的一种作业处理的装置的结构示意图。Figures 2-4 are schematic structural diagrams of a job processing device provided by an embodiment of the application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
应当理解,本披露的权利要求、说明书及附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。本披露的说明书和权利要求书中使用的术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that the terms "first", "second", "third" and "fourth" in the claims, specification and drawings of the present disclosure are used to distinguish different objects, rather than to describe a specific order. . The terms "comprising" and "comprising" used in the specification and claims of this disclosure indicate the existence of the described features, wholes, steps, operations, elements and/or components, but do not exclude one or more other features, wholes The existence or addition of, steps, operations, elements, components, and/or their collections.
还应当理解,在此本披露说明书中所使用的术语仅仅是出于描述特定实施例的目的,而并不意在限定本披露。如在本披露说明书和权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。还应当进一步理解,在本披露说明书和权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should also be understood that the terms used in this disclosure specification are only for the purpose of describing specific embodiments, and are not intended to limit the disclosure. As used in this disclosure and claims, unless the context clearly indicates other circumstances, the singular forms "a", "an" and "the" are intended to include plural forms. It should be further understood that the term "and/or" used in this disclosure specification and claims refers to any combination of one or more of the associated listed items and all possible combinations, and includes these combinations.
如在本说明书和权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in this specification and claims, the term "if" can be interpreted as "when" or "once" or "in response to determination" or "in response to detection" depending on the context. Similarly, the phrase "if determined" or "if detected [described condition or event]" can be interpreted as meaning "once determined" or "in response to determination" or "once detected [described condition or event]" depending on the context ]" or "in response to detection of [condition or event described]".
在芯片运行过程中,需要将待执行的任务分配到某一节点中执行,具体的分配过程为:先确定执行该任务所需的内存大小,然后根据各节点对应的内存单元,确定内存剩余空间满足该内存大小的目标节 点。例如,可以将内存剩余空间最大的节点作为目标节点,或者,可以在内存剩余空间大于该内存大小的节点中,随机选择一个节点作为目标节点。然后,基于亲和性绑定原则,将该任务分配至目标节点进行执行。In the chip running process, the task to be executed needs to be allocated to a node for execution. The specific allocation process is: first determine the memory size required to execute the task, and then determine the remaining memory space according to the memory unit corresponding to each node The target node that meets the memory size. For example, the node with the largest remaining memory space may be used as the target node, or, among the nodes with the remaining memory space greater than the memory size, a node may be randomly selected as the target node. Then, based on the affinity binding principle, the task is assigned to the target node for execution.
然而,上述分配过程中由于亲和性绑定原则的因素,往往会导致任务需要长时间的等待,进而严重影响任务的执行效率。However, due to the affinity binding principle in the above allocation process, the task often needs to wait for a long time, which seriously affects the execution efficiency of the task.
为解决上述技术问题,本申请实施例提供了一种任务迁移方法,该方法可以应用于一芯片中,该芯片可以包括至少一个处理器,可选地,该芯片可以是具有异构多处理器的芯片,该芯片可以包含采用NUMA架构智能处理器和通用处理器,该通用处理器可以是CPU(central processing unit,中央处理器),该智能处理器可以是加速器,或者IPU(Intelligent Processing Unit,智能处理单元),或者GPU(Graphics Processing Unit,图形单元),也可以为其他类型的智能处理器,本申请实施例不作限定。具体的,该方法可以应用于芯片中,该芯片中的CPU(central processing unit,中央处理器)可以执行上述的任务迁移方法,以将多个任务调度至智能处理器上进行处理。当然,在其他实施例中,该芯片的智能处理器也可以执行上述的任务迁移方法。本申请实施例的任务迁移方法的具体执行过程可参见后文的描述。In order to solve the above technical problem, the embodiment of the present application provides a task migration method. The method can be applied to a chip. The chip may include at least one processor. Optionally, the chip may have heterogeneous multiprocessors. The chip may include an intelligent processor with a NUMA architecture and a general-purpose processor. The general-purpose processor may be a CPU (central processing unit, central processing unit), and the intelligent processor may be an accelerator or an IPU (Intelligent Processing Unit, Intelligent processing unit), or GPU (Graphics Processing Unit, graphics unit), may also be other types of intelligent processors, which are not limited in the embodiment of the present application. Specifically, the method can be applied to a chip, and a CPU (central processing unit, central processing unit) in the chip can execute the task migration method described above to schedule multiple tasks to an intelligent processor for processing. Of course, in other embodiments, the intelligent processor of the chip can also execute the above-mentioned task migration method. For the specific execution process of the task migration method in the embodiment of the present application, please refer to the following description.
可选地,该NUMA架构的智能处理器还包含具有多个运算单元的处理器和多个存储单元。多个运算单元通常划分为多个运算单元组,每个运算单元组分配有至少一个存储单元,一个运算单元组及其对应的存储单元构成一个节点。一个节点中的运算单元所需要数据的读写都可以通过本节点中的存储单元实现,不同节点之间通过通信接口实现数据的读写。Optionally, the intelligent processor of the NUMA architecture further includes a processor with multiple arithmetic units and multiple storage units. Multiple arithmetic units are usually divided into multiple arithmetic unit groups, and each arithmetic unit group is equipped with at least one storage unit, and an arithmetic unit group and its corresponding storage unit constitute a node. The reading and writing of data required by the arithmetic unit in a node can all be realized through the storage unit in the node, and the reading and writing of data between different nodes is realized through the communication interface.
图1-1为本申请实施例提供的一种NUMA架构的智能处理器的示意图。如图1-1所示,该智能处理器包含具有16个运算单元和4个存储单元,该智能处理器中划分出4个节点,每个节点包含4个运算单元和1个存储单元。图1-1仅以示意的方式提供了一种智能处理器的示意图,在其他可能实现的方式中,各个节点还可以包含四个以上的运算单元和1个存储单元,该存储单元可以包括多个子存储单元。例如,各个节点可以包括四个子节点,即每个节点可以包括16个运算单元。每个子节点包含四个运算单元和1个子存储单元,四个子节点的排布方式可以按照四个节点的方式排布。进一步地,上述任务分配方法可以在单个节点的各个子节点之间执行,其执行过程具体可参见下文关于任务分配方法的描述。Figure 1-1 is a schematic diagram of an intelligent processor with a NUMA architecture provided by an embodiment of the application. As shown in Figure 1-1, the smart processor contains 16 arithmetic units and 4 storage units. The smart processor is divided into 4 nodes, and each node contains 4 arithmetic units and 1 storage unit. Figure 1-1 only provides a schematic diagram of an intelligent processor in a schematic manner. In other possible implementation manners, each node may also include more than four arithmetic units and one storage unit, and the storage unit may include multiple Sub-storage unit. For example, each node may include four sub-nodes, that is, each node may include 16 arithmetic units. Each sub-node contains four arithmetic units and one sub-storage unit, and the arrangement of the four sub-nodes can be arranged in the manner of four nodes. Further, the above-mentioned task allocation method can be executed among the sub-nodes of a single node, and the execution process of the method can be detailed in the description of the task allocation method below.
当某一任务被调度至软件队列后,处理器可以根据执行该任务所需的运算单元的数目,在存储该任务的任务数据的存储单元所属的节点中,为该任务分配该任务所期望的运算单元,并将该任务所期望的各运算单元的等待引用计数(即clu_wait_ref)加1。例如,如图1-1所示,该任务所需的运算单元的数目为2,存储该任务的任务数据的存储单元为存储单元1,则处理器可以在节点1中将运算单元1和运算单元2确定为该任务所期望的运算单元,并将运算单元1和运算单元2的等待引用计数加1。When a task is scheduled to the software queue, the processor can allocate the task expected by the task in the node to which the storage unit storing the task data of the task belongs according to the number of computing units required to execute the task Arithmetic unit, and add 1 to the wait reference count (ie, clu_wait_ref) of each arithmetic unit expected by the task. For example, as shown in Figure 1-1, the number of arithmetic units required for this task is 2, and the storage unit storing the task data of this task is storage unit 1. Then the processor can combine the arithmetic unit 1 and the arithmetic unit 1 in node 1. Unit 2 is determined as the arithmetic unit expected by the task, and the waiting reference counts of arithmetic unit 1 and arithmetic unit 2 are incremented by one.
当处理器确定执行该任务的运算单元后,该任务被调度至硬件队列,并将执行该任务的各运算单元的真正引用计数(即clu_real_ref)加1。例如,如图1-1所示,处理器确定执行该任务的运算单元为运算单元1和运算单元2后,处理器可以将运算单元1和运算单元2的真正引用计数加1。After the processor determines the arithmetic unit that executes the task, the task is scheduled to the hardware queue, and the real reference count (ie, clu_real_ref) of each arithmetic unit that executes the task is incremented by 1. For example, as shown in Figure 1-1, after the processor determines that the arithmetic units performing the task are arithmetic unit 1 and arithmetic unit 2, the processor can add 1 to the true reference counts of arithmetic unit 1 and arithmetic unit 2.
当该任务执行完成时,该任务所期望的各运算单元的等待引用计数减1,同时,执行该任务的各运算单元的真正引用计数减1。例如,运算单元1和运算单元2执行完该任务后,处理器可以将运算单元1和运算单元2的等待引用计数和真正引用计数减1。如果该任务所期望的运算单元发生迁移时,该任务所期望的各源运算单元的等待引用计数减1,该任务所期望的各目的运算单元的等待引用计数加1。例如,如图1-1所示,该任务所期待的运算单元由运算单元1和运算单元2迁移至运算单元3和运算单元4,则处理器将运算单元1和运算单元2的等待引用计数减1,并将运算单元3和运算单元4的等待引用计数加1。When the execution of the task is completed, the waiting reference count of each arithmetic unit expected by the task is decreased by 1, and at the same time, the true reference count of each arithmetic unit that executes the task is decreased by 1. For example, after the arithmetic unit 1 and the arithmetic unit 2 have performed the task, the processor may decrement the waiting reference count and the true reference count of the arithmetic unit 1 and the arithmetic unit 2 by one. If the arithmetic unit expected by the task migrates, the waiting reference count of each source arithmetic unit expected by the task is decreased by 1, and the waiting reference count of each destination arithmetic unit expected by the task is increased by 1. For example, as shown in Figure 1-1, the arithmetic unit expected by the task is migrated from arithmetic unit 1 and arithmetic unit 2 to arithmetic unit 3 and arithmetic unit 4. Subtract 1 and add 1 to the waiting reference counts of arithmetic unit 3 and arithmetic unit 4.
为了便于理解,首先,对本申请提供的一种可迁移任务的确定方法进行介绍,具体处理过程如下。In order to facilitate understanding, firstly, a method for determining transferable tasks provided in this application is introduced, and the specific processing process is as follows.
步骤一,获取待执行的目标任务,并确定目标任务的任务类型、任务执行时长、目标任务所期望的第三运算单元所属的节点的最小跨节点访存延时和第三运算单元中期望执行的任务的数目。Step 1: Obtain the target task to be executed, and determine the task type of the target task, the task execution time, the minimum cross-node memory access delay of the node to which the third arithmetic unit belongs to the target task, and the expected execution in the third arithmetic unit The number of tasks.
在实施中,当某一任务被调度至软件队列后,处理器需要确定某一任务(即目标任务)是否为可迁移任务。相应的,处理器可以获取该目标任务的任务类型、任务执行时长、该目标任务所期望的第三运算单元所属的节点的最小跨节点访存延时和第三运算单元中期望执行的任务的数目(即第三运算单元的等待引用计数)等。其中,任务类型可以包括访存密集型(即任务中I/O(Input/Output,输入/输出)指令较多,执行时需要频繁读写存储单元中的数据的任务)和计算密集型(即任务中计算指令较多,执行时需要占用大量运算资源的任务),还可以包括其他任务类型,本申请实施例不作限定。然后,处理器可以判断该目标任务的任务类型是否为计算密集型、该目标任务的任务执行时长是否大于最小跨节点访存延时、以及第三运算单元的等待引用计数是否大于或等于第三预设数目阈值。其中,该第三预设数目阈值可以由技术人员根据经验进行设置。In implementation, after a certain task is scheduled to the software queue, the processor needs to determine whether a certain task (that is, the target task) is a migratable task. Correspondingly, the processor can obtain the task type of the target task, the task execution duration, the minimum cross-node memory access delay of the node to which the third arithmetic unit is expected for the target task, and the expected execution time of the task in the third arithmetic unit. The number (that is, the waiting reference count of the third arithmetic unit) and so on. Among them, the task types can include memory-intensive (that is, there are many I/O (Input/Output) instructions in the task, and tasks that require frequent reading and writing of data in the storage unit during execution) and computationally intensive (that is, There are many calculation instructions in the task, and tasks that require a large amount of computing resources when executed), and may also include other task types, which are not limited in the embodiment of the present application. Then, the processor can determine whether the task type of the target task is computationally intensive, whether the task execution time of the target task is greater than the minimum cross-node memory access delay, and whether the waiting reference count of the third arithmetic unit is greater than or equal to the third The preset number threshold. Wherein, the third preset number threshold can be set by a technician based on experience.
步骤二,如果任务类型为计算密集型,和/或任务执行时长大于最小跨节点访存延时,和/或第三运算单元中期望执行的任务的数目大于或等于第三预设数目阈值,则确定目标任务为可迁移任务,并根据预设的亲和性掩码修改规则,修改目标任务的亲和性掩码。Step 2: If the task type is computationally intensive, and/or the task execution time is greater than the minimum cross-node memory access delay, and/or the number of tasks expected to be executed in the third arithmetic unit is greater than or equal to the third preset number threshold, The target task is determined to be a transferable task, and the affinity mask of the target task is modified according to the preset affinity mask modification rule.
在实施中,如果该目标任务的任务类型为计算密集型,和/或该目标任务的任务执行时长大于最小跨节点访存延时,和/或第三运算单元的等待引用计数大于或等于第三预设数目阈值,则说明该目标任务进行迁移后不会对该目标任务的执行效率产生影响、且该目标任务所期望的第三运算单元较为繁忙可能会影响该目标任务的执行效率。因此,处理器可以确定该目标任务为可迁移任务。然后,处理器可以根据预设的亲和性掩码修改规则,修改该目标任务的亲和性掩码。其中,该目标任务的亲和性掩码(affinity)用于表示各节点中可执行该目标任务的节点,亲和性掩码中包括该智能处理器包含的节点的总数目个位,每个位唯一对应一个节点,如果某一位为1,则表示该位对应的节点可以执行该目标任务,如果某一位为0,则表示该位对应的节点不可以执行该目标任务;亲和性掩码修改规则可以由技术人员根据可迁移任务的迁移范围进行设置。In implementation, if the task type of the target task is computationally intensive, and/or the task execution time of the target task is greater than the minimum cross-node memory access delay, and/or the waiting reference count of the third arithmetic unit is greater than or equal to the first Three preset number thresholds indicate that after the target task is migrated, the execution efficiency of the target task will not be affected, and the busy third computing unit expected by the target task may affect the execution efficiency of the target task. Therefore, the processor can determine that the target task is a migratable task. Then, the processor can modify the affinity mask of the target task according to the preset affinity mask modification rule. Among them, the affinity mask of the target task (affinity) is used to indicate the nodes that can execute the target task in each node, and the affinity mask includes the total number of nodes contained in the intelligent processor. The bit uniquely corresponds to a node, if a bit is 1, it means that the node corresponding to the bit can perform the target task, if a bit is 0, it means that the node corresponding to the bit cannot perform the target task; affinity The mask modification rule can be set by the technician according to the migration scope of the migratable task.
例如,亲和性掩码修改规则为可迁移任务可以迁移至所有的节点,该目标任务的原亲和性掩码为0001,如果该目标任务为可迁移任务,则处理器可以根据亲和性掩码修改规则,将目标任务的亲和性掩码修改为1111。又如,亲和性掩码修改规则为可迁移任务可以迁移至节点3和节点4,该目标任务的原亲和性掩码为0001,如果该目标任务为可迁移任务,则处理器可以根据亲和性掩码修改规则,将目标任务的亲和性掩码修改为1101。For example, the affinity mask modification rule is that a migratable task can be migrated to all nodes, and the original affinity mask of the target task is 0001. If the target task is a migratable task, the processor can be based on the affinity Mask modification rules, modify the affinity mask of the target task to 1111. For another example, the affinity mask modification rule is that the migratable task can be migrated to node 3 and node 4. The original affinity mask of the target task is 0001. If the target task is a migratable task, the processor can follow Affinity mask modification rules, modify the affinity mask of the target task to 1101.
为了便于理解,其次,对本申请提供的一种迁移条件的判断方法进行介绍,具体处理过程如下。In order to facilitate understanding, secondly, a method for judging migration conditions provided in this application is introduced. The specific processing process is as follows.
步骤一,如果可迁移任务所期望的第二运算单元中执行的任务的任务属性与可迁移任务的任务属性不相同,则确定可迁移任务满足预设迁移条件。Step 1: If the task attribute of the task executed in the second computing unit expected by the transferable task is different from the task attribute of the transferable task, it is determined that the transferable task meets the preset migration condition.
在实施中,当某一运算单元被分配执行某一任务后,该运算单元仅可以执行任务属性与该任务的任务属性相同的任务。其中,任务属性为执行任务所需的运算单元的数目。基于上述原理,当处理器确定某一任务为可迁移任务后,处理器可以获取该可迁移任务的任务属性和可迁移任务所期望的第二运算单元中执行的任务的任务属性。然后,处理器可以判断第二运算单元中执行的任务的任务属性与可迁移任务的任务属性是否相同。如果第二运算单元中执行的任务的任务属性与可迁移任务的任务属性不相同,则说明该第二运算单元无法执行该可迁移任务,处理器可以确定可迁移任务满足预设迁移条件。这样,后续处理器可以将该可迁移任务迁移至可执行该可迁移任务的其他节点。如果第二运算单元中执行的任务的任务属性与可迁移任务的任务属性相同,则处理器执行步骤二。In implementation, when a certain arithmetic unit is assigned to perform a certain task, the arithmetic unit can only execute tasks with the same task attributes as the task attributes. Among them, the task attribute is the number of arithmetic units required to execute the task. Based on the foregoing principle, after the processor determines that a certain task is a migratable task, the processor can obtain the task attributes of the migratable task and the task attributes of the tasks executed in the second computing unit expected by the migratable task. Then, the processor can determine whether the task attribute of the task executed in the second computing unit is the same as the task attribute of the migratable task. If the task attribute of the task executed in the second arithmetic unit is different from the task attribute of the migratable task, it means that the second arithmetic unit cannot execute the migratable task, and the processor can determine that the migratable task meets the preset migration condition. In this way, subsequent processors can migrate the migratable task to other nodes that can execute the migratable task. If the task attribute of the task executed in the second computing unit is the same as the task attribute of the migratable task, the processor executes step two.
步骤二,如果第二运算单元中执行的任务的任务属性与可迁移任务的任务属性相同,则判断第二运算单元中待执行的任务的总数目是否大于或等于第一预设数目阈值。Step 2: If the task attributes of the tasks executed in the second arithmetic unit are the same as the task attributes of the transferable tasks, it is determined whether the total number of tasks to be executed in the second arithmetic unit is greater than or equal to the first preset number threshold.
在实施中,如果第二运算单元中执行的任务的任务属性与可迁移任务的任务属性相同,则说明该第二运算单元可以执行该可迁移任务。然后,处理器可以进一步判断第二运算单元中待执行的任务的总数目(即第二运算单元的真正引用计数)是否大于或等于第一预设数目阈值。其中,第一预设数目阈值可以由技术人员根据经验进行设置。如果第二运算单元中待执行的任务的总数目小于第一预设数目阈值,则说明该可迁移任务无需等待较长时间就可以被第二运算单元执行,处理器可以确定该可迁移任务不满足预设迁移条件。如果第二运算单元中待执行的任务的总数目大于或等于第一预设数目阈值,则处理器执行步骤三。In implementation, if the task attribute of the task executed in the second arithmetic unit is the same as the task attribute of the migratable task, it means that the second arithmetic unit can execute the migratable task. Then, the processor may further determine whether the total number of tasks to be executed in the second arithmetic unit (that is, the true reference count of the second arithmetic unit) is greater than or equal to the first preset number threshold. Wherein, the first preset number threshold can be set by a technician based on experience. If the total number of tasks to be executed in the second arithmetic unit is less than the first preset number threshold, it means that the migratable task can be executed by the second arithmetic unit without waiting a long time, and the processor can determine that the migratable task is not Meet the preset migration conditions. If the total number of tasks to be executed in the second computing unit is greater than or equal to the first preset number threshold, the processor executes step three.
步骤三,如果第二运算单元中待执行的任务的总数目大于或等于第一预设数目阈值,则确定可迁移任务满足预设迁移条件。Step 3: If the total number of tasks to be executed in the second computing unit is greater than or equal to the first preset number threshold, it is determined that the migratable tasks meet the preset migrating condition.
在实施中,如果第二运算单元中待执行的任务的总数目大于或等于第一预设数目阈值,则说明该可迁移任务需要等待较长时间才可以被第二运算单元执行。相应的,处理器可以确定该可迁移任务满足预设迁移条件,这样,该处理器将该可迁移任务迁移至其他节点,从而减少该可迁移任务的等待时长,提高该可迁移任务的执行效率。如果第二运算单元中待执行的任务的总数目小于第一预设数目阈值,则说明该可迁移任务无需等待较长时间就可以被第二运算单元执行。相应的,处理器可以确定该可迁移任务不满足预设迁移条件。In implementation, if the total number of tasks to be executed in the second computing unit is greater than or equal to the first preset number threshold, it means that the transferable task needs to wait a long time before being executed by the second computing unit. Correspondingly, the processor can determine that the migratable task satisfies the preset migration condition, so that the processor migrates the migratable task to other nodes, thereby reducing the waiting time of the migratable task and improving the execution efficiency of the migratable task . If the total number of tasks to be executed in the second arithmetic unit is less than the first preset number threshold, it means that the migratable task can be executed by the second arithmetic unit without waiting a long time. Correspondingly, the processor may determine that the migratable task does not meet the preset migrating condition.
下面将结合具体的实施例,对本申请提供的一种任务迁移的方法进行详细的说明。如图1-2所示,具体步骤如下。The method for task migration provided in the present application will be described in detail below in conjunction with specific embodiments. As shown in Figure 1-2, the specific steps are as follows.
步骤201,当检测到可迁移任务满足预设迁移条件时,根据可迁移任务的任务属性,在各节点中确定与可迁移任务相匹配的目标节点。其中,任务属性包括执行可迁移任务所需的运算单元的目标数目。Step 201: When it is detected that the migratable task meets the preset migration condition, a target node that matches the migratable task is determined in each node according to the task attribute of the migratable task. Among them, the task attribute includes the target number of arithmetic units required to execute the migratable task.
在实施中,当某一任务被调度至芯片中的软件队列后,处理器可以判断该任务是否为可迁移任务。如果该任务为可迁移任务,则处理器可以进一步检测该可迁移任务是否满足预设迁移条件。当处理器检测到该可迁移任务满足预设迁移条件时,可以根据可迁移任务的任务属性,在各节点中确定与可迁移任务相匹配的目标节点。其中,该可迁移任务的任务属性包括执行该可迁移任务所需的运算单元的目标数目。可选地,目标任务所需的运算单元的目标数目可以通过任务标识符进行表示,该任务标识符可以是Block任务或Union任务等等,此处不做具体限定。当该任务标识符为Union任务时,则系统可以根据Union的值确定运算单元的目标数目。例如,当Union=1时,则表明运行该目标任务需要一个节点中的四个运算单元。当Union=2时,则表明运行该目标任务需要两个节点中的八个运算单元。当Union=3时,则表明运行该目标任务需要三个节点中的十二个运算单元。当Union=4时,则表明运行该目标任务需要四个节点中的十六个运算单元。当该任务标识符为Block任务时,则表明运行该目标任务需要1个运算单元。In implementation, after a certain task is scheduled to the software queue in the chip, the processor can determine whether the task is a migratable task. If the task is a migratable task, the processor may further detect whether the migratable task meets the preset migration condition. When the processor detects that the migratable task meets the preset migration condition, it can determine the target node matching the migratable task among the nodes according to the task attributes of the migratable task. Wherein, the task attribute of the migratable task includes the target number of computing units required to execute the migratable task. Optionally, the target number of arithmetic units required by the target task may be represented by a task identifier, and the task identifier may be a Block task or a Union task, etc., which is not specifically limited here. When the task identifier is a Union task, the system can determine the target number of arithmetic units according to the value of Union. For example, when Union=1, it indicates that four arithmetic units in a node are required to run the target task. When Union=2, it indicates that eight arithmetic units in two nodes are required to run the target task. When Union=3, it indicates that twelve arithmetic units in three nodes are required to run the target task. When Union=4, it indicates that sixteen arithmetic units in four nodes are required to run the target task. When the task identifier is a Block task, it indicates that 1 arithmetic unit is required to run the target task.
可选的,处理器根据可迁移任务的任务属性,在各节点中确定与可迁移任务相匹配的目标节点的具体处理过程如下。Optionally, according to the task attributes of the migratable task, the processor determines the specific processing process of the target node matching the migratable task in each node as follows.
步骤一,如果各节点中存在包含目标数目个空闲的运算单元的候选节点,则在候选节点中,将与可迁移任务所期望的运算单元所属的节点之间的距离最小的候选节点,确定为目标节点。Step 1: If there is a candidate node containing the target number of free computing units in each node, among the candidate nodes, the candidate node with the smallest distance from the node to which the computing unit expected by the migratable task belongs is determined as Target node.
在实施中,当处理器检测到该可迁移任务满足预设迁移条件时,可以优先判断各节点中是否存在包含目标数目个空闲的运算单元(即真正引用计数等于0的运算单元)的候选节点。如果存在候选节点,则处理器可以在候选节点中,将与可迁移任务所期望的运算单元所属的节点之间的距离最小的候选节点,确定为目标节点,这样,后续处理器可以将该可迁移任务迁移至该目标节点,通过该目标节点中的 空闲的运算单元执行该可迁移任务,从而减少该可迁移任务的等待时长,提高该可迁移任务的执行效率。In implementation, when the processor detects that the migratable task meets the preset migration condition, it can firstly determine whether there are candidate nodes containing the target number of free computing units (that is, computing units with a true reference count equal to 0) in each node. . If there is a candidate node, the processor can determine the candidate node with the smallest distance from the node to which the computing unit expected by the migratable task belongs among the candidate nodes as the target node. In this way, the subsequent processor can determine the candidate node as the target node. The migration task is migrated to the target node, and the migratable task is executed through the idle computing unit in the target node, thereby reducing the waiting time of the migratable task and improving the execution efficiency of the migratable task.
例如,该可迁移任务所期待的运算单元所属的节点为节点1,候选节点为节点1和节点2,节点1与节点1和节点2之间的距离分别为0和1,则目标节点为节点1。又如,该可迁移任务所期待的运算单元所属的节点为节点1,候选节点为节点2和节点4,节点1与节点2和节点4之间的距离分别为1和2,则目标节点为节点2。For example, the node to which the computing unit expected by the migratable task belongs is node 1, the candidate nodes are node 1 and node 2, and the distances between node 1 and node 1 and node 2 are 0 and 1, respectively, then the target node is node 1. For another example, the node to which the computing unit expected by the migratable task belongs is node 1, the candidate nodes are node 2 and node 4, and the distances between node 1 and node 2 and node 4 are 1 and 2, respectively, then the target node is Node 2.
需要说明的是,如果候选节点中存在多个距离最小的候选节点,则处理器可以按照节点标识由小到大的顺序或由大到小的顺序,在多个距离最小的候选节点中,确定目标节点。或者,候选节点中存在多个距离最小的候选节点时,可以随机从上述多个候选节点中选择其中一个作为目标节点,此处不做具体限定。It should be noted that if there are multiple candidate nodes with the smallest distance among the candidate nodes, the processor can determine the number of candidate nodes with the smallest distance in descending order or descending order of the node identifiers. Target node. Or, when there are multiple candidate nodes with the smallest distance among the candidate nodes, one of the above multiple candidate nodes can be randomly selected as the target node, which is not specifically limited here.
步骤二,如果各节点中不存在包含目标数目个空闲的运算单元的候选节点,则将包含待执行的任务的总数目最小的第一运算单元的节点,确定为目标节点。其中,第一运算单元为执行的任务的任务属性与可迁移任务的任务属性相同的运算单元。Step 2: If there is no candidate node containing the target number of idle arithmetic units in each node, the node containing the first arithmetic unit with the smallest total number of tasks to be executed is determined as the target node. Wherein, the first operation unit is an operation unit whose task attribute of the executed task is the same as the task attribute of the transferable task.
在实施中,当某一运算单元被分配执行某一任务后,该运算单元仅可以执行任务属性与该任务的任务属性相同的任务。基于上述原理,如果各节点中不存在包含目标数目个空闲的运算单元的候选节点,则处理器可以进一步在各节点中,确定执行的任务的任务属性与可迁移任务的任务属性相同的第一运算单元。然后,处理器可以在各第一运算单元中确定待执行的任务的总数目最小(即真正引用计数最小)的第一运算单元,并将确定出的待执行的任务的总数目最小的第一运算单元所属的节点,作为目标节点。这样,后续处理器可以将该可迁移任务迁移至该目标节点,通过该目标节点中的目标运算单元执行该可迁移任务,从而减少该可迁移任务的等待时长,提高该可迁移任务的执行效率。例如,执行可迁移任务所需的运算单元的数目为3,节点1中执行的任务所需的运算单元的数目为3的运算单元为运算单元1至运算单元3,且运算单元1至运算单元3中待执行的任务的总数目10,节点2中执行的任务所需的运算单元的数目为3的运算单元为运算单元6至运算单元8,且运算单元6至运算单元8中待执行的任务的总数目15,节点4中执行的任务所需的运算单元的数目为3的运算单元为运算单元13至运算单元15,且运算单元13至运算单元15中待执行的任务的总数目5,则目标节点为运算单元13至运算单元15所属的节点4。In implementation, when a certain arithmetic unit is assigned to perform a certain task, the arithmetic unit can only execute tasks with the same task attributes as the task attributes. Based on the above principle, if there is no candidate node containing the target number of idle arithmetic units in each node, the processor can further determine in each node that the task attribute of the executed task is the same as the task attribute of the migratable task. Operation unit. Then, the processor may determine the first arithmetic unit with the smallest total number of tasks to be executed (that is, the smallest true reference count) in each first arithmetic unit, and determine the first arithmetic unit with the smallest total number of tasks to be executed. The node to which the arithmetic unit belongs is used as the target node. In this way, the subsequent processor can migrate the migratable task to the target node, and execute the migratable task through the target computing unit in the target node, thereby reducing the waiting time of the migratable task and improving the execution efficiency of the migratable task . For example, the number of arithmetic units required to perform a migratable task is 3, and the number of arithmetic units required for the task executed in node 1 is 3 arithmetic units are arithmetic unit 1 to arithmetic unit 3, and arithmetic unit 1 to arithmetic unit The total number of tasks to be executed in 3 is 10, and the number of arithmetic units required for tasks to be executed in node 2 is 3 arithmetic units are arithmetic units 6 to 8, and arithmetic units 6 to 8 are to be executed The total number of tasks is 15, and the number of arithmetic units required for tasks executed in node 4 is 3. The arithmetic units are arithmetic units 13 to 15 and the total number of tasks to be executed from arithmetic units 13 to 15 are 5 , The target node is the node 4 to which the computing unit 13 to the computing unit 15 belong.
作为一种可选的实施方式,由于任务的迁移会影响其他任务的执行。因此,当处理器检测到可迁移任务满足预设迁移条件时,根据可迁移任务的任务属性,在各节点中确定与可迁移任务相匹配的目标节点之前,处理器可以判断智能处理器中的运算单元是否出现负载不均,具体的处理过程如下。As an optional implementation manner, the migration of tasks will affect the execution of other tasks. Therefore, when the processor detects that the migratable task meets the preset migration conditions, according to the task attributes of the migratable task, before determining the target node matching the migratable task in each node, the processor can determine the Whether the arithmetic unit has uneven load, the specific processing process is as follows.
步骤一,获取各运算单元中期望执行的任务的数目。Step 1: Obtain the number of tasks expected to be executed in each arithmetic unit.
在实施中,处理器可以获取各运算单元中期望执行的任务的数目(即各运算单元的等待引用计数)。然后,处理器可以在各运算单元的等待引用计数中确定最大等待引用计数和最小等待引用计数。之后,处理器可以计算最大等待引用计数和最小等待引用计数的差值(即最大差值),并判断该最大差值是否大于或等于第二预设数目阈值。其中,第二预设数目阈值可以由技术人员根据经验进行设置。如果最大差值小于第二预设数目阈值,则说明该智能处理器中的运算单元未出现负载不均,处理器无需进行任务的迁移。如果最大差值大于或等于第二预设数目阈值,则说明该智能处理器中的运算单元负载不均,处理器执行步骤二。In implementation, the processor can obtain the number of tasks expected to be executed in each arithmetic unit (that is, the waiting reference count of each arithmetic unit). Then, the processor can determine the maximum waiting reference count and the minimum waiting reference count among the waiting reference counts of each arithmetic unit. After that, the processor may calculate the difference between the maximum waiting reference count and the minimum waiting reference count (that is, the maximum difference), and determine whether the maximum difference is greater than or equal to the second preset number threshold. Wherein, the second preset number threshold can be set by a technician based on experience. If the maximum difference is less than the second preset number threshold, it means that the computing unit in the intelligent processor has no load unevenness, and the processor does not need to perform task migration. If the maximum difference is greater than or equal to the second preset number threshold, it means that the computing units in the smart processor are not evenly loaded, and the processor executes step two.
步骤二,如果各运算单元中期望执行的任务的数目之间的最大差值大于或等于第二预设数目阈值,则当检测到可迁移任务满足预设迁移条件时,根据可迁移任务的任务属性,在各节点中,确定与可迁移任务相匹配的目标节点。Step 2: If the maximum difference between the number of tasks expected to be executed in each arithmetic unit is greater than or equal to the second preset number threshold, when it is detected that the migratable task meets the preset migrating condition, according to the task of the migratable task Properties, in each node, determine the target node that matches the migratable task.
在实施中,如果最大差值大于或等于第二预设数目阈值,则说明中的运算单元负载不均,当处理器 检测到可迁移任务满足预设迁移条件时,根据可迁移任务的任务属性,在各节点中,确定与可迁移任务相匹配的目标节点。其中,当处理器检测到可迁移任务满足预设迁移条件时,根据可迁移任务的任务属性,在各节点中,确定与可迁移任务相匹配的目标节点的处理过程与步骤201类似,此处不再赘述。In implementation, if the maximum difference is greater than or equal to the second preset number threshold, the load of the computing unit in the description is uneven. When the processor detects that the migratable task meets the preset migration condition, it will be based on the task attribute of the migratable task. , In each node, determine the target node that matches the migratable task. Wherein, when the processor detects that the migratable task meets the preset migration condition, according to the task attributes of the migratable task, in each node, the process of determining the target node matching the migratable task is similar to step 201, here No longer.
步骤202,将可迁移任务迁移至目标节点,以通过目标节点执行可迁移任务。Step 202: Migrate the migratable task to the target node, so as to execute the migratable task through the target node.
在实施中,处理器确定出目标节点后,可以将该可迁移任务迁移至目标节点,以通过目标节点执行可迁移任务。In implementation, after the processor determines the target node, it can migrate the migratable task to the target node, so as to execute the migratable task through the target node.
作为一种可选的实施方式,处理器将可迁移任务迁移至目标节点之前,处理器还可以修改该可迁移任务的使用掩码,具体处理过程为:如果目标节点与可迁移任务所期望的运算单元所在的节点不相同,则在可迁移任务的使用掩码中,将目标节点对应的位置为1,并将可迁移任务所期望的运算单元所在的节点对应的位置为0。As an optional implementation manner, before the processor migrates the migratable task to the target node, the processor may also modify the use mask of the migratable task. The specific processing process is: if the target node and the migratable task expect The nodes where the arithmetic units are located are not the same, then in the use mask of the migratable task, the position corresponding to the target node is set to 1, and the position corresponding to the node where the arithmetic unit expected by the migratable task is located is 0.
在实施中,可迁移任务的使用掩码(usage_mask)用于表示各节点中确定执行该可迁移任务的节点,使用掩码中包括该芯片包含的节点的总数目个位,每个位唯一对应一个节点,如果某一位为1,则表示该位对应的节点确定执行该可迁移任务,如果某一位为0,则表示该位对应的节点不执行该可迁移任务。当处理器确定出该可迁移任务的目标节点后,可以判断该目标节点与可迁移任务所期望的运算单元所在的节点是否相同。如果该目标节点与可迁移任务所期望的运算单元所在的节点相同,则处理器无需修改该可迁移任务的使用掩码。如果该目标节点与可迁移任务所期望的运算单元所在的节点不相同,则处理器可以在可迁移任务的使用掩码中,将目标节点对应的位置为1,并将可迁移任务所期望的运算单元所在的节点对应的位置为0。例如,可迁移任务所期望的运算单元所在的节点为节点1,则可迁移任务的原使用掩码为0001。假设目标节点为节点2,则可迁移任务的修改后的使用掩码为0010。In implementation, the usage_mask of the migratable task is used to indicate the node that determines the execution of the migratable task in each node. The usage mask includes the total number of bits contained in the chip, and each bit uniquely corresponds to For a node, if a bit is 1, it means that the node corresponding to the bit is determined to execute the migratable task, and if a bit is 0, it means that the node corresponding to the bit does not execute the migratable task. After the processor determines the target node of the migratable task, it can determine whether the target node is the same as the node where the computing unit expected by the migratable task is located. If the target node is the same as the node where the computing unit expected by the migratable task is located, the processor does not need to modify the use mask of the migratable task. If the target node is not the same as the node where the computing unit expected by the migratable task is located, the processor can set the position corresponding to the target node to 1 in the use mask of the migratable task, and set the desired position of the migratable task. The position corresponding to the node where the arithmetic unit is located is 0. For example, if the node where the computing unit expected by the migratable task is located is node 1, then the original usage mask of the migratable task is 0001. Assuming that the target node is node 2, the modified usage mask of the migratable task is 0010.
作为一种可选的实施方式,处理器将可迁移任务迁移至目标节点之前,处理器还可以根据可迁移任务的亲和性掩码和使用掩码判断是否可以将可迁移任务迁移至目标节点,具体的处理过程为:如果可迁移任务的亲和性掩码和使用掩码中,目标节点对应的位均为1,则将可迁移任务迁移至目标节点。As an optional implementation manner, before the processor migrates the migratable task to the target node, the processor can also determine whether the migratable task can be migrated to the target node according to the affinity mask and the usage mask of the migratable task. , The specific processing process is: if the bits corresponding to the target node in the affinity mask and the usage mask of the migratable task are both 1, then the migratable task is migrated to the target node.
在实施中,处理器将可迁移任务迁移至目标节点之前,处理器可以判断可迁移任务的亲和性掩码和使用掩码中,目标节点对应的位是否均为1。如果可迁移任务的亲和性掩码和使用掩码中,目标节点对应的位均为1,则说明该可迁移任务可以迁移至目标节点。相应的,处理器可以将可迁移任务迁移至目标节点。如果可迁移任务的亲和性掩码中,目标节点对应的位置为0,则说明该可迁移任务不可以迁移至目标节点。In implementation, before the processor migrates the migratable task to the target node, the processor can determine whether the bits corresponding to the target node in the affinity mask and the usage mask of the migratable task are both 1. If the bits corresponding to the target node in the affinity mask and the usage mask of the migratable task are both 1, it means that the migratable task can be migrated to the target node. Correspondingly, the processor can migrate the migratable task to the target node. If the position corresponding to the target node in the affinity mask of the migratable task is 0, it means that the migratable task cannot be migrated to the target node.
本申请实施例提供的任务迁移的方法,当处理器检测到可迁移任务满足预设迁移条件时,根据可迁移任务的任务属性,在各节点中确定与可迁移任务相匹配的目标节点。其中,任务属性包括执行可迁移任务所需的运算单元的目标数目。然后,处理器将可迁移任务迁移至目标节点,以通过目标节点执行可迁移任务。这样,当可迁移任务所期待的运算单元无法执行该可迁移任务,或者该可迁移任务需要等待较长时间才可以被该可迁移任务所期待的运算单元执行时,处理器可以将该可迁移任务迁移至目标节点,从而减少该可迁移任务的等待时长,提高该可迁移任务的执行效率。In the method for task migration provided by the embodiment of the present application, when the processor detects that the migratable task meets the preset migration condition, the target node that matches the migratable task is determined in each node according to the task attribute of the migratable task. Among them, the task attribute includes the target number of arithmetic units required to execute the migratable task. Then, the processor migrates the migratable task to the target node to execute the migratable task through the target node. In this way, when the computing unit expected by the migratable task cannot execute the migratable task, or the migratable task needs to wait a long time before it can be executed by the computing unit expected by the migratable task, the processor can perform the migratable task. The task is migrated to the target node, thereby reducing the waiting time of the migratable task and improving the execution efficiency of the migratable task.
本申请实施例还提供了一种任务迁移的装置,如图1-3所示,该装置包括:The embodiment of the present application also provides a device for task migration. As shown in Figures 1-3, the device includes:
第一确定模块310,用于当检测到可迁移任务满足预设迁移条件时,根据可迁移任务的任务属性,在各节点中确定与可迁移任务相匹配的目标节点,任务属性包括执行可迁移任务所需的运算单元的目标数目;The first determining module 310 is configured to, when it is detected that the migratable task meets the preset migration condition, determine the target node matching the migratable task in each node according to the task attribute of the migratable task, and the task attribute includes executing the migratable task. The target number of computing units required for the task;
迁移模块320,用于将可迁移任务迁移至目标节点,以通过目标节点执行可迁移任务。The migration module 320 is configured to migrate the migratable task to the target node, so as to execute the migratable task through the target node.
作为一种可选的实施方式,第一确定模块310,具体用于:As an optional implementation manner, the first determining module 310 is specifically configured to:
如果各节点中存在包含目标数目个空闲的运算单元的候选节点,则在候选节点中,将与可迁移任务 所期望的运算单元所属的节点之间的距离最小的候选节点,确定为目标节点;If there are candidate nodes containing the target number of free computing units in each node, among the candidate nodes, the candidate node with the smallest distance from the node to which the computing unit expected by the migratable task belongs is determined as the target node;
如果各节点中不存在包含目标数目个空闲的运算单元的候选节点,则将包含待执行的任务的总数目最小的第一运算单元的节点,确定为目标节点,第一运算单元为执行的任务的任务属性与可迁移任务的任务属性相同的运算单元。If there is no candidate node containing the target number of idle arithmetic units in each node, the node containing the first arithmetic unit with the smallest total number of tasks to be executed is determined as the target node, and the first arithmetic unit is the task to be executed The task attribute of the task attribute is the same as the task attribute of the transferable task.
作为一种可选的实施方式,该装置还包括:As an optional implementation manner, the device further includes:
第二确定模块,用于如果可迁移任务所期望的第二运算单元中执行的任务的任务属性与可迁移任务的任务属性不相同,则确定可迁移任务满足预设迁移条件;The second determining module is configured to determine that the migratable task meets the preset migration condition if the task attribute of the task executed in the second computing unit expected by the migratable task is different from the task attribute of the migratable task;
判断模块,用于如果第二运算单元中执行的任务的任务属性与可迁移任务的任务属性相同,则判断第二运算单元中待执行的任务的总数目是否大于或等于第一预设数目阈值;A judging module for judging whether the total number of tasks to be executed in the second arithmetic unit is greater than or equal to the first preset number threshold if the task attributes of the tasks executed in the second arithmetic unit are the same as those of the migratable tasks ;
第三确定模块,用于如果第二运算单元中待执行的任务的总数目大于或等于第一预设数目阈值,则确定可迁移任务满足预设迁移条件。The third determining module is configured to determine that if the total number of tasks to be executed in the second computing unit is greater than or equal to the first preset number threshold, the migratable tasks meet the preset migrating condition.
作为一种可选的实施方式,该装置还包括:As an optional implementation manner, the device further includes:
获取模块,用于获取各运算单元中期望执行的任务的数目;The obtaining module is used to obtain the number of tasks expected to be executed in each arithmetic unit;
第四确定模块,用于如果各运算单元中期望执行的任务的数目之间的最大差值大于或等于第二预设数目阈值,则触发第一确定模块310执行当检测到可迁移任务满足预设迁移条件时,根据可迁移任务的任务属性,在各节点中,确定与可迁移任务相匹配的目标节点的步骤。The fourth determining module is configured to trigger the first determining module 310 to execute when it is detected that the migratable task meets the predetermined threshold if the maximum difference between the number of tasks expected to be executed in each arithmetic unit is greater than or equal to the second preset number threshold. When setting the migration condition, according to the task attribute of the migratable task, in each node, determine the step of the target node that matches the migratable task.
作为一种可选的实施方式,该装置还包括:As an optional implementation manner, the device further includes:
第五确定模块,用于获取待执行的目标任务,并确定目标任务的任务类型、任务执行时长、目标任务所期望的第三运算单元所属的节点的最小跨节点访存延时和第三运算单元中期望执行的任务的数目;The fifth determination module is used to obtain the target task to be executed, and determine the task type of the target task, the task execution time, the minimum cross-node memory access delay of the node to which the third operation unit belongs and the third operation expected by the target task The number of tasks expected to be performed in the unit;
修改模块,用于如果任务类型为计算密集型,和/或任务执行时长大于最小跨节点访存延时,和/或第三运算单元中期望执行的任务的数目大于或等于第三预设数目阈值,则确定目标任务为可迁移任务,并根据预设的亲和性掩码修改规则,修改目标任务的亲和性掩码。Modification module, used if the task type is computationally intensive, and/or the task execution time is greater than the minimum cross-node memory access delay, and/or the number of tasks expected to be executed in the third arithmetic unit is greater than or equal to the third preset number Threshold, the target task is determined to be a transferable task, and the affinity mask of the target task is modified according to the preset affinity mask modification rule.
作为一种可选的实施方式,该装置还包括:As an optional implementation manner, the device further includes:
设置模块,用于如果目标节点与可迁移任务所期望的运算单元所在的节点不相同,则在可迁移任务的使用掩码中,将目标节点对应的位置为1,并将可迁移任务所期望的运算单元所在的节点对应的位置为0。The setting module is used to set the corresponding position of the target node to 1 in the use mask of the migratable task if the target node is not the same as the node where the computing unit expected by the migratable task is located, and set the desired position of the migratable task The position corresponding to the node where the arithmetic unit of is 0.
作为一种可选的实施方式,该装置还包括:As an optional implementation manner, the device further includes:
第六确定模块,用于如果可迁移任务的亲和性掩码和使用掩码中,目标节点对应的位均为1,则触发迁移模块320执行将可迁移任务迁移至目标节点的步骤。The sixth determining module is used to trigger the migration module 320 to execute the step of migrating the migratable task to the target node if the bits corresponding to the target node in the affinity mask and the usage mask of the migratable task are both 1.
本申请实施例提供的任务迁移的装置,当CPU检测到可迁移任务满足预设迁移条件时,根据可迁移任务的任务属性,在各节点中确定与可迁移任务相匹配的目标节点。其中,任务属性包括执行可迁移任务所需的运算单元的目标数目。然后,CPU将可迁移任务迁移至目标节点,以通过目标节点执行可迁移任务。这样,当可迁移任务所期待的运算单元无法执行该可迁移任务,或者该可迁移任务需要等待较长时间才可以被该可迁移任务所期待的运算单元执行时,CPU可以将该可迁移任务迁移至目标节点,从而减少该可迁移任务的等待时长,提高该可迁移任务的执行效率。According to the task migration device provided by the embodiment of the present application, when the CPU detects that the migratable task meets the preset migration condition, the target node that matches the migratable task is determined in each node according to the task attribute of the migratable task. Among them, the task attribute includes the target number of arithmetic units required to execute the migratable task. Then, the CPU migrates the migratable task to the target node to execute the migratable task through the target node. In this way, when the computing unit expected by the migratable task cannot execute the migratable task, or the migratable task needs to wait a long time before it can be executed by the computing unit expected by the migratable task, the CPU can perform the migratable task Migrate to the target node, thereby reducing the waiting time of the migratable task and improving the execution efficiency of the migratable task.
在一个实施例中,如图1-4所示,本申请还提供了一种计算机设备,包括存储器及处理器,所述存储器上存储有可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述任务迁移的方法步骤。具体地,处理器执行上述任务迁移的方法的实现过程可参见图1-2以及上文的描述,此处不再赘述。In one embodiment, as shown in FIGS. 1-4, the present application also provides a computer device, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor The method steps of the above task migration are realized when the computer program is executed. Specifically, the implementation process of the method in which the processor executes the above-mentioned task migration can refer to FIG. 1-2 and the above description, which will not be repeated here.
在一个实施例中,一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执 行时实现上述任务迁移的方法的步骤。In one embodiment, a computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the steps of the above-mentioned task migration method are realized.
本申请一实施例中,存在某一任务可以被拆分为至少一个子任务(下文称为作业)的情况,此时不同的作业同样可以被分配至某一节点中进行执行。因此,基于上述分配过程中由于亲和性绑定原则的因素,往往会导致作业需要长时间的等待,进而严重影响作业的执行效率。In an embodiment of the present application, there is a situation in which a certain task can be split into at least one subtask (hereinafter referred to as a job). In this case, different jobs can also be allocated to a certain node for execution. Therefore, due to the affinity binding principle in the above allocation process, the job often needs to wait for a long time, which seriously affects the execution efficiency of the job.
基于此,本申请实施例还提供了一种作业处理方法,该方法可以应用于一芯片中,该芯片可以包含采用NUMA架构的智能处理器和通用处理器,该通用处理器可以是CPU(central processing unit,中央处理器)等。该采用NUMA架构的智能处理器可以为加速处理器,或者IPU(Intelligent Processing Unit,智能处理单元)处理器,或者GPU(Graphics Processing Unit,图形处理单元)处理器,也可以为其他类型的处理器,本申请实施例不作限定。具体地,该方法可以应用于上述的芯片中,上述芯片中的通用处理器(CPU)可以执行上述的作业处理方法,以将多个作业分发至智能处理器中的至少一个运算单元执行。本申请的作业处理方法的具体执行过程可参见后文的描述。Based on this, an embodiment of the present application also provides a job processing method, which can be applied to a chip, and the chip can include an intelligent processor with a NUMA architecture and a general-purpose processor. The general-purpose processor can be a CPU (central processing unit, central processing unit) and so on. The intelligent processor using the NUMA architecture can be an accelerated processor, or an IPU (Intelligent Processing Unit) processor, or a GPU (Graphics Processing Unit, graphics processing unit) processor, or other types of processors , The embodiments of this application are not limited. Specifically, the method can be applied to the above-mentioned chip, and a general-purpose processor (CPU) in the above-mentioned chip can execute the above-mentioned job processing method to distribute multiple jobs to at least one arithmetic unit in the intelligent processor for execution. For the specific execution process of the job processing method of this application, please refer to the following description.
可选地,该NUMA架构的智能处理器包含具有多个运算单元的处理器和多个存储单元。多个运算单元通常划分为多个运算单元组,每个运算单元组分配有至少一个存储单元,一个运算单元组及其对应的存储单元构成一个节点。一个节点中的运算单元所需要数据的读写都可以通过本节点中的存储单元实现,不同节点之间通过通信接口实现数据的读写。图1-1为本申请实施例提供的一种NUMA架构的智能处理器的示意图。如图1-1所示,该智能处理器包含具有16个运算单元和4个存储单元,该智能处理器中划分出4个节点,每个节点包含4个运算单元和1个存储单元。图1-1仅以示意的方式提供了一种智能处理器的示意图,在其他可能实现的方式中,各个节点还可以包含四个以上的运算单元和1个存储单元,该存储单元可以包括多个子存储单元。例如,各个节点可以包括四个子节点,即每个节点可以包括16个运算单元。每个子节点包含四个运算单元和1个子存储单元,四个子节点的排布方式可以按照四个节点的方式排布。进一步地,上述作业处理方法可以在单个节点的各个子节点之间执行,其执行过程具体可参见下文关于作业处理方法的描述。Optionally, the intelligent processor of the NUMA architecture includes a processor with multiple arithmetic units and multiple storage units. Multiple arithmetic units are usually divided into multiple arithmetic unit groups, and each arithmetic unit group is equipped with at least one storage unit, and an arithmetic unit group and its corresponding storage unit constitute a node. The reading and writing of data required by the arithmetic unit in a node can all be realized through the storage unit in the node, and the reading and writing of data between different nodes is realized through the communication interface. Figure 1-1 is a schematic diagram of an intelligent processor with a NUMA architecture provided by an embodiment of the application. As shown in Figure 1-1, the smart processor contains 16 arithmetic units and 4 storage units. The smart processor is divided into 4 nodes, and each node contains 4 arithmetic units and 1 storage unit. Figure 1-1 only provides a schematic diagram of an intelligent processor in a schematic manner. In other possible implementation manners, each node may also include more than four arithmetic units and a storage unit, and the storage unit may include multiple Sub-storage unit. For example, each node may include four sub-nodes, that is, each node may include 16 arithmetic units. Each sub-node contains four arithmetic units and one sub-storage unit, and the arrangement of the four sub-nodes can be arranged in the manner of four nodes. Further, the above-mentioned job processing method may be executed between the sub-nodes of a single node, and the execution process can be referred to the description of the job processing method below.
本申请实施例首先对作业的划分及亲和性掩码的修改进行介绍,如图2-1所示,具体处理过程如下:The embodiment of this application first introduces the division of jobs and the modification of the affinity mask, as shown in Figure 2-1. The specific processing process is as follows:
步骤201,获取待执行的目标任务,并确定目标任务的各维度信息和执行目标任务所需的运算单元的目标数目。Step 201: Obtain the target task to be executed, and determine each dimensional information of the target task and the target number of arithmetic units required to execute the target task.
在实施中,当某一任务(即目标任务)被调度至软件队列后,处理器可以确定该目标任务的各维度信息(即dimX、dimY和dimZ)和执行该目标任务所需的运算单元的目标数目(即kernel_class)。然后,处理器可以计算各维度信息的乘积(即dimX*dimY*dimZ)与目标数目的比值,并判断该比值是否大于1。如果该比值大于1,则说明该目标任务可以被拆分为多个作业,处理器执行步骤202。如果该比值小于或等于1,则说明该目标任务无法被拆分为多个作业。In implementation, when a task (that is, the target task) is scheduled to the software queue, the processor can determine the dimensional information of the target task (that is, dimX, dimY, and dimZ) and the calculation unit required to execute the target task. The number of targets (ie kernel_class). Then, the processor can calculate the ratio of the product of each dimension information (that is, dimX*dimY*dimZ) to the target number, and determine whether the ratio is greater than one. If the ratio is greater than 1, it means that the target task can be split into multiple jobs, and the processor executes step 202. If the ratio is less than or equal to 1, it means that the target task cannot be split into multiple jobs.
步骤202,如果各维度信息的乘积与目标数目的比值大于1,则将目标任务添加至可拆分任务列表中。Step 202: If the ratio of the product of each dimension information to the number of targets is greater than 1, the target task is added to the list of splittable tasks.
在实施中,如果比值大于1,则说明该目标任务可以被拆分为多个作业。相应的,处理器可以将该目标任务添加至可拆分任务列表中。其中,该可拆分任务列表用于存储可以拆分为多个作业的任务;该可拆分任务列表可以为链表,也可以为其他类型的列表,本申请实施例不作限定。另外,当可拆分任务列表中的某一任务包含的所有作业执行完毕后,处理器可以将该任务从该可拆分任务列表中删除。In implementation, if the ratio is greater than 1, it means that the target task can be split into multiple jobs. Correspondingly, the processor can add the target task to the list of splittable tasks. The splittable task list is used to store tasks that can be split into multiple jobs; the splittable task list may be a linked list or other types of lists, which is not limited in the embodiment of the present application. In addition, when all jobs included in a task in the splittable task list are executed, the processor may delete the task from the splittable task list.
可选地,当确定该目标任务可以被拆分为多个作业时,可以将该目标任务发送至调度器,调度器可以根据该目标任务的各维度信息和该目标任务所需的运算单元的目标数目等任务属性,将该目标任务拆分为多个作业。进一步可选地,该调度器可以是置于芯片上的硬件调度器,该硬件调度器可以包括任务拆分单元等多个电路模块。当然,该调度器也可以是软件调度器,此处不做具体限定。Optionally, when it is determined that the target task can be split into multiple jobs, the target task can be sent to the scheduler, and the scheduler can be based on the dimensional information of the target task and the calculation unit required by the target task. Task attributes such as the number of targets, split the target task into multiple jobs. Further optionally, the scheduler may be a hardware scheduler placed on a chip, and the hardware scheduler may include multiple circuit modules such as a task splitting unit. Of course, the scheduler may also be a software scheduler, which is not specifically limited here.
步骤203,根据预设的亲和性掩码修改规则,修改目标任务的亲和性掩码。Step 203: Modify the affinity mask of the target task according to the preset affinity mask modification rule.
在实施中,处理器将目标任务拆分为多个目标作业后,处理器可以根据预设的亲和性掩码修改规则,修改该目标任务的亲和性掩码。其中,该目标任务的亲和性掩码(affinity)用于表示各节点中可执行该目标任务的节点,亲和性掩码中包括该智能处理器包含的节点的总数目个位,每个位唯一对应一个节点,如果某一位为1,则表示该位对应的节点可以执行该目标任务,如果某一位为0,则表示该位对应的节点不可以执行该目标任务。亲和性掩码修改规则可以由技术人员根据作业处理的节点范围进行设置。例如,亲和性掩码修改规则为任务以迁移至所有的节点,该目标任务的原亲和性掩码为0001,则处理器可以根据亲和性掩码修改规则,将该目标任务的亲和性掩码修改为1111。又如,亲和性掩码修改规则为任务可以迁移至节点3和节点4,该目标任务的原亲和性掩码为0001,则处理器可以根据亲和性掩码修改规则,将该目标任务的亲和性掩码修改为1101。上述步骤202和步骤203不区分先后顺序。In implementation, after the processor splits the target task into multiple target jobs, the processor can modify the affinity mask of the target task according to the preset affinity mask modification rule. Among them, the affinity mask of the target task (affinity) is used to indicate the nodes that can execute the target task in each node, and the affinity mask includes the total number of nodes contained in the intelligent processor. A bit uniquely corresponds to a node. If a bit is 1, it means that the node corresponding to the bit can perform the target task; if a bit is 0, it means that the node corresponding to the bit cannot perform the target task. The affinity mask modification rule can be set by the technician according to the range of nodes processed by the job. For example, if the affinity mask modification rule is a task to migrate to all nodes, and the original affinity mask of the target task is 0001, the processor can modify the rule according to the affinity mask to change the affinity of the target task. The harmony mask is revised to 1111. For another example, the affinity mask modification rule is that the task can be migrated to node 3 and node 4. The original affinity mask of the target task is 0001, and the processor can modify the rule according to the affinity mask to change the target The affinity mask of the task is modified to 1101. The above-mentioned step 202 and step 203 do not distinguish the sequence.
需要说明的是,由于目标作业是由目标任务进行拆分得到的,因此,目标作业的亲和性掩码与目标任务的亲和性掩码相同。It should be noted that since the target job is obtained by splitting the target task, the affinity mask of the target job is the same as the affinity mask of the target task.
下面将结合具体的实施例对本申请提供的一种作业处理的方法进行介绍,如图2-2所示,具体处理过程如下。The following will introduce a job processing method provided by the present application in conjunction with specific embodiments, as shown in Figure 2-2, and the specific processing process is as follows.
步骤301,当满足预设处理条件时,根据目标任务包含的目标作业的作业属性,在各节点中确定与目标作业相匹配的第一节点。其中,作业属性包括执行目标作业所需的运算单元的目标数目。Step 301: When the preset processing conditions are met, the first node matching the target job is determined in each node according to the job attribute of the target job included in the target task. Among them, the job attribute includes the target number of arithmetic units required to execute the target job.
在实施中,当处理器将某一任务(即目标任务)调度至软件队列后,处理器可以判断该目标任务是否可以拆分为多个目标作业(JOB)。如果该目标任务可以拆分为多个目标作业,则可以确定该目标任务是可以放松亲和性的任务,之后处理器可以进一步判断是否满足预设处理条件。其中,处理器判断是否满足预设处理条件的处理过程,后续会进行详细介绍。当满足预设处理条件时,处理器可以根据目标作业的作业属性,在各节点中确定与目标作业相匹配的第一节点。其中,该目标作业的作业属性包括执行该目标作业所需的运算单元的目标数目。In implementation, after the processor schedules a certain task (ie, target task) to the software queue, the processor can determine whether the target task can be split into multiple target jobs (JOB). If the target task can be divided into multiple target tasks, it can be determined that the target task is a task that can relax affinity, and then the processor can further determine whether the preset processing condition is satisfied. Among them, the processing procedure for the processor to determine whether the preset processing condition is met will be described in detail later. When the preset processing conditions are met, the processor may determine the first node matching the target job among the nodes according to the job attributes of the target job. Wherein, the job attribute of the target job includes the target number of arithmetic units required to execute the target job.
可选的,处理器根据目标作业的作业属性,在各节点中确定与目标作业相匹配的第一节点的处理过程为:针对各节点中的每个节点,如果该节点中空闲的运算单元的数目大于或等于目标数目,则将该节点确定为第一节点。Optionally, according to the job attributes of the target job, the processor determines the first node that matches the target job in each node. The processing process is: for each node in each node, if there is an idle computing unit in the node, If the number is greater than or equal to the target number, the node is determined as the first node.
在实施中,当满足预设处理条件时,针对每个节点,处理器可以获取该节点中空闲的运算单元(即真正引用计数等于0)的数目。然后,处理器可以判断该节点中空闲的运算单元的数目是否大于或等于目标数目。如果该节点中空闲的运算单元的数目大于或等于目标数目,则说明该节点可以执行该目标任务包含的目标作业。相应的,处理器可以将该节点确认为第一节点。如果该节点中空闲的运算单元的数目小于目标数目,则说明该节点无法执行该目标任务包含的目标作业。相应的,该节点不是第一节点。In implementation, when the preset processing conditions are met, for each node, the processor can obtain the number of free arithmetic units (that is, the true reference count is equal to 0) in the node. Then, the processor can determine whether the number of free arithmetic units in the node is greater than or equal to the target number. If the number of free computing units in the node is greater than or equal to the target number, it means that the node can execute the target job included in the target task. Correspondingly, the processor may confirm the node as the first node. If the number of free arithmetic units in the node is less than the target number, it means that the node cannot execute the target job included in the target task. Correspondingly, this node is not the first node.
作为一种可选的实施方式,如图2-3所示,处理器确定是否满足预设处理条件的处理过程如下:As an optional implementation manner, as shown in FIG. 2-3, the processing procedure for the processor to determine whether the preset processing condition is satisfied is as follows:
步骤401,获取各节点中空闲的运算单元的空闲时长。Step 401: Obtain idle time lengths of idle computing units in each node.
在实施中,针对各节点中的每个节点,处理器可以获取该节点中空闲的运算单元(即真正引用计数等于0)的空闲时长。In implementation, for each node in each node, the processor can obtain the idle duration of the idle operation unit (that is, the true reference count is equal to 0) in the node.
步骤402,如果可拆分任务列表中存在等待执行的包含多个作业的任务、且各空闲的运算单元的空闲时长中存在大于或等于预设时长阈值的空闲时长,则确定满足预设处理条件。Step 402: If there is a task that includes multiple jobs waiting to be executed in the splittable task list, and there is an idle time longer than or equal to the preset time threshold in the idle time of each idle computing unit, it is determined that the preset processing condition is satisfied .
在实施中,处理器得到各空闲的运算单元的空闲时长后,可以进一步判断可拆分任务列表中是否存在等待执行的包含多个作业的任务,并判断各空闲的运算单元的空闲时长中,是否存在大于或等于预设时长阈值的空闲时长。其中,该预设时长阈值可以由技术人员根据经验进行设置。如果可拆分任务列表中存在等待执行的包含多个作业的任务、且各空闲的运算单元的空闲时长中存在大于或等于预设时长阈 值的空闲时长,则说明处理器可以将可拆分任务列表中待执行的任务分配至各节点中的空闲的运算单元进行执行。相应的,该处理器可以确定满足预设处理条件。如果可拆分任务列表中不存在等待执行的任务,或者各空闲的运算单元的空闲时长中不存在大于或等于预设时长阈值的空闲时长,则说明处理器无法将可拆分任务列表中待执行的任务分配至各节点中的空闲的运算单元上进行执行。相应的,该处理器可以确定不满足预设处理条件。In implementation, after the processor obtains the idle duration of each idle computing unit, it can further determine whether there is a task containing multiple jobs waiting to be executed in the splittable task list, and determine the idle duration of each idle computing unit, Whether there is an idle period greater than or equal to the preset period threshold. Wherein, the preset duration threshold can be set by a technician based on experience. If there is a task that contains multiple jobs waiting to be executed in the splittable task list, and there is an idle duration greater than or equal to the preset duration threshold in the idle duration of each idle computing unit, it means that the processor can split the splittable task The tasks to be executed in the list are allocated to the idle arithmetic units in each node for execution. Correspondingly, the processor can determine that the preset processing condition is satisfied. If there is no task waiting to be executed in the splittable task list, or there is no idle time greater than or equal to the preset duration threshold in the idle time of each idle computing unit, it means that the processor cannot wait for the splittable task list. The tasks to be executed are allocated to the idle computing units in each node for execution. Correspondingly, the processor may determine that the preset processing condition is not satisfied.
步骤302,通过第一节点和执行目标任务的运算单元所在的第二节点,执行目标任务包含的目标作业。Step 302: Execute the target job included in the target task through the first node and the second node where the arithmetic unit that executes the target task is located.
在实施中,处理器确定出第一节点后,可以通过第一节点和执行目标任务的运算单元所在的第二节点,执行目标任务包含的目标作业。这样,当目标作业需要等待较长时间才可以被该目标作业所等待执行的运算单元执行时,处理器可以通过第一节点和第二节点共同执行该目标作业,从而减少该目标作业的等待时长,提高该目标作业的执行效率。In implementation, after the processor determines the first node, it can execute the target job included in the target task through the first node and the second node where the computing unit that executes the target task is located. In this way, when the target job needs to wait a long time before it can be executed by the arithmetic unit that the target job is waiting to execute, the processor can execute the target job through the first node and the second node, thereby reducing the waiting time of the target job , Improve the execution efficiency of the target job.
作为一种可选的实施方式,处理器通过第一节点和第二节点,执行目标任务包含的目标作业之前,处理器还可以根据确定出的第一节点修改目标作业的使用掩码,具体处理过程为:在目标任务的使用掩码中,将第一节点对应的位置为1。As an optional implementation manner, before the processor executes the target job included in the target task through the first node and the second node, the processor may also modify the use mask of the target job according to the determined first node. The process is: in the use mask of the target task, set the position corresponding to the first node to 1.
在实施中,目标任务的使用掩码(usage_mask)用于表示各节点中确定执行该目标任务的节点,使用掩码中包括该智能处理器包含的节点的总数目个位,每个位唯一对应一个节点,如果某一位为1,则表示该位对应的节点确定执行该目标任务,如果某一位为0,则表示该位对应的节点不执行该目标任务。当处理器确定出该目标作业的第一节点后,在处理器将目标作业调度至硬件队列之前,可以在该目标任务的使用掩码中,将第一节点对应的位置为1。例如,该目标任务的原使用掩码为0001。假设第一节点为节点2、节点4,则该目标任务的修改后的使用掩码为1011。In the implementation, the usage_mask of the target task is used to indicate the node that determines the execution of the target task in each node. The usage mask includes the total number of bits contained in the intelligent processor, and each bit uniquely corresponds to For a node, if a bit is 1, it means that the node corresponding to the bit is determined to perform the target task, and if a bit is 0, it means that the node corresponding to the bit does not perform the target task. After the processor determines the first node of the target job, before the processor schedules the target job to the hardware queue, the position corresponding to the first node can be set to 1 in the usage mask of the target task. For example, the original use mask of the target task is 0001. Assuming that the first node is node 2 and node 4, the modified use mask of the target task is 1011.
需要说明的是,由于目标作业是由目标任务进行拆分得到的,因此,目标作业的使用掩码与目标任务的使用掩码相同。It should be noted that, since the target job is obtained by splitting the target task, the usage mask of the target job is the same as the usage mask of the target task.
作为一种可选的实施方式,处理器通过第一节点和第二节点,执行目标任务包含的目标作业之前,处理器还可以根据该目标作业的亲和性掩码和使用掩码确定该第一节点和第二节点是否可以执行该目标作业,具体的处理过程为:如果目标作业的亲和性掩码和使用掩码中,第一节点和第二节点对应的位均为1,则执行通过第一节点和执行目标任务的运算单元所在的第二节点,执行目标任务包含的目标作业的步骤。As an optional implementation manner, before the processor executes the target job included in the target task through the first node and the second node, the processor may also determine the first node according to the affinity mask and usage mask of the target job. Whether a node and a second node can execute the target job, the specific processing process is: if the affinity mask and usage mask of the target job, the bits corresponding to the first node and the second node are both 1, then execute The steps of the target job included in the target task are executed through the first node and the second node where the arithmetic unit that executes the target task is located.
在实施中,处理器得到该目标作业的亲和性掩码和使用掩码后,针对每个第一节点,该处理器可以在目标作业的亲和性掩码和使用掩码中,判断该第一节点对应的位是否均为1。如果该第一节点对应的位均为1,则说明该第一节点可以执行该目标作业。同理,针对执行目标任务的运算单元所在的第二节点,该处理器可以在目标作业的亲和性掩码和使用掩码中,判断该第二节点对应的位是否也均为1。如果该第二节点对应的位均为1,则说明该第二节点可以执行该目标作业。相应的,处理器可以通过第一节点和第二节点,执行目标任务包含的目标作业。如果该第一节点对应的位中存在0,则说明该第一节点无法执行该目标作业,则处理器将仅通过第二节点执行该目标作业。例如,目标作业的亲和性掩码为1101,使用掩码为1001,第一节点为节点2和节点4,第二节点为节点1,则节点1对应的位均为1,节点4对应的位均为1,节点2在亲和性掩码中的位为0,该处理器可以通过节点1和节点4执行目标作业。In implementation, after the processor obtains the affinity mask and usage mask of the target job, for each first node, the processor can determine the affinity mask and usage mask of the target job. Whether the bits corresponding to the first node are all 1. If the bits corresponding to the first node are all 1, it means that the first node can execute the target job. In the same way, for the second node where the arithmetic unit that executes the target task is located, the processor can determine whether the bit corresponding to the second node is also 1 in the affinity mask and the usage mask of the target job. If the bits corresponding to the second node are all 1, it means that the second node can execute the target job. Correspondingly, the processor can execute the target job included in the target task through the first node and the second node. If there is a 0 in the bit corresponding to the first node, it means that the first node cannot execute the target job, and the processor will execute the target job only through the second node. For example, if the affinity mask of the target job is 1101, the usage mask is 1001, the first node is node 2 and node 4, and the second node is node 1, then the bits corresponding to node 1 are all 1, and node 4 corresponds to The bits are all 1, and the bit of node 2 in the affinity mask is 0, and the processor can execute the target job through node 1 and node 4.
本申请实施例提供的一种作业处理的方法,当满足预设处理条件时,处理器根据目标任务包含的目标作业的作业属性,在各节点中确定与目标作业相匹配的第一节点。其中,作业属性包括执行目标作业所需的运算单元的目标数目。然后,处理器通过第一节点和执行目标任务的运算单元所在的第二节点, 执行目标任务包含的目标作业。这样,当目标作业需要等待较长时间才可以被该目标作业所等待执行的运算单元执行时,处理器可以通过第一节点和第二节点共同执行该目标作业,从而减少该目标作业的等待时长,提高该目标作业的执行效率。In the method for job processing provided by the embodiment of the present application, when a preset processing condition is met, the processor determines the first node matching the target job among the nodes according to the job attributes of the target job included in the target task. Among them, the job attribute includes the target number of arithmetic units required to execute the target job. Then, the processor executes the target job included in the target task through the first node and the second node where the arithmetic unit that executes the target task is located. In this way, when the target job needs to wait a long time before it can be executed by the arithmetic unit that the target job is waiting to execute, the processor can execute the target job through the first node and the second node, thereby reducing the waiting time of the target job , Improve the execution efficiency of the target job.
本申请实施例还提供了一种作业处理的装置,如图2-4所示,该装置包括:An embodiment of the present application also provides a device for job processing. As shown in Figures 2-4, the device includes:
第一确定模块510,用于当满足预设处理条件时,根据目标任务包含的目标作业的作业属性,在各节点中确定与目标作业相匹配的第一节点,作业属性包括执行目标作业所需的运算单元的目标数目;The first determining module 510 is used to determine the first node matching the target job in each node according to the job attributes of the target job contained in the target task when the preset processing conditions are met, and the job attributes include those required to execute the target job. The target number of arithmetic units;
执行模块520,用于通过第一节点和执行目标任务的运算单元所在的第二节点,执行目标任务包含的目标作业。The execution module 520 is configured to execute the target job included in the target task through the first node and the second node where the arithmetic unit that executes the target task is located.
作为一种可选的实施方式,第一确定模块510,具体用于:As an optional implementation manner, the first determining module 510 is specifically configured to:
针对各节点中的每个节点,如果该节点中空闲的运算单元的数目大于或等于目标数目,则将该节点确定为第一节点。For each node in each node, if the number of free arithmetic units in the node is greater than or equal to the target number, then the node is determined as the first node.
作为一种可选的实施方式,该装置还包括:As an optional implementation manner, the device further includes:
获取模块,用于获取各节点中空闲的运算单元的空闲时长;The obtaining module is used to obtain the idle time of the idle computing unit in each node;
第二确定模块,用于如果可拆分任务列表中存在等待执行的包含多个作业的任务、且各空闲的运算单元的空闲时长中存在大于或等于预设时长阈值的空闲时长,则确定满足预设处理条件。The second determining module is used to determine that if there are tasks that include multiple jobs waiting to be executed in the splittable task list, and the idle time length of each idle computing unit is greater than or equal to the preset time length threshold, then it is determined to meet Preset processing conditions.
作为一种可选的实施方式,该装置还包括:As an optional implementation manner, the device further includes:
第三确定模块,用于获取待执行的目标任务,并确定目标任务的各维度信息和执行目标任务所需的运算单元的目标数目;The third determining module is used to obtain the target task to be executed, and determine the dimensional information of the target task and the target number of computing units required to execute the target task;
添加模块,用于如果各维度信息的乘积与目标数目的比值大于1,则将目标任务添加至可拆分任务列表中;Add module for adding the target task to the splittable task list if the ratio of the product of each dimension information to the target number is greater than one;
修改模块,用于根据预设的亲和性掩码修改规则,修改目标任务的亲和性掩码。The modification module is used to modify the affinity mask of the target task according to the preset affinity mask modification rule.
作为一种可选的实施方式,该装置还包括:As an optional implementation manner, the device further includes:
设置模块,用于在目标任务的使用掩码中,将第一节点对应的位置为1。The setting module is used to set the position corresponding to the first node to 1 in the use mask of the target task.
作为一种可选的实施方式,该装置还包括:As an optional implementation manner, the device further includes:
第四确定模块,用于如果目标作业的亲和性掩码和使用掩码中,第一节点和第二节点对应的位均为1,则触发执行模块520执行通过第一节点和执行目标任务的运算单元所在的第二节点,执行目标任务包含的目标作业的步骤。The fourth determining module is used to trigger the execution module 520 to execute the pass through the first node and execute the target task if the bits corresponding to the first node and the second node in the affinity mask and the usage mask of the target job are both 1. The second node where the arithmetic unit is located, executes the steps of the target job contained in the target task.
作为一种可选的实施方式,目标作业的亲和性掩码与目标任务的亲和性掩码相同,目标作业的的使用掩码与目标任务的使用掩码相同。As an optional implementation manner, the affinity mask of the target job is the same as the affinity mask of the target task, and the usage mask of the target job is the same as the usage mask of the target task.
本申请实施例提供的一种作业处理的装置,当满足预设处理条件时,CPU根据目标任务包含的目标作业的作业属性,在各节点中确定与目标作业相匹配的第一节点。其中,作业属性包括执行目标作业所需的运算单元的目标数目。然后,CPU通过第一节点和执行目标任务的运算单元所在的第二节点,执行目标任务包含的目标作业。这样,当目标作业需要等待较长时间才可以被该目标作业所等待执行的运算单元执行时,CPU可以通过第一节点和第二节点共同执行该目标作业,从而减少该目标作业的等待时长,提高该目标作业的执行效率。According to a job processing device provided by an embodiment of the present application, when a preset processing condition is met, the CPU determines the first node matching the target job among the nodes according to the job attributes of the target job included in the target task. Among them, the job attribute includes the target number of arithmetic units required to execute the target job. Then, the CPU executes the target job included in the target task through the first node and the second node where the arithmetic unit that executes the target task is located. In this way, when the target job needs to wait a long time before it can be executed by the arithmetic unit that the target job is waiting to execute, the CPU can jointly execute the target job through the first node and the second node, thereby reducing the waiting time of the target job, Improve the execution efficiency of the target job.
在一个实施例中,如图1-4所示,本申请提供的一种计算机设备,包括存储器及处理器,所述存储器上存储有可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述作业处理的方法步骤。具体地,处理器执行上述作业处理的方法的实现过程可参见图2-1、图2-2、图2-3以及上文的描述,此处不再赘述。In one embodiment, as shown in FIGS. 1-4, a computer device provided by the present application includes a memory and a processor, the memory stores a computer program that can run on the processor, and the processor executes The computer program implements the method steps of the above-mentioned job processing. Specifically, the implementation process of the method in which the processor executes the above-mentioned job processing can refer to FIG. 2-1, FIG. 2-2, FIG. 2-3, and the above description, which will not be repeated here.
在一个实施例中,一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执 行时实现上述作业处理的方法的步骤。In one embodiment, a computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the steps of the above-mentioned job processing method are realized.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本披露并不受所描述的动作顺序的限制,因为依据本披露,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本披露所必须的。It should be noted that for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that the present disclosure is not limited by the described sequence of actions. Because according to this disclosure, certain steps can be performed in other order or at the same time. Secondly, those skilled in the art should also be aware that the embodiments described in the specification are optional embodiments, and the actions and modules involved are not necessarily required by the disclosure.
进一步需要说明的是,虽然图1-2、图2-1、图2-2以及图2-3中的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be further noted that although the steps in the flowcharts in Figure 1-2, Figure 2-1, Figure 2-2, and Figure 2-3 are displayed in sequence according to the arrow's instructions, these steps are not necessarily in accordance with the arrow's instructions. The order of execution. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least part of the steps in the figure may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. The order of execution of these sub-steps or stages It is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
应该理解,上述的装置实施例仅是示意性的,本披露的装置还可通过其它的方式实现。例如,上述实施例中所述单元/模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。例如,多个单元、模块或组件可以结合,或者可以集成到另一个系统,或一些特征可以忽略或不执行。It should be understood that the above device embodiments are only illustrative, and the device of the present disclosure may also be implemented in other ways. For example, the division of units/modules in the above-mentioned embodiments is only a logical function division, and there may be other division methods in actual implementation. For example, multiple units, modules or components may be combined or integrated into another system, or some features may be omitted or not implemented.
另外,若无特别说明,在本披露各个实施例中的各功能单元/模块可以集成在一个单元/模块中,也可以是各个单元/模块单独物理存在,也可以两个或两个以上单元/模块集成在一起。上述集成的单元/模块既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。In addition, unless otherwise specified, the functional units/modules in the various embodiments of the present disclosure may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may exist. The modules are integrated together. The above-mentioned integrated unit/module can be realized in the form of hardware or software program module.
所述集成的单元/模块如果以硬件的形式实现时,该硬件可以是数字电路,模拟电路等等。硬件结构的物理实现包括但不局限于晶体管,忆阻器等等。若无特别说明,所述人工智能处理器可以是任何适当的硬件处理器,比如CPU、GPU、FPGA、DSP和ASIC等等。若无特别说明,所述存储单元可以是任何适当的磁存储介质或者磁光存储介质,比如,阻变式存储器RRAM(Resistive Random Access Memory)、动态随机存取存储器DRAM(Dynamic Random Access Memory)、静态随机存取存储器SRAM(Static Random-Access Memory)、增强动态随机存取存储器EDRAM(Enhanced Dynamic Random Access Memory)、高带宽内存HBM(High-Bandwidth Memory)、混合存储立方HMC(Hybrid Memory Cube)等等。If the integrated unit/module is implemented in the form of hardware, the hardware may be a digital circuit, an analog circuit, and so on. The physical realization of the hardware structure includes but is not limited to transistors, memristors and so on. Unless otherwise specified, the artificial intelligence processor may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, and so on. Unless otherwise specified, the storage unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as RRAM (Resistive Random Access Memory), DRAM (Dynamic Random Access Memory), Static random access memory SRAM (Static Random-Access Memory), enhanced dynamic random access memory EDRAM (Enhanced Dynamic Random Access Memory), high-bandwidth memory HBM (High-Bandwidth Memory), hybrid storage cube HMC (Hybrid Memory Cube), etc. Wait.
所述集成的单元/模块如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本披露的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本披露各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit/module is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory. Based on this understanding, the technical solution of the present disclosure essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory. It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure. The aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。上述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments. The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combinations of these technical features, they should be It is considered as the range described in this specification.
依据以下条款可更好地理解前述内容:The foregoing can be better understood according to the following clauses:
条款A1、一种任务迁移的方法,所述方法包括:Clause A1. A method for task migration, the method includes:
当检测到可迁移任务满足预设迁移条件时,根据所述可迁移任务的任务属性,在各节点中确定与所述可迁移任务相匹配的目标节点,所述任务属性包括执行所述可迁移任务所需的运算单元的目标数目;When it is detected that a migratable task meets the preset migration condition, a target node that matches the migratable task is determined in each node according to the task attribute of the migratable task, and the task attribute includes executing the migratable task. The target number of computing units required for the task;
将所述可迁移任务迁移至所述目标节点,以通过所述目标节点执行所述可迁移任务。Migrating the migratable task to the target node, so as to execute the migratable task through the target node.
条款A2、根据条款A1所述的方法,所述根据所述可迁移任务的任务属性,在各节点中确定与所述可迁移任务相匹配的目标节点,包括:Clause A2, the method according to clause A1, the determining the target node matching the migratable task in each node according to the task attribute of the migratable task includes:
如果各节点中存在包含所述目标数目个空闲的运算单元的候选节点,则在所述候选节点中,将与所述可迁移任务所期望的运算单元所属的节点之间的距离最小的候选节点,确定为目标节点;If there is a candidate node containing the target number of free computing units in each node, among the candidate nodes, the candidate node with the smallest distance from the node to which the computing unit expected by the migratable task belongs will be selected , Determined as the target node;
如果所述各节点中不存在包含所述目标数目个空闲的运算单元的候选节点,则将包含待执行的任务的总数目最小的第一运算单元的节点,确定为目标节点,所述第一运算单元为执行的任务的任务属性与所述可迁移任务的任务属性相同的运算单元。If there is no candidate node that contains the target number of idle arithmetic units among the nodes, the node containing the first arithmetic unit with the smallest total number of tasks to be executed is determined as the target node, and the first The arithmetic unit is an arithmetic unit whose task attribute of the executed task is the same as the task attribute of the transferable task.
条款A3、根据条款A1所述的方法,所述方法还包括:Clause A3. The method according to clause A1, the method further comprising:
如果所述可迁移任务所期望的第二运算单元中执行的任务的任务属性与所述可迁移任务的任务属性不相同,则确定所述可迁移任务满足所述预设迁移条件;If the task attribute of the task executed in the second computing unit expected by the transferable task is different from the task attribute of the transferable task, determining that the transferable task meets the preset migration condition;
如果所述第二运算单元中执行的任务的任务属性与所述可迁移任务的任务属性相同,则判断所述第二运算单元中待执行的任务的总数目是否大于或等于第一预设数目阈值;If the task attribute of the task executed in the second operation unit is the same as the task attribute of the transferable task, it is determined whether the total number of tasks to be executed in the second operation unit is greater than or equal to the first preset number Threshold
如果所述第二运算单元中待执行的任务的总数目大于或等于所述第一预设数目阈值,则确定所述可迁移任务满足所述预设迁移条件。If the total number of tasks to be executed in the second computing unit is greater than or equal to the first preset number threshold, it is determined that the migratable tasks meet the preset migrating condition.
条款A4、根据条款A1所述的方法,当检测到可迁移任务满足预设迁移条件时,根据所述可迁移任务的任务属性,在各节点中确定与所述可迁移任务相匹配的目标节点之前,所述方法还包括:Clause A4. According to the method described in Clause A1, when it is detected that the migratable task meets the preset migration condition, the target node that matches the migratable task is determined in each node according to the task attribute of the migratable task Previously, the method also included:
获取各运算单元中期望执行的任务的数目;Obtain the number of tasks expected to be executed in each arithmetic unit;
如果所述各运算单元中期望执行的任务的数目之间的最大差值大于或等于第二预设数目阈值,则执行所述当检测到可迁移任务满足预设迁移条件时,根据所述可迁移任务的任务属性,在各节点中,确定与所述可迁移任务相匹配的目标节点的步骤。If the maximum difference between the number of tasks expected to be executed in each arithmetic unit is greater than or equal to the second preset number threshold, then execute said when it is detected that the migratable tasks meet the preset migrating condition, according to the migratable task For the task attribute of the migration task, in each node, the step of determining the target node that matches the migratable task.
条款A5、根据条款A1所述的方法,所述方法还包括:Clause A5. The method according to clause A1, the method further comprising:
获取待执行的目标任务,并确定所述目标任务的任务类型、任务执行时长、所述目标任务所期望的第三运算单元所属的节点的最小跨节点访存延时和所述第三运算单元中期望执行的任务的数目;Obtain the target task to be executed, and determine the task type of the target task, the task execution duration, the minimum cross-node memory access delay of the node to which the third arithmetic unit is expected for the target task, and the third arithmetic unit The number of tasks expected to be performed in the
如果所述任务类型为计算密集型,和/或所述任务执行时长大于所述最小跨节点访存延时,和/或所述第三运算单元中期望执行的任务的数目大于或等于第三预设数目阈值,则确定所述目标任务为可迁移任务,并根据预设的亲和性掩码修改规则,修改所述目标任务的亲和性掩码。If the task type is computationally intensive, and/or the task execution time is greater than the minimum cross-node memory access delay, and/or the number of tasks expected to be executed in the third arithmetic unit is greater than or equal to the third If the preset number threshold is set, the target task is determined to be a transferable task, and the affinity mask of the target task is modified according to the preset affinity mask modification rule.
条款A6、根据条款A1所述的方法,所述将所述可迁移任务迁移至所述目标节点之前,所述方法还包括:Clause A6. The method according to clause A1, before the migrating the migratable task to the target node, the method further includes:
如果所述目标节点与所述可迁移任务所期望的运算单元所在的节点不相同,则在所述可迁移任务的使用掩码中,将所述目标节点对应的位置为1,并将所述可迁移任务所期望的运算单元所在的节点对应的位置为0。If the target node is not the same as the node where the computing unit expected by the migratable task is located, then in the use mask of the migratable task, the position corresponding to the target node is set to 1, and the The position corresponding to the node where the computing unit expected by the migratable task is located is 0.
条款A7、根据条款A1所述的方法,所述将所述可迁移任务迁移至所述目标节点之前,所述方法还包括:Clause A7. The method according to clause A1, before the migrating the migratable task to the target node, the method further includes:
如果所述可迁移任务的亲和性掩码和使用掩码中,所述目标节点对应的位均为1,则执行所述将所述可迁移任务迁移至所述目标节点的步骤。If the bits corresponding to the target node in the affinity mask and the usage mask of the migratable task are both 1, then the step of migrating the migratable task to the target node is performed.
条款A8、一种任务迁移的装置,该装置包括:Clause A8. A device for task migration, which includes:
第一确定模块,用于当检测到可迁移任务满足预设迁移条件时,根据所述可迁移任务的任务属性,在各节点中确定与所述可迁移任务相匹配的目标节点,所述任务属性包括执行所述可迁移任务所需的运算单元的目标数目;The first determining module is configured to determine a target node matching the migratable task in each node according to the task attribute of the migratable task when it is detected that the migratable task meets the preset migration condition, and the task The attribute includes the target number of arithmetic units required to execute the migratable task;
迁移模块,用于将所述可迁移任务迁移至所述目标节点,以通过所述目标节点执行所述可迁移任务。The migration module is configured to migrate the migratable task to the target node, so as to execute the migratable task through the target node.
条款A9、一种计算机设备,包括存储器及处理器,所述存储器上存储有可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现条款A1至条款A7中任一项所述方法的步骤。Clause A9. A computer device, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements any one of clauses A1 to A7 when the computer program is executed The steps of the method.
条款A10、一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现条款A1至条款A7中任一项所述的方法的步骤。Clause A10. A computer-readable storage medium on which a computer program is stored, which, when executed by a processor, implements the steps of the method described in any one of clauses A1 to A7.
条款B1、一种作业处理的方法,所述方法包括:Clause B1. A method of job processing, the method comprising:
当满足预设处理条件时,根据目标任务包含的目标作业的作业属性,在各节点中确定与所述目标作业相匹配的第一节点,所述作业属性包括执行所述目标作业所需的运算单元的目标数目;When the preset processing conditions are met, the first node matching the target job is determined in each node according to the job attributes of the target job contained in the target task, and the job attributes include the operations required to execute the target job The target number of units;
通过所述第一节点和执行所述目标任务的运算单元所在的第二节点,执行所述目标任务包含的目标作业。The target job included in the target task is executed through the first node and the second node where the arithmetic unit that executes the target task is located.
条款B2、根据条款B1所述的方法,所述根据目标任务包含的目标作业的作业属性,在各节点中确定与所述目标作业相匹配的第一节点,包括:Clause B2. The method according to clause B1, wherein the determining a first node matching the target job in each node according to the job attribute of the target job contained in the target task includes:
针对所述各节点中的每个节点,如果该节点中空闲的运算单元的数目大于或等于所述目标数目,则将该节点确定为第一节点。For each of the nodes, if the number of free arithmetic units in the node is greater than or equal to the target number, then the node is determined as the first node.
条款B3、根据条款B1所述的方法,所述方法还包括:Clause B3, the method according to clause B1, the method further comprising:
获取所述各节点中空闲的运算单元的最大空闲时长;Acquiring the maximum idle duration of the computing units that are idle in each node;
如果可拆分任务列表中存在等待执行的包含多个作业的任务、且各空闲的运算单元的空闲时长中存在大于或等于预设时长阈值的空闲时长,则确定满足预设处理条件。If there is a task containing multiple jobs waiting to be executed in the splittable task list, and there is an idle time longer than or equal to the preset time threshold in the idle time of each idle computing unit, it is determined that the preset processing condition is satisfied.
条款B4、根据条款B1所述的方法,所述当满足预设处理条件时,根据目标任务包含的目标作业的作业属性,在各节点中确定与所述目标作业相匹配的第一节点之前,所述方法还包括:Clause B4. The method according to clause B1, when the preset processing conditions are met, before the first node matching the target job is determined in each node according to the job attributes of the target job included in the target task, The method also includes:
获取待执行的目标任务,并确定所述目标任务的各维度信息和执行所述目标任务所需的运算单元的目标数目;Acquiring the target task to be executed, and determining the dimensional information of the target task and the target number of arithmetic units required to execute the target task;
如果所述各维度信息的乘积与所述目标数目的比值大于1,则将所述目标任务添加至可拆分任务列表中;If the ratio of the product of the dimensional information to the number of targets is greater than 1, the target task is added to the list of splittable tasks;
根据预设的亲和性掩码修改规则,修改所述目标任务的亲和性掩码。Modify the affinity mask of the target task according to the preset affinity mask modification rule.
条款B5、根据条款B1所述的方法,所述通过所述第一节点和执行所述目标任务的运算单元所在的第二节点,执行所述目标任务包含的目标作业之前,所述方法还包括:Clause B5. The method according to clause B1, before executing the target job included in the target task through the first node and the second node where the computing unit that executes the target task is located, the method further includes :
在所述目标任务的使用掩码中,将所述第一节点对应的位置为1。In the use mask of the target task, the position corresponding to the first node is 1.
条款B6、根据条款B1所述的方法,所述通过所述第一节点和执行所述目标任务的运算单元所在的第二节点,执行所述目标任务包含的目标作业之前,所述方法还包括:Clause B6. The method according to clause B1, before executing the target job included in the target task through the first node and the second node where the computing unit that executes the target task is located, the method further includes :
如果所述目标作业的亲和性掩码和使用掩码中,所述第一节点和所述第二节点对应的位均为1,则执行所述通过所述第一节点和执行所述目标任务的运算单元所在的第二节点,执行所述目标任务包含的目标作业的步骤。If in the affinity mask and usage mask of the target job, the bits corresponding to the first node and the second node are both 1, then the passing of the first node and the execution of the target are executed. The second node where the computing unit of the task is located executes the steps of the target job included in the target task.
条款B7、根据条款B6所述的方法,所述目标作业的亲和性掩码与所述目标任务的亲和性掩码相同,所述目标作业的的使用掩码与所述目标任务的使用掩码相同。Clause B7. The method according to clause B6, wherein the affinity mask of the target job is the same as the affinity mask of the target task, and the usage mask of the target job is the same as the usage mask of the target task The mask is the same.
条款B8、一种作业处理的装置,所述装置包括:Clause B8. A device for job processing, the device comprising:
第一确定模块,用于当满足预设处理条件时,根据目标任务包含的目标作业的作业属性,在各节点中确定与所述目标作业相匹配的第一节点,所述作业属性包括执行所述目标作业所需的运算单元的目标数目;The first determining module is used to determine the first node that matches the target job in each node according to the job attributes of the target job contained in the target task when the preset processing conditions are met, and the job attributes include the execution of the target job. State the target number of arithmetic units required for the target operation;
执行模块,用于通过所述第一节点和执行所述目标任务的运算单元所在的第二节点,执行所述目标任务包含的目标作业。The execution module is configured to execute the target job included in the target task through the first node and the second node where the arithmetic unit that executes the target task is located.
条款B9、一种计算机设备,包括存储器及处理器,所述存储器上存储有可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现条款B1至条款B7中任一项所述方法的步骤。Clause B9. A computer device, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements any one of clauses B1 to B7 when the computer program is executed The steps of the method.
条款B10、一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现条款B1至条款B7中任一项所述的方法的步骤。Clause B10. A computer-readable storage medium with a computer program stored thereon, which, when executed by a processor, implements the steps of the method described in any one of clauses B1 to B7.
本领域技术人员可以理解,图中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。Those skilled in the art can understand that the structure shown in the figure is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may include More or fewer parts than shown in the figure, or some parts are combined, or have a different part arrangement. Finally, it should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities. Or there is any such actual relationship or sequence between operations. Moreover, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, but also includes those that are not explicitly listed Other elements of, or also include elements inherent to this process, method, article or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or equipment that includes the element.
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。The various embodiments in this specification are described in a progressive manner. Each embodiment focuses on the differences from other embodiments, and the same or similar parts between the various embodiments can be referred to each other.
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use this application. Various modifications to these embodiments will be obvious to those skilled in the art, and the general principles defined herein can be implemented in other embodiments without departing from the spirit or scope of the application. Therefore, this application will not be limited to the embodiments shown in this document, but should conform to the widest scope consistent with the principles and novel features disclosed in this document.

Claims (10)

  1. 一种任务迁移的方法,其特征在于,所述方法包括:A method for task migration, characterized in that the method includes:
    当检测到可迁移任务满足预设迁移条件时,根据所述可迁移任务的任务属性,在各节点中确定与所述可迁移任务相匹配的目标节点,所述任务属性包括执行所述可迁移任务所需的运算单元的目标数目;When it is detected that a migratable task meets the preset migration condition, a target node that matches the migratable task is determined in each node according to the task attribute of the migratable task, and the task attribute includes executing the migratable task. The target number of computing units required for the task;
    将所述可迁移任务迁移至所述目标节点,以通过所述目标节点执行所述可迁移任务。Migrating the migratable task to the target node, so as to execute the migratable task through the target node.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述可迁移任务的任务属性,在各节点中确定与所述可迁移任务相匹配的目标节点,包括:The method according to claim 1, wherein the determining a target node matching the migratable task in each node according to the task attribute of the migratable task comprises:
    如果各节点中存在包含所述目标数目个空闲的运算单元的候选节点,则在所述候选节点中,将与所述可迁移任务所期望的运算单元所属的节点之间的距离最小的候选节点,确定为目标节点;If there is a candidate node containing the target number of free computing units in each node, among the candidate nodes, the candidate node with the smallest distance from the node to which the computing unit expected by the migratable task belongs will be selected , Determined as the target node;
    如果所述各节点中不存在包含所述目标数目个空闲的运算单元的候选节点,则将包含待执行的任务的总数目最小的第一运算单元的节点,确定为目标节点,所述第一运算单元为执行的任务的任务属性与所述可迁移任务的任务属性相同的运算单元。If there is no candidate node that contains the target number of idle arithmetic units among the nodes, the node containing the first arithmetic unit with the smallest total number of tasks to be executed is determined as the target node, and the first The arithmetic unit is an arithmetic unit whose task attribute of the executed task is the same as the task attribute of the transferable task.
  3. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    如果所述可迁移任务所期望的第二运算单元中执行的任务的任务属性与所述可迁移任务的任务属性不相同,则确定所述可迁移任务满足所述预设迁移条件;If the task attribute of the task executed in the second computing unit expected by the transferable task is different from the task attribute of the transferable task, determining that the transferable task meets the preset migration condition;
    如果所述第二运算单元中执行的任务的任务属性与所述可迁移任务的任务属性相同,则判断所述第二运算单元中待执行的任务的总数目是否大于或等于第一预设数目阈值;If the task attribute of the task executed in the second operation unit is the same as the task attribute of the transferable task, it is determined whether the total number of tasks to be executed in the second operation unit is greater than or equal to the first preset number Threshold
    如果所述第二运算单元中待执行的任务的总数目大于或等于所述第一预设数目阈值,则确定所述可迁移任务满足所述预设迁移条件。If the total number of tasks to be executed in the second computing unit is greater than or equal to the first preset number threshold, it is determined that the migratable tasks meet the preset migrating condition.
  4. 根据权利要求1所述的方法,其特征在于,当检测到可迁移任务满足预设迁移条件时,根据所述可迁移任务的任务属性,在各节点中确定与所述可迁移任务相匹配的目标节点之前,所述方法还包括:The method according to claim 1, wherein when it is detected that a migratable task meets a preset migration condition, according to the task attributes of the migratable task, determine in each node that the migratable task matches the migratable task Before the target node, the method further includes:
    获取各运算单元中期望执行的任务的数目;Obtain the number of tasks expected to be executed in each arithmetic unit;
    如果所述各运算单元中期望执行的任务的数目之间的最大差值大于或等于第二预设数目阈值,则执行所述当检测到可迁移任务满足预设迁移条件时,根据所述可迁移任务的任务属性,在各节点中,确定与所述可迁移任务相匹配的目标节点的步骤。If the maximum difference between the number of tasks expected to be executed in each arithmetic unit is greater than or equal to the second preset number threshold, then execute said when it is detected that the migratable tasks meet the preset migrating condition, according to the migratable task For the task attribute of the migration task, in each node, the step of determining the target node that matches the migratable task.
  5. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    获取待执行的目标任务,并确定所述目标任务的任务类型、任务执行时长、所述目标任务所期望的第三运算单元所属的节点的最小跨节点访存延时和所述第三运算单元中期望执行的任务的数目;Obtain the target task to be executed, and determine the task type of the target task, the task execution duration, the minimum cross-node memory access delay of the node to which the third arithmetic unit is expected for the target task, and the third arithmetic unit The number of tasks expected to be performed in the
    如果所述任务类型为计算密集型,和/或所述任务执行时长大于所述最小跨节点访存延时,和/或所述第三运算单元中期望执行的任务的数目大于或等于第三预设数目阈值,则确定所述目标任务为可迁移任务,并根据预设的亲和性掩码修改规则,修改所述目标任务的亲和性掩码。If the task type is computationally intensive, and/or the task execution time is greater than the minimum cross-node memory access delay, and/or the number of tasks expected to be executed in the third arithmetic unit is greater than or equal to the third If the preset number threshold is set, the target task is determined to be a transferable task, and the affinity mask of the target task is modified according to the preset affinity mask modification rule.
  6. 根据权利要求1所述的方法,其特征在于,所述将所述可迁移任务迁移至所述目标节点之前,所述方法还包括:The method according to claim 1, wherein before the migrating the migratable task to the target node, the method further comprises:
    如果所述目标节点与所述可迁移任务所期望的运算单元所在的节点不相同,则在所述可迁移任务的使用掩码中,将所述目标节点对应的位置为1,并将所述可迁移任务所期望的运算单元所在的节点对应的位置为0。If the target node is not the same as the node where the computing unit expected by the migratable task is located, then in the use mask of the migratable task, the position corresponding to the target node is set to 1, and the The position corresponding to the node where the computing unit expected by the migratable task is located is 0.
  7. 根据权利要求1所述的方法,其特征在于,所述将所述可迁移任务迁移至所述目标节点之前,所述方法还包括:The method according to claim 1, wherein before the migrating the migratable task to the target node, the method further comprises:
    如果所述可迁移任务的亲和性掩码和使用掩码中,所述目标节点对应的位均为1,则执行所述将所 述可迁移任务迁移至所述目标节点的步骤。If the bits corresponding to the target node in the affinity mask and the usage mask of the migratable task are both 1, then the step of migrating the migratable task to the target node is performed.
  8. 一种任务迁移的装置,其特征在于,所述装置包括:A device for task migration, characterized in that the device comprises:
    第一确定模块,用于当检测到可迁移任务满足预设迁移条件时,根据所述可迁移任务的任务属性,在各节点中确定与所述可迁移任务相匹配的目标节点,所述任务属性包括执行所述可迁移任务所需的运算单元的目标数目;The first determining module is configured to determine a target node matching the migratable task in each node according to the task attribute of the migratable task when it is detected that the migratable task meets the preset migration condition, and the task The attribute includes the target number of arithmetic units required to execute the migratable task;
    迁移模块,用于将所述可迁移任务迁移至所述目标节点,以通过所述目标节点执行所述可迁移任务。The migration module is configured to migrate the migratable task to the target node, so as to execute the migratable task through the target node.
  9. 一种计算机设备,包括存储器及处理器,所述存储器上存储有可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至7中任一项所述方法的步骤。A computer device comprising a memory and a processor, and a computer program that can be run on the processor is stored on the memory, wherein the processor implements any one of claims 1 to 7 when the computer program is executed. The steps of the method described in item.
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的方法的步骤。A computer-readable storage medium having a computer program stored thereon, wherein the computer program implements the steps of the method according to any one of claims 1 to 7 when the computer program is executed by a processor.
PCT/CN2021/070663 2020-01-07 2021-01-07 Task migration method and apparatus, and computer device and readable storage medium WO2021139726A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202010012302.6 2020-01-07
CN202010012242.8 2020-01-07
CN202010012242.8A CN113157427B (en) 2020-01-07 2020-01-07 Method, device, computer equipment and readable storage medium for task migration
CN202010012302.6A CN113157403A (en) 2020-01-07 2020-01-07 Job processing method and device, computer equipment and readable storage medium

Publications (1)

Publication Number Publication Date
WO2021139726A1 true WO2021139726A1 (en) 2021-07-15

Family

ID=76788446

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/070663 WO2021139726A1 (en) 2020-01-07 2021-01-07 Task migration method and apparatus, and computer device and readable storage medium

Country Status (1)

Country Link
WO (1) WO2021139726A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101126992A (en) * 2006-08-15 2008-02-20 国际商业机器公司 Method and system for dispensing multiple tasks at multiple node of network
CN101889265A (en) * 2007-12-07 2010-11-17 微软公司 Kernel processor grouping
CN101458634B (en) * 2008-01-22 2011-03-16 中兴通讯股份有限公司 Load equilibration scheduling method and device
CN102473161A (en) * 2009-08-18 2012-05-23 国际商业机器公司 Decentralized load distribution to reduce power and/or cooling cost in event-driven system
CN105210038A (en) * 2013-05-15 2015-12-30 英派尔科技开发有限公司 Core affinity bitmask translation
EP3101548A1 (en) * 2015-06-03 2016-12-07 Fujitsu Limited Parallel computer, migration program and migration method
CN106844051A (en) * 2017-01-19 2017-06-13 河海大学 The loading commissions migration algorithm of optimised power consumption in a kind of edge calculations environment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101126992A (en) * 2006-08-15 2008-02-20 国际商业机器公司 Method and system for dispensing multiple tasks at multiple node of network
CN101889265A (en) * 2007-12-07 2010-11-17 微软公司 Kernel processor grouping
CN101458634B (en) * 2008-01-22 2011-03-16 中兴通讯股份有限公司 Load equilibration scheduling method and device
CN102473161A (en) * 2009-08-18 2012-05-23 国际商业机器公司 Decentralized load distribution to reduce power and/or cooling cost in event-driven system
CN105210038A (en) * 2013-05-15 2015-12-30 英派尔科技开发有限公司 Core affinity bitmask translation
EP3101548A1 (en) * 2015-06-03 2016-12-07 Fujitsu Limited Parallel computer, migration program and migration method
CN106844051A (en) * 2017-01-19 2017-06-13 河海大学 The loading commissions migration algorithm of optimised power consumption in a kind of edge calculations environment

Similar Documents

Publication Publication Date Title
Hashem et al. Honey bee based load balancing in cloud computing
WO2017016421A1 (en) Method of executing tasks in a cluster and device utilizing same
Farzanyar et al. Efficient mining of frequent itemsets in social network data based on MapReduce framework
Li et al. Map-Balance-Reduce: An improved parallel programming model for load balancing of MapReduce
CN108549583B (en) Big data processing method and device, server and readable storage medium
WO2017112077A1 (en) Optimizing skewed joins in big data
US20110265098A1 (en) Message Passing with Queues and Channels
WO2016177279A1 (en) Data processing method and system
CN106250233B (en) MapReduce performance optimization system and optimization method
CN109388486B (en) Data placement and migration method for heterogeneous memory and multi-type application mixed deployment scene
CN108427602B (en) Distributed computing task cooperative scheduling method and device
CN104182278A (en) Method and device for judging busy degree of computer hardware resource
Grosof et al. Optimal scheduling in the multiserver-job model under heavy traffic
Yu et al. Cloud task scheduling algorithm based on three queues and dynamic priority
Farzanyar et al. Accelerating frequent itemsets mining on the cloud: a MapReduce-based approach
US8543722B2 (en) Message passing with queues and channels
Slagter et al. SmartJoin: a network-aware multiway join for MapReduce
WO2021139726A1 (en) Task migration method and apparatus, and computer device and readable storage medium
CN113157427B (en) Method, device, computer equipment and readable storage medium for task migration
CN110175172A (en) Very big two points of groups parallel enumerating method based on sparse bipartite graph
Mao et al. A fine-grained and dynamic MapReduce task scheduling scheme for the heterogeneous cloud environment
CN113157403A (en) Job processing method and device, computer equipment and readable storage medium
US20210149746A1 (en) Method, System, Computer Readable Medium, and Device for Scheduling Computational Operation Based on Graph Data
Rad et al. Brain drain optimization: a novel approach for task scheduling in the cloud computing
Bengre et al. A learning-based scheduler for high volume processing in data warehouse using graph neural networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21738158

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21738158

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21738158

Country of ref document: EP

Kind code of ref document: A1