WO2024055708A1 - 任务调度方法、装置、设备及介质 - Google Patents

任务调度方法、装置、设备及介质 Download PDF

Info

Publication number
WO2024055708A1
WO2024055708A1 PCT/CN2023/104517 CN2023104517W WO2024055708A1 WO 2024055708 A1 WO2024055708 A1 WO 2024055708A1 CN 2023104517 W CN2023104517 W CN 2023104517W WO 2024055708 A1 WO2024055708 A1 WO 2024055708A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
scheduled
tasks
interval
resources
Prior art date
Application number
PCT/CN2023/104517
Other languages
English (en)
French (fr)
Inventor
冯牧祎
于涌
韩栋
Original Assignee
上海寒武纪信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海寒武纪信息科技有限公司 filed Critical 上海寒武纪信息科技有限公司
Publication of WO2024055708A1 publication Critical patent/WO2024055708A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the present disclosure relates to the field of chip computing technology, and in particular, designs a task scheduling method, device, equipment and medium.
  • AI Artificial Intelligence
  • RAM Random Access Memory
  • AI computing chips use caches to cache data, and cache the data required for the current computing task in the cache, thereby reducing the delay required by the computing unit in the chip to access RAM.
  • the present disclosure provides a task scheduling method, device, equipment and medium to improve chip computing efficiency.
  • the present disclosure provides a task scheduling method, including:
  • interval affinity resources For each interval task of the to-be-scheduled task, occupy an interval affinity resource for the interval task from the resources currently visible to the to-be-scheduled task, and continuously schedule each sub-task in the interval task to Affinity resources corresponding to the sub-tasks until all sub-tasks under the interval task are scheduled; wherein, the interval affinity resources include a set of affinity resources corresponding to all sub-tasks in the interval task, where Occupied resources are not visible to other tasks.
  • the method further includes:
  • interval affinity resources occupied by the interval task are released, and the released resources are visible to other tasks.
  • the interval affinity resources occupied by the interval task include:
  • the resources represented by the mask among the resources currently visible to the task to be scheduled are used as the interval affinity resources, and the interval affinity resources are occupied for the interval tasks.
  • the method further includes:
  • the number of consecutive scheduling times is obtained and recorded in the first register
  • the data in the second register is updated, and the data in the second register represents the number of times of continuous scheduling; wherein, the first update condition includes: if the number of times of successful continuous scheduling reaches the The number of consecutive scheduling in the first register, or after executing the step of splitting the task to be scheduled into multiple subtasks, or receiving a termination scheduling instruction for the task to be scheduled, or currently reaching the maximum degree of parallelism , then the second register is cleared; otherwise, each time the scheduling of the subtask is completed, the second register is incremented by 1;
  • the data in the third register represents the interval affinity resources that currently need to be occupied by the interval task; wherein the second update condition includes: if the interval task has currently The number of successful continuous scheduling reaches the number of continuous scheduling in the first register, or after performing the step of splitting the to-be-scheduled task into multiple sub-tasks, or receiving a termination scheduling instruction for the to-be-scheduled task. , or the maximum parallelism is currently reached, then the data in the third register is cleared; otherwise, after each subtask scheduling is completed, the affinity resource of the subtask is deleted from the third register.
  • the continuous scheduling of each sub-task in the interval task to the affinity resource corresponding to the sub-task includes:
  • the tasks issued by the software have priorities; and determining the tasks to be scheduled from the tasks issued by the software includes:
  • At least one task is selected from the tasks corresponding to each priority as the task to be scheduled, until the current available resources do not include affinity resources corresponding to any task; where , affinity resources corresponding to different tasks to be scheduled are different, and the available resources include resources other than the affinity resources corresponding to all current tasks to be scheduled.
  • selecting at least one task from tasks corresponding to each priority level as the to-be-scheduled task includes:
  • the current task with the highest weight among the different tasks is used as the task to be scheduled, and the weights of other tasks among the different tasks are increased.
  • a task scheduling device including:
  • the determination module is used to determine the tasks to be scheduled from the tasks issued by the software.
  • the tasks issued by the software are set with a number of consecutive dispatches;
  • a processing module configured to split the to-be-scheduled task into multiple sub-tasks, and divide the multiple sub-tasks according to the number of consecutive scheduling of the to-be-scheduled task to obtain at least one interval task of the to-be-scheduled task.
  • the processing module is also configured to, for each interval task of the to-be-scheduled task, occupy interval affinity resources for the interval task from the resources currently visible to the to-be-scheduled task, and occupy the interval affinity resources in the interval task.
  • Each subtask is continuously scheduled to the affinity resource corresponding to the subtask until all subtasks under the interval task have completed scheduling; wherein the interval affinity resource includes the affinity resources corresponding to all subtasks in the interval task. and a collection of resources where occupied resources are not visible to other tasks.
  • the processing module is further configured to release the interval affinity resources occupied by the interval task if all subtasks in the interval task have completed scheduling, where the released resources are Other tasks are visible.
  • the processing module is specifically configured to obtain the resource mask corresponding to the sub-task in the interval task, and the resource mask corresponding to the sub-task represents the affinity resource corresponding to the sub-task;
  • the processing module is specifically configured to use the resource represented by the mask among the resources currently visible to the task to be scheduled as the interval affinity resource, and occupy the interval affinity resource for the interval task.
  • the processing module is specifically configured to obtain the number of consecutive scheduling times by parsing the task scheduling instruction and record it in the first register;
  • the processing module is specifically also configured to update the data in the second register based on the first update condition, where the data in the second register represents the number of times that has been continuously scheduled; wherein the first update condition includes: if The current number of successful continuous scheduling reaches the number of continuous scheduling in the first register, or after performing the step of splitting the to-be-scheduled task into multiple sub-tasks, or receiving a termination for the to-be-scheduled task Scheduling instructions, or the current maximum parallelism is reached, the second register is cleared; otherwise, after each subtask scheduling is completed, the second register is incremented by 1;
  • the processing module is specifically configured to update the data in the third register based on the second update condition.
  • the data in the third register represents the interval affinity resources that currently need to be occupied by the interval task; wherein, the The second update condition includes: if the current number of successful continuous scheduling reaches the number of continuous scheduling in the first register, or after executing the step of splitting the to-be-scheduled task into multiple subtasks, or after receiving a request for If the task to be scheduled terminates the scheduling instruction, or if the maximum parallelism is currently reached, the data in the third register will be cleared; otherwise, after each subtask is scheduled, the subtask will be deleted from the third register. affinity resources.
  • the processing module is specifically configured to determine the unscheduled subtasks in the interval task if the data in the second register does not reach the number of consecutive schedules in the first register;
  • the processing module is specifically configured to select a subtask to be scheduled from the unscheduled subtasks, and schedule the subtask to be scheduled to the affinity resource corresponding to the subtask until the interval All subtasks in the task are scheduled.
  • the processing module is specifically configured to select at least one task from the tasks corresponding to each priority as the to-be-scheduled task according to the priority from high to low according to the currently available resources.
  • the currently available resources do not include affinity resources corresponding to any tasks; where the affinity resources corresponding to different to-be-scheduled tasks are different, and the available resources include resources other than the affinity resources corresponding to all currently to-be-scheduled tasks.
  • the processing module is specifically configured to, if there are different tasks with overlapping affinity resources among the tasks corresponding to the priority, use the task with the highest weight among the different tasks as the current task. tasks to be scheduled, and increase the weight of other tasks among the different tasks.
  • the present disclosure provides an electronic device, including: a processor, and a memory communicatively connected to the processor;
  • the memory stores computer execution instructions
  • the processor executes computer execution instructions stored in the memory to implement the method as described above.
  • the present disclosure provides a computer-readable storage medium, which stores computer-executable instructions. When executed by a processor, the computer-executable instructions are used to implement the method as described above.
  • a task to be scheduled is determined according to the task issued by the software, and the task to be scheduled is split into multiple subtasks, and the multiple subtasks are divided by intervals according to the number of continuous schedulings corresponding to the task to be scheduled that is parsed.
  • the interval affinity resources are occupied for the interval task from the currently visible resources of the task to be scheduled, and each subtask in the interval task is continuously scheduled to the affinity resources corresponding to the subtask until all subtasks under the interval task are scheduled.
  • This scheme divides multiple subtasks into intervals, and continuously schedules each subtask in the interval until all subtasks under the interval are scheduled. Since the data required to execute the subtasks under the same task are strongly correlated, the continuous scheduling of subtasks under the same interval can effectively reduce the cache's access to RAM and reduce the number of data switches in the cache, thereby effectively improving the chip computing efficiency.
  • Figure 1 is an example of the operation data access process
  • Figure 2 is a schematic flowchart of a task scheduling method provided by Embodiment 1 of the present disclosure
  • FIG. 3 is a schematic flowchart of another task scheduling method provided by Embodiment 1 of the present disclosure.
  • FIG. 4 is a schematic flowchart of yet another task scheduling method provided by Embodiment 1 of the present disclosure.
  • Figure 5 is a schematic structural diagram of a task scheduling device provided by Embodiment 3 of the present disclosure.
  • Figure 6 is a device block diagram of a central control unit according to an exemplary embodiment
  • FIG. 7 is a schematic structural diagram of an electronic device provided in Embodiment 5 of the present disclosure.
  • FIG. 1 shows an example of the operation data access process.
  • the storage data of the chip itself is very limited. Most of the operation data is stored in RAM. However, the latency of accessing RAM is very high.
  • the chip performs operations, the chip requires a large number of computing units to complete a task. The computing units all require corresponding computing data to complete related operations. If the computing unit in the chip directly retrieves resources from RAM, the chip The calculation efficiency will be very low. Therefore, by adding a cache, the cache retrieves the computing data required for the task performed by the current chip from RAM in advance, and then caches the computing resources in the cache, and the computing unit on the chip directly retrieves them from the cache. Obtain the required computing data, reduce the delay time for the computing unit on the chip to directly retrieve data from RAM, and improve the computing efficiency of the chip.
  • FIG 2 is a schematic flowchart of a task scheduling method provided by Embodiment 1 of the present disclosure. As shown in Figure 2, the method includes:
  • Step 101 Determine the tasks to be scheduled from the tasks issued by the software.
  • the tasks issued by the software are set with a number of consecutive scheduling times;
  • Step 102 Split the to-be-scheduled task into multiple sub-tasks, and divide the multiple sub-tasks according to the number of consecutive scheduling of the to-be-scheduled task to obtain at least one interval task of the to-be-scheduled task;
  • Step 103 For each interval task of the to-be-scheduled task, occupy interval affinity resources for the interval task from the resources currently visible to the to-be-scheduled task, and assign each sub-task in the interval task Continuously schedule to the affinity resources corresponding to the subtasks until all subtasks under the interval task have completed scheduling; wherein the interval affinity resources include affinity resources corresponding to all subtasks in the interval task.
  • Kernel tasks have a relatively large granularity and can be split into multiple subtasks.
  • the software delivers a task to the chip with the Kernel task granularity, it will also deliver the continuous scheduling attribute corresponding to the Kernel task, that is, the number of times the Kernel task can be continuously scheduled, which is written in the kernel as a parameter.
  • the continuous scheduling properties described above can be set flexibly through software.
  • the chip After receiving the task, the chip will identify the task that needs to be executed at the current moment among a large number of kernel tasks that need to be executed, which is the current task to be scheduled. And control the continuous scheduling interval by analyzing the continuous scheduling parameters of the current to-be-scheduled tasks, This allows the software to flexibly use the continuous scheduling attribute.
  • the continuous scheduling refers to uninterrupted scheduling of subtasks in the same continuous reading scheduling interval until all subtasks in a continuous reading scheduling interval are scheduled.
  • the current task to be scheduled is a kernel task that can be split into 10 subtasks, and the 10 subtasks are job0-job9, and the continuous read scheduling attribute specified by the software is 4, then 4 subtasks will be An interval is divided, for example, job0-job3 is an interval, job4-job7 is an interval, and then the remaining job8 and job9 are used as an interval, and the subtasks in the same interval are scheduled continuously without interruption.
  • the chip In addition to requiring relevant computing data to complete each scheduling task, the chip also needs to occupy certain computing resources.
  • the computing resources can be understood as computing units.
  • the chip needs to complete each scheduling task within a certain computing unit through the computing data corresponding to the scheduling task.
  • One IPU core is a computing unit.
  • Each kernel task is split into multiple subtasks.
  • the subtasks have two types, which can be divided into block tasks and UnionX tasks. Since the task types of block tasks and UnionX tasks are different, the number of computing units they need to occupy is different. , one IPU core is required to complete a block task, and X IPU clusters are required to complete a UnionX task.
  • the task types of all subtasks under the same kernel task are the same.
  • the kernel tasks and split subtasks issued by the software have some specific corresponding computing units to complete, which can complete all calculations of a certain task. Units are affinity resources corresponding to the task. In addition to corresponding computing data, each task also requires computing units to complete, but the required computing units must be affinity resources corresponding to the task.
  • affinity resources corresponding to each subtask in the interval are invisible to other kernel tasks. Subtasks can no longer occupy computing units that are no longer visible. All affinity resources corresponding to each subtask in the interval only provide computing units for the subtasks in the interval.
  • the affinity resources corresponding to all subtasks under the same kernel task are consistent. For example, when the first task in the above range of job0-job3, that is, after job0 is successfully scheduled, if the job under the kernel to which job0 belongs The affinity resources required by the task are IPU cluster1, IPU cluster2 and IPU cluster3.
  • affinity resources corresponding to a task are much larger than the actual resources required to complete the task. For example, to complete job0, you only need to use one computing unit in IPU cluster1. That is to say, to complete job0, you only need to occupy one IPU core in IPU cluster1. , then the remaining three IPU cores in IPU cluster1 will serve as affinity resources for other subtasks in the same range to choose from.
  • This example divides multiple subtasks into intervals and makes affinity resources of all subtasks in the interval invisible to other kernel tasks, and continuously schedules each subtask in the interval until all subtasks in the interval are All subtasks are scheduled. Because the processing data from the same kernel task will be highly correlated, such as instruction addresses, data addresses, parameters and other information, there will be a large number of repetitions. Therefore, when the computing unit executes subtasks from the same kernel task, access can be effectively improved. cache hit rate, thereby effectively improving chip computing efficiency.
  • the interval occupied by the interval task is released. Affinity resources, where released resources are visible to other tasks.
  • the occupied resource will be released and visible to other kernel tasks.
  • job0-job3 are successfully scheduled to the corresponding affinity resources, and the calculation of job0-job3 is completed on the computing resource, the affinity resources IPU cluster1, IPU cluster2 and IPU cluster3 of job0-job3 will be released. Visible to other kernel tasks.
  • FIG3 is a flow chart of another task scheduling method provided in Embodiment 1 of the present disclosure.
  • step 101 from the resources currently visible to the task to be scheduled, occupying interval affinity resources for the interval task includes:
  • Step 201 Obtain the resource mask corresponding to the sub-task in the interval task.
  • the resource mask corresponding to the sub-task represents the affinity resource corresponding to the sub-task;
  • Step 202 Use the resource represented by the mask among the resources currently visible to the task to be scheduled as the interval affinity resource, and occupy the interval affinity resource for the interval task.
  • Each Kernel task has its own corresponding affinity resource.
  • the IPU cluster included in the affinity resource corresponding to each Kernel task can be corresponding to the resource mask.
  • the resource mask It can be represented by binary numbers. For example, for the Kernel task to which job0 belongs, each IPU cluster can be represented by 1 if it is an affinity resource of job0, and can be represented by 0 if it is not an affinity resource of job0.
  • the resource mask used to represent the affinity resources of job0 is 0000000000001110
  • the digits of the resource mask represent the representation results of Kernel task affinity resources by IPU cluster0-IPU cluster15 from right to left. This example uses resource masks to represent the affinity resources corresponding to subtasks, and can accurately schedule the affinity resources corresponding to each subtask when scheduling subtasks.
  • FIG. 4 is a schematic flowchart of yet another task scheduling method provided by Embodiment 1 of the present disclosure, including:
  • Step 301 Obtain the number of consecutive scheduling times by parsing the task scheduling instruction and record it in the first register;
  • Step 302 Based on the first update condition, update the data in the second register.
  • the data in the second register represents the number of times that has been continuously scheduled.
  • the first update condition includes: if the number of times that has been successfully continuously scheduled is currently The number of times reaches the number of consecutive scheduling in the first register, or after performing the step of splitting the task to be scheduled into multiple subtasks, or receiving a termination scheduling instruction for the task to be scheduled, or currently reaching If the maximum degree of parallelism is reached, the second register will be cleared; otherwise, after each subtask scheduling is completed, the second register will be incremented by 1;
  • Step 303 Update the data in the third register based on the second update condition.
  • the data in the third register represents the interval affinity resources that currently need to be occupied by the interval task; wherein the second update condition includes: If the current number of successful continuous scheduling reaches the number of continuous scheduling in the first register, or after performing the step of splitting the to-be-scheduled task into multiple sub-tasks, or receiving a request for the to-be-scheduled task.
  • the scheduling instruction is terminated, or the maximum degree of parallelism is currently reached, the data in the third register is cleared; otherwise, after each subtask scheduling is completed, the affinity resource of the subtask is deleted from the third register.
  • continuous scheduling when the conditions for stopping continuous scheduling are met, continuous scheduling will stop. Before, it is necessary to first parse out the continuous scheduling parameters of the kernel task issued by the software. After the parameters are parsed, they are recorded in the first register in the chip for subsequent reference. In addition to the first register, there are also second registers and third registers in the chip. The main task of the second register is to maintain a table indicating the current number of continuous scheduling. During the continuous scheduling process, all the data are updated based on the first update condition. In addition to the count clearing condition, the first update condition also includes adding 1 to the count in the table every time the subtask is successfully scheduled.
  • the clearing conditions include: Condition 1: The current number of successful continuous scheduling reaches the number of continuous scheduling in the first register; Condition 2: After executing the step of splitting the to-be-scheduled task into multiple subtasks; Condition 3: The termination scheduling instruction for the task to be scheduled is received; Condition 4: The maximum degree of parallelism is currently reached.
  • the condition 1 refers to that the number of continuous scheduling reaches the continuous scheduling parameter recorded in the first register, and all subtasks in the current continuous scheduling interval are scheduled successfully, so the table maintained in the second register is cleared. Zero to facilitate recording the number of consecutive dispatches for the next time.
  • the condition 2 refers to that the task to be scheduled has just been split into multiple subtasks and needs to be continuously scheduled, which is equivalent to initialization and clearing.
  • the condition 3 refers to the fact that in the current continuous scheduling process, if there are subtasks that are not willing to be executed, then the current number of continuous scheduling is clear to facilitate recording the number of next continuous scheduling.
  • Condition 4 the maximum parallelism refers to the maximum number of resources occupied by a kernel.
  • the main task of the third register is to record the resources currently occupied by tasks to be scheduled and generate a dynamic table.
  • the dynamic table is updated based on the second update condition.
  • the second update condition is in addition to In addition to the clearing condition, when each subtask in the continuous read scheduling interval is successfully scheduled, the affinity resource of the subtask is deleted from the third register.
  • the affinity resources corresponding to the subtasks in the interval of job0-job3 are IPU cluster1, IPU cluster2 and IPU cluster3.
  • the dynamic table in the third register will delete the IPU cluster1 at this time. , IPU cluster2 and IPU cluster3, the IPU cluster1, IPU cluster2 and IPU cluster3 will not be visible to other tasks to be scheduled.
  • the affinity resources occupied by the execution of the subtask will be idle, so in After the dynamic table is updated next time, idle affinity resources will be displayed. For example, after job3 is successfully scheduled and job0 is completed, an IPU core in IPU cluster1 occupied by job0 will be displayed in the dynamic table at this time.
  • This example maintains a table representing the current number of continuous scheduling and a dynamic table of resources that can be occupied by the current subtask, to intuitively display the current number of continuous scheduling and resource occupancy, and avoid resource waste and conflicts.
  • continuously scheduling each sub-task in the interval task to the affinity resource corresponding to the sub-task includes:
  • the table maintained in the second register is a table representing the current number of consecutive scheduling times.
  • the number of times recorded in the table does not reach the parameters recorded in the first register, it means that the right to continue is continuously scheduled.
  • the subtasks to be scheduled are selected from the subtasks that have not been scheduled, and the current subtasks to be scheduled are scheduled to the corresponding affinity resources until the subtasks in the current continuous scheduling interval are all Scheduled to completion.
  • This example accurately selects the subtasks to be scheduled and the currently schedulable affinity resources by referring to the table indicating the current number of consecutive scheduling times and the dynamic table of resources that can be occupied by the current subtask, to avoid resource waste and conflicts.
  • multiple subtasks are divided into intervals, and affinity resources occupying all subtasks in the interval are invisible to other kernel tasks, and each subtask in the interval is continuously scheduled until the All subtasks are scheduled.
  • affinity resources are accurately selected for subtask scheduling, because all subtasks in the continuous scheduling interval come from the same kernel task.
  • the data between subtasks has strong correlation, so it can effectively improve the hit rate of accessing the cache and reduce the number of data switching in the cache, thus effectively improving the chip computing efficiency.
  • kernel tasks In the process of continuous read scheduling, first grasp the priority of kernel tasks, because kernel tasks are divided into different priority levels. High-level kernel tasks occupy affinity resources before low-level kernel tasks, and in the same level of kernel tasks , kernel tasks with large granularity occupy affinity resources before kernel tasks with small granularity.
  • the tasks issued by the software have priorities; and determining the tasks to be scheduled from the tasks issued by the software includes:
  • At least one task is selected from the tasks corresponding to each priority as the task to be scheduled, until the current available resources do not include affinity resources corresponding to any task; where , affinity resources corresponding to different tasks to be scheduled are different, and the available resources include resources other than the affinity resources corresponding to all current tasks to be scheduled.
  • the kernel task with high priority can first occupy the affinity resources to perform the corresponding operation. Only after all kernel tasks are successfully scheduled can the low-priority kernel tasks be scheduled.
  • kernel tasks of different priorities that need to be scheduled at the same time, according to the currently allocable affinity resources recorded in the dynamic table in the third register, in order from high to low priority, the tasks under each priority are determined. Tasks to be scheduled prepare for scheduling. When the affinity resources of tasks to be scheduled at different priorities conflict, the tasks to be scheduled with higher priority will be scheduled first. When the affinity resources of tasks to be scheduled at different priorities do not conflict, different priorities can be scheduled at the same time.
  • the central module is called the top layer of the task splitter (TS).
  • the top layer of the TS receives the requirements of the kernel tasks.
  • the kernel task affinity resource allocation corresponding to each priority is generated according to the priority. Specifically, the same
  • the affinity resources of all kernel tasks under the priority level perform a bitwise OR operation on the resource mask to facilitate recording the overall situation of the affinity resources occupied by the kernel task under each priority level.
  • the high priority includes kernel1, kernel2, kernel3, kernel4 and kernel5.
  • the resource masks are: 1110000000000000, 0110000000000000, 0001100000000000, 0000010000000000 and 00000001 respectively. 10000000, will give high priority
  • the result obtained after performing the resource mask bitwise OR operation on the affinity resources of all kernel tasks under the priority level is 1111110110000000. This means that the affinity resources required for all kernel tasks under the high priority level are cluster10-cluster15, cluster8 and cluster7.
  • the low priority includes kernel6, kernel7 and kernel8.
  • the resource masks are: 0000000011000000, 0000001100100000 and 0000000000011100.
  • the result obtained after performing the resource mask bitwise OR operation on the affinity resources of all kernel tasks under the low priority is 0000001111111100 , the affinity resources required by all kernel tasks under low priority are cluster2-cluster9. It can be seen that the conflicting affinity resources between the high-priority kernel task and the low-priority kernel task are only cluster8 and cluster7, so cluster8 and cluster7 will be occupied by the high-priority kernel task first, but After completing kernel6, you can call cluster6, and after completing kernel7, you can call cluster5 and cluster9. Therefore, at this time, the overall affinity resources between the high-priority kernel task and the low-priority kernel task do not conflict and can be scheduled at the same time.
  • Kernel6, Kernel7 and kernel8 do not have to wait for kernel1, kernel2, kernel3, kernel4 and kernel5 to be scheduled before they are executed.
  • This example uses the TS top layer to effectively allocate affinity resources between kernel tasks under each priority, while ensuring that high-priority kernel tasks are scheduled first, achieving the maximum utilization of computing resources and improving the chip's performance. Computational efficiency.
  • selecting at least one task from tasks corresponding to each priority level as the task to be scheduled includes:
  • the current task with the highest weight among the different tasks is used as the task to be scheduled, and the weights of other tasks among the different tasks are increased.
  • each kernel task has its own weight attribute.
  • the function of the weight attribute is that if the kernel task has a higher weight, it will be scheduled first if the tasks are at the same level. For example, kernel1, kernel2, kernel3 and kernel4 at the same level all have affinity resources only cluster1. Because kernel1 has the highest initial weight, kernel1 is scheduled successfully first. When kernel2, kernel3 and kernel4 are at the same level as kernel1, due to Affinity resource conflicts result in failure to be scheduled successfully, so at this time, increase the weights of kernel2, kernel3, and kernel4 to increase the probability of being scheduled successfully next time.
  • This embodiment realizes the orderly execution of kernel tasks and avoids the chaos of kernel task execution by utilizing the priority of kernel tasks and the weight attributes of kernel tasks at the same level.
  • FIG. 5 is a schematic structural diagram of a task scheduling device provided in Embodiment 3 of the present disclosure, including:
  • the determination module 51 is used to determine the tasks to be scheduled from the tasks issued by the software.
  • the tasks issued by the software are set with a number of consecutive scheduling times;
  • the processing module 52 is used to split the to-be-scheduled task into multiple sub-tasks, and according to the number of consecutive scheduling of the to-be-scheduled task, Divide the multiple subtasks to obtain at least one interval task of the to-be-scheduled tasks;
  • the processing module 52 is also configured to, for each interval task of the to-be-scheduled task, occupy interval affinity resources for the interval task from the resources currently visible to the to-be-scheduled task, and add the interval task to the interval task.
  • Kernel tasks have a relatively large granularity and can be split into multiple subtasks.
  • the software delivers a task to the chip with the Kernel task granularity, it will also deliver the continuous scheduling attribute corresponding to the Kernel task, that is, the number of times the Kernel task can be continuously scheduled, which is written in the kernel as a parameter.
  • the continuous scheduling properties described above can be set flexibly through software.
  • the determination module 51 will confirm the task that needs to be executed at the current moment among a large number of kernel tasks that need to be executed, that is, the current task to be scheduled.
  • the processing module 52 controls the continuous scheduling interval by parsing the continuous scheduling parameters of the current to-be-scheduled task, thereby allowing the software to flexibly use the continuous scheduling attributes.
  • the continuous scheduling refers to uninterrupted scheduling of sub-schedules in the same consecutive reading scheduling interval. tasks until all subtasks in a continuous reading scheduling interval are scheduled.
  • the current task to be scheduled is a kernel task that can be split into 10 subtasks, and the 10 subtasks are job0-job9, and the continuous read scheduling attribute specified by the software is 4, then 4 subtasks will be An interval is divided, for example, job0-job3 is an interval, job4-job7 is an interval, and then the remaining job8 and job9 are used as an interval, and the subtasks in the same interval are scheduled continuously without interruption.
  • the chip In addition to requiring relevant computing data to complete each scheduling task, the chip also needs to occupy certain computing resources.
  • the computing resources can be understood as computing units.
  • the chip needs to complete each scheduling task within a certain computing unit through the computing data corresponding to the scheduling task.
  • One IPU core is a computing unit.
  • Each kernel task is split into multiple subtasks.
  • the subtasks have two types, which can be divided into block tasks and UnionX tasks. Since the task types of block tasks and UnionX tasks are different, the number of computing units they need to occupy is different. , one IPU core is required to complete a block task, and X IPU clusters are required to complete a UnionX task.
  • the task types of all subtasks under the same kernel task are the same.
  • the kernel tasks and split subtasks issued by the software have some specific corresponding computing units to complete, which can complete all calculations of a certain task. Units are affinity resources corresponding to the task. In addition to corresponding computing data, each task also requires computing units to complete, but the required computing units must be affinity resources corresponding to the task.
  • affinity resources corresponding to each subtask in the interval are invisible to other kernel tasks. Subtasks can no longer occupy computing units that are no longer visible. All affinity resources corresponding to each subtask in the interval only provide computing units for the subtasks in the interval. The affinity resources corresponding to all subtasks under the same kernel task are consistent.
  • the affinity resources required by the task are IPU cluster1, IPU cluster2 and IPU cluster3, when job0 is successfully scheduled, it will occupy IPU cluster1, IPU cluster2 and IPU cluster3 and will not be visible to other kernel tasks.
  • the affinity resources corresponding to a task are much larger than the actual resources required to complete the task. For example, to complete job0, you only need to use one computing unit in IPU cluster1, which means that to complete job0, you only need to occupy one IPU core in IPU cluster1. , then the remaining three IPU cores in IPU cluster1 will serve as affinity resources for other subtasks in the same range to choose from.
  • This example processing module divides multiple subtasks into intervals and makes affinity resources of all subtasks in the interval invisible to other kernel tasks, and continuously schedules each subtask in the interval until the end of the interval. All subtasks are scheduled. Because the processing data from the same kernel task will be highly correlated, such as instruction addresses, data addresses, parameters and other information, there will be a large number of repetitions. Therefore, when the computing unit executes subtasks from the same kernel task, access can be effectively improved. The cache hit rate reduces the number of data switching times in the cache, thereby effectively improving chip computing efficiency.
  • the processing module 52 is also configured to release the interval affinity resources occupied by the interval task if all subtasks in the interval task have completed scheduling, where the released resources are critical to other tasks. visible.
  • the occupied resource will be released and visible to other kernel tasks.
  • job0-job3 are successfully scheduled to the corresponding affinity resources, and the calculation of job0-job3 is completed on the computing resource, the affinity resources IPU cluster1, IPU cluster2 and IPU cluster3 of job0-job3 will be released. Visible to other kernel tasks.
  • the processing module 52 is specifically configured to obtain the resource mask corresponding to the sub-task in the interval task, where the resource mask corresponding to the sub-task represents the affinity resource corresponding to the sub-task;
  • the processing module 52 is specifically also configured to use the resource represented by the mask among the resources currently visible to the task to be scheduled as the interval affinity resource, and occupy the interval affinity resource for the interval task.
  • Each Kernel task has its own corresponding affinity resource.
  • the IPU cluster included in the affinity resource corresponding to each Kernel task can be corresponding to the resource mask.
  • the resource mask It can be represented by binary numbers. For example, for the Kernel task to which job0 belongs, each IPU cluster can be represented by 1 if it is an affinity resource of job0, and can be represented by 0 if it is not an affinity resource of job0.
  • the resource mask used to represent the affinity resources of job0 is 0000000000001110
  • the digits of the resource mask represent the representation results of Kernel task affinity resources by IPU cluster0-IPU cluster15 from right to left. This example uses resource masks to represent the affinity resources corresponding to subtasks, and can accurately schedule the affinity resources corresponding to each subtask when scheduling subtasks.
  • the processing module 52 is specifically configured to obtain the number of consecutive scheduling times by parsing the task scheduling instruction and record it in the first register;
  • the processing module 52 is specifically also configured to update the data in the second register based on the first update condition.
  • the data in the second register represents the number of times that has been continuously scheduled; wherein the first update condition includes: if the current The number of consecutive successful dispatches has reached the specified number The number of consecutive scheduling in a register, or after executing the step of splitting the task to be scheduled into multiple subtasks, or receiving a termination scheduling instruction for the task to be scheduled, or currently reaching the maximum degree of parallelism, then Clear the second register; otherwise, add 1 to the second register each time after completing the scheduling of the subtask;
  • the processing module 52 is specifically also configured to update the data in the third register based on the second update condition.
  • the data in the third register represents the interval affinity resources that currently need to be occupied by the interval task; wherein, the third register
  • the second update condition includes: if the current number of successful continuous scheduling reaches the number of continuous scheduling in the first register, or after executing the step of splitting the to-be-scheduled task into multiple subtasks, or after receiving a request for all If the termination scheduling instruction of the task to be scheduled is specified, or the maximum parallelism is currently reached, the data in the third register will be cleared; otherwise, after each subtask is scheduled, the subtask will be deleted from the third register. Affinity Resources.
  • continuous scheduling will stop. Before that, it is necessary to first parse the continuous scheduling parameters of the kernel task issued by the software. When the parameters are parsed Finally, record it in the first register in the chip for subsequent reference. In addition to the first register, there are also second registers and third registers in the chip. The main task of the second register is to maintain a table indicating the current number of continuous scheduling. During the continuous scheduling process, all the data are updated based on the first update condition. In addition to the count clearing condition, the first update condition also includes adding 1 to the count in the table every time the subtask is successfully scheduled.
  • the clearing conditions include: Condition 1: The current number of successful continuous scheduling reaches the number of continuous scheduling in the first register; Condition 2: After executing the step of splitting the to-be-scheduled task into multiple subtasks; Condition 3: The termination scheduling instruction for the task to be scheduled is received; Condition 4: The maximum degree of parallelism is currently reached.
  • the condition 1 refers to that the number of continuous scheduling reaches the continuous scheduling parameter recorded in the first register, and all subtasks in the current continuous scheduling interval are scheduled successfully, so the table maintained in the second register is cleared. Zero to facilitate recording the number of consecutive dispatches for the next time.
  • the condition 2 refers to that the task to be scheduled has just been split into multiple subtasks and needs to be continuously scheduled, which is equivalent to initialization and clearing.
  • the condition 3 refers to the fact that in the current continuous scheduling process, if there are subtasks that are not willing to be executed, then the current number of continuous scheduling is clear to facilitate recording the number of next continuous scheduling.
  • Condition 4 the maximum parallelism refers to the maximum number of resources occupied by a kernel.
  • the main task of the third register is to record the resources currently occupied by tasks to be scheduled and generate a dynamic table.
  • the dynamic table is updated based on the second update condition.
  • the second update condition is in addition to In addition to the clearing condition, when each subtask in the continuous read scheduling interval is successfully scheduled, the affinity resource of the subtask is deleted from the third register.
  • the affinity resources corresponding to the subtasks in the interval of job0-job3 are IPU cluster1, IPU cluster2 and IPU cluster3.
  • the dynamic table in the third register will delete the IPU cluster1 at this time. , IPU cluster2 and IPU cluster3, the IPU cluster1, IPU cluster2 and IPU cluster3 will not be visible to other tasks to be scheduled.
  • the affinity resources occupied by the execution of the subtask will be idle, so in After the dynamic table is updated next time, idle affinity resources will be displayed. For example, after job3 is successfully scheduled and job0 is completed, an IPU core in IPU cluster1 occupied by job0 will be displayed in the dynamic table at this time.
  • This example maintains a table representing the current number of continuous scheduling and a dynamic table of resources that can be occupied by the current subtask, to intuitively display the current number of continuous scheduling and resource occupancy, and avoid resource waste and conflicts.
  • the processing module 52 is specifically configured to determine the unscheduled subtasks in the interval task if the data in the second register does not reach the number of consecutive scheduling times in the first register;
  • the processing module 52 is specifically also configured to select a subtask to be scheduled from the unscheduled subtasks, and schedule the subtask to be scheduled to the affinity resource corresponding to the subtask until the interval task All subtasks in are scheduled.
  • the table maintained in the second register is a table representing the current number of consecutive scheduling times.
  • the number of times recorded in the table does not reach the parameters recorded in the first register, it means that the right to continue is continuously scheduled.
  • the subtasks to be scheduled are selected from the subtasks that have not been scheduled, and the current subtasks to be scheduled are scheduled to the corresponding affinity resources until the subtasks in the current continuous scheduling interval are all Scheduled to completion.
  • This example accurately selects the subtasks to be scheduled and the currently schedulable affinity resources by referring to the table indicating the current number of consecutive scheduling times and the dynamic table of resources that can be occupied by the current subtask, to avoid resource waste and conflicts.
  • the processing module is specifically configured to select at least one task from tasks corresponding to each priority as the to-be-scheduled task according to the priority from high to low according to the currently available resources.
  • the current available resources do not include affinity resources corresponding to any tasks; the affinity resources corresponding to different tasks to be scheduled are different, and the available resources include resources other than the affinity resources corresponding to all currently to be scheduled tasks.
  • the kernel task with high priority can first occupy the affinity resources to perform the corresponding operation. Only after all kernel tasks are successfully scheduled can the low-priority kernel tasks be scheduled.
  • kernel tasks of different priorities that need to be scheduled at the same time, according to the currently allocable affinity resources recorded in the dynamic table in the third register, in order from high to low priority, the tasks under each priority are determined. Tasks to be scheduled prepare for scheduling. When the affinity resources of tasks to be scheduled at different priorities conflict, the tasks to be scheduled with higher priority will be scheduled first. When the affinity resources of tasks to be scheduled at different priorities do not conflict, tasks with different priorities can be scheduled at the same time.
  • the central module is called the top layer of the task splitter (TS).
  • the top layer of the TS receives the requirements of the kernel tasks.
  • the kernel task affinity resource allocation corresponding to each priority is generated according to the priority.
  • the resource mask bitwise OR operation is performed on the affinity resources of all kernel tasks under the same priority level, so as to facilitate recording the overall situation of the affinity resources occupied by the kernel task under each priority level.
  • the priority levels include kernel1, kernel2, kernel3, kernel4 and kernel5.
  • the resource masks are: 1110000000000000, 0110000000000000, 0001100000000000, 0000010000000000 and 0000000110000000. All kers under high priority will be
  • the affinity resource of the nel task performs a bitwise OR of the resource mask.
  • the result obtained after the operation is 1111110110000000, which means that the affinity resources required by all kernel tasks under high priority are cluster10-cluster15, cluster8 and cluster7.
  • the low priority includes kernel6, kernel7 and kernel8.
  • the resource masks are: 0000000011000000, 0000001100100000 and 0000000000011100.
  • the result obtained after performing the resource mask bitwise OR operation on the affinity resources of all kernel tasks under the low priority is 0000001111111100 , the affinity resources required by all kernel tasks under low priority are cluster2-cluster9. It can be seen that the conflicting affinity resources between the high-priority kernel task and the low-priority kernel task are only cluster8 and cluster7, so cluster8 and cluster7 will be occupied by the high-priority kernel task first, but After completing kernel6, you can call cluster6, and after completing kernel7, you can call cluster5 and cluster9. Therefore, at this time, the overall affinity resources between the high-priority kernel task and the low-priority kernel task do not conflict and can be scheduled at the same time. Kernel6, Kernel7 and kernel8 do not have to wait for kernel1, kernel2, kernel3, kernel4 and kernel5 to be scheduled before they are executed.
  • This example uses the TS top layer to effectively allocate affinity resources between kernel tasks under each priority, while ensuring that high-priority kernel tasks are scheduled first, achieving the maximum utilization of computing resources and improving the chip's performance. Computational efficiency.
  • the processing module is further configured to, if there are different tasks with overlapping affinity resources among the tasks corresponding to the priorities, use the task with the highest weight among the current different tasks as the to-be-listed task. Schedule tasks and increase the weight of other tasks among the different tasks.
  • each kernel task has its own weight attribute.
  • the function of the weight attribute is that if the kernel task has a higher weight, it will be scheduled first if the tasks are at the same level. For example, kernel1, kernel2, kernel3 and kernel4 at the same level all have affinity resources only cluster1. Because kernel1 has the highest initial weight, kernel1 is scheduled successfully first. When kernel2, kernel3 and kernel4 are at the same level as kernel1, due to Affinity resource conflicts result in failure to be scheduled successfully, so at this time, increase the weights of kernel2, kernel3, and kernel4 to increase the probability of being scheduled successfully next time.
  • This embodiment realizes the orderly execution of kernel tasks and avoids the chaos of kernel task execution by utilizing the priority of kernel tasks and the weight attributes of kernel tasks at the same level.
  • Figure 6 is a device block diagram of a central control unit according to an exemplary embodiment.
  • the device may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, or a fitness device. Personal digital assistants, etc.
  • Device 800 may include one or more of the following components: processing component 802 , memory 804 , power supply component 806 , multimedia component 808 , audio component 810 , input/output (I/O) interface 812 , sensor component 814 , and communications component 816 .
  • Processing component 802 generally controls the overall operations of device 800, such as operations associated with display, phone calls, data communications, camera operations, and recording operations.
  • the processing component 802 may include one or more processors 820 to execute instructions to complete the entire method described above. all or part of the steps.
  • processing component 802 may include one or more modules that facilitate interaction between processing component 802 and other components.
  • processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.
  • Memory 804 is configured to store various types of data to support operations at device 800 . Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, etc.
  • Memory 804 may be implemented by any type of volatile or non-volatile storage device, or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EEPROM), Programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EEPROM erasable programmable read-only memory
  • EPROM Programmable read-only memory
  • PROM programmable read-only memory
  • ROM read-only memory
  • magnetic memory flash memory, magnetic or optical disk.
  • Power supply component 806 provides power to the various components of device 800.
  • Power supply components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to device 800 .
  • Multimedia component 808 includes a screen that provides an output interface between the device 800 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide action.
  • multimedia component 808 includes a front-facing camera and/or a rear-facing camera.
  • the front camera and/or the rear camera may receive external multimedia data.
  • Each front-facing camera and rear-facing camera can be a fixed optical lens system or have a focal length and optical zoom capabilities.
  • Audio component 810 is configured to output and/or input audio signals.
  • audio component 810 includes a microphone (MIC) configured to receive external audio signals when device 800 is in operating modes, such as call mode, recording mode, and speech recognition mode. The received audio signal may be further stored in memory 804 or sent via communication component 816 .
  • audio component 810 also includes a speaker for outputting audio signals.
  • the I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module, which may be a keyboard, a click wheel, a button, etc. These buttons may include, but are not limited to: Home button, Volume buttons, Start button, and Lock button.
  • Sensor component 814 includes one or more sensors that provide various aspects of status assessment for device 800 .
  • the sensor component 814 can detect the open/closed state of the device 800, the relative positioning of components, such as the display and keypad of the device 800, and the sensor component 814 can also detect a change in position of the device 800 or a component of the device 800. , the presence or absence of user contact with the device 800 , device 800 orientation or acceleration/deceleration and temperature changes of the device 800 .
  • Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact.
  • Sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • Communication component 816 is configured to facilitate wired or wireless communication between apparatus 800 and other devices.
  • Device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof.
  • the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication Component 816 also includes a near field communication (NFC) module to facilitate short-range communications.
  • NFC near field communication
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • apparatus 800 may be configured by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable Gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are implemented for executing the above method.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field programmable Gate array
  • controller microcontroller, microprocessor or other electronic components are implemented for executing the above method.
  • a non-transitory computer-readable storage medium including instructions such as a memory 804 including instructions, which are executable by the processor 820 of the apparatus 800 to complete the above method is also provided.
  • the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
  • FIG. 7 is a schematic structural diagram of an electronic device provided in Embodiment 5 of the present disclosure. As shown in the figure, the electronic device includes:
  • the electronic device also includes a processor 291 and a memory 292; it may also include a communication interface 293 and a bus 294. Among them, the processor 291, the memory 292, and the communication interface 293 can communicate with each other through the bus 294. Communication interface 293 may be used for information transmission.
  • the processor 291 can call logical instructions in the memory 294 to execute the methods of the above embodiments.
  • the above-mentioned logical instructions in the memory 292 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product.
  • the memory 292 can be used to store software programs, computer-executable programs, such as program instructions/modules corresponding to the methods in the embodiments of the present disclosure.
  • the processor 291 executes software programs, instructions and modules stored in the memory 292 to execute functional applications and data processing, that is, to implement the methods in the above method embodiments.
  • the memory 292 may include a stored program area and a stored data area, where the stored program area may store an operating system and an application program required for at least one function; the stored data area may store data created according to the use of the terminal device, etc.
  • the memory 292 may include high-speed random access memory and may also include non-volatile memory.
  • Embodiments of the present disclosure provide a non-transitory computer-readable storage medium.
  • Computer-executable instructions are stored in the computer-readable storage medium. When executed by a processor, the computer-executable instructions are used to implement the methods described in the previous embodiments. method.
  • Embodiments of the present disclosure provide a computer program product, which includes a computer program.
  • the computer program is executed by a processor, the method described in the foregoing embodiments is implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System (AREA)

Abstract

本公开提供一种任务调度方法、装置、设备及介质。其中方法包括确定待调度任务后将待调度任务拆分成多个子任务,并按照连续调度次数,将所述多个子任务进行划分,得到待调度任务的至少一个区间任务;针对待调度任务的每个区间任务,从待调度任务当前可见的资源中,为所述区间任务占据区间亲和资源,以及,将所述区间任务中的每个子任务连续调度至所述子任务对应的亲和资源,直至所述区间任务下的所有子任务均完成调度;本方案通过将多个子任务按区间进行划分,并连续调度所述区间中每个子任务直至所述区间下的所有子任务均完成调度,提升了完成所述区间下所有子任务所使用数据的局部性,减少了数据在缓存器中的切换次数,从而提高芯片的计算效率。

Description

任务调度方法、装置、设备及介质
本公开要求于2022年9月13日提交中国专利局、申请号为202211110283.6、申请名称为“任务调度方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及芯片运算技术领域,尤其设计一种任务调度方法、装置、设备及介质。
背景技术
人工智能(Artificial Intelligence,简称AI)芯片做运算时,需要访问大量的运算数据,AI芯片本身的存储数据能力有限,访问随机存取存储器(Random Access Memory,简称RAM)去取得计算相关的数据的延时很高。
目前,AI计算芯片都会通过缓存器来做数据的缓存,在缓存器中缓存当前计算任务所需要的数据,从而减少芯片中计算单元访问RAM需要的延时。
但是,当执行的计算任务相关性不高,需要缓存器缓存不同的数据时,会增加缓存器访问RAM的次数,增加缓存器切换数据的次数,降低了AI芯片的计算效率。
发明内容
本公开提供一种任务调度方法、装置、设备及介质,用来提升芯片计算效率。
一方面,本公开提供一种任务调度方法,包括:
从软件下发的任务中确定待调度任务,所述软件下发的任务设定有连续调度次数;
将所述待调度任务拆分成多个子任务,并按照所述待调度任务的连续调度次数,将所述多个子任务进行划分,得到所述待调度任务的至少一个区间任务;
针对所述待调度任务的每个区间任务,从所述待调度任务当前可见的资源中,为所述区间任务占据区间亲和资源,以及,将所述区间任务中的每个子任务连续调度至所述子任务对应的亲和资源,直至所述区间任务下的所有子任务均完成调度;其中,所述区间亲和资源包括所述区间任务中所有子任务对应的亲和资源的集合,其中被占据的资源对其它任务不可见。
在一种实施例中,所述方法还包括:
若所述区间任务中的所有子任务均完成调度,则释放所述区间任务占据的所述区间亲和资源,其中被释放的资源对其它任务可见。
在一种实施例中,从所述待调度任务当前可见的资源中,为所述区间任务占据区间亲和资源,包括:
获取所述区间任务中子任务对应的资源掩码,所述子任务对应的资源掩码表征所述子任务对应的亲和资源;
将所述待调度任务当前可见的资源中所述掩码表征的资源,作为所述区间亲和资源,为所述区间任务占用所述区间亲和资源。
在一种实施例中,所述方法还包括:
通过解析所述任务调度指令,获得所述连续调度次数并记录在第一寄存器中;
基于第一更新条件,更新第二寄存器中的数据,所述第二寄存器中的数据表征当前已连续调度的次数;其中,所述第一更新条件包括:若当前已成功连续调度的次数达到所述第一寄存器中的连续调度次数,或者执行所述将所述待调度任务拆分成多个子任务的步骤后,或者收到针对所述待调度任务的终止调度指令,或者当前达到最大并行度,则对所述第二寄存器清零;否则,每次完成子任务的调度后,将所述第二寄存器执行加1处理;
基于第二更新条件,更新第三寄存器中的数据,所述第三寄存器中的数据表征当前需为所述区间任务占据的区间亲和资源;其中,所述第二更新条件包括:若当前已成功连续调度的次数达到所述第一寄存器中的连续调度次数,或者执行所述将所述待调度任务拆分成多个子任务的步骤后,或者收到针对所述待调度任务的终止调度指令,或者当前达到最大并行度,则清除所述第三寄存器中的数据;否则,每次完成子任务的调度后,从所述第三寄存器中删除该子任务的亲和资源。
在一种实施例中,所述将所述区间任务中的每个子任务连续调度至所述子任务对应的亲和资源,包括:
若第二寄存器中的数据未达到第一寄存器中的连续调度次数,则确定所述区间任务中未完成调度的子任务;
从所述未完成调度的子任务中选取待调度的子任务,将所述待调度的子任务调度至所述子任务对应的亲和资源,直至所述区间任务中所有子任务均完成调度。
在一种实施例中,所述软件下发的任务具有优先级;所述从软件下发的任务中确定待调度任务,包括:
根据当前的可用资源,按照优先级自高向低,从每个优先级对应的任务中选取至少一个任务作为所述待调度任务,直至当前的可用资源不包括任意任务对应的亲和资源;其中,不同待调度任务对应的亲和资源不同,所述可用资源包括除当前的所有待调度任务对应的亲和资源以外的资源。
在一种实施例中,所述从每个优先级对应的任务中选取至少一个任务作为所述待调度任务,包括:
若所述优先级对应的任务中存在亲和资源有重合的不同任务,则将当前所述不同任务中权重最大的任务作为所述待调度任务,并增加所述不同任务中其它任务的权重。
另一方面,本公开提供一种任务调度装置,包括:
确定模块,用于从软件下发的任务中确定待调度任务,所述软件下发的任务设定有连续调度次数;
处理模块,用于将所述待调度任务拆分成多个子任务,并按照所述待调度任务的连续调度次数,将所述多个子任务进行划分,得到所述待调度任务的至少一个区间任务;
处理模块,还用于针对所述待调度任务的每个区间任务,从所述待调度任务当前可见的资源中,为所述区间任务占据区间亲和资源,以及,将所述区间任务中的每个子任务连续调度至所述子任务对应的亲和资源,直至所述区间任务下的所有子任务均完成调度;其中,所述区间亲和资源包括所述区间任务中所有子任务对应的亲和资源的集合,其中被占据的资源对其它任务不可见。
在一种实施例中,所述处理模块,还用于若所述区间任务中的所有子任务均完成调度,则释放所述区间任务占据的所述区间亲和资源,其中被释放的资源对其它任务可见。
在一种实施例中,所述处理模块,具体用于获取所述区间任务中子任务对应的资源掩码,所述子任务对应的资源掩码表征所述子任务对应的亲和资源;
所述处理模块,具体还用于将所述待调度任务当前可见的资源中所述掩码表征的资源,作为所述区间亲和资源,为所述区间任务占用所述区间亲和资源。
在一种实施例中,所述处理模块,具体还用于通过解析所述任务调度指令,获得所述连续调度次数并记录在第一寄存器中;
所述处理模块,具体还用于基于第一更新条件,更新第二寄存器中的数据,所述第二寄存器中的数据表征当前已连续调度的次数;其中,所述第一更新条件包括:若当前已成功连续调度的次数达到所述第一寄存器中的连续调度次数,或者执行所述将所述待调度任务拆分成多个子任务的步骤后,或者收到针对所述待调度任务的终止调度指令,或者当前达到最大并行度,则对所述第二寄存器清零;否则,每次完成子任务的调度后,将所述第二寄存器执行加1处理;
所述处理模块,具体还用于基于第二更新条件,更新第三寄存器中的数据,所述第三寄存器中的数据表征当前需为所述区间任务占据的区间亲和资源;其中,所述第二更新条件包括:若当前已成功连续调度的次数达到所述第一寄存器中的连续调度次数,或者执行所述将所述待调度任务拆分成多个子任务的步骤后,或者收到针对所述待调度任务的终止调度指令,或者当前达到最大并行度,则清除所述第三寄存器中的数据;否则,每次完成子任务的调度后,从所述第三寄存器中删除该子任务的亲和资源。
在一种实施例中,所述处理模块,具体还用于若第二寄存器中的数据未达到第一寄存器中的连续调度次数,则确定所述区间任务中未完成调度的子任务;
所述处理模块,具体还用于从所述未完成调度的子任务中选取待调度的子任务,将所述待调度的子任务调度至所述子任务对应的亲和资源,直至所述区间任务中所有子任务均完成调度。
在一种实施例中,所述处理模块,具体还用于根据当前的可用资源,按照优先级自高向低,从每个优先级对应的任务中选取至少一个任务作为所述待调度任务,直至当前的可用资源不包括任意任务对应的亲和资源;其中,不同待调度任务对应的亲和资源不同,所述可用资源包括除当前的所有待调度任务对应的亲和资源以外的资源。
在一种实施例中,所述处理模块,具体还用于若所述优先级对应的任务中存在亲和资源有重合的不同任务,则将当前所述不同任务中权重最大的任务作为所述待调度任务,并增加所述不同任务中其它任务的权重。
又一方面,本公开提供一种电子设备,包括:处理器,以及与所述处理器通信连接的存储器;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,以实现如前所述的方法。
又一方面,本公开提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,所述计算机执行指令被处理器执行时用于实现如前所述的方法。
本公开提供的任务调度方法、装置、设备及介质中,根据软件下发的任务确定出待调度任务,并将所述待调度任务拆分成多个子任务,以及根据解析出的所述待调度任务对应的连续调度次数,将所述多个子任务按区间进行划分,针对所述待调度任务的每个区间任务,从所述待调度任务当前可见的资源中,为所述区间任务占据区间亲和资源,将所述区间任务中的每个子任务连续调度至所述子任务对应的亲和资源,直至所述区间任务下的所有子任务均完成调度。本方案通过将多个子任务按区间进行划分,并连续调度所述区间中每个子任务直至所述区间下的所有子任务均完成调度,由于执行同一任务下的子任务需要的数据具有强相关性,所以连续调度同一区间下的子任务可以有效减少缓存器对于RAM的访存,减少了数据在缓存器中的切换次数,从而有效提升芯片计算效率。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开实施例的原理。
图1为示例的运算数据访问过程;
图2为本公开实施例一提供的任务调度方法的流程示意图;
图3为本公开实施例一提供的另一种任务调度方法的流程示意图;
图4为本公开实施例一提供的又一种任务调度方法的流程示意图;
图5为本公开实施例三提供的任务调度装置的结构示意图;
图6是根据一示例性实施例示出的一种中央控制单元的装置框图;
图7为本公开实施例五中提供的一种电子设备的结构示意图。
通过上述附图,已示出本公开明确的实施例,后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本公开构思的范围,而是通过参考特定实施例为本领域技术人员说明本公开的概念。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
需要说明的是,本公开中对于术语的简要说明,仅是为了方便理解接下来描述的实施方式,而不是意图限定本公开的实施方式。除非另有说明,这些术语应当按照其普通和通常的含义理解。
图1为示例的运算数据访问过程,芯片本身的存储数据非常有限,大部分运算数据均存储在RAM中,然而访问RAM的延时很高。当芯片做运算时,芯片完成一个任务需要用到大量的计算单元,所述计算单元都需要相应的运算数据来完成相关的运算,若由芯片中的计算单元直接从RAM中调取资源,芯片的计算效率会很低。所以通过增设缓存器,缓存器提前从RAM中调取当前芯片所执行的任务所需要的运算数据,再将所述运算资源缓存在缓存器中,由芯片上的计算单元直接从缓存器中调取需要的运算数据,减少芯片上的计算单元直接从RAM中调取数据的延时时间,提高芯片的计算效率。
下面以具体的实施例对本公开的技术方案以及本公开的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。在本公开的描述中,除非另有明确的规定和限定,各术语应在本领域内做广义理解。下面将结合附图,对本公开的实施例进行描述。
实施例一
图2为本公开实施例一提供的任务调度方法的流程示意图,如图2所示,该方法包括:
步骤101、从软件下发的任务中确定待调度任务,所述软件下发的任务设定有连续调度次数;
步骤102、将所述待调度任务拆分成多个子任务,并按照所述待调度任务的连续调度次数,将所述多个子任务进行划分,得到所述待调度任务的至少一个区间任务;
步骤103、针对所述待调度任务的每个区间任务,从所述待调度任务当前可见的资源中,为所述区间任务占据区间亲和资源,以及,将所述区间任务中的每个子任务连续调度至所述子任务对应的亲和资源,直至所述区间任务下的所有子任务均完成调度;其中,所述区间亲和资源包括所述区间任务中所有子任务对应的亲和资源的集合,其中被占据的资源对其它任务不可见。
结合场景示例:在软件运行的过程中,会生成大量需要被执行的任务,软件会将所述任务下发给芯片执行。软件下发的任务被称为Kernel任务,所述Kernel任务的粒度比较大,可以拆分成多个子任务。当软件给芯片以Kernel任务粒度的大小给芯片下发任务时,会同时下发所述Kernel任务对应的连续调度属性,即该Kernel任务可以连续调度的次数,作为一个参数写在kernel里,所述连续调度属性可通过软件灵活性设置。芯片在收到任务后,会在大量需要被执行的kernel任务中确认出当前时刻下需要被执行的任务,也就是当前的待调度任务。并通过解析当前的待调度任务的连续调度参数控制连续调度的区间, 从而让软件灵活的使用连续调度属性,所述连续调度指的是不间断的调度同一个连读调度区间中的子任务,直到一个连读调度区间中的子任务全部被调度完成。比如,若当前的待调度任务是一个可以被拆分成10个子任务的kernel任务,所述10个子任务分别为job0-job9,并且软件指定的连读调度属性为4,就以4个子任务为一个区间进行划分,比如job0-job3为一个区间,job4-job7为一个区间,再将剩下的job8与job9作为一个区间,同一个区间中的子任务进行不间断的连续调度。
芯片完成每个调度任务除了需要相关的运算数据外,还需要占据一定的计算资源,所述计算资源可以理解为计算单元,芯片需要通过调度任务对应的运算数据在一定的计算单元内完成每个调度任务。在一个芯片中,一共有16个IPU cluster,然后每个IPU cluster包含4个IPU core,所述一个IPU core就是一个计算单元。每个kernel任务拆分出多个子任务,其中所述子任务具有两种类型,可以分为block任务与UnionX任务,由于block任务与UnionX任务的任务类型不同,所以需要占据的计算单元的多少不同,其中完成一个block任务需占用一个IPU core,完成一个UnionX任务需要占用X个IPU cluster,同一个kernel任务下的所有子任务的任务类型一致。
芯片中的所有计算单元并不全都符合完成某一个任务的条件,软件下发的kernel任务以及拆分出的子任务都有一些特定对应的一些计算单元来完成,可以完成某一任务的所有计算单元都是该任务对应的亲和资源。每个任务除了需要对应的运算数据外,还需要计算单元才可完成,但是所述需要的计算单元必须是任务对应的亲和资源。
为了完成一个区间中所有子任务的连续调度,当一个区间中的首个子任务被调度成功后,那么所述区间中每个子任务对应的所有亲和资源对其他kernel任务不可见,其他kernel任务的子任务不可以再占用已不可见的计算单元,所述区间中每个子任务对应的所有亲和资源只为所述区间中的子任务提供计算单元。同一个kernel任务下的所有子任务对应的亲和资源是一致的,比如,当上述job0-job3这一个区间中的首个任务,也就是job0被调度成功后,假如job0所属的kernel下的job任务需要的亲和资源有IPU cluster1、IPU cluster2与IPU cluster3,那当job0被调度成功后,将占据IPU cluster1、IPU cluster2与IPU cluster3,并对其他kernel任务不可见。往往一个任务对应的亲和资源都比完成该任务需要的实际资源要大很多,比如完成job0只需要用到IPU cluster1中的一个计算单元,就是说完成job0只需要占用IPU cluster1中的一个IPU core,那么在IPU cluster1中剩余的其他3个IPU core将作为同一区间中其他子任务可供选择的亲和资源。
本示例通过将多个子任务按区间进行划分,并通过占据所述区间中所有子任务的亲和资源对其他kernel任务不可见,连续调度所述区间中每个子任务,直至所述区间下的所有子任务均完成调度。因为来自同一个kernel任务的处理数据会有强相关性,比如指令地址、数据地址、参数等信息会有大量的重复,所以当计算单元执行来自同一个kernel任务的子任务时,可以有效提升访问cache的命中率,从而有效提升芯片计算效率。
在一种示例中,若所述区间任务中的所有子任务均完成调度,则释放所述区间任务占据的所述区间 亲和资源,其中被释放的资源对其它任务可见。
当该区间中每个子任务均被调度成功到亲和资源上并完成每个子任务的计算后,将释放被占据的资源对其他kernel任务可见。比如当job0-job3均被成调度功到对应的亲和资源上,并在计算资源上完成job0-job3的计算后,会释放所述job0-job3的亲和资源IPU cluster1、IPU cluster2与IPU cluster3对其他kernel任务可见。
在一种示例中,图3为本公开实施例一提供的另一种任务调度方法的流程示意图,在步骤101中,从所述待调度任务当前可见的资源中,为所述区间任务占据区间亲和资源,包括:
步骤201、获取所述区间任务中子任务对应的资源掩码,所述子任务对应的资源掩码表征所述子任务对应的亲和资源;
步骤202、将所述待调度任务当前可见的资源中所述掩码表征的资源,作为所述区间亲和资源,为所述区间任务占用所述区间亲和资源。
在一个芯片上一共有16个IPU cluster,每个Kernel任务都有各自对应的亲和资源,每个Kernel任务对应的亲和资源中包括的IPU cluster可用资源掩码来对应,所述资源掩码可以采用二进制数字来表征。比如针对job0所属的Kernel任务来说,每个IPU cluster若是job0的亲和资源可用1来表示,若不是job0的亲和资源可用0来表示。假如job0的亲和资源有IPU cluster1、IPU cluster2与IPU cluster3,那么只有IPU cluster1、IPU cluster2与IPU cluster3为1,其他IPU cluster均为0,那么用来表征job0的亲和资源的资源掩码为0000000000001110,资源掩码的数位从右到左依次代表IPU cluster0-IPU cluster15对Kernel任务亲和资源的表征结果。本实例采用资源掩码的方式表征子任务对应的亲和资源,可在调度子任务时准确调度到每个子任务对应的亲和资源上。
在一种示例中,图4为本公开实施例一提供的又一种任务调度方法的流程示意图,包括:
步骤301、通过解析所述任务调度指令,获得所述连续调度次数并记录在第一寄存器中;
步骤302、基于第一更新条件,更新第二寄存器中的数据,所述第二寄存器中的数据表征当前已连续调度的次数;其中,所述第一更新条件包括:若当前已成功连续调度的次数达到所述第一寄存器中的连续调度次数,或者执行所述将所述待调度任务拆分成多个子任务的步骤后,或者收到针对所述待调度任务的终止调度指令,或者当前达到最大并行度,则对所述第二寄存器清零;否则,每次完成子任务的调度后,将所述第二寄存器执行加1处理;
步骤303、基于第二更新条件,更新第三寄存器中的数据,所述第三寄存器中的数据表征当前需为所述区间任务占据的区间亲和资源;其中,所述第二更新条件包括:若当前已成功连续调度的次数达到所述第一寄存器中的连续调度次数,或者执行所述将所述待调度任务拆分成多个子任务的步骤后,或者收到针对所述待调度任务的终止调度指令,或者当前达到最大并行度,则清除所述第三寄存器中的数据;否则,每次完成子任务的调度后,从所述第三寄存器中删除该子任务的亲和资源。
根据场景示例,执行连续调度的过程中,当满足连续调度停止的条件时,连续调度将停止,在此之 前,需要首先解析出软件下发的kernel任务的连续调度参数,当解析出所述参数后,将其记录在芯片中的第一寄存器中,以供后续参考。除了第一寄存器,芯片中还存在第二寄存器与第三寄存器,所述第二寄存器主要的任务是维护一个表示当前连续调度次数的表格,在连续调度的过程中,基于第一更新条件更新所述表格,所述第一更新条件除了包括次数清零条件,好包括若子任务每成功调度一次,则将表格中的次数加1。所述清零条件包括:条件1:当前已成功连续调度的次数达到所述第一寄存器中的连续调度次数;条件2:执行所述将待调度任务被拆分成多个子任务的步骤后;条件3:收到针对所述待调度任务的终止调度指令;条件4:当前达到最大并行度。
所述条件1指的是连续调度的次数达到所述第一寄存器中记录的连续调度参数,当前连续调度区间中的所有子任务均被调度成功,所以对所述第二寄存器中维护的表格清零以方便记录下次连续调度的次数。所述条件2指的是待调度任务刚被拆分成多个子任务需要进行连续调度,相当于初始化清零。所述条件3指的是当前连续调度的过程中,存在子任务不惜要被执行,则清楚目前连续调度的次数以方便记录下次连续调度的次数。所述条件4,最大并行度是指一个kernel最多占用的资源数量,若一个kernel只允许被占用4个cluster,那4就是他的最大并行度,当目前同时执行的子任务达到对的kernel的最大并行度,则清楚目前连续调度的次数以方便记录下次连续调度的次数。
所述第三寄存器主要的任务是记录当前待调度任务可占用的资源,并生成一个动态表格,在连续调度的过程中,基于第二更新条件更新所述动态表格,所述第二更新条件除了包括清零条件外,当连读调度区间内每个子任务被调度成功后,从所述第三寄存器中删除该子任务的亲和资源。具体的,比如job0-job3这一个区间中子任务对应的亲和资源为IPU cluster1、IPU cluster2与IPU cluster3,当job0调度成功后,此时所述第三寄存器中动态表格将删除所述IPU cluster1、IPU cluster2与IPU cluster3,所述IPU cluster1、IPU cluster2与IPU cluster3将对其他待调度任务不可见。
需要说明的是,若执行job0需要占用IPU cluster1中的一个IPU core,当job0调度成功后,与所述job0同一个区间内的其他子任务不会再占用所述job0占用的IPU core,但是当job0被执行完毕,不需要再占用计算资源时,可对同一个区间内的其他子任务释放出所占用的IPU core。
在一种示例中,若是在此区间内的所有子任务在全部调度成功之前,之前已被调度成功的子任务已完成,则执行该子任务所占用的亲和资源会被闲置出来,所以在动态表格下一次更新后,会显示出被闲置的亲和资源。比如在job3调度成功后,job0已完成,那被job0占用的IPU cluster1中的一个IPU core会显示在此时的动态表格中。
本示例通过维护表示当前连续调度次数的表格与当前子任务可占用的资源的动态表格,直观的表现出当前连续调度的次数与资源占用情况,避免造成资源的浪费与冲突。
在一种示例中,所述将所述区间任务中的每个子任务连续调度至所述子任务对应的亲和资源,包括:
若第二寄存器中的数据未达到第一寄存器中的连续调度次数,则确定所述区间任务中未完成调度的子任务;
从所述未完成调度的子任务中选取待调度的子任务,将所述待调度的子任务调度至所述子任务对应的亲和资源,直至所述区间任务中所有子任务均完成调度。
当连续调度的任务在正常执行时,所述第二寄存器中维护的表格是表示当前连续调度次数的表格,当表格中记录的次数没有达到所述第一寄存器中记录的参数时,说明当权连续调度区间中仍有没有被调度,则从没有被调度的子任务中选取待调度子任务,将当前的待调度子任务调度到对应的亲和资源上,直到当前连续调度区间中的子任务都被调度完成。本示例通过参考表示当前连续调度次数的表格与当前子任务可占用的资源的动态表格,准确的选取待调度子任务与当前可调度的亲和资源,避免造成资源的浪费与冲突。
本实施例过将多个子任务按区间进行划分,并通过占据所述区间中所有子任务的亲和资源对其他kernel任务不可见,连续调度所述区间中每个子任务,直至所述区间下的所有子任务均完成调度。并通过参考表示当前连续调度次数的表格与当前子任务可占用的资源的动态表格,为子任务的调度准确的选取亲和资源,因为连续调度区间中的所有子任务都时来自同一个kernel任务,子任务之间的数据具有强相关性,所以可以有效提升访问cache的命中率,减少了数据在缓存器中的切换次数,从而有效提升芯片计算效率。
实施例2
在连读调度的过程中,首先把握kernel任务的优先级,因为kernel任务分为不同的优先等级,高等级的kernel任务优先于低等级的kernel任务占用亲和资源,并且在同等级的kernel任务中,粒度大的kernel任务优先于粒度小的kernel任务占用亲和资源。
在一种示例中,所述软件下发的任务具有优先级;所述从软件下发的任务中确定待调度任务,包括:
根据当前的可用资源,按照优先级自高向低,从每个优先级对应的任务中选取至少一个任务作为所述待调度任务,直至当前的可用资源不包括任意任务对应的亲和资源;其中,不同待调度任务对应的亲和资源不同,所述可用资源包括除当前的所有待调度任务对应的亲和资源以外的资源。
结合场景示例,当同时存在多个kernel任务需要被调度时,因为可用的计算资源是有限的,所以优先级高的kernel任务可首先占用亲和资源来执行相应的运算,当高优先级下的kernel任务均被调度成功后,才可以对低优先级下的kernel任务进行任务调度。当同时存在不同优先级的kernel任务需要被调度时,根据所述第三寄存器中动态表格记录的当前可分配亲和资源,按照优先级由高到低的顺序,确定出每个优先级下的待调度任务为调度做准备。不同优先级下的待调度任务的亲和资源发生冲突时,优先调度优先级高的待调度任务,当不同优先级下的待调度任务的亲和资源不发生冲突时,可同时调度不同优先级下的待调度任务。可选择将不同等级的kernel任务需要占用的亲和资源都传输到任务调度系统的中枢模块,所述中枢模块被称作任务分离器(Task spliter,简称TS)顶层,TS顶层收到kernel任务需要占用的亲和资源后,根据优先级生成每个优先级对应的kernel任务亲和资源分配情况。具体的,将同一个 优先级下所有kernel任务的亲和资源进行资源掩码的位或操作,方便记录每一个优先级下kernel任务需占用的亲和资源的整体情况。当不同优先级下的kernel任务所占用的亲和资源不冲突时,即使是不同优先级下的kernel任务可以同时执行调度。比如,现在存在两个优先级的kernel任务均需要被调度,高优先级下包括kernel1、kernel2、kernel3、kernel4与kernel5,资源掩码分别为:1110000000000000、0110000000000000、0001100000000000、0000010000000000与0000000110000000,将高优先级下的所有kernel任务的亲和资源进行资源掩码的位或操作后得到的结果时1111110110000000,这代表高优先级下的所有kernel任务需要用到的亲和资源为cluster10-cluster15、cluster8与cluster7。低优先级下包括kernel6、kernel7与kernel8,资源掩码分别为:0000000011000000、0000001100100000与0000000000011100,将低优先级下的所有kernel任务的亲和资源进行资源掩码的位或操作后得到的结果时0000001111111100,低优先级下的所有kernel任务需要用到的亲和资源为cluster2-cluster9。由此可见,由此可见高优先级的kernel任务与低优先级的kernel任务之间,发生冲突的亲和资源只有cluster8与cluster7,所以cluster8与cluster7会被高优先级的kernel任务优先占用,但是完成kernel6可以调用cluster6,完成kernel7可以调用cluster5与cluster9,所以此时高优先级下的kernel任务与低优先级的kernel任务之间的亲和资源整体是不冲突的,可以同时被调度,kernel6、kernel7与kernel8不必等kernel1、kernel2、kernel3、kernel4与kernel5全部调度完后才执行被调度。本实例通过TS顶层对每个优先级下kernel任务之间的亲和资源的有效分配,在保障高优先级的kernel任务优先被调度的情况下,实现计算资源的最大使用程度,提高了芯片的计算效率。
在一种示例中,所述从每个优先级对应的任务中选取至少一个任务作为所述待调度任务,包括:
若所述优先级对应的任务中存在亲和资源有重合的不同任务,则将当前所述不同任务中权重最大的任务作为所述待调度任务,并增加所述不同任务中其它任务的权重。
结合场景示例,每个kernel任务自带权重的属性,权重属性的作用是在同等级的情况下,kernel任务的权重高,就优先被调度。比如同一等级下的kernel1、kernel2、kernel3与kernel4,他们的亲和资源均只有cluster1,因为kernel1的初始权重最高,所以kernel1被首先调度成功,kernel2、kernel3和kernel4与kernel1同等级的情况下,由于亲和资源冲突导致没有被调度成功,所以此时增加kernel2、kernel3和kernel4的权重,增加下次被调度成功的机率。
本实施例通过利用kernel任务的优先级,与同等级下kernel任务的权重属性,实现了kernel任务的有序执行,避免了kernel任务执行的杂乱。
实施例三
图5为本公开实施例三提供的一种任务调度装置的结构示意图,包括:
确定模块51,用于从软件下发的任务中确定待调度任务,所述软件下发的任务设定有连续调度次数;
处理模块52,用于将所述待调度任务拆分成多个子任务,并按照所述待调度任务的连续调度次数, 将所述多个子任务进行划分,得到所述待调度任务的至少一个区间任务;
处理模块52,还用于针对所述待调度任务的每个区间任务,从所述待调度任务当前可见的资源中,为所述区间任务占据区间亲和资源,以及,将所述区间任务中的每个子任务连续调度至所述子任务对应的亲和资源,直至所述区间任务下的所有子任务均完成调度;其中,所述区间亲和资源包括所述区间任务中所有子任务对应的亲和资源的集合,其中被占据的资源对其它任务不可见。
结合场景示例:在软件运行的过程中,会生成大量需要被执行的任务,软件会将所述任务下发给芯片执行。软件下发的任务被称为Kernel任务,所述Kernel任务的粒度比较大,可以拆分成多个子任务。当软件给芯片以Kernel任务粒度的大小给芯片下发任务时,会同时下发所述Kernel任务对应的连续调度属性,即该Kernel任务可以连续调度的次数,作为一个参数写在kernel里,所述连续调度属性可通过软件灵活性设置。芯片在收到任务后,确定模块51会在大量需要被执行的kernel任务中确认出当前时刻下需要被执行的任务,也就是当前的待调度任务。处理模块52通过解析当前的待调度任务的连续调度参数控制连续调度的区间,从而让软件灵活的使用连续调度属性,所述连续调度指的是不间断的调度同一个连读调度区间中的子任务,直到一个连读调度区间中的子任务全部被调度完成。比如,若当前的待调度任务是一个可以被拆分成10个子任务的kernel任务,所述10个子任务分别为job0-job9,并且软件指定的连读调度属性为4,就以4个子任务为一个区间进行划分,比如job0-job3为一个区间,job4-job7为一个区间,再将剩下的job8与job9作为一个区间,同一个区间中的子任务进行不间断的连续调度。
芯片完成每个调度任务除了需要相关的运算数据外,还需要占据一定的计算资源,所述计算资源可以理解为计算单元,芯片需要通过调度任务对应的运算数据在一定的计算单元内完成每个调度任务。在一个芯片中,一共有16个IPU cluster,然后每个IPU cluster包含4个IPU core,所述一个IPU core就是一个计算单元。每个kernel任务拆分出多个子任务,其中所述子任务具有两种类型,可以分为block任务与UnionX任务,由于block任务与UnionX任务的任务类型不同,所以需要占据的计算单元的多少不同,其中完成一个block任务需占用一个IPU core,完成一个UnionX任务需要占用X个IPU cluster,同一个kernel任务下的所有子任务的任务类型一致。
芯片中的所有计算单元并不全都符合完成某一个任务的条件,软件下发的kernel任务以及拆分出的子任务都有一些特定对应的一些计算单元来完成,可以完成某一任务的所有计算单元都是该任务对应的亲和资源。每个任务除了需要对应的运算数据外,还需要计算单元才可完成,但是所述需要的计算单元必须是任务对应的亲和资源。
为了完成一个区间中所有子任务的连续调度,当一个区间中的首个子任务被调度成功后,那么所述区间中每个子任务对应的所有亲和资源对其他kernel任务不可见,其他kernel任务的子任务不可以再占用已不可见的计算单元,所述区间中每个子任务对应的所有亲和资源只为所述区间中的子任务提供计算单元。同一个kernel任务下的所有子任务对应的亲和资源是一致的,比如,当上述job0-job3这一个区间中的首个任务,也就是job0被调度成功后,假如job0所属的kernel下的job任务需要的亲和资源有IPU  cluster1、IPU cluster2与IPU cluster3,那当job0被调度成功后,将占据IPU cluster1、IPU cluster2与IPU cluster3,并对其他kernel任务不可见。往往一个任务对应的亲和资源都比完成该任务需要的实际资源要大很多,比如完成job0只需要用到IPU cluster1中的一个计算单元,就是说完成job0只需要占用IPU cluster1中的一个IPU core,那么在IPU cluster1中剩余的其他3个IPU core将作为同一区间中其他子任务可供选择的亲和资源。
本示例处理模块通过将多个子任务按区间进行划分,并通过占据所述区间中所有子任务的亲和资源对其他kernel任务不可见,连续调度所述区间中每个子任务,直至所述区间下的所有子任务均完成调度。因为来自同一个kernel任务的处理数据会有强相关性,比如指令地址、数据地址、参数等信息会有大量的重复,所以当计算单元执行来自同一个kernel任务的子任务时,可以有效提升访问cache的命中率,减少了数据在缓存器中的切换次数,从而有效提升芯片计算效率。
在一种示例中,处理模块52,还用于若所述区间任务中的所有子任务均完成调度,则释放所述区间任务占据的所述区间亲和资源,其中被释放的资源对其它任务可见。
当该区间中每个子任务均被调度成功到亲和资源上并完成每个子任务的计算后,将释放被占据的资源对其他kernel任务可见。比如当job0-job3均被成调度功到对应的亲和资源上,并在计算资源上完成job0-job3的计算后,会释放所述job0-job3的亲和资源IPU cluster1、IPU cluster2与IPU cluster3对其他kernel任务可见。
在一种示例中,处理模块52,具体用于获取所述区间任务中子任务对应的资源掩码,所述子任务对应的资源掩码表征所述子任务对应的亲和资源;
处理模块52,具体还用于将所述待调度任务当前可见的资源中所述掩码表征的资源,作为所述区间亲和资源,为所述区间任务占用所述区间亲和资源。
在一个芯片上一共有16个IPU cluster,每个Kernel任务都有各自对应的亲和资源,每个Kernel任务对应的亲和资源中包括的IPU cluster可用资源掩码来对应,所述资源掩码可以采用二进制数字来表征。比如针对job0所属的Kernel任务来说,每个IPU cluster若是job0的亲和资源可用1来表示,若不是job0的亲和资源可用0来表示。假如job0的亲和资源有IPU cluster1、IPU cluster2与IPU cluster3,那么只有IPU cluster1、IPU cluster2与IPU cluster3为1,其他IPU cluster均为0,那么用来表征job0的亲和资源的资源掩码为0000000000001110,资源掩码的数位从右到左依次代表IPU cluster0-IPU cluster15对Kernel任务亲和资源的表征结果。本实例采用资源掩码的方式表征子任务对应的亲和资源,可在调度子任务时准确调度到每个子任务对应的亲和资源上。
在一种示例中,处理模块52,具体还用于通过解析所述任务调度指令,获得所述连续调度次数并记录在第一寄存器中;
处理模块52,具体还用于基于第一更新条件,更新第二寄存器中的数据,所述第二寄存器中的数据表征当前已连续调度的次数;其中,所述第一更新条件包括:若当前已成功连续调度的次数达到所述第 一寄存器中的连续调度次数,或者执行所述将所述待调度任务拆分成多个子任务的步骤后,或者收到针对所述待调度任务的终止调度指令,或者当前达到最大并行度,则对所述第二寄存器清零;否则,每次完成子任务的调度后,将所述第二寄存器执行加1处理;
处理模块52,具体还用于基于第二更新条件,更新第三寄存器中的数据,所述第三寄存器中的数据表征当前需为所述区间任务占据的区间亲和资源;其中,所述第二更新条件包括:若当前已成功连续调度的次数达到所述第一寄存器中的连续调度次数,或者执行所述将所述待调度任务拆分成多个子任务的步骤后,或者收到针对所述待调度任务的终止调度指令,或者当前达到最大并行度,则清除所述第三寄存器中的数据;否则,每次完成子任务的调度后,从所述第三寄存器中删除该子任务的亲和资源。
根据场景示例,执行连续调度的过程中,当满足连续调度停止的条件时,连续调度将停止,在此之前,需要首先解析出软件下发的kernel任务的连续调度参数,当解析出所述参数后,将其记录在芯片中的第一寄存器中,以供后续参考。除了第一寄存器,芯片中还存在第二寄存器与第三寄存器,所述第二寄存器主要的任务是维护一个表示当前连续调度次数的表格,在连续调度的过程中,基于第一更新条件更新所述表格,所述第一更新条件除了包括次数清零条件,好包括若子任务每成功调度一次,则将表格中的次数加1。所述清零条件包括:条件1:当前已成功连续调度的次数达到所述第一寄存器中的连续调度次数;条件2:执行所述将待调度任务被拆分成多个子任务的步骤后;条件3:收到针对所述待调度任务的终止调度指令;条件4:当前达到最大并行度。
所述条件1指的是连续调度的次数达到所述第一寄存器中记录的连续调度参数,当前连续调度区间中的所有子任务均被调度成功,所以对所述第二寄存器中维护的表格清零以方便记录下次连续调度的次数。所述条件2指的是待调度任务刚被拆分成多个子任务需要进行连续调度,相当于初始化清零。所述条件3指的是当前连续调度的过程中,存在子任务不惜要被执行,则清楚目前连续调度的次数以方便记录下次连续调度的次数。所述条件4,最大并行度是指一个kernel最多占用的资源数量,若一个kernel只允许被占用4个cluster,那4就是他的最大并行度,当目前同时执行的子任务达到对的kernel的最大并行度,则清楚目前连续调度的次数以方便记录下次连续调度的次数。
所述第三寄存器主要的任务是记录当前待调度任务可占用的资源,并生成一个动态表格,在连续调度的过程中,基于第二更新条件更新所述动态表格,所述第二更新条件除了包括清零条件外,当连读调度区间内每个子任务被调度成功后,从所述第三寄存器中删除该子任务的亲和资源。具体的,比如job0-job3这一个区间中子任务对应的亲和资源为IPU cluster1、IPU cluster2与IPU cluster3,当job0调度成功后,此时所述第三寄存器中动态表格将删除所述IPU cluster1、IPU cluster2与IPU cluster3,所述IPU cluster1、IPU cluster2与IPU cluster3将对其他待调度任务不可见。
需要说明的是,若执行job0需要占用IPU cluster1中的一个IPU core,当job0调度成功后,与所述job0同一个区间内的其他子任务不会再占用所述job0占用的IPU core,但是当job0被执行完毕,不需要再占用计算资源时,可对同一个区间内的其他子任务释放出所占用的IPU core。
在一种示例中,若是在此区间内的所有子任务在全部调度成功之前,之前已被调度成功的子任务已完成,则执行该子任务所占用的亲和资源会被闲置出来,所以在动态表格下一次更新后,会显示出被闲置的亲和资源。比如在job3调度成功后,job0已完成,那被job0占用的IPU cluster1中的一个IPU core会显示在此时的动态表格中。
本示例通过维护表示当前连续调度次数的表格与当前子任务可占用的资源的动态表格,直观的表现出当前连续调度的次数与资源占用情况,避免造成资源的浪费与冲突。
在一种示例中,处理模块52,具体还用于若第二寄存器中的数据未达到第一寄存器中的连续调度次数,则确定所述区间任务中未完成调度的子任务;
处理模块52,具体还用于从所述未完成调度的子任务中选取待调度的子任务,将所述待调度的子任务调度至所述子任务对应的亲和资源,直至所述区间任务中所有子任务均完成调度。
当连续调度的任务在正常执行时,所述第二寄存器中维护的表格是表示当前连续调度次数的表格,当表格中记录的次数没有达到所述第一寄存器中记录的参数时,说明当权连续调度区间中仍有没有被调度,则从没有被调度的子任务中选取待调度子任务,将当前的待调度子任务调度到对应的亲和资源上,直到当前连续调度区间中的子任务都被调度完成。本示例通过参考表示当前连续调度次数的表格与当前子任务可占用的资源的动态表格,准确的选取待调度子任务与当前可调度的亲和资源,避免造成资源的浪费与冲突。
在一种示例中,所述处理模块,具体还用于根据当前的可用资源,按照优先级自高向低,从每个优先级对应的任务中选取至少一个任务作为所述待调度任务,直至当前的可用资源不包括任意任务对应的亲和资源;其中,不同待调度任务对应的亲和资源不同,所述可用资源包括除当前的所有待调度任务对应的亲和资源以外的资源。
结合场景示例,当同时存在多个kernel任务需要被调度时,因为可用的计算资源是有限的,所以优先级高的kernel任务可首先占用亲和资源来执行相应的运算,当高优先级下的kernel任务均被调度成功后,才可以对低优先级下的kernel任务进行任务调度。当同时存在不同优先级的kernel任务需要被调度时,根据所述第三寄存器中动态表格记录的当前可分配亲和资源,按照优先级由高到低的顺序,确定出每个优先级下的待调度任务为调度做准备。不同优先级下的待调度任务的亲和资源发生冲突时,优先调度优先级高的待调度任务,当不同优先级下的待调度任务的亲和资源不发生冲突时,可同时调度不同优先级下的待调度任务。可选择将不同等级的kernel任务需要占用的亲和资源都传输到任务调度系统的中枢模块,所述中枢模块被称作任务分离器(Task spliter,简称TS)顶层,TS顶层收到kernel任务需要占用的亲和资源后,根据优先级生成每个优先级对应的kernel任务亲和资源分配情况。具体的,将同一个优先级下所有kernel任务的亲和资源进行资源掩码的位或操作,方便记录每一个优先级下kernel任务需占用的亲和资源的整体情况。当不同优先级下的kernel任务所占用的亲和资源不冲突时,即使是不同优先级下的kernel任务可以同时执行调度。比如,现在存在两个优先级的kernel任务均需要被调度,高优 先级下包括kernel1、kernel2、kernel3、kernel4与kernel5,资源掩码分别为:1110000000000000、0110000000000000、0001100000000000、0000010000000000与0000000110000000,将高优先级下的所有kernel任务的亲和资源进行资源掩码的位或操作后得到的结果时1111110110000000,这代表高优先级下的所有kernel任务需要用到的亲和资源为cluster10-cluster15、cluster8与cluster7。低优先级下包括kernel6、kernel7与kernel8,资源掩码分别为:0000000011000000、0000001100100000与0000000000011100,将低优先级下的所有kernel任务的亲和资源进行资源掩码的位或操作后得到的结果时0000001111111100,低优先级下的所有kernel任务需要用到的亲和资源为cluster2-cluster9。由此可见,由此可见高优先级的kernel任务与低优先级的kernel任务之间,发生冲突的亲和资源只有cluster8与cluster7,所以cluster8与cluster7会被高优先级的kernel任务优先占用,但是完成kernel6可以调用cluster6,完成kernel7可以调用cluster5与cluster9,所以此时高优先级下的kernel任务与低优先级的kernel任务之间的亲和资源整体是不冲突的,可以同时被调度,kernel6、kernel7与kernel8不必等kernel1、kernel2、kernel3、kernel4与kernel5全部调度完后才执行被调度。
本实例通过TS顶层对每个优先级下kernel任务之间的亲和资源的有效分配,在保障高优先级的kernel任务优先被调度的情况下,实现计算资源的最大使用程度,提高了芯片的计算效率。
在一种示例中,所述处理模块,具体还用于若所述优先级对应的任务中存在亲和资源有重合的不同任务,则将当前所述不同任务中权重最大的任务作为所述待调度任务,并增加所述不同任务中其它任务的权重。
结合场景示例,每个kernel任务自带权重的属性,权重属性的作用是在同等级的情况下,kernel任务的权重高,就优先被调度。比如同一等级下的kernel1、kernel2、kernel3与kernel4,他们的亲和资源均只有cluster1,因为kernel1的初始权重最高,所以kernel1被首先调度成功,kernel2、kernel3和kernel4与kernel1同等级的情况下,由于亲和资源冲突导致没有被调度成功,所以此时增加kernel2、kernel3和kernel4的权重,增加下次被调度成功的机率。
本实施例通过利用kernel任务的优先级,与同等级下kernel任务的权重属性,实现了kernel任务的有序执行,避免了kernel任务执行的杂乱。
实施例四
图6是根据一示例性实施例示出的一种中央控制单元的装置框图,该装置可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。
装置800可以包括以下一个或多个组件:处理组件802,存储器804,电源组件806,多媒体组件808,音频组件810,输入/输出(I/O)接口812,传感器组件814,以及通信组件816。
处理组件802通常控制装置800的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件802可以包括一个或多个处理器820来执行指令,以完成上述的方法的全 部或部分步骤。此外,处理组件802可以包括一个或多个模块,便于处理组件802和其他组件之间的交互。例如,处理组件802可以包括多媒体模块,以方便多媒体组件808和处理组件802之间的交互。
存储器804被配置为存储各种类型的数据以支持在装置800的操作。这些数据的示例包括用于在装置800上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器804可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电源组件806为装置800的各种组件提供电力。电源组件806可以包括电源管理系统,一个或多个电源,及其他与为装置800生成、管理和分配电力相关联的组件。
多媒体组件808包括在所述装置800和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件808包括一个前置摄像头和/或后置摄像头。当装置800处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件810被配置为输出和/或输入音频信号。例如,音频组件810包括一个麦克风(MIC),当装置800处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器804或经由通信组件816发送。在一些实施例中,音频组件810还包括一个扬声器,用于输出音频信号。
I/O接口812为处理组件802和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件814包括一个或多个传感器,用于为装置800提供各个方面的状态评估。例如,传感器组件814可以检测到装置800的打开/关闭状态,组件的相对定位,例如所述组件为装置800的显示器和小键盘,传感器组件814还可以检测装置800或装置800一个组件的位置改变,用户与装置800接触的存在或不存在,装置800方位或加速/减速和装置800的温度变化。传感器组件814可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件814还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件814还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件816被配置为便于装置800和其他设备之间有线或无线方式的通信。装置800可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信组件816经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信 组件816还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在示例性实施例中,装置800可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器804,上述指令可由装置800的处理器820执行以完成上述方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
实施例五
图7为本公开实施例五中提供的一种电子设备的结构示意图,如图所示,该电子设备包括:
处理器(processor)291,电子设备还包括了存储器(memory)292;还可以包括通信接口(Communication Interface)293和总线294。其中,处理器291、存储器292、通信接口293、可以通过总线294完成相互间的通信。通信接口293可以用于信息传输。处理器291可以调用存储器294中的逻辑指令,以执行上述实施例的方法。
此外,上述的存储器292中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。
存储器292作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序,如本公开实施例中的方法对应的程序指令/模块。处理器291通过运行存储在存储器292中的软件程序、指令以及模块,从而执行功能应用以及数据处理,即实现上述方法实施例中的方法。
存储器292可包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端设备的使用所创建的数据等。此外,存储器292可以包括高速随机存取存储器,还可以包括非易失性存储器。
本公开实施例提供一种非临时性计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,所述计算机执行指令被处理器执行时用于实现如前述实施例所述的方法。
本公开实施例提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如前述实施例所述的方法。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求书指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其 范围进行各种修改和改变。本公开的范围仅由所附的权利要求书来限制。

Claims (16)

  1. 一种任务调度方法,其特征在于,包括:
    从软件下发的任务中确定待调度任务,所述软件下发的任务设定有连续调度次数;
    将所述待调度任务拆分成多个子任务,并按照所述待调度任务的连续调度次数,将所述多个子任务进行划分,得到所述待调度任务的至少一个区间任务;
    针对所述待调度任务的每个区间任务,从所述待调度任务当前可见的资源中,为所述区间任务占据区间亲和资源,以及,将所述区间任务中的每个子任务连续调度至所述子任务对应的亲和资源,直至所述区间任务下的所有子任务均完成调度;其中,所述区间亲和资源包括所述区间任务中所有子任务对应的亲和资源的集合,其中被占据的资源对其它任务不可见。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    若所述区间任务中的所有子任务均完成调度,则释放所述区间任务占据的所述区间亲和资源,其中被释放的资源对其它任务可见。
  3. 根据权利要求1或2所述的方法,其特征在于,从所述待调度任务当前可见的资源中,为所述区间任务占据区间亲和资源,包括:
    获取所述区间任务中子任务对应的资源掩码,所述子任务对应的资源掩码表征所述子任务对应的亲和资源;
    将所述待调度任务当前可见的资源中所述掩码表征的资源,作为所述区间亲和资源,为所述区间任务占用所述区间亲和资源。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述方法还包括:
    通过解析所述任务调度指令,获得所述连续调度次数并记录在第一寄存器中;
    基于第一更新条件,更新第二寄存器中的数据,所述第二寄存器中的数据表征当前已连续调度的次数;其中,所述第一更新条件包括:若当前已成功连续调度的次数达到所述第一寄存器中的连续调度次数,或者执行所述将所述待调度任务拆分成多个子任务的步骤后,或者收到针对所述待调度任务的终止调度指令,或者当前达到最大并行度,则对所述第二寄存器清零;否则,每次完成子任务的调度后,将所述第二寄存器执行加1处理;
    基于第二更新条件,更新第三寄存器中的数据,所述第三寄存器中的数据表征当前需为所述区间任务占据的区间亲和资源;其中,所述第二更新条件包括:若当前已成功连续调度的次数达到所述第一寄存器中的连续调度次数,或者执行所述将所述待调度任务拆分成多个子任务的步骤后,或者收到针对所述待调度任务的终止调度指令,或者当前达到最大并行度,则清除所述第三寄存器中的数据;否则,每次完成子任务的调度后,从所述第三寄存器中删除该子任务的亲和资源。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,将所述区间任务中的每个子任务连续调度至所述子任务对应的亲和资源,包括:
    若第二寄存器中的数据未达到第一寄存器中的连续调度次数,则确定所述区间任务中未完成调度的子任务;
    从所述未完成调度的子任务中选取待调度的子任务,将所述待调度的子任务调度至所述子任务对应的亲和资源,直至所述区间任务中所有子任务均完成调度。
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述软件下发的任务具有优先级;所述从软件下发的任务中确定待调度任务,包括:
    根据当前的可用资源,按照优先级自高向低,从每个优先级对应的任务中选取至少一个任务作为所述待调度任务,直至当前的可用资源不包括任意任务对应的亲和资源;其中,不同待调度任务对应的亲和资源不同,所述可用资源包括除当前的所有待调度任务对应的亲和资源以外的资源。
  7. 根据权利要求6所述的方法,其特征在于,所述从每个优先级对应的任务中选取至少一个任务作为所述待调度任务,包括:
    若所述优先级对应的任务中存在亲和资源有重合的不同任务,则将当前所述不同任务中权重最大的任务作为所述待调度任务,并增加所述不同任务中其它任务的权重。
  8. 一种任务调度装置,其特征在于,包括:
    确定模块,用于从软件下发的任务中确定待调度任务,所述软件下发的任务设定有连续调度次数;
    处理模块,用于将所述待调度任务拆分成多个子任务,并按照所述待调度任务的连续调度次数,将所述多个子任务进行划分,得到所述待调度任务的至少一个区间任务;
    处理模块,还用于针对所述待调度任务的每个区间任务,从所述待调度任务当前可见的资源中,为所述区间任务占据区间亲和资源,以及,将所述区间任务中的每个子任务连续调度至所述子任务对应的亲和资源,直至所述区间任务下的所有子任务均完成调度;其中,所述区间亲和资源包括所述区间任务中所有子任务对应的亲和资源的集合,其中被占据的资源对其它任务不可见。
  9. 根据权利要求8所述的装置,其特征在于,
    所述处理模块,还用于若所述区间任务中的所有子任务均完成调度,则释放所述区间任务占据的所述区间亲和资源,其中被释放的资源对其它任务可见。
  10. 根据权利要求8或9所述的装置,其特征在于,
    所述处理模块,具体用于获取所述区间任务中子任务对应的资源掩码,所述子任务对应的资源掩码表征所述子任务对应的亲和资源;
    所述处理模块,具体还用于将所述待调度任务当前可见的资源中所述掩码表征的资源,作为所述区间亲和资源,为所述区间任务占用所述区间亲和资源。
  11. 根据权利要求8-10任一项所述的装置,其特征在于,
    所述处理模块,具体还用于通过解析所述任务调度指令,获得所述连续调度次数并记录在第一寄存器中;
    所述处理模块,具体还用于基于第一更新条件,更新第二寄存器中的数据,所述第二寄存器中的数据表征当前已连续调度的次数;其中,所述第一更新条件包括:若当前已成功连续调度的次数达到所述第一寄存器中的连续调度次数,或者执行所述将所述待调度任务拆分成多个子任务的步骤后,或者收到针对所述待调度任务的终止调度指令,或者当前达到最大并行度,则对所述第二寄存器清零;否则,每次完成子任务的调度后,将所述第二寄存器执行加1处理;
    所述处理模块,具体还用于基于第二更新条件,更新第三寄存器中的数据,所述第三寄存器中的数据表征当前需为所述区间任务占据的区间亲和资源;其中,所述第二更新条件包括:若当前已成功连续调度的次数达到所述第一寄存器中的连续调度次数,或者执行所述将所述待调度任务拆分成多个子任务的步骤后,或者收到针对所述待调度任务的终止调度指令,或者当前达到最大并行度,则清除所述第三寄存器中的数据;否则,每次完成子任务的调度后,从所述第三寄存器中删除该子任务的亲和资源。
  12. 根据权利要求8-11任一项所述的装置,其特征在于,
    所述处理模块,具体还用于若第二寄存器中的数据未达到第一寄存器中的连续调度次数,则确定所述区间任务中未完成调度的子任务;
    所述处理模块,具体还用于从所述未完成调度的子任务中选取待调度的子任务,将所述待调度的子任务调度至所述子任务对应的亲和资源,直至所述区间任务中所有子任务均完成调度。
  13. 根据权利要求8-12任一项所述的装置,其特征在于,
    所述处理模块,具体还用于根据当前的可用资源,按照优先级自高向低,从每个优先级对应的任务中选取至少一个任务作为所述待调度任务,直至当前的可用资源不包括任意任务对应的亲和资源;其中,不同待调度任务对应的亲和资源不同,所述可用资源包括除当前的所有待调度任务对应的亲和资源以外的资源。
  14. 根据权利要求13所述的装置,其特征在于,
    所述处理模块,具体还用于若所述优先级对应的任务中存在亲和资源有重合的不同任务,则将当前所述不同任务中权重最大的任务作为所述待调度任务,并增加所述不同任务中其它任务的权重。
  15. 一种电子设备,其特征在于,包括:处理器,以及与所述处理器通信连接的存储器;
    所述存储器存储计算机执行指令;
    所述处理器执行所述存储器存储的计算机执行指令,以实现如权利要求1-7中任一项所述的方法。
  16. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机执行指令,所述计算机执行指令被处理器执行时用于实现如权利要求1-7中任一项所述的方法。
PCT/CN2023/104517 2022-09-13 2023-06-30 任务调度方法、装置、设备及介质 WO2024055708A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211110283.6 2022-09-13
CN202211110283.6A CN117742901A (zh) 2022-09-13 2022-09-13 任务调度方法、装置、设备及介质

Publications (1)

Publication Number Publication Date
WO2024055708A1 true WO2024055708A1 (zh) 2024-03-21

Family

ID=90274163

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/104517 WO2024055708A1 (zh) 2022-09-13 2023-06-30 任务调度方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN117742901A (zh)
WO (1) WO2024055708A1 (zh)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5761512A (en) * 1995-12-27 1998-06-02 International Business Machines Corporation Automatic client-server complier
US20090300623A1 (en) * 2007-08-17 2009-12-03 Nikhil Bansal Methods and systems for assigning non-continual jobs to candidate processing nodes in a stream-oriented computer system
US20100153963A1 (en) * 2008-12-12 2010-06-17 Subbarao Kakarlamudi Workload management in a parallel database system
CN105260244A (zh) * 2015-10-30 2016-01-20 北京奇艺世纪科技有限公司 一种分布式系统任务调度的方法和装置
US20180115496A1 (en) * 2016-10-21 2018-04-26 Advanced Micro Devices, Inc. Mechanisms to improve data locality for distributed gpus
US20210216557A1 (en) * 2020-01-13 2021-07-15 EMC IP Holding Company LLC Continuous query scheduling and splitting in a cluster-based data storage system
WO2022067531A1 (zh) * 2020-09-29 2022-04-07 深圳大学 一种计算资源感知的任务调度方法
CN114741207A (zh) * 2022-06-10 2022-07-12 之江实验室 一种基于多维度组合并行的gpu资源调度方法和系统
CN114924858A (zh) * 2022-05-27 2022-08-19 中国银行股份有限公司 任务调度方法及装置、存储介质及电子设备

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5761512A (en) * 1995-12-27 1998-06-02 International Business Machines Corporation Automatic client-server complier
US20090300623A1 (en) * 2007-08-17 2009-12-03 Nikhil Bansal Methods and systems for assigning non-continual jobs to candidate processing nodes in a stream-oriented computer system
US20100153963A1 (en) * 2008-12-12 2010-06-17 Subbarao Kakarlamudi Workload management in a parallel database system
CN105260244A (zh) * 2015-10-30 2016-01-20 北京奇艺世纪科技有限公司 一种分布式系统任务调度的方法和装置
US20180115496A1 (en) * 2016-10-21 2018-04-26 Advanced Micro Devices, Inc. Mechanisms to improve data locality for distributed gpus
US20210216557A1 (en) * 2020-01-13 2021-07-15 EMC IP Holding Company LLC Continuous query scheduling and splitting in a cluster-based data storage system
WO2022067531A1 (zh) * 2020-09-29 2022-04-07 深圳大学 一种计算资源感知的任务调度方法
CN114924858A (zh) * 2022-05-27 2022-08-19 中国银行股份有限公司 任务调度方法及装置、存储介质及电子设备
CN114741207A (zh) * 2022-06-10 2022-07-12 之江实验室 一种基于多维度组合并行的gpu资源调度方法和系统

Also Published As

Publication number Publication date
CN117742901A (zh) 2024-03-22

Similar Documents

Publication Publication Date Title
US9104476B2 (en) Opportunistic multitasking of VOIP applications
CN110300328B (zh) 一种视频播放控制方法、装置及可读存储介质
CN109254849B (zh) 应用程序的运行方法及装置
US8856798B2 (en) Mobile computing device activity manager
CN109343902A (zh) 音频处理组件的运行方法、装置、终端及存储介质
CN111597042A (zh) 业务线程运行方法、装置、存储介质及电子设备
JP7100154B2 (ja) プロセッサコアのスケジューリング方法、装置、端末及び記憶媒体
CN115576645B (zh) 一种虚拟处理器调度方法、装置、存储介质及电子设备
CN115237613B (zh) 一种多方安全计算任务调度方法、装置和可读存储介质
CN110413383B (zh) 事件处理方法、装置、终端及存储介质
KR20240010042A (ko) 데이터 액세스 방법, 장치 및 비일시적 컴퓨터 판독가능한 저장 매체
WO2024055708A1 (zh) 任务调度方法、装置、设备及介质
CN117112031A (zh) 一种指令发射方法、装置、电子设备及储存介质
CN116578422A (zh) 资源分配方法和电子设备
CN110730300A (zh) 相机控制方法、装置、存储介质和终端
CN113268325A (zh) 一种调度任务的方法、装置及存储介质
CN113360254A (zh) 任务调度方法及系统
CN113495787A (zh) 资源分配方法、装置、存储介质及电子设备
CN117827709B (zh) 直接内存访问的实现方法、装置、设备及存储介质
CN114253699A (zh) 资源使用控制方法、装置以及存储介质
CN117389750A (zh) 任务加速方法、装置、可读存储介质及芯片
CN116974956A (zh) 表项替换方法、装置、电路、处理器、设备及介质
JP2024521963A (ja) データアクセス方法、装置及び非一時的なコンピュータ可読記憶媒体
CN117806782A (zh) 线程调度方法、装置、电子设备及存储介质
CN113918350A (zh) 垃圾回收方法、装置、存储介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23864445

Country of ref document: EP

Kind code of ref document: A1