WO2024041401A1 - Method and apparatus for processing task, and device and storage medium - Google Patents

Method and apparatus for processing task, and device and storage medium Download PDF

Info

Publication number
WO2024041401A1
WO2024041401A1 PCT/CN2023/112749 CN2023112749W WO2024041401A1 WO 2024041401 A1 WO2024041401 A1 WO 2024041401A1 CN 2023112749 W CN2023112749 W CN 2023112749W WO 2024041401 A1 WO2024041401 A1 WO 2024041401A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
priority
determined
core
tasks
Prior art date
Application number
PCT/CN2023/112749
Other languages
French (fr)
Chinese (zh)
Inventor
孙东旭
朱科潜
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024041401A1 publication Critical patent/WO2024041401A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt

Definitions

  • Embodiments of this application mainly relate to the computer field. More specifically, embodiments of the present application relate to methods, apparatus, devices and storage media for processing tasks.
  • Embodiments of the present application provide a solution for processing tasks.
  • a method of processing a task includes: determining a first task to be executed by a first logical core among physical cores in a processing resource; determining whether the first task is a predetermined priority; if it is determined that the first task is a predetermined priority, determining whether the second logical core executes the second task of the predetermined priority; and if it is determined that the second logical core does not execute the second task of the predetermined priority, allocate a dedicated task including a null instruction to the second logical core.
  • this method can speed up the execution of high-priority tasks in the processor, realize the suppression of lower-priority tasks by high-priority tasks on a single physical core, and eliminate the interference of different priority tasks on the logical core, improving the efficiency of the processor.
  • the execution of high-priority tasks within a single physical core improves the processing efficiency of high-priority tasks and increases the resource utilization of the processor.
  • determining the first task includes: obtaining a task ready queue for the first logical core, where the task ready queue is an ordered queue based on the priority of the task; and obtaining the first task from the task ready queue.
  • obtaining the first task includes: selecting a ready task from the head of the task ready queue according to priority; and determining whether the first logical core is executing the current task; if it is determined that the first logical core is executing the current task, compare The priority of the ready task and the priority of the current task; and if it is determined that the priority of the ready task is higher than the priority of the current task, the ready task is determined as the first task to replace the execution of the current task. In this way, it can be quickly determined whether tasks in the logical core need to be replaced, which increases the probability that high-priority tasks will be processed quickly.
  • the method further includes: obtaining an allocation task assigned to the first logical core and a corresponding priority for the allocation task; and adding the allocation task to the task ready queue based on the corresponding priority.
  • the predetermined priority is a first predetermined priority
  • the method further includes: if it is determined that the first task is a second predetermined priority, causing the first logical core to execute the first task, and the second predetermined priority is lower than First booking priority. In this way, high-priority tasks can be quickly executed and processing efficiency is improved.
  • determining whether the second logical core in the physical core executes a second task of a predetermined priority includes: if it is determined that the first task is of a predetermined priority, determining whether the second logical core is executing the second task; if it is determined that the first task is of a predetermined priority, The second logical core is executing the second task and determines whether the priority of the second task is a predetermined priority. In this way, the efficiency of determining the task priority of another logical core in the physical core can be improved.
  • the physical core is a first physical core
  • the processing resource further includes a second physical core
  • the predetermined priority is a first predetermined priority
  • the method further includes: obtaining performance parameters related to the first task.
  • Baseline value adjust the value assignable to the second object based on the baseline value
  • obtaining the baseline value includes: suppressing the execution of the third task by the second physical check; and determining the baseline value based on suppressing the third task. In this way, baseline performance indicators can be determined quickly and accurately.
  • suppressing the execution of the third task includes: limiting the upper limit of shared resources allocated to the third task; or suspending the execution of the third task. In this way, the baseline performance of high-priority tasks can be obtained quickly and accurately.
  • suppressing the execution of the third task includes: suppressing the execution of the third task by suppressing the second physical check multiple times at predetermined time intervals; and determining the baseline value includes: determining based on the multiple suppressing of the third task. Multiple baseline values.
  • adjusting the shared resources that can be allocated to the third task on the second physical core includes: determining the sensitivity of the first task to the shared resources based on a baseline value of the performance parameter; and adjusting the allocated resources based on the sensitivity. Shared resources for the third task. In this way, it can be accurately determined whether high-priority tasks are resource-sensitive, so that while ensuring high-priority tasks, the execution of low-priority tasks is also ensured.
  • the performance parameters include at least one of the following: cache accesses per thousand instructions, cache miss rate, and memory bandwidth, wherein determining the sensitivity includes at least one of the following: if determining cache accesses per thousand instructions If the amount is lower than the threshold cache access amount or the cache miss rate is greater than the threshold cache miss rate, it is determined that the first task is not cache-sensitive; if it is determined that the cache access amount per thousand instructions is higher than or equal to the threshold cache access amount and the cache miss is If the cache miss rate is less than or equal to the threshold cache miss rate, the first task is determined to be cache sensitive; if the memory bandwidth is determined to be less than the threshold bandwidth, the first task is determined to be insensitive to memory bandwidth; and if the memory bandwidth is determined to be greater than or equal to the threshold bandwidth, the first task is determined to be cache-sensitive. Tasks are sensitive to memory bandwidth. In this way, it is possible to quickly determine which resource a high priority is sensitive to, thereby providing accurate information for resource configuration and improving resource allocation efficiency and accuracy.
  • adjusting the shared resources includes: if it is determined that the first task is not sensitive to the shared resources, increasing the upper limit of the shared resources used for the third task; and if it is determined that the first task is sensitive to the shared resources, dynamically adjusting the shared resources used for the third task.
  • the upper limit of shared resources for three tasks In this way, the configuration of shared resources can be accurately adjusted.
  • dynamically adjusting the upper limit of shared resources includes: obtaining an actual value of a performance parameter related to the first task; and dynamically adjusting the upper limit of shared resources based on the baseline value and the actual value. In this way, the configuration of shared resources can be accurately adjusted.
  • dynamically adjusting the upper limit of shared resources based on the baseline value and the actual value includes: if it is determined that the difference between the baseline value and the actual value exceeds the first threshold, reducing the shared resources allocated to the third task; and if it is determined that the difference between the baseline value and the actual value is lower than a second threshold, increasing the shared resources allocated to the third task, wherein the second threshold is lower than the first threshold.
  • appropriate resources can be accurately configured for high-priority tasks and low-priority tasks. While ensuring the processing efficiency of high-priority tasks, it also ensures the execution of low-priority tasks and improves resource utilization.
  • the shared resources include at least one of last level cache LLC and memory bandwidth. This way, you can determine exactly which shared resources to adjust.
  • an apparatus for processing a task includes: a task determination unit configured to determine a first task to be executed by a first logical core among physical cores in the processing resource; a priority determination unit configured to determine whether the first task is a predetermined priority; an execution determination unit configured to determine whether the second logical core in the physical core executes the second task of the predetermined priority if it is determined that the first task is of a predetermined priority; and an allocation unit configured to determine if the second logical core has not A second task of a predetermined priority is executed, and a dedicated task including a null instruction is assigned to the second logical core.
  • an electronic device comprising: at least one computing unit; at least one memory, the at least one memory being coupled to the at least one computing unit and storing information for use by the at least one Instructions executed by a computing unit which, when executed by the at least one computing unit, cause the device to perform the method according to the first aspect of the application.
  • a computer-readable storage medium is also provided, on which a computer program is stored.
  • the program is executed by a processor, the method according to the first aspect of the present application is implemented.
  • a computer program product including computer-executable instructions, wherein when the computer-executable instructions are executed by a processor, the method according to the first aspect of the present application is implemented.
  • the device of the second aspect, the electronic device of the third aspect, the computer storage medium of the fourth aspect, or the computer program product of the fifth aspect provided above are used to execute the method provided by the first aspect. Therefore, the explanation or explanation regarding the first aspect also applies in the second, third, fourth and fifth aspects.
  • the beneficial effects that can be achieved in the second aspect, the third aspect, the fourth aspect, and the fifth aspect can be referred to the beneficial effects in the corresponding methods, and will not be described again here.
  • Figure 1 illustrates a schematic diagram of an example environment in which various embodiments of the present application can be implemented
  • Figure 2 shows a schematic flow diagram for processing tasks according to some embodiments of the present application
  • Figure 3 shows a schematic diagram of a system for controlling inter-core shared resources according to some embodiments of the present application
  • FIG. 4 illustrates a schematic flow diagram of a process for determining resource sensitivity according to some embodiments of the present application
  • Figure 5 shows a schematic flowchart of a process for dynamically allocating resources according to some embodiments of the present application
  • Figure 6 shows a schematic diagram of an implementation example of a computing device according to some embodiments of the present application.
  • Figure 7 shows a block diagram of an apparatus according to some embodiments of the present application.
  • FIG. 8 illustrates a block diagram of a computing device capable of implementing various embodiments of the present application.
  • a traditional solution is to use a container co-location solution, such as a full-scenario offline co-location solution, which designs a new offline scheduler class based on the Linux system.
  • This scheduler class has a lower priority in the scheduling queue than the default Scheduler class, higher than IDLE scheduler class.
  • designing a new scheduling class increases system complexity and operation and maintenance costs, and cannot reuse features such as load balancing in the current system.
  • Another traditional solution is to classify tasks into delay-sensitive and best-effort types.
  • This solution needs to read data related to user-side processing efficiency in real time, and provide as many resources as possible for best-effort tasks on the premise of meeting the processing efficiency of delay-sensitive applications.
  • these solutions require certain prior knowledge of the task, such as data such as application throughput and cache hit rate under different cache capacities (depending on specific hardware performance and architecture).
  • it since it requires prior knowledge of the task, it can only be limited to specific tasks and cannot be adjusted for various tasks run by the server.
  • the computing device first determines the first task to be performed by the first logical core among the physical cores in the processing resource, and then determines the first Whether the task is high priority. If the computing device determines that the first task is a high-priority task, it determines whether a second logical core in the physical core executes a high-priority second task. If it is determined that the second logical core does not execute the high-priority second task, a dedicated task including a null instruction is allocated to the second logical core. thus making high-priority Tasks exclusively occupy resources within the physical core.
  • the shared resources occupied by low-priority tasks running in other physical cores within the same on-chip resource can also be adjusted to further improve the execution of high-priority tasks.
  • embodiments of the present application can speed up the execution of high-priority tasks in the processor, improve the processing efficiency of high-priority tasks, and improve the resource utilization of the processor.
  • Figure 1 shows a schematic diagram of an example environment 100 in which various embodiments of the present application can be implemented. As shown in FIG. 1 , environment 100 includes computing device 101 .
  • Computing devices 101 include, but are not limited to, personal computers, servers, handheld or laptop devices, mobile devices (such as mobile phones, personal digital assistants (PDAs), media players, etc.), multi-processor systems, consumer electronics, small computers , large computers, distributed computing environments including any of the above systems or devices, etc.
  • mobile devices such as mobile phones, personal digital assistants (PDAs), media players, etc.
  • PDAs personal digital assistants
  • multi-processor systems consumer electronics
  • small computers small computers
  • large computers large computers
  • distributed computing environments including any of the above systems or devices, etc.
  • Computing device 101 is used to handle various tasks from users.
  • the tasks described in this article are tasks processed by computing devices, which can be virtual machines, containers, or a set of processes or a set of threads.
  • users can assign a priority to the service, also called a hard priority.
  • tasks are divided into high-priority tasks and low-priority tasks, such as the high-priority task 102 and the low-priority task 103 shown in FIG. 1 .
  • the priorities of high priority tasks 102 and low priority tasks 103 are pre-specified.
  • Tasks assigned a task priority have a task label indicating the priority level of their task. For example, assign each task a field to store the task label.
  • computing device 101 prioritizes tasks based on their type.
  • the priority of a task is specified by the user.
  • a resource-exclusive virtual machine and a resource-sharing virtual machine are mixedly deployed.
  • a high-priority label is added to the resource-exclusive virtual machine
  • a low-priority label is added to the resource-sharing virtual machine.
  • users can classify tasks into high-priority tasks and low-priority tasks based on whether they are delay-sensitive or not.
  • Computing device 101 also has on-chip resources 107 .
  • On-chip resources 107 refer to the resources set on the chip, including at least physical cores 108 and 109, as well as the last level cache LLC and memory bandwidth MBW, that is, LLC/MBW 114.
  • on-chip resource 107 is a CPU.
  • Figure 1 shows that the on-chip resources include two physical cores 108 and 109, which is only an example and not a specific limitation of the present disclosure.
  • Those skilled in the art can set the number of physical cores included in the on-chip resource 107 as needed.
  • on-chip resources 107 may include one physical core or more than two physical cores.
  • the physical core 108 includes a logical core 110 and a logical core 111
  • the physical core 109 includes a logical core 112 and a logical core 113 .
  • the logical core in Figure 1 is obtained by performing hyper-threading operations on the physical core.
  • the physical core shown in FIG. 1 including two logical cores is only an example and is not a specific limitation of the present disclosure.
  • the number of logical cores in a physical core can be set by those skilled in the art as needed. For example, one physical core includes one or more than two logical cores, and the number of logical cores in two physical cores can be the same or different.
  • the computing device 101 configures a task ready queue for each logical core for storing tasks to be executed by each logical core.
  • the computing device 101 When the computing device 101 receives a task, it will allocate the newly received task to the task ready queue of a logical core according to the load balancing of the logical core and/or the user's configuration of the task.
  • Computing device 101 also includes a CPU scheduling optimization module 104 for scheduling and optimizing execution of high priority tasks.
  • a CPU scheduling optimization module 104 for scheduling and optimizing execution of high priority tasks.
  • tasks assigned to the task ready queue of a logical core will process the tasks in the ready queue in order based on the time slice assigned to the task and the processor time slice, such as using the CPU time slice round-robin scheduling method or the completely fair scheduling method.
  • the CPU scheduling optimization module 104 is used to further adjust the sorting of each task including the priorities in the ready queue and the execution of the tasks.
  • the CPU scheduling optimization module 104 includes a single core suppression module 105 and a logical core isolation module 106 .
  • the single-core suppression module 105 is used to sort tasks including priorities in the ready queue. High-priority tasks are arranged at the front of the task ready queue, while low-priority tasks are arranged at the rear of the task ready queue. If two tasks have the same priority, they will be queued according to the existing time slice scheduling method. Then, through the single-core suppression module 105, tasks with high priority are sent to the logical core for processing first.
  • the logical core isolation module 106 in the CPU optimized scheduling module will further determine whether other logical cores on the same physical core are executing high-priority tasks. For example, if the logical core 110 in the physical core 108 is running a high-priority task, the logical core isolation module 106 will determine whether the logical core 111 on the same physical core is running a high-priority task. Shown in FIG. 1 is an example in which the physical core includes two logical cores. If the physical core includes more than two logical cores, the logical isolation module 106 determines whether one or more other logical cores are running high-priority tasks.
  • the logical core isolation module 106 will pull up an empty instruction task in the logical core 111.
  • the priority of this empty instruction task is higher than low priority and lower than high priority. For example, if other logical cores are running If a low-priority task is executed, the low-priority task will be replaced with an empty instruction task. If other logical cores do not process the task, the empty instruction task is directly pulled up. After an empty instruction task is pulled up on other logical cores, low-priority tasks cannot occupy the other logical cores, thereby ensuring the execution of high-priority tasks. If other logical cores are also running high-priority tasks, no adjustments will be made to other logical cores.
  • the logical core only executes the low-priority task and does not perform task control operations on other logical cores.
  • the logical cores are running high-priority tasks, low-priority tasks running on the logical core will be replaced by empty instruction tasks.
  • computing device 101 optionally also includes an on-chip shared resource manager 115 .
  • the on-chip resource sharing manager 115 is used to obtain the performance index of high-priority tasks executed on one physical core, and then adjust the occupation of shared resources between physical cores by low-priority tasks on other physical cores based on the performance index.
  • the on-chip shared resource manager 115 includes a collector 116 for obtaining performance indicators on high-priority tasks.
  • the on-chip shared resource manager 115 also includes a classifier 117 for determining whether high priority tasks are sensitive to shared resources. Then the resource controller 118 adjusts the occupation of shared resources by low-priority tasks on other physical cores based on whether the high-priority tasks on one physical core are sensitive to the shared resources. For example, if the logical core 110 of the physical core 108 runs a high-priority task, in addition to controlling the logical core 111 in the physical core 108 not to execute low-priority tasks, the on-chip shared resource manager 115 also controls the allocation of resources in the physical core 109 to Shared resources for low priority tasks.
  • the execution of high-priority tasks in the processor can be accelerated, the processing efficiency of high-priority tasks is improved, and the resource utilization of the processor is improved.
  • FIG. 1 A schematic diagram of an example environment 100 in which embodiments of the present application can be implemented is described above in conjunction with FIG. 1 .
  • FIG. 2 A flowchart of a method 200 for processing tasks according to an embodiment of the present disclosure is described below with reference to FIG. 2 .
  • Method 200 may be performed at computing device 101 in FIG. 1 and any suitable computing device.
  • a first task is determined to be performed by a first logical core among the physical cores in the processing resource. For example, computing device 101 determines tasks processed by logical cores 110 within physical cores 108 to assign to the logical cores for execution.
  • the computing device 101 may obtain a task ready queue corresponding to the first logical core, which is an ordered queue based on the priority of the task. Then, the computing device 101 obtains the first task to be executed by the first logical core from the task ready queue.
  • a task ready queue corresponding to the first logical core which is an ordered queue based on the priority of the task. Then, the computing device 101 obtains the first task to be executed by the first logical core from the task ready queue.
  • the computing device 101 when acquiring the first task for execution on the first logical core, the computing device 101 selects the ready task from the head of the task ready queue according to the priority. Since the task ready queue is a sorted queue, the head of the queue stores ready tasks with higher priority. The computing device 101 then also determines whether the first logical core is currently executing the current task. If it is determined that the first logical core has not executed the task, the first logical core executes the ready task. If the first logical core is executing the current task, the priority of the ready task needs to be compared with the priority of the current task. If it is determined that the priority of the ready task is higher than the priority of the current task, the ready task is determined as the first task to replace the execution of the current task.
  • the current task will continue to be executed without executing the low-priority ready task. If it is determined that the priority of the ready task is equal to the priority of the current task, the CPU time slice round-robin scheduling method or the completely fair scheduling method is used for scheduling.
  • the above process is generally performed after a new task is allocated to the task ready queue of the first logical core and a sorting is performed. Through the above method, it can be quickly determined whether tasks in the logical core need to be replaced, which increases the probability that high-priority tasks will be processed quickly.
  • the task to be run will be obtained from the head of the task ready queue when the logical core has processed the CPU time slice of the current task.
  • the computing device 101 when receiving a new task, the computing device 101 obtains the assigned task assigned to the first logical core and the corresponding priority for the assigned task. The computing device then adds the assigned tasks to the task ready queue based on the corresponding priorities. In one example, when the computing device 101 receives a task assigned to the logical core 110, it reorders the tasks in the task ready queue in order from high to low priority. In another example, newly assigned tasks are inserted into a logical queue based on their priority. If the priorities of two tasks are the same, the order of the two tasks is determined according to the commonly used time slice ordering method. Through the above method, the order of newly assigned tasks in the priority queue can be quickly determined, and high-priority tasks can be processed in a timely manner, thereby improving the processing efficiency of high-priority tasks.
  • the first task is a predetermined priority. Due to different task priorities, resources within the same physical core are managed differently. Accordingly, computing device 101 prioritizes tasks for processing within logical cores as they are determined.
  • the predetermined priority is a first predetermined priority, ie, a high priority. If it is determined that the first task is not the scheduled priority, That is, if it is not the first predetermined priority or the high priority, the computing device 101 will determine that the first task is the second predetermined priority, and then the computing device 101 causes the first logical core to execute the first task, in which the second predetermined priority is low. at the first predetermined priority level, that is, low priority level. At this time, the tasks of other logical cores are not adjusted. In this way, the first task can be quickly executed and the processing efficiency is improved.
  • a first predetermined priority ie, a high priority.
  • the computing device 101 determines whether the second logical core in the physical core executes a second task of the predetermined priority. If it is determined at block 203 that the second logical core is executing a second task of a predetermined priority, it indicates that the second logical core is also processing a high-priority task. Therefore, the tasks performed by the second logical core may not be adjusted.
  • the computing device 101 determines whether the second logical core is executing the second task. If the second task is not executed, the second logical core can be directly caused to execute a task including a null instruction.
  • the null instruction task can also be called a dedicated task dedicated to scheduling control. If it is determined that the second logical core is executing the second task, it is determined whether the priority of the second task is a predetermined priority. If it is determined to be a predetermined priority, no further operation is performed. In this way, the efficiency of determining the priority level of the second task can be improved.
  • the second logical core is assigned a dedicated task including a null instruction. At this time, an empty instruction task is launched in the second logical core. Therefore, the high-priority tasks running in the first logical core occupy more core resources for execution.
  • embodiments of the present application can speed up the execution of high-priority tasks in the processor, realize the suppression of lower-priority tasks by high-priority tasks on a single physical core, and eliminate the need for tasks of different priorities to overlap in the logical core.
  • the interference on the CPU improves the execution of high-priority tasks within a single physical core and improves the processing efficiency of high-priority tasks.
  • FIG. 2 A schematic flowchart for processing tasks according to some embodiments of the present application is described above in conjunction with FIG. 2 .
  • An example process for handling inter-core shared resources among multiple physical cores to further speed up the execution of high-priority tasks is described below.
  • the processing resources may include multiple physical cores.
  • the computing device 101 first obtains the baseline value of the performance parameter related to the first task.
  • Shared resources allocable to a third task on the second physical core are then adjusted based on the baseline value, the third task having a second predetermined priority lower than the first predetermined priority.
  • This process is further described below in conjunction with Figure 3.
  • FIG 3 depicts a schematic diagram of a system for controlling inter-core shared resources according to some embodiments of the present application.
  • the system 300 is used to control the allocation of shared resources between cores, which may be the on-chip shared resource manager 115 in Figure 1 .
  • the system 300 includes a data collector 301, a resource sensitivity classifier 302 and a resource controller 303.
  • the data collector 301 is used to collect high-priority task microarchitecture performance indicators, including but not limited to the number of instructions executed per unit time, instruction cycles, cache misses, cache accesses, memory bandwidth, and pipeline back-end memory constraints, etc. Relevant basic indicators. Based on these indicators, we can further calculate the number of cache misses per thousand instructions (Cache Misses Per Kilo Instructions, MPKI), the cache accesses per thousand instructions, such as the last level cache accesses per thousand instructions (LLC Accesses) Per Kilo Instructions (APKI), Cache Misses Rate (CMR), instructions per cycle (IPC) and other complex indicators. The data collector also obtains the memory bandwidth MBW allocated to high-priority tasks. For example, if a high-priority task runs on the first physical core among multiple physical cores, the data collector 301 will collect performance indicators corresponding to the high-priority task.
  • the baseline performance of a high-priority task refers to the performance index value of executing the high-priority task after suppressing low-priority tasks on other physical cores.
  • the computing device suppresses the execution of one or more low-priority tasks by one or more other physical cores.
  • the computing device determines the baseline value based on the suppression of low-priority tasks. If low-priority tasks on other physical cores are not suppressed, the normal operating performance index value of the high-priority task will be obtained. In this way, baseline performance indicators can be determined quickly and accurately.
  • the first physical core 108 in Figure 1 runs a high-priority task, suppresses the execution of the low-priority third task on the second physical core 109, and then obtains baseline performance for high-priority services. Additionally, when the execution of the low-priority third task on the second physical core 109 is suppressed, the execution of the third task by the second physical core 109 is suppressed multiple times at predetermined time intervals. Multiple baseline values are then determined based on multiple suppressions of the third task. The baseline value for the indicator can then be determined based on averaging multiple values for the same indicator over multiple suppressions or performing various suitable processing.
  • Low-priority tasks on other cores are periodically suppressed as shown in block 304 in Figure 3 .
  • the suppression process includes two methods. One is, as shown in block 305, to limit the upper limit of shared resources allocated to low-priority tasks. By suppressing in this way, the trigger frequency can be set to hundreds of milliseconds. Another way, as shown in block 306, is to suspend the execution of low-priority tasks. This method can accurately obtain high-priority tasks without low-priority tasks.
  • the baseline performance under task interference is good, but it is not friendly to low-priority tasks, so the trigger frequency can be set to the second level.
  • System 300 also includes a resource sensitivity classifier 302 that determines whether a task is sensitive to or sensitive to shared resources based on the obtained baseline values of the performance parameters. For example, it is determined whether the high-priority task running on the first physical core 108 in FIG. 1 is sensitive to shared resources.
  • computing device 101 determines the sensitivity of the first task to the shared resource based on the baseline value of the performance parameter. As shown in block 307, it is determined whether sensitivity to the shared resource is based on the baseline value of the performance parameter.
  • the resource controller 303 then adjusts the shared resources that can be allocated to low-priority tasks on other cores based on the sensitivity.
  • the first task is not sensitive to shared resources
  • statically allocate shared resources for low-priority tasks on other cores for example, add resources for low-priority tasks on other cores.
  • the upper limit of shared resources Since the first task is not sensitive to shared resources, more shared resources can be allocated to low-priority tasks. At this time, the amount of low-priority resources allocated to other cores is increased according to a predetermined strategy. If it is determined that the first task is sensitive to shared resources, at block 309, an upper limit of shared resources for low priority tasks on other cores is dynamically allocated.
  • the shared resources include at least one of last level cache LLC and memory bandwidth. This way you can determine exactly which shared resources to adjust. The process of dynamically configuring shared resources will be described in conjunction with Figure 5.
  • the computing device obtains performance parameters for a high priority first task.
  • the performance parameters include at least one of the following: cache accesses per thousand instructions APKI, cache miss rate CMR, and memory bandwidth MBW.
  • APKI and CMR parameters are determined. If it is determined that the cache access amount APKI per thousand instructions is lower than the threshold cache access amount A or the cache miss rate CMR is greater than the threshold cache miss rate B, then it is determined at block 404 that the high-priority first task is not sensitive to cache resources. , such as LLC is not sensitive. If it is determined that the cache access amount APKI per thousand instructions is higher than or equal to the threshold cache access amount A and the cache miss rate CMR is less than or equal to the threshold cache miss rate B, a high-priority first task pair cache is determined at block 405 Resources LLC is sensitive.
  • the memory bandwidth MBW is compared with the threshold C. If it is determined that the memory bandwidth MBW is less than the threshold C, then at block 406 it is determined that the high-priority first task is not sensitive to the memory bandwidth. If it is determined that the memory bandwidth MBW is greater than or equal to the threshold C, then at block 407 it is determined that the high priority first task is sensitive to the memory bandwidth MBW.
  • the setting is not sensitive to a type of resource, wait for the condition to be established multiple times in succession to take effect, and prevent ping-pong phenomenon, which refers to the difference between different results due to different results of each test. Changes back and forth, causing resource allocation to change repeatedly.
  • the computing device obtains the baseline performance of the high-priority first task running on the first core and the actual value of the performance parameter of the first task that is not suppressed. Then dynamically adjust the upper limit of shared resources based on the baseline value and actual value. In this way, the configuration of shared resources can be accurately adjusted. In some embodiments, if it is determined that the difference between the baseline value and the actual value exceeds the first threshold, the computing device reduces the shared resources allocated to the low-priority third task running on other cores.
  • the shared resources allocated to the third task are increased, wherein the second threshold is lower than the first threshold.
  • Figure 5 shows a schematic flowchart of a process for dynamically allocating resources according to some embodiments of the present application.
  • the initial state is set and certain shared resources are allocated to it, for example, the LLC is set to 2-way (way) and the memory bandwidth is 10%.
  • the LLC is set to 2-way (way) and the memory bandwidth is 10%.
  • the computing device compares the actual value of the collected indicator for the high-priority first task with the baseline performance, for example, compares the actual value of the performance indicator APKI or MBW with the baseline value.
  • the collected indicator is worse than the baseline performance, for example, the difference between the two exceeds the threshold E, then at block 508, reduce the resources allocated to low-priority tasks, for example, halve the resources allocated to low-priority tasks.
  • the resources of the task may be reduced by a certain percentage.
  • block 505 is optionally included. The above process may be compared continuously multiple times, and operation 508 is performed only when a threshold number of differences exceeds the threshold E in multiple comparisons.
  • block 503 if the actual value of the collected indicator is close to the baseline performance, for example, the difference between the two values is less than the threshold F, that is, If the performance degradation is less than the threshold F, then increase the resources of the low-priority task at block 509.
  • block 506 is optionally included.
  • the above process may be compared multiple times continuously, and operation 509 is performed only if the difference of a threshold number of times among the multiple comparisons is less than the threshold F.
  • situations that do not satisfy the above two conditions are determined as other situations.
  • the original configuration remains unchanged.
  • the low-priority task can be Limit the CPU bandwidth of high-level tasks (that is, limit the upper limit of CPU resources). If the performance degradation is still serious after the limit, you can request the cluster center to migrate low-priority tasks.
  • Computing device 601 includes task layer 601, software layer 605, and hardware layer 613.
  • the task layer 601 is used to obtain tasks and configure the tasks into high-priority tasks 602 and low-priority tasks 604 .
  • the priority of the task is implemented through priority tag 603.
  • the software layer 605 includes a CPU scheduling optimization module 606, which is used to manage resources within a physical core, such as prioritizing high-priority tasks and isolating logical cores.
  • the CPU scheduling optimization module 606 includes a single core suppression module 608 and a logical core isolation module 609 to implement the above functions.
  • the software layer 605 also includes an on-chip shared resource management module 607, which is used to manage inter-core shared resources on the chip.
  • the on-chip shared resource management module 607 includes a data collection module 610, a resource sensitivity classifier 611 and a resource controller 612, for adjusting shared resources between cores.
  • the hardware layer includes on-chip resources 614 and Resource Director Technology (RDT)/Memory System Resource Partitioning and Monitoring (MPAM) to implement task execution.
  • On-chip resource 614 may be a CPU.
  • the on-chip resources 614 also include a performance monitoring unit 616.
  • the performance monitoring unit 616 and RDT/MPAM are used to provide performance parameters to the data acquisition module 610.
  • FIG. 7 further shows a block diagram of an apparatus 700 for processing a task according to an embodiment of the present application.
  • the apparatus 700 may include a plurality of modules for performing corresponding steps in the process 200 discussed in FIG. 2 .
  • the apparatus 700 includes a task determining unit 701 configured to determine a first task to be executed by a first logical core among physical cores in the processing resource; a priority determining unit 702 configured to determine the first whether the task is of a predetermined priority level; the execution determination unit 703 is configured to, if it is determined that the first task is of a predetermined priority level, determine whether the second logical core in the physical core executes the second task of the predetermined priority level; and the allocation unit 704 is It is configured to allocate a dedicated task including a null instruction to the second logical core if it is determined that the second logical core does not execute the second task of the predetermined priority.
  • the task determining unit 701 includes: a ready queue determining unit configured to obtain a task ready queue for the first logical core, where the task ready queue is an ordered queue based on the priority of the task; the first task obtains Unit, configured to obtain the first task from the task ready queue.
  • the first task acquisition unit includes: a selection unit configured to select a ready task from the head of the task ready queue according to priority; an execution determination unit configured to determine whether the first logical core is executing current task.
  • the priority comparison unit is configured to compare the priority of the ready task with the priority of the current task if it is determined that the first logical core is executing the current task.
  • the replacement unit is configured to determine the ready task as the first task for replacing execution of the current task if it is determined that the priority of the ready task is higher than the priority of the current task.
  • the apparatus 700 further includes: a task acquisition unit configured to acquire an allocation task allocated to the first logical core and a corresponding priority for the allocation task; and an adding unit configured to obtain the allocation task based on the corresponding priority. Add the assigned task to the task ready queue.
  • the apparatus 700 further includes: a first task execution unit configured to cause the first logical core to execute the first task if it is determined that the first task is the second predetermined priority. A task with a second predetermined priority lower than the first predetermined priority.
  • the execution determination unit includes: a second task execution determination unit configured to determine whether the second logical core is executing the second task if it is determined that the first task is of a predetermined priority; a second task priority determination unit , configured to determine whether the priority of the second task is a predetermined priority if it is determined that the second logical core is executing the second task.
  • the apparatus 700 further includes: a baseline value acquisition unit configured to obtain the a baseline value of a task-related performance parameter; a shared resource adjustment unit configured to adjust shared resources that can be allocated to a third task on the second physical core based on the baseline value, and the third task has a value lower than A second predetermined priority level to the first predetermined priority level.
  • the baseline value acquisition unit includes: a suppression unit configured to suppress the execution of the third task by the second physical check; and a baseline value determination unit configured to determine the baseline value based on the suppression of the third task. .
  • the suppressing unit includes a limiting unit configured to limit an upper limit of shared resources allocated to the third task; or a suspending unit configured to suspend execution of the third task.
  • the suppressing unit includes suppressing the execution of the third task by suppressing the second physical check multiple times at predetermined time intervals; and the baseline value determining unit includes determining multiple baseline values based on suppressing the third task multiple times.
  • the shared resource adjustment unit includes: a sensitivity determining unit configured to determine the sensitivity of the first task to the shared resource based on a baseline value of the performance parameter.
  • the allocable resource adjustment unit is configured to adjust shared resources that can be allocated to the third task based on sensitivity.
  • the performance parameter includes at least one of the following: cache accesses per thousand instructions, cache miss rate, and memory bandwidth
  • the sensitivity determination unit is configured to perform at least one of the following: if it is determined that per thousand instructions If the cache access amount of the instruction is lower than the threshold cache access amount or the cache miss rate is greater than the threshold cache miss rate, it is determined that the first task is not cache-sensitive; if it is determined that the cache access amount per thousand instructions is higher than or equal to the threshold cache access amount and the cache miss rate is less than or equal to the threshold cache miss rate, the first task is determined to be cache sensitive; if the memory bandwidth is determined to be less than the threshold bandwidth, the first task is determined to be insensitive to the memory bandwidth; and if the memory bandwidth is determined to be greater than or equal to the threshold bandwidth , determine that the first task is sensitive to memory bandwidth.
  • the allocable resource adjustment unit includes: a first increasing unit configured to increase the upper limit of shared resources for the third task if it is determined that the first task is not sensitive to the shared resources; and a dynamic adjustment unit configured to increase the upper limit of the shared resources for the third task. It is configured to dynamically adjust the upper limit of shared resources used for the third task if it is determined that the first task is sensitive to the shared resources.
  • the dynamic adjustment unit includes: an actual value acquisition unit configured to acquire an actual value of a performance parameter related to the first task; and an upper limit adjustment unit configured to dynamically adjust the performance parameter based on the baseline value and the actual value. Adjust the upper limit of shared resources.
  • the upper limit adjustment unit includes: a reduction unit configured to reduce the shared resource allocated to the third task if it is determined that the difference between the baseline value and the actual value exceeds the first threshold; and a second increase unit , configured to increase the shared resource allocated to the third task if it is determined that the difference between the baseline value and the actual value is lower than a second threshold, wherein the second threshold is lower than the first threshold.
  • the shared resource includes at least one of last level cache LLC and memory bandwidth.
  • FIG. 8 illustrates a schematic block diagram of an example device 800 that may be used to implement embodiments of the present disclosure.
  • computing devices 101 and 601 may be implemented by example device 800.
  • the device 800 includes a central processing unit (CPU) 801 that can operate on a computer in accordance with computer program instructions stored in a read-only memory (ROM) 802 or loaded from a storage unit 808 into a random access memory (RAM) 803 Program instructions to perform various appropriate actions and processes.
  • ROM 802 read-only memory
  • RAM 803 random access memory
  • RAM 803 various programs and data required for the operation of the device 800 can also be stored.
  • CPU 801, ROM 802 and RAM 803 are connected to each other via bus 804.
  • An input/output (I/O) interface 805 is also connected to bus 804.
  • the I/O interface 805 includes: an input unit 806, such as a keyboard, a mouse, etc.; an output unit 807, such as various types of displays, speakers, etc.; a storage unit 808, such as a magnetic disk, optical disk, etc. ; and communication unit 809, such as a network card, modem, wireless communication transceiver, etc.
  • the communication unit 809 allows the device 800 to exchange information/data with other devices through computer networks such as the Internet and/or various telecommunications networks.
  • processes 200, 400, and 500 may be performed by processing unit 801.
  • processes 200, 400, and 500 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808.
  • part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809.
  • ROM 802 and/or communication unit 809 When a computer program is loaded into RAM 803 and executed by CPU 801, one or more actions of processes 200, 400, and 500 described above may be performed.
  • the application may be a method, device, system, chip and/or computer program product.
  • the chip may include a processing unit and a communication interface, and the processing unit may process program instructions received from the communication interface.
  • a computer program product may include a computer-readable storage medium having thereon computer-readable program instructions for performing various aspects of the present application.
  • Computer-readable storage media may be tangible devices that can retain and store instructions for use by an instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above.
  • Non-exhaustive list of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) or flash memory), static State-of-the-art random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory sticks, floppy disks, mechanical coding devices such as punched cards or recessed cards with instructions stored on them The protruding structure in the groove, and any suitable combination of the above.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • flash memory static State-of-the-art random access memory
  • SRAM static State-of-the-art random access memory
  • CD-ROM compact disk read-only memory
  • DVD digital versatile disk
  • memory sticks floppy disks
  • mechanical coding devices such as punched cards or recessed cards with instructions stored on them
  • the protruding structure in the groove and any suitable combination
  • computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or through electrical wires. transmitted electrical signals.
  • Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to various computing/processing devices, or to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage on a computer-readable storage medium in the respective computing/processing device .
  • Computer program instructions for performing the operations of this application may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages.
  • the computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through the Internet). connect).
  • LAN local area network
  • WAN wide area network
  • an external computer such as an Internet service provider through the Internet. connect
  • an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA)
  • the electronic circuit can Computer readable program instructions are executed to implement various aspects of the application.
  • These computer-readable program instructions may be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus, thereby producing a machine such that the instructions, when executed by a processing unit of the computer or other programmable data processing apparatus, , resulting in an apparatus that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium. These instructions cause the computer, programmable data processing device and/or other equipment to work in a specific manner. Therefore, the computer-readable medium storing the instructions includes An article of manufacture that includes instructions that implement aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
  • Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other equipment, causing a series of operating steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executed on a computer, other programmable data processing apparatus, or other equipment to implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that embody one or more elements for implementing the specified logical function(s).
  • Executable instructions may occur out of the order noted in the figures. For example, two consecutive blocks may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or acts. , or can be implemented using a combination of specialized hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The present application relates to the technical field of task management. Provided are a method and apparatus for processing a task, and a device, a storage medium and a program product. The method comprises: determining a first task to be executed by a first logical core in a physical core in a processing resource, and determining whether the first task has a predetermined priority; if it is determined that the first task has the predetermined priority, determining whether a second logical core in the physical core executes a second task of the predetermined priority; and if it is determined that the second logical core does not execute the second task of the predetermined priority, assigning, to the second logical core, a dedicated task which comprises a no-operation instruction. The embodiments of the present application can accelerate the execution of a high-priority task in a processor, thereby improving the processing efficiency of the high-priority task and increasing the resource utilization rate of the processor.

Description

用于处理任务的方法、装置、设备和存储介质Methods, devices, equipment and storage media for processing tasks
相关申请的交叉引用Cross-references to related applications
本申请要求申请号为202211021692.9,题为“用于处理任务的方法、装置、设备和存储介质”、申请日为2022年8月24日的中国发明专利申请的优先权,通过引用的方式将该申请整体并入本文。This application claims priority to the Chinese invention patent application with application number 202211021692.9, entitled "Methods, devices, equipment and storage media for processing tasks" and the filing date is August 24, 2022, which is incorporated by reference. The application is incorporated herein in its entirety.
技术领域Technical field
本申请的实施例主要涉及计算机领域。更具体地,本申请的实施例涉及用于处理任务的方法、装置、设备和存储介质。Embodiments of this application mainly relate to the computer field. More specifically, embodiments of the present application relate to methods, apparatus, devices and storage media for processing tasks.
背景技术Background technique
随着计算机技术和通信技术的快速进步,人们越来越多的依靠网络和计算机处理各种任务。因此,数据量有了爆发性的增长。为了对这些数据进行管理,出现了越来越多的数据中心。这些数据中心使用配置的服务器,结合网络基础设施来传递、加速、展示、计算、存储用户或客户的各种数据。With the rapid advancement of computer technology and communication technology, people increasingly rely on networks and computers to handle various tasks. Therefore, the amount of data has grown explosively. In order to manage this data, more and more data centers have appeared. These data centers use configured servers combined with network infrastructure to transmit, accelerate, display, calculate, and store various data of users or customers.
数据中心的发展经历了多个阶段,从开始的实现数据存储阶段发展到数据处理阶段。现在随着云技术的发展,又进入了云数据中心发展阶段。随着数据中心的快速发展,数据中心中布置的服务器的数量越来越多。然而,在使用这些服务器服务于客户的过程中,还存在许多需要解决的问题。The development of data centers has gone through multiple stages, from the initial stage of data storage to the stage of data processing. Now with the development of cloud technology, it has entered the development stage of cloud data center. With the rapid development of data centers, the number of servers deployed in data centers is increasing. However, there are still many problems that need to be solved in the process of using these servers to serve customers.
发明内容Contents of the invention
本申请的实施例提供了一种用于处理任务的方案。Embodiments of the present application provide a solution for processing tasks.
根据本申请的第一方面,提供了一种处理任务的方法。该方法包括:确定要由处理资源中的物理核中的第一逻辑核执行的第一任务;确定第一任务是否是预定优先级;如果确定第一任务是预定优先级,确定物理核中的第二逻辑核是否执行预定优先级的第二任务;以及如果确定第二逻辑核未执行预定优先级的第二任务,向第二逻辑核分配包括空指令的专用任务。According to a first aspect of the application, a method of processing a task is provided. The method includes: determining a first task to be executed by a first logical core among physical cores in a processing resource; determining whether the first task is a predetermined priority; if it is determined that the first task is a predetermined priority, determining whether the second logical core executes the second task of the predetermined priority; and if it is determined that the second logical core does not execute the second task of the predetermined priority, allocate a dedicated task including a null instruction to the second logical core.
通过该方式,能够加快高优先级任务在处理器内的执行,实现高优先级任务对较低优先级任务在单物理核上的压制以及消除不同优先级任务在逻辑核上的干扰,提高了高优先级任务在单个物理核内的执行,改进了高优先级任务的处理效率,增加了处理器的资源利用率。Through this method, it can speed up the execution of high-priority tasks in the processor, realize the suppression of lower-priority tasks by high-priority tasks on a single physical core, and eliminate the interference of different priority tasks on the logical core, improving the efficiency of the processor. The execution of high-priority tasks within a single physical core improves the processing efficiency of high-priority tasks and increases the resource utilization of the processor.
在一些实施例中,确定第一任务包括:获取针对第一逻辑核的任务就绪队列,任务就绪队列是基于任务的优先级的有序队列;从任务就绪队列获得第一任务。通过该方式,可以快速准确的获取到要由逻辑核处理的任务,减少了获取高优先级任务时间,提高了处理效率。In some embodiments, determining the first task includes: obtaining a task ready queue for the first logical core, where the task ready queue is an ordered queue based on the priority of the task; and obtaining the first task from the task ready queue. Through this method, tasks to be processed by the logic core can be quickly and accurately obtained, which reduces the time for obtaining high-priority tasks and improves processing efficiency.
在一些实施例中,获得第一任务包括:根据优先级从任务就绪队列的头部选取就绪任务;以及确定第一逻辑核是否正在执行当前任务;如果确定第一逻辑核正在执行当前任务,比较就绪任务的优先级和当前任务的优先级;以及如果确定就绪任务的优先级高于当前任务的优先级,将就绪任务确定为第一任务以用于替换当前任务的执行。通过该方式,可以快速地确定出是否需要替换逻辑核内的任务,提高了高优先级任务被快速处理的机率。In some embodiments, obtaining the first task includes: selecting a ready task from the head of the task ready queue according to priority; and determining whether the first logical core is executing the current task; if it is determined that the first logical core is executing the current task, compare The priority of the ready task and the priority of the current task; and if it is determined that the priority of the ready task is higher than the priority of the current task, the ready task is determined as the first task to replace the execution of the current task. In this way, it can be quickly determined whether tasks in the logical core need to be replaced, which increases the probability that high-priority tasks will be processed quickly.
在一些实施例中,该方法还包括:获取被分配给第一逻辑核的分配任务和针对分配任务的对应优先级;以及基于对应优先级来将分配任务添加到任务就绪队列中。通过该方式,可以快速的确定新分配的任务在优先级队列中的顺序,也使得高优先级任务能够被及时处理,提高了高优先级任务的处理效率。In some embodiments, the method further includes: obtaining an allocation task assigned to the first logical core and a corresponding priority for the allocation task; and adding the allocation task to the task ready queue based on the corresponding priority. Through this method, the order of newly assigned tasks in the priority queue can be quickly determined, and high-priority tasks can be processed in a timely manner, which improves the processing efficiency of high-priority tasks.
在一些实施例中,预定优先级是第一预定优先级,该方法还包括:如果确定第一任务是第二预定优先级,使得第一逻辑核执行第一任务,第二预定优先级低于第一预定优先级。通过该方式,可以快速的执行高优先级任务,提高了处理效率。In some embodiments, the predetermined priority is a first predetermined priority, and the method further includes: if it is determined that the first task is a second predetermined priority, causing the first logical core to execute the first task, and the second predetermined priority is lower than First booking priority. In this way, high-priority tasks can be quickly executed and processing efficiency is improved.
在一些实施例中,确定物理核中的第二逻辑核是否执行预定优先级的第二任务包括:如果确定第一任务是预定优先级,确定第二逻辑核是否正在执行第二任务;如果确定第二逻辑核正在执行第二任务,确定第二任务的优先级是否是预定优先级。通过该方式,可以提高判定物理核中另一逻辑核的任务优先级的效率。In some embodiments, determining whether the second logical core in the physical core executes a second task of a predetermined priority includes: if it is determined that the first task is of a predetermined priority, determining whether the second logical core is executing the second task; if it is determined that the first task is of a predetermined priority, The second logical core is executing the second task and determines whether the priority of the second task is a predetermined priority. In this way, the efficiency of determining the task priority of another logical core in the physical core can be improved.
在一些实施例中,其特征在于物理核是第一物理核,处理资源还包括第二物理核,预定优先级是第一预定优先级,方法还包括:获取与第一任务有关的性能参数的基线值;基于基线值来调整可分配给第二物 理核上的第三任务的共享资源,第三任务具有低于第一预定优先级的第二预定优先级。通过该方式,可以为高优先任务先分配更多的核间共享资源,加快了高优先级任务的执行,提高了高优先级任务的处理效率和处理效率。In some embodiments, the physical core is a first physical core, the processing resource further includes a second physical core, the predetermined priority is a first predetermined priority, and the method further includes: obtaining performance parameters related to the first task. Baseline value; adjust the value assignable to the second object based on the baseline value A shared resource of a third task on the management core, the third task having a second predetermined priority level lower than the first predetermined priority level. Through this method, more inter-core shared resources can be allocated to high-priority tasks first, speeding up the execution of high-priority tasks, and improving the processing efficiency and processing efficiency of high-priority tasks.
在一些实施例中,获取基线值包括:压制第二物理核对第三任务的执行;以及基于对第三任务的压制,确定基线值。通过该方式,可以快速准确的确定基线性能指标。In some embodiments, obtaining the baseline value includes: suppressing the execution of the third task by the second physical check; and determining the baseline value based on suppressing the third task. In this way, baseline performance indicators can be determined quickly and accurately.
在一些实施例中,压制第三任务的执行包括:限制被分配给第三任务的共享资源的上限;或者暂停第三任务的执行。通过该方式,可以快速准确地获取到高优先级任务的基线性能。In some embodiments, suppressing the execution of the third task includes: limiting the upper limit of shared resources allocated to the third task; or suspending the execution of the third task. In this way, the baseline performance of high-priority tasks can be obtained quickly and accurately.
在一些实施例中,压制第三任务的执行包括:以预定的时间间隔多次压制第二物理核对第三任务的执行;并且确定基线值包括:基于对第三任务的多次压制,来确定多个基线值。通过多次上述操作,避免出现任务的震荡或乒乓现象。In some embodiments, suppressing the execution of the third task includes: suppressing the execution of the third task by suppressing the second physical check multiple times at predetermined time intervals; and determining the baseline value includes: determining based on the multiple suppressing of the third task. Multiple baseline values. By performing the above operations multiple times, you can avoid task oscillation or ping-pong phenomena.
在一些实施例中,调整可分配给第二物理核上的第三任务的共享资源包括:基于性能参数的基线值,确定第一任务对共享资源的敏感性;以及基于敏感性,调整可分配给第三任务的共享资源。通过该方式,可以准确的确定出高优先级任务是否对资源敏感,使得在保证高优先级任务的同时,也保证了低优先级任务的执行。In some embodiments, adjusting the shared resources that can be allocated to the third task on the second physical core includes: determining the sensitivity of the first task to the shared resources based on a baseline value of the performance parameter; and adjusting the allocated resources based on the sensitivity. Shared resources for the third task. In this way, it can be accurately determined whether high-priority tasks are resource-sensitive, so that while ensuring high-priority tasks, the execution of low-priority tasks is also ensured.
在一些实施例中,性能参数包括以下至少一项:每千条指令的缓存访问量、缓存未命中率和存储器带宽,其中确定敏感性包括以下至少一项:如果确定每千条指令的缓存访问量低于阈值缓存访问量或者缓存未命中率大于阈值缓存未命中率,确定第一任务对缓存不敏感;如果确定每千条指令的缓存访问量高于或等于阈值缓存访问量并且缓存未命中率小于或等于阈值缓存未命中率,确定第一任务对缓存敏感;如果确定存储器带宽小于阈值带宽,确定第一任务对存储器带宽不敏感;以及如果确定存储器带宽大于或等于阈值带宽,确定第一任务对存储器带宽敏感。通过该方式,可以快速的确定出高优先级对哪种资源敏感,从而为资源的配置提供准确的信息,提高了资源分配效率和准确性。In some embodiments, the performance parameters include at least one of the following: cache accesses per thousand instructions, cache miss rate, and memory bandwidth, wherein determining the sensitivity includes at least one of the following: if determining cache accesses per thousand instructions If the amount is lower than the threshold cache access amount or the cache miss rate is greater than the threshold cache miss rate, it is determined that the first task is not cache-sensitive; if it is determined that the cache access amount per thousand instructions is higher than or equal to the threshold cache access amount and the cache miss is If the cache miss rate is less than or equal to the threshold cache miss rate, the first task is determined to be cache sensitive; if the memory bandwidth is determined to be less than the threshold bandwidth, the first task is determined to be insensitive to memory bandwidth; and if the memory bandwidth is determined to be greater than or equal to the threshold bandwidth, the first task is determined to be cache-sensitive. Tasks are sensitive to memory bandwidth. In this way, it is possible to quickly determine which resource a high priority is sensitive to, thereby providing accurate information for resource configuration and improving resource allocation efficiency and accuracy.
在一些实施例中,调整共享资源包括:如果确定第一任务对共享资源不敏感,增加用于第三任务的共享资源的上限;以及如果确定第一任务对共享资源敏感,动态调整用于第三任务的共享资源的上限。通过该方式,可以准确的对共享资源的配置进行调整。In some embodiments, adjusting the shared resources includes: if it is determined that the first task is not sensitive to the shared resources, increasing the upper limit of the shared resources used for the third task; and if it is determined that the first task is sensitive to the shared resources, dynamically adjusting the shared resources used for the third task. The upper limit of shared resources for three tasks. In this way, the configuration of shared resources can be accurately adjusted.
在一些实施例中,动态调整共享资源的上限包括:获取与第一任务有关的性能参数的实际值;以及基于基线值和实际值,动态地调整共享资源的上限。通过该方式,可以准确的对共享资源的配置进行调整。In some embodiments, dynamically adjusting the upper limit of shared resources includes: obtaining an actual value of a performance parameter related to the first task; and dynamically adjusting the upper limit of shared resources based on the baseline value and the actual value. In this way, the configuration of shared resources can be accurately adjusted.
在一些实施例中,基于基线值和实际值,动态地调整共享资源的上限包括:如果确定基线值和实际值之间的差值超过第一阈值,减少被分配给第三任务的共享资源;以及如果确定基线值和实际值之间的差值低于第二阈值,增加被分配给第三任务的共享资源,其中第二阈值低于第一阈值。通过该方式,可以准确地为高优先级任务和低优先级任务配置合适的资源,在保证高优先级任务的处理效率的同时,还保证了低优先级任务的执行,提高了资源利用率。In some embodiments, dynamically adjusting the upper limit of shared resources based on the baseline value and the actual value includes: if it is determined that the difference between the baseline value and the actual value exceeds the first threshold, reducing the shared resources allocated to the third task; and if it is determined that the difference between the baseline value and the actual value is lower than a second threshold, increasing the shared resources allocated to the third task, wherein the second threshold is lower than the first threshold. In this way, appropriate resources can be accurately configured for high-priority tasks and low-priority tasks. While ensuring the processing efficiency of high-priority tasks, it also ensures the execution of low-priority tasks and improves resource utilization.
在一些实施例中,共享资源包括最后一级缓存LLC和存储器带宽中的至少一项。通过该方式,可以准确地确定出要调整哪些共享资源。In some embodiments, the shared resources include at least one of last level cache LLC and memory bandwidth. This way, you can determine exactly which shared resources to adjust.
根据本申请的第二方面,提供了一种用于处理任务的装置。该装置包括:任务确定单元,被配置为确定要由处理资源中的物理核中的第一逻辑核执行的第一任务;优先级确定单元,被配置为确定第一任务是否是预定优先级;执行确定单元,被配置为如果确定第一任务是预定优先级,确定物理核中的第二逻辑核是否执行预定优先级的第二任务;以及分配单元,被配置为如果确定第二逻辑核未执行预定优先级的第二任务,向第二逻辑核分配包括空指令的专用任务。According to a second aspect of the application, an apparatus for processing a task is provided. The apparatus includes: a task determination unit configured to determine a first task to be executed by a first logical core among physical cores in the processing resource; a priority determination unit configured to determine whether the first task is a predetermined priority; an execution determination unit configured to determine whether the second logical core in the physical core executes the second task of the predetermined priority if it is determined that the first task is of a predetermined priority; and an allocation unit configured to determine if the second logical core has not A second task of a predetermined priority is executed, and a dedicated task including a null instruction is assigned to the second logical core.
根据本申请的第三方面,还提供了一种电子设备,包括:至少一个计算单元;至少一个存储器,所述至少一个存储器被耦合到所述至少一个计算单元并且存储用于由所述至少一个计算单元执行的指令,所述指令当由所述至少一个计算单元执行时,使得所述设备执行根据本申请的第一方面所述的方法。According to a third aspect of the present application, there is also provided an electronic device, comprising: at least one computing unit; at least one memory, the at least one memory being coupled to the at least one computing unit and storing information for use by the at least one Instructions executed by a computing unit which, when executed by the at least one computing unit, cause the device to perform the method according to the first aspect of the application.
根据本申请的第四方面,还提供了一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现根据本申请的第一方面所述的方法。According to a fourth aspect of the present application, a computer-readable storage medium is also provided, on which a computer program is stored. When the program is executed by a processor, the method according to the first aspect of the present application is implemented.
根据本申请的第五方面,还提供了一种计算机程序产品,包括计算机可执行指令,其中所述计算机可执行指令被处理器执行时实现根据本申请的第一方面所述的方法。According to the fifth aspect of the present application, a computer program product is also provided, including computer-executable instructions, wherein when the computer-executable instructions are executed by a processor, the method according to the first aspect of the present application is implemented.
可以理解地,上述提供的第二方面的装置、第三方面的电子设备、第四方面的计算机存储介质、或第五方面的计算机程序产品用于执行第一方面所提供的方法。因此,关于第一方面的解释或者说明同样适用 于第二方面、第三方面、第四方面和第五方面。此外,第二方面、第三方面、第四方面、和第五方面所能达到的有益效果可参考对应方法中的有益效果,此处不再赘述。It can be understood that the device of the second aspect, the electronic device of the third aspect, the computer storage medium of the fourth aspect, or the computer program product of the fifth aspect provided above are used to execute the method provided by the first aspect. Therefore, the explanation or explanation regarding the first aspect also applies in the second, third, fourth and fifth aspects. In addition, the beneficial effects that can be achieved in the second aspect, the third aspect, the fourth aspect, and the fifth aspect can be referred to the beneficial effects in the corresponding methods, and will not be described again here.
附图说明Description of drawings
结合附图并参考以下详细说明,本申请各实施例的上述和其他特征、优点及方面将变得更加明显。在附图中,相同或相似的附图标注表示相同或相似的元素,其中:The above and other features, advantages and aspects of various embodiments of the present application will become more apparent with reference to the following detailed description taken in conjunction with the accompanying drawings. In the drawings, the same or similar reference numbers represent the same or similar elements, where:
图1示出了本申请的多个实施例能够在其中实现的示例环境的示意图;Figure 1 illustrates a schematic diagram of an example environment in which various embodiments of the present application can be implemented;
图2示出了根据本申请的一些实施例的用于处理任务的示意流程图;Figure 2 shows a schematic flow diagram for processing tasks according to some embodiments of the present application;
图3示出了根据本申请的一些实施例的用于控制核间共享资源的系统示意图;Figure 3 shows a schematic diagram of a system for controlling inter-core shared resources according to some embodiments of the present application;
图4示出了根据本申请的一些实施例的用于确定资源敏感性的过程的示意流程图;4 illustrates a schematic flow diagram of a process for determining resource sensitivity according to some embodiments of the present application;
图5示出了根据本申请的一些实施例的用于动态分配资源的过程的示意流程图;Figure 5 shows a schematic flowchart of a process for dynamically allocating resources according to some embodiments of the present application;
图6示出了根据本申请的一些实施例的计算设备的实现示例的示意图;Figure 6 shows a schematic diagram of an implementation example of a computing device according to some embodiments of the present application;
图7示出了根据本申请的一些实施例的装置的框图;以及Figure 7 shows a block diagram of an apparatus according to some embodiments of the present application; and
图8示出了能够实施本申请的多个实施例的计算设备的框图。8 illustrates a block diagram of a computing device capable of implementing various embodiments of the present application.
具体实施方式Detailed ways
下面将参照附图更详细地描述本申请的实施例。虽然附图中显示了本申请的某些实施例,然而应当理解的是,本申请可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本申请。应当理解的是,本申请的附图及实施例仅用于示例性作用,并非用于限制本申请的保护范围。Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather these embodiments are provided for Understand this application more thoroughly and completely. It should be understood that the drawings and embodiments of the present application are for illustrative purposes only and are not intended to limit the scope of protection of the present application.
在本申请的实施例的描述中,术语“包括”及其类似用语应当理解为开放性包含,即“包括但不限于”。术语“基于”应当理解为“至少部分地基于”。术语“一个实施例”或“该实施例”应当理解为“至少一个实施例”。术语“第一”、“第二”等等可以指代不同的或相同的对象。下文还可能包括其他明确的和隐含的定义。In the description of the embodiments of the present application, the term "including" and similar expressions shall be understood as an open inclusion, that is, "including but not limited to." The term "based on" should be understood to mean "based at least in part on." The terms "one embodiment" or "the embodiment" should be understood to mean "at least one embodiment". The terms "first," "second," etc. may refer to different or the same object. Other explicit and implicit definitions may be included below.
如上所述,伴随着数据中心的不断扩容,服务器的数量也快速增长。然而,服务器的资源利用率始终处于较低的状态,诸如中央处理器(Central Process Unit,CPU)利用率。这意味着巨大的资源浪费。为了提高服务器的资源利用率,常见的做法是将不同优先级的任务(虚拟机/容器等)混合部署。例如,通过混部将服务器CPU利用率从15%提升到30%,因此可将CPU成本节省50%。As mentioned above, with the continuous expansion of data centers, the number of servers is also growing rapidly. However, the server's resource utilization is always in a low state, such as central processing unit (Central Process Unit, CPU) utilization. This means a huge waste of resources. In order to improve server resource utilization, a common practice is to mix and deploy tasks of different priorities (virtual machines/containers, etc.). For example, server CPU utilization can be increased from 15% to 30% through colocation, thereby saving 50% of CPU costs.
然而,当优先级不同的任务进行混合部署时,片上资源中的CPU被分时复用,较低优先级的任务会不可避免的对高优先级任务造成干扰。现有主机操作系统(或虚拟机监管程序)任务调度算法无法保证高优先级任务对较低优先级任务的完全抢占;此外,不同优先级任务在同一物理核上的不同逻辑核上运行时,在算术逻辑单元、第一级缓存和第二级缓存等资源处存在干扰,影响了高优先级任务的快速执行。进一步地,最后一级缓存(Last Level Cache,LLC)和存储器带宽(Memory Bandwidth,MBW)为核间共享资源,当不同物理核上运行高优先级任务和较低优先级任务时,较低优先级任务可能会抢占过多核间共享资源,导致高优先级任务处理效率受损。However, when tasks with different priorities are mixed and deployed, the CPUs in the on-chip resources are time-shared and multiplexed, and lower-priority tasks will inevitably interfere with high-priority tasks. The existing host operating system (or hypervisor) task scheduling algorithm cannot guarantee that high-priority tasks will completely preempt lower-priority tasks; in addition, when tasks of different priorities run on different logical cores on the same physical core, There is interference at resources such as the arithmetic logic unit, first-level cache, and second-level cache, which affects the rapid execution of high-priority tasks. Furthermore, the Last Level Cache (LLC) and the Memory Bandwidth (MBW) are shared resources between cores. When high-priority tasks and lower-priority tasks are run on different physical cores, the lower priority High-level tasks may seize too many inter-core shared resources, causing the processing efficiency of high-priority tasks to be impaired.
为了解决上述问题,一种传统方案是采用容器混部方案,例如全场景在离线混部方案,其基于linux系统设计全新的离线调度器类,该调度器类在调度队列中优先级低于默认调度器类,高于IDLE调度器类。然而设计新的调度类增加了系统复杂度和运维成本,且无法复用当前系统中负载均衡等特性。In order to solve the above problems, a traditional solution is to use a container co-location solution, such as a full-scenario offline co-location solution, which designs a new offline scheduler class based on the Linux system. This scheduler class has a lower priority in the scheduling queue than the default Scheduler class, higher than IDLE scheduler class. However, designing a new scheduling class increases system complexity and operation and maintenance costs, and cannot reuse features such as load balancing in the current system.
另一种传统方案是将任务分类为时延敏感型和尽力服务型。该方案需要实时读取用户侧处理效率相关数据,在满足延迟敏感型应用处理效率的前提下,尽可能多的为尽力服务型任务提供资源。然而,这些方案要求对任务有一定先验知识,例如需要知道不同缓存容量下应用的吞吐,缓存命中率等数据(依赖特定硬件性能和架构)。然而,由于需要对任务有先验知识,只能限于特定任务,并不能对由服务器运行的各种任务进行调整。Another traditional solution is to classify tasks into delay-sensitive and best-effort types. This solution needs to read data related to user-side processing efficiency in real time, and provide as many resources as possible for best-effort tasks on the premise of meeting the processing efficiency of delay-sensitive applications. However, these solutions require certain prior knowledge of the task, such as data such as application throughput and cache hit rate under different cache capacities (depending on specific hardware performance and architecture). However, since it requires prior knowledge of the task, it can only be limited to specific tasks and cannot be adjusted for various tasks run by the server.
至少为了解决上述问题中的一些问题和其他潜在问题,在本申请的实施例中,计算设备先确定要由处理资源中的物理核中的第一逻辑核执行的第一任务,然后确定第一任务是否是高优先级。如果计算设备确定出第一任务是高优先级任务,则确定物理核中的第二逻辑核是否执行高优先级的第二任务。如果确定第二逻辑核未执行高优先级的第二任务,则向第二逻辑核分配包括空指令的专用任务。从而使得高优先级的 任务独占物理核内的资源。进一步地,在一个物理核内运行有高优先级任务时,还可以调节同一片上资源内的其他物理核中运行的低优先级的任务占用的共享资源以进一步提升高优先级任务的执行。基于这样的方式,本申请的实施例能够加快高优先级任务在处理器内的执行,提高了高优先级任务的处理效率,并且改进了处理器的资源利用率。In order to solve at least some of the above problems and other potential problems, in embodiments of the present application, the computing device first determines the first task to be performed by the first logical core among the physical cores in the processing resource, and then determines the first Whether the task is high priority. If the computing device determines that the first task is a high-priority task, it determines whether a second logical core in the physical core executes a high-priority second task. If it is determined that the second logical core does not execute the high-priority second task, a dedicated task including a null instruction is allocated to the second logical core. thus making high-priority Tasks exclusively occupy resources within the physical core. Furthermore, when there are high-priority tasks running in one physical core, the shared resources occupied by low-priority tasks running in other physical cores within the same on-chip resource can also be adjusted to further improve the execution of high-priority tasks. Based on this method, embodiments of the present application can speed up the execution of high-priority tasks in the processor, improve the processing efficiency of high-priority tasks, and improve the resource utilization of the processor.
图1示出了本申请的多个实施例能够在其中实现的示例环境100的示意图。如图1所示,环境100包括计算设备101。Figure 1 shows a schematic diagram of an example environment 100 in which various embodiments of the present application can be implemented. As shown in FIG. 1 , environment 100 includes computing device 101 .
计算设备101包括但不限于个人计算机、服务器、手持或膝上型设备、移动设备(诸如移动电话、个人数字助理(PDA)、媒体播放器等)、多处理器系统、消费电子产品、小型计算机、大型计算机、包括上述系统或设备中的任意一个的分布式计算环境等。Computing devices 101 include, but are not limited to, personal computers, servers, handheld or laptop devices, mobile devices (such as mobile phones, personal digital assistants (PDAs), media players, etc.), multi-processor systems, consumer electronics, small computers , large computers, distributed computing environments including any of the above systems or devices, etc.
计算设备101用于处理来自用户的各种任务。本文描述的任务是由计算设备处理的任务,其可以为虚拟机、容器,也可以为一组进程或一组线程等。为了能够为一些重要的任务提供高质量的服务,用户可以为服务分配一个优先级,也称为硬优先级。例如,将任务分为高优先级任务和低优先级任务,如图1中示出的高优先级任务102和低优先级任务103。高优先级任务102和低优先级任务103的优先级是预先指定的。分配了任务优先级的任务会带有任务标签,以指示其任务的优先级级别。例如,为每个任务分配一个字段用于存储任务标签。Computing device 101 is used to handle various tasks from users. The tasks described in this article are tasks processed by computing devices, which can be virtual machines, containers, or a set of processes or a set of threads. In order to provide high-quality services for some important tasks, users can assign a priority to the service, also called a hard priority. For example, tasks are divided into high-priority tasks and low-priority tasks, such as the high-priority task 102 and the low-priority task 103 shown in FIG. 1 . The priorities of high priority tasks 102 and low priority tasks 103 are pre-specified. Tasks assigned a task priority have a task label indicating the priority level of their task. For example, assign each task a field to store the task label.
在一个示例中,计算设备101依据任务的类型确定任务的优先级。在另一个示例中,任务的优先级由用户指定。例如在公有云场景中,将资源独享型虚拟机与资源共享型虚拟机混合部署,对资源独享型虚拟机添加高优先级标签,为资源共享型虚拟机添加低优先级标签。在私有云场景中,用户可将任务按照是否时延敏感等要求分类为高优先级任务和低优先级任务。上述示例仅是用于描述本公开,而非对本公开的具体限定。In one example, computing device 101 prioritizes tasks based on their type. In another example, the priority of a task is specified by the user. For example, in a public cloud scenario, a resource-exclusive virtual machine and a resource-sharing virtual machine are mixedly deployed. A high-priority label is added to the resource-exclusive virtual machine, and a low-priority label is added to the resource-sharing virtual machine. In a private cloud scenario, users can classify tasks into high-priority tasks and low-priority tasks based on whether they are delay-sensitive or not. The above examples are only used to describe the present disclosure, but not to specifically limit the present disclosure.
计算设备101还具有片上资源107。片上资源107是指设置在芯片的资源,至少包括物理核108和109,以及最后一级缓存LLC和存储器带宽MBW,即LLC/MBW 114。例如,片上资源107为CPU。图1中示出了片上资源包括两个物理核108和109,其仅是示例,而非对本公开的具体限定。本领域技术人员可以依据需要设置片上资源107中包含的物理核的数目。例如,片上资源107可以包括一个物理核或者多于两个物理核。Computing device 101 also has on-chip resources 107 . On-chip resources 107 refer to the resources set on the chip, including at least physical cores 108 and 109, as well as the last level cache LLC and memory bandwidth MBW, that is, LLC/MBW 114. For example, on-chip resource 107 is a CPU. Figure 1 shows that the on-chip resources include two physical cores 108 and 109, which is only an example and not a specific limitation of the present disclosure. Those skilled in the art can set the number of physical cores included in the on-chip resource 107 as needed. For example, on-chip resources 107 may include one physical core or more than two physical cores.
如图1所示,物理核108包括逻辑核110和逻辑核111,物理核109包括逻辑核112和逻辑核113。图1中的逻辑核是通过对物理核进行超线程操作来获得的。图1中示出的物理核包括两个逻辑核仅是示例,而非对本公开的具体限定。物理核中逻辑核的数目可以由本领域技术人员依据需要设置,例如一个物理核包括一个或多于两个逻辑核,并且两个物理核内逻辑核的数目可以相同或不同。As shown in FIG. 1 , the physical core 108 includes a logical core 110 and a logical core 111 , and the physical core 109 includes a logical core 112 and a logical core 113 . The logical core in Figure 1 is obtained by performing hyper-threading operations on the physical core. The physical core shown in FIG. 1 including two logical cores is only an example and is not a specific limitation of the present disclosure. The number of logical cores in a physical core can be set by those skilled in the art as needed. For example, one physical core includes one or more than two logical cores, and the number of logical cores in two physical cores can be the same or different.
计算设备101为每个逻辑核配置了一个任务就绪队列,以用于存储由每个逻辑核执行的任务。计算设备101在接收到任务时会依据逻辑核的负载均衡和/或用户对任务的配置来将新接收到的任务分配到一个逻辑核的任务就绪队列。The computing device 101 configures a task ready queue for each logical core for storing tasks to be executed by each logical core. When the computing device 101 receives a task, it will allocate the newly received task to the task ready queue of a logical core according to the load balancing of the logical core and/or the user's configuration of the task.
计算设备101还包括CPU调度优化模块104以用于调度和优化高优先级任务的执行。通常分配到逻辑核的任务就绪队列中的任务会基于分配给任务的时间片以及处理器时间片来按序处理就绪队列中的任务,例如利用CPU时间片轮转调度方法或完全公平调度方法。由于计算设备101接收的任务还设置了优先级,CPU调度优化模块104用于进一步调整包括优先级的各个任务在就绪队列内的排序以及任务的执行。Computing device 101 also includes a CPU scheduling optimization module 104 for scheduling and optimizing execution of high priority tasks. Usually, tasks assigned to the task ready queue of a logical core will process the tasks in the ready queue in order based on the time slice assigned to the task and the processor time slice, such as using the CPU time slice round-robin scheduling method or the completely fair scheduling method. Since the tasks received by the computing device 101 also have priorities set, the CPU scheduling optimization module 104 is used to further adjust the sorting of each task including the priorities in the ready queue and the execution of the tasks.
CPU调度优化模块104包括单核压制模块105和逻辑核隔离模块106。单核压制模块105用于将包括优先级的任务在就绪队列内进行排序,高优先级的任务排在任务就绪队列的前部,而低优先级的任务排在任务就绪队列的后部。如果两个任务的优先级相同,则依据现有的时间片调度的方式进行排队。然后通过单核压制模块105将具有高优先级的任务优先送到逻辑核中进行处理。The CPU scheduling optimization module 104 includes a single core suppression module 105 and a logical core isolation module 106 . The single-core suppression module 105 is used to sort tasks including priorities in the ready queue. High-priority tasks are arranged at the front of the task ready queue, while low-priority tasks are arranged at the rear of the task ready queue. If two tasks have the same priority, they will be queued according to the existing time slice scheduling method. Then, through the single-core suppression module 105, tasks with high priority are sent to the logical core for processing first.
如果逻辑核在处理高优先级任务,则CPU优化调度模块中的逻辑核隔离模块106会进一步确定在同一物理核上的其他逻辑核是否在执行高优先级任务。例如,如果物理核108中的逻辑核110在运行高优先级任务,则逻辑核隔离模块106会确定同一物理核上的逻辑核111是否在运行高优先级任务。图1中示出的是物理核包括两个逻辑核的示例,如果物理核包括多于两个逻辑核,逻辑隔离模块106会确定一个或多个其他逻辑核是否在运行高优先级任务。If the logical core is processing high-priority tasks, the logical core isolation module 106 in the CPU optimized scheduling module will further determine whether other logical cores on the same physical core are executing high-priority tasks. For example, if the logical core 110 in the physical core 108 is running a high-priority task, the logical core isolation module 106 will determine whether the logical core 111 on the same physical core is running a high-priority task. Shown in FIG. 1 is an example in which the physical core includes two logical cores. If the physical core includes more than two logical cores, the logical isolation module 106 determines whether one or more other logical cores are running high-priority tasks.
如果其他逻辑核,如逻辑核111未运行高优先级任务,则逻辑核隔离模块106会在逻辑核111中拉起一个空指令任务。该空指令任务的优先级高于低优先级,并且低于高优先级。例如,如果其他逻辑核在运 行低优先级任务,则将低优先级任务替换为空指令任务。如果其他逻辑核未处理任务,则直接拉起空指令任务。在其他逻辑核上拉起空指令任务后,低优先级的任务就不能占用该其他逻辑核,从而保证高优先级任务的执行。如果其他逻辑核也在运行高优先级任务,则不对其他逻辑核进行调整。If other logical cores, such as the logical core 111, are not running high-priority tasks, the logical core isolation module 106 will pull up an empty instruction task in the logical core 111. The priority of this empty instruction task is higher than low priority and lower than high priority. For example, if other logical cores are running If a low-priority task is executed, the low-priority task will be replaced with an empty instruction task. If other logical cores do not process the task, the empty instruction task is directly pulled up. After an empty instruction task is pulled up on other logical cores, low-priority tasks cannot occupy the other logical cores, thereby ensuring the execution of high-priority tasks. If other logical cores are also running high-priority tasks, no adjustments will be made to other logical cores.
另外,如果逻辑核要处理的任务是低优先级任务,则逻辑核只是执行该低优先级任务,并不对其他逻辑核进行任务控制操作。同样,如果其他逻辑核在运行高优先级任务,则逻辑核上运行的低优先级任务会被空指令任务替换。In addition, if the task to be processed by the logical core is a low-priority task, the logical core only executes the low-priority task and does not perform task control operations on other logical cores. Similarly, if other logical cores are running high-priority tasks, low-priority tasks running on the logical core will be replaced by empty instruction tasks.
如果片上资源存在多个物理核,则多个物理核之间的共享资源也会存在竞争。例如多个物理核共用的最后一级缓存LLC和存储器带宽MBW资源。备选地或附加地,计算设备101还可选地包括片上共享资源管理器115。片上资源共享管理器115用于获取一个物理核上执行的高优先级任务性能指标,然后基于该性能指标来调节其他物理核上的低优先级任务对物理核间共享资源的占用。If there are multiple physical cores for on-chip resources, there will also be competition for shared resources between multiple physical cores. For example, the last level cache LLC and memory bandwidth MBW resources shared by multiple physical cores. Alternatively or additionally, computing device 101 optionally also includes an on-chip shared resource manager 115 . The on-chip resource sharing manager 115 is used to obtain the performance index of high-priority tasks executed on one physical core, and then adjust the occupation of shared resources between physical cores by low-priority tasks on other physical cores based on the performance index.
片上共享资源管理器115包括采集器116,用于获取关于高优先级任务的性能指标。片上共享资源管理器115还包括分类器117,用于确定高优先级任务对于共享资源是否敏感。然后资源控制器118基于一个物理核上的高优先级任务对于共享资源是否敏感,来调整其他物理核上的低优先级任务对共享资源的占用。例如,如果物理核108的逻辑核110运行高优先级任务,除了控制物理核108内的逻辑核111不执行低优先级的任务,还会通过片上共享资源管理器115控制物理核109中分配给低优先级任务的共享资源。The on-chip shared resource manager 115 includes a collector 116 for obtaining performance indicators on high-priority tasks. The on-chip shared resource manager 115 also includes a classifier 117 for determining whether high priority tasks are sensitive to shared resources. Then the resource controller 118 adjusts the occupation of shared resources by low-priority tasks on other physical cores based on whether the high-priority tasks on one physical core are sensitive to the shared resources. For example, if the logical core 110 of the physical core 108 runs a high-priority task, in addition to controlling the logical core 111 in the physical core 108 not to execute low-priority tasks, the on-chip shared resource manager 115 also controls the allocation of resources in the physical core 109 to Shared resources for low priority tasks.
通过上述方式,能够加快高优先级任务在处理器内的执行,提高了高优先级任务的处理效率,并且改进了处理器的资源利用率。Through the above method, the execution of high-priority tasks in the processor can be accelerated, the processing efficiency of high-priority tasks is improved, and the resource utilization of the processor is improved.
上面结合图1描述了本申请的实施例能够在其中被实现的示例环境100的示意图。下面结合图2描述根据本公开的实施例的用于处理任务的方法200的流程图。方法200可以在图1中的计算设备101及任意合适的计算设备处执行。A schematic diagram of an example environment 100 in which embodiments of the present application can be implemented is described above in conjunction with FIG. 1 . A flowchart of a method 200 for processing tasks according to an embodiment of the present disclosure is described below with reference to FIG. 2 . Method 200 may be performed at computing device 101 in FIG. 1 and any suitable computing device.
在框201处,确定要由处理资源中的物理核中的第一逻辑核执行的第一任务。例如,计算设备101确定物理核108内的逻辑核110处理的任务以分配给逻辑核执行。At block 201, a first task is determined to be performed by a first logical core among the physical cores in the processing resource. For example, computing device 101 determines tasks processed by logical cores 110 within physical cores 108 to assign to the logical cores for execution.
在一些实施例中,计算设备101可以获取对应于第一逻辑核的任务就绪队列,该任务就绪队列是基于任务的优先级的有序队列。然后,计算设备101从该任务就绪队列中来获取要由第一逻辑核执行的第一任务。通过该方式,可以快速准确的获取到要由逻辑核处理的任务,减少了获取高优先级任务的时间,提高了处理效率。In some embodiments, the computing device 101 may obtain a task ready queue corresponding to the first logical core, which is an ordered queue based on the priority of the task. Then, the computing device 101 obtains the first task to be executed by the first logical core from the task ready queue. Through this method, tasks to be processed by the logic core can be quickly and accurately obtained, which reduces the time to obtain high-priority tasks and improves processing efficiency.
在一些实施例中,计算设备101在获取用于在第一逻辑核上执行的第一任务时,会根据优先级从任务就绪队列的头部选取就绪任务。由于任务就绪队列是经过排序的队列,因此队列头部存储的是优先级较高的就绪任务。然后计算设备101还会确定第一逻辑核当前是否正在执行当前任务。如果确定第一逻辑核未执行任务,则由第一逻辑核来执行该就绪任务。如果第一逻辑核正在执行当前任务,此时需要比较就绪任务的优先级和当前任务的优先级。如果确定就绪任务的优先级高于当前任务的优先级,将就绪任务确定为第一任务以用于替换当前任务的执行。如果确定就绪任务的优先级低于当前任务的优先级,则继续执行当前任务,而不执行该低优先级的就绪任务。如果确定就绪任务的优先级等于当前任务的优先级,则利用CPU时间片轮转调度方法或完全公平调度方法来进行调度。上述过程一般在第一逻辑核的任务就绪队列分配了新任务,并且执行了一次排序后进行。通过上述方式,可以快速地确定出是否需要替换逻辑核内的任务,提高了高优先级任务被快速处理的机率。在一些实施例中,如果逻辑核未分配到新的任务,则当逻辑核处理完当前任务的CPU时间片时会从任务就绪队列头部获取要运行的任务。上述示例仅是用于描述本公开,而非对本公开的具体限定。In some embodiments, when acquiring the first task for execution on the first logical core, the computing device 101 selects the ready task from the head of the task ready queue according to the priority. Since the task ready queue is a sorted queue, the head of the queue stores ready tasks with higher priority. The computing device 101 then also determines whether the first logical core is currently executing the current task. If it is determined that the first logical core has not executed the task, the first logical core executes the ready task. If the first logical core is executing the current task, the priority of the ready task needs to be compared with the priority of the current task. If it is determined that the priority of the ready task is higher than the priority of the current task, the ready task is determined as the first task to replace the execution of the current task. If it is determined that the priority of the ready task is lower than the priority of the current task, the current task will continue to be executed without executing the low-priority ready task. If it is determined that the priority of the ready task is equal to the priority of the current task, the CPU time slice round-robin scheduling method or the completely fair scheduling method is used for scheduling. The above process is generally performed after a new task is allocated to the task ready queue of the first logical core and a sorting is performed. Through the above method, it can be quickly determined whether tasks in the logical core need to be replaced, which increases the probability that high-priority tasks will be processed quickly. In some embodiments, if the logical core is not assigned a new task, the task to be run will be obtained from the head of the task ready queue when the logical core has processed the CPU time slice of the current task. The above examples are only used to describe the present disclosure, but not to specifically limit the present disclosure.
在一些实施例中,计算设备101在接收到新的任务时,会获取被分配给第一逻辑核的分配任务和针对分配任务的对应优先级。然后计算设备基于对应优先级来将分配任务添加到任务就绪队列中。在一个示例中,计算设备101在接收到分配给逻辑核110的任务时,会对任务就绪队列中的任务依据优先级从高到低的顺序进行重新排序。在另一个示例中,依据新分配任务的优先级插入到逻辑队列中。如果两个任务的优先级相同,则依据常用的时间片确定次序的方式来确定两者的顺序。通过上述方式,可以快速的确定新分配的任务在优先级队列中的顺序,也使得高优先级任务能够被及时处理,提高了高优先级任务的处理效率。In some embodiments, when receiving a new task, the computing device 101 obtains the assigned task assigned to the first logical core and the corresponding priority for the assigned task. The computing device then adds the assigned tasks to the task ready queue based on the corresponding priorities. In one example, when the computing device 101 receives a task assigned to the logical core 110, it reorders the tasks in the task ready queue in order from high to low priority. In another example, newly assigned tasks are inserted into a logical queue based on their priority. If the priorities of two tasks are the same, the order of the two tasks is determined according to the commonly used time slice ordering method. Through the above method, the order of newly assigned tasks in the priority queue can be quickly determined, and high-priority tasks can be processed in a timely manner, thereby improving the processing efficiency of high-priority tasks.
在框202处,确定第一任务是否是预定优先级。由于任务优先级的高低不同会使得同一物理核内的资源的管理方式不同。因此,计算设备101在确定了在逻辑核内处理的任务时会确定其优先级。At block 202, it is determined whether the first task is a predetermined priority. Due to different task priorities, resources within the same physical core are managed differently. Accordingly, computing device 101 prioritizes tasks for processing within logical cores as they are determined.
在一些实施例中,预定优先级是第一预定优先级,也即高优先级。如果确定第一任务不是预定优先级, 即不是第一预定优先级或高优先级,则计算设备101会确定出第一任务是第二预定优先级,然后计算设备101使得第一逻辑核执行第一任务,其中第二预定优先级低于第一预定优先级,也即低优先级。此时并不对其他逻辑核的任务进行调整。通过该方式,可以快速的执行第一任务,提高了处理效率。In some embodiments, the predetermined priority is a first predetermined priority, ie, a high priority. If it is determined that the first task is not the scheduled priority, That is, if it is not the first predetermined priority or the high priority, the computing device 101 will determine that the first task is the second predetermined priority, and then the computing device 101 causes the first logical core to execute the first task, in which the second predetermined priority is low. at the first predetermined priority level, that is, low priority level. At this time, the tasks of other logical cores are not adjusted. In this way, the first task can be quickly executed and the processing efficiency is improved.
如果在框202处确定第一任务是预定优先级,在框203处,计算设备101还会进一步确定物理核中的第二逻辑核是否执行预定优先级的第二任务。如果在框203处确定第二逻辑核正在执行预定优先级的第二任务,表明第二逻辑核也在处理高优先级的任务。因此,可以不对第二逻辑核执行的任务进行调整。If it is determined at block 202 that the first task is of a predetermined priority, at block 203 , the computing device 101 further determines whether the second logical core in the physical core executes a second task of the predetermined priority. If it is determined at block 203 that the second logical core is executing a second task of a predetermined priority, it indicates that the second logical core is also processing a high-priority task. Therefore, the tasks performed by the second logical core may not be adjusted.
在一些实施例中,如果确定第一任务是预定优先级,即第一任务是高优先级任务,则计算设备101确定第二逻辑核是否正在执行第二任务。如果未执行第二任务,则可直接使第二逻辑核执行包括空指令的任务,该空指令任务也可称为专用于调度控制的专用任务。如果确定第二逻辑核正在执行第二任务,确定第二任务的优先级是否是预定优先级。如果确定是预定优先级,则不进行进一步的操作。通过该方式,可以提高判定第二任务的优先级级别的效率。In some embodiments, if it is determined that the first task is of a predetermined priority, that is, the first task is a high-priority task, the computing device 101 determines whether the second logical core is executing the second task. If the second task is not executed, the second logical core can be directly caused to execute a task including a null instruction. The null instruction task can also be called a dedicated task dedicated to scheduling control. If it is determined that the second logical core is executing the second task, it is determined whether the priority of the second task is a predetermined priority. If it is determined to be a predetermined priority, no further operation is performed. In this way, the efficiency of determining the priority level of the second task can be improved.
如果确定第二逻辑核未执行预定优先级的第二任务,在框204处,向第二逻辑核分配包括空指令的专用任务。此时,在第二逻辑核中拉起了空指令任务。从而,使得第一逻辑核中运行的高优先级的任务占用更多的核内资源进行执行。If it is determined that the second logical core is not executing the second task of the predetermined priority, at block 204 , the second logical core is assigned a dedicated task including a null instruction. At this time, an empty instruction task is launched in the second logical core. Therefore, the high-priority tasks running in the first logical core occupy more core resources for execution.
通过该方法,本申请的实施例能够加快高优先级任务在处理器内的执行,实现高优先级任务对较低优先级任务在单物理核上的压制,并且消除不同优先级任务在逻辑核上的干扰,提高了高优先级任务在单个物理核内的执行,改进了高优先级任务的处理效率。Through this method, embodiments of the present application can speed up the execution of high-priority tasks in the processor, realize the suppression of lower-priority tasks by high-priority tasks on a single physical core, and eliminate the need for tasks of different priorities to overlap in the logical core. The interference on the CPU improves the execution of high-priority tasks within a single physical core and improves the processing efficiency of high-priority tasks.
上面结合图2描述了根据本申请的一些实施例的用于处理任务的示意流程图。下面描述在多个物理核间处理核间共享资源以进一步加快高优先级任务执行的示例过程。处理资源可以包括多个物理核,在第一物理核上运行了高优先级任务后,还需要基于高优先级任务的性能来调整一个或多个其他物理核上的低优先级应用。在该过程中,计算设备101先获取与第一任务有关的性能参数的基线值。然后基于基线值来调整可分配给第二物理核上的第三任务的共享资源,第三任务具有低于第一预定优先级的第二预定优先级。通过该方式,可以为高优先级任务先分配更多的核间共享资源,加快了高优先级任务的执行,提高了处理效率和处理效率。下面结合图3进一步描述该过程。A schematic flowchart for processing tasks according to some embodiments of the present application is described above in conjunction with FIG. 2 . An example process for handling inter-core shared resources among multiple physical cores to further speed up the execution of high-priority tasks is described below. The processing resources may include multiple physical cores. After a high-priority task is run on the first physical core, low-priority applications on one or more other physical cores need to be adjusted based on the performance of the high-priority task. In this process, the computing device 101 first obtains the baseline value of the performance parameter related to the first task. Shared resources allocable to a third task on the second physical core are then adjusted based on the baseline value, the third task having a second predetermined priority lower than the first predetermined priority. In this way, more inter-core shared resources can be allocated to high-priority tasks first, speeding up the execution of high-priority tasks and improving processing efficiency and processing efficiency. This process is further described below in conjunction with Figure 3.
图3描述了根据本申请的一些实施例的用于控制核间共享资源的系统示意图。系统300用于控制核间共享资源的分配,其可以为图1中的片上共享资源管理器115。系统300包括数据采集器301、资源敏感性分类器302和资源控制器303。Figure 3 depicts a schematic diagram of a system for controlling inter-core shared resources according to some embodiments of the present application. The system 300 is used to control the allocation of shared resources between cores, which may be the on-chip shared resource manager 115 in Figure 1 . The system 300 includes a data collector 301, a resource sensitivity classifier 302 and a resource controller 303.
数据采集器301用于采集高优先级任务微架构性能指标,包括但不限于单位时间内执行的指令数、指令周期、缓存未命中数、缓存访问数、存储器带宽、以及流水线后端存储器约束等相关的基础指标。基于这些指标可进一步计算出每千条指令的缓存未命中数(Cache Misses Per Kilo Instructions,MPKI),每千条指令的缓存访问量,例如每千条指令的最后一级缓存访问量(LLC Accesses Per Kilo Instructions,APKI)、缓存未命中率(Cache Misses Rate,CMR)、每周期指令数(instructions per cycles,IPC)等复杂指标。数据采集器还会获取分配给高优先级任务的存储器带宽MBW。例如,多个物理核中的第一物理核中运行高优先级任务,数据采集器301就会采集与该高优先级任务对应的性能指标。The data collector 301 is used to collect high-priority task microarchitecture performance indicators, including but not limited to the number of instructions executed per unit time, instruction cycles, cache misses, cache accesses, memory bandwidth, and pipeline back-end memory constraints, etc. Relevant basic indicators. Based on these indicators, we can further calculate the number of cache misses per thousand instructions (Cache Misses Per Kilo Instructions, MPKI), the cache accesses per thousand instructions, such as the last level cache accesses per thousand instructions (LLC Accesses) Per Kilo Instructions (APKI), Cache Misses Rate (CMR), instructions per cycle (IPC) and other complex indicators. The data collector also obtains the memory bandwidth MBW allocated to high-priority tasks. For example, if a high-priority task runs on the first physical core among multiple physical cores, the data collector 301 will collect performance indicators corresponding to the high-priority task.
为了进一步了解高优先级任务的性能,需要获取高优先级任务的基线性能。高优先级任务的基线性能是指在压制了其他物理核上的低优先任务后执行该高优先级任务的性能指标值。在一个示例中,计算设备压制一个或多个其他物理核对一个或多个低优先级任务的执行。然后计算设备基于对低优先级任务的压制,确定基线值。而如果不压制其他物理核上的低优先任务,则获得是该高优先级任务的正常运行性能指标值。通过该方式,可以快速准确的确定基线性能指标。In order to further understand the performance of high-priority tasks, it is necessary to obtain the baseline performance of high-priority tasks. The baseline performance of a high-priority task refers to the performance index value of executing the high-priority task after suppressing low-priority tasks on other physical cores. In one example, the computing device suppresses the execution of one or more low-priority tasks by one or more other physical cores. The computing device then determines the baseline value based on the suppression of low-priority tasks. If low-priority tasks on other physical cores are not suppressed, the normal operating performance index value of the high-priority task will be obtained. In this way, baseline performance indicators can be determined quickly and accurately.
例如,图1中的第一物理核108上运行了高优先级任务,压制第二物理核109上的低优先级的第三任务的执行,然后获得针对高优先级业务的基线性能。附加地,在压制第二物理核109上的低优先级的第三任务的执行时,以预定的时间间隔多次压制第二物理核109对第三任务的执行。然后基于对第三任务的多次压制,来确定多个基线值。然后可以基于多次压制的针对同一指标的多个值求平均或进行各种合适的处理来确定针对该指标的基线值。For example, the first physical core 108 in Figure 1 runs a high-priority task, suppresses the execution of the low-priority third task on the second physical core 109, and then obtains baseline performance for high-priority services. Additionally, when the execution of the low-priority third task on the second physical core 109 is suppressed, the execution of the third task by the second physical core 109 is suppressed multiple times at predetermined time intervals. Multiple baseline values are then determined based on multiple suppressions of the third task. The baseline value for the indicator can then be determined based on averaging multiple values for the same indicator over multiple suppressions or performing various suitable processing.
如图3中框304所示周期性地压制其他核上的低优先级任务。该压制过程包括两种方式,一个是如框305所示,限制被分配给低优先级任务的共享资源的上限。以此种方式压制,可设定触发频率为百毫秒级。另一种方式如框306所示,暂停低优先级任务的执行。此种方式可准确获取高优先级任务在无低优先级任 务干扰下的基线性能,但对低优先级任务不友好,故可设定触发频率为秒级。Low-priority tasks on other cores are periodically suppressed as shown in block 304 in Figure 3 . The suppression process includes two methods. One is, as shown in block 305, to limit the upper limit of shared resources allocated to low-priority tasks. By suppressing in this way, the trigger frequency can be set to hundreds of milliseconds. Another way, as shown in block 306, is to suspend the execution of low-priority tasks. This method can accurately obtain high-priority tasks without low-priority tasks. The baseline performance under task interference is good, but it is not friendly to low-priority tasks, so the trigger frequency can be set to the second level.
系统300还包括资源敏感性分类器302,其基于获得的性能参数的基线值来确定任务对共享资源是否敏感或对共享资源的敏感性。例如确定图1中的第一物理核108上运行的高优先级任务对共享资是否敏感。在一个示例中,计算设备101基于性能参数的基线值,确定第一任务对共享资源的敏感性。如框307中所示,基于性能参数的基线值来确定对共享资源是否敏感。然后资源控制器303基于敏感性,调整可分配给其他核上的低优先级任务的共享资源。通过该方式,可以准确的确定出高优先级任务是否对资源敏感,使得在保证高优先级任务的处理效率的同时,也保证了低优先级任务的执行。下面将接合图4进一步描述确定资源敏感性的过程。System 300 also includes a resource sensitivity classifier 302 that determines whether a task is sensitive to or sensitive to shared resources based on the obtained baseline values of the performance parameters. For example, it is determined whether the high-priority task running on the first physical core 108 in FIG. 1 is sensitive to shared resources. In one example, computing device 101 determines the sensitivity of the first task to the shared resource based on the baseline value of the performance parameter. As shown in block 307, it is determined whether sensitivity to the shared resource is based on the baseline value of the performance parameter. The resource controller 303 then adjusts the shared resources that can be allocated to low-priority tasks on other cores based on the sensitivity. In this way, it can be accurately determined whether high-priority tasks are resource-sensitive, so that while ensuring the processing efficiency of high-priority tasks, the execution of low-priority tasks is also ensured. The process of determining resource sensitivity will be further described below in connection with FIG. 4 .
在框307处经过判别后如果确定第一任务对共享资源不敏感,在框308处,静态分配其他核上的低优先级任务的共享资源,例如增加用于其他核上的低优先级任务的共享资源的上限。由于第一任务对共享资源不敏感,因此可以为低优先级任务分配更多的共享资源,此时按预定的策略增加分配给其他核上低优先级资源的量。如果确定第一任务对共享资源敏感,在框309处,动态分配用于其他核上的低优先级任务的共享资源的上限。在动态分配共享资源的过程中需要使用高优先级的第一任务的基线性能和正常的性能来动态配置共享资源。共享资源包括最后一级缓存LLC和存储器带宽中的至少一项。通过该方式,可以准确地确定要调整哪些共享资源。该动态配置共享资源的过程将结合图5进行描述。If it is determined at block 307 that the first task is not sensitive to shared resources, at block 308 , statically allocate shared resources for low-priority tasks on other cores, for example, add resources for low-priority tasks on other cores. The upper limit of shared resources. Since the first task is not sensitive to shared resources, more shared resources can be allocated to low-priority tasks. At this time, the amount of low-priority resources allocated to other cores is increased according to a predetermined strategy. If it is determined that the first task is sensitive to shared resources, at block 309, an upper limit of shared resources for low priority tasks on other cores is dynamically allocated. In the process of dynamically allocating shared resources, it is necessary to use the baseline performance and normal performance of the high-priority first task to dynamically configure shared resources. The shared resources include at least one of last level cache LLC and memory bandwidth. This way you can determine exactly which shared resources to adjust. The process of dynamically configuring shared resources will be described in conjunction with Figure 5.
上面结合图3描述了用于控制核间共享资源的系统示意图。下面结合图4进一步描述确定资源敏感性的过程。图4示出了根据本申请的一些实施例的用于确定资源敏感性的过程的示意流程图;The system schematic diagram for controlling shared resources between cores is described above in conjunction with Figure 3. The process of determining resource sensitivity is further described below in conjunction with Figure 4. 4 illustrates a schematic flow diagram of a process for determining resource sensitivity according to some embodiments of the present application;
在框401处,计算设备获取针对高优先级的第一任务的性能参数。该性能参数包括以下至少一项:每千条指令的缓存访问量APKI、缓存未命中率CMR和存储器带宽MBW。At block 401, the computing device obtains performance parameters for a high priority first task. The performance parameters include at least one of the following: cache accesses per thousand instructions APKI, cache miss rate CMR, and memory bandwidth MBW.
在框402处,对APKI和CMR参数进行判定。如果确定每千条指令的缓存访问量APKI低于阈值缓存访问量A或者缓存未命中率CMR大于阈值缓存未命中率B,则在框404处确定高优先级的第一任务对缓存资源不敏感,例如对LLC不敏感。如果确定每千条指令的缓存访问量APKI高于或等于阈值缓存访问量A并且缓存未命中率CMR小于或等于阈值缓存未命中率B,在框405处确定高优先级的第一任务对缓存资源LLC敏感。At block 402, APKI and CMR parameters are determined. If it is determined that the cache access amount APKI per thousand instructions is lower than the threshold cache access amount A or the cache miss rate CMR is greater than the threshold cache miss rate B, then it is determined at block 404 that the high-priority first task is not sensitive to cache resources. , such as LLC is not sensitive. If it is determined that the cache access amount APKI per thousand instructions is higher than or equal to the threshold cache access amount A and the cache miss rate CMR is less than or equal to the threshold cache miss rate B, a high-priority first task pair cache is determined at block 405 Resources LLC is sensitive.
在框403处,将存储器带宽MBW与阈值C进行比较,如果确定存储器带宽MBW小于阈值C,则在框406处确定高优先级的第一任务对存储器带宽不敏感。如果确定存储器带宽MBW大于或等于阈值C,则在框407确定高优先级的第一任务对存储器带宽MBW敏感。备选地或附加地,当设置对一类资源不敏感时,等待条件连续成立多次时生效,以及防止乒乓现象,该乒乓现象是指由于每次检测的结果不同导致在不同的结果之间来回变化,使得资源分配反复变化。通过上述方式,可以快速的确定出高优先级对哪种资源敏感,从而为资源的配置提供准确的信息,提高了资源分配效率和准确性。At block 403, the memory bandwidth MBW is compared with the threshold C. If it is determined that the memory bandwidth MBW is less than the threshold C, then at block 406 it is determined that the high-priority first task is not sensitive to the memory bandwidth. If it is determined that the memory bandwidth MBW is greater than or equal to the threshold C, then at block 407 it is determined that the high priority first task is sensitive to the memory bandwidth MBW. Alternatively or additionally, when the setting is not sensitive to a type of resource, wait for the condition to be established multiple times in succession to take effect, and prevent ping-pong phenomenon, which refers to the difference between different results due to different results of each test. Changes back and forth, causing resource allocation to change repeatedly. Through the above method, it is possible to quickly determine which resources high priority is sensitive to, thereby providing accurate information for resource configuration and improving resource allocation efficiency and accuracy.
上面结合图4描述了根据本申请的一些实施例的用于确定资源敏感性的过程的示意流程图。计算设备在获取在第一核上运行的高优先级的第一任务的基线性能和未进行压制的第一任务的性能参数的实际值。然后基于基线值和实际值,动态地调整共享资源的上限。通过该方式可以准确的对共享资源的配置进行调整。在一些实施例中,如果确定基线值和实际值之间的差值超过第一阈值,计算设备就减少被分配给其他核上运行的低优先级的第三任务的共享资源。如果确定基线值和实际值之间的差值低于第二阈值,增加被分配给第三任务的共享资源,其中第二阈值低于第一阈值。通过上述方式,可以准确地为高优先级任务和低优先级任务配置合适的资源,在保证高优先级任务的处理效率的同时,还保证了低优先级任务的执行,提高了资源利用率。下面结合图5进一步描述用于动态分配资源的过程。A schematic flowchart of a process for determining resource sensitivity according to some embodiments of the present application is described above in conjunction with FIG. 4 . The computing device obtains the baseline performance of the high-priority first task running on the first core and the actual value of the performance parameter of the first task that is not suppressed. Then dynamically adjust the upper limit of shared resources based on the baseline value and actual value. In this way, the configuration of shared resources can be accurately adjusted. In some embodiments, if it is determined that the difference between the baseline value and the actual value exceeds the first threshold, the computing device reduces the shared resources allocated to the low-priority third task running on other cores. If it is determined that the difference between the baseline value and the actual value is lower than a second threshold, the shared resources allocated to the third task are increased, wherein the second threshold is lower than the first threshold. Through the above method, appropriate resources can be accurately configured for high-priority tasks and low-priority tasks. While ensuring the processing efficiency of high-priority tasks, it also ensures the execution of low-priority tasks and improves resource utilization. The process for dynamically allocating resources is further described below in conjunction with FIG. 5 .
图5示出了根据本申请的一些实施例的用于动态分配资源的过程的示意流程图。在运行低优先级任务时会设置初始状态,为其分配一定的共享资源,例如设置LLC为2路(way),存储器带宽为10%。上述示例仅是用于描述本公开,而非对本公开的具体限定。Figure 5 shows a schematic flowchart of a process for dynamically allocating resources according to some embodiments of the present application. When running a low-priority task, the initial state is set and certain shared resources are allocated to it, for example, the LLC is set to 2-way (way) and the memory bandwidth is 10%. The above examples are only used to describe the present disclosure, but not to specifically limit the present disclosure.
在框501,计算设备将采集到针对高优先级的第一任务的指标实际值与基线性能进行比较,例如对性能指标APKI或MBW的实际值和基线值进行比较。在框502处,如果采集到的指标比基线性能差,例如两者的差值超过阈值E,则在框508处,降低分配给低优先级的任务的资源,例如减半分配给低优先级任务的资源或降低一定比例。备选地或附加地,可选地还包括框505,对上述过程可以连续多次进行比较,只有在多次比较中有阈值次数的差值超过阈值E时才进行操作508。In block 501, the computing device compares the actual value of the collected indicator for the high-priority first task with the baseline performance, for example, compares the actual value of the performance indicator APKI or MBW with the baseline value. At block 502, if the collected indicator is worse than the baseline performance, for example, the difference between the two exceeds the threshold E, then at block 508, reduce the resources allocated to low-priority tasks, for example, halve the resources allocated to low-priority tasks. The resources of the task may be reduced by a certain percentage. Alternatively or additionally, block 505 is optionally included. The above process may be compared continuously multiple times, and operation 508 is performed only when a threshold number of differences exceeds the threshold E in multiple comparisons.
在框503处,如果采集到的指标实际值与基线性能较为接近,例如两次数值之间差值小于阈值F,即 性能劣化小于阈值F,则在框509处调大低优先级任务的资源。备选地或附加地,可选地还包括框506,对上述过程可以连续多次进行比较,只有在多次比较中有阈值次数的差值小于阈值F才进行操作509。在框504中,将不满足上述两个条件的情况确定为其他情况。在为其他情况时,在框507处,保持原来的配置不变。At block 503, if the actual value of the collected indicator is close to the baseline performance, for example, the difference between the two values is less than the threshold F, that is, If the performance degradation is less than the threshold F, then increase the resources of the low-priority task at block 509. Alternatively or additionally, block 506 is optionally included. The above process may be compared multiple times continuously, and operation 509 is performed only if the difference of a threshold number of times among the multiple comparisons is less than the threshold F. In block 504, situations that do not satisfy the above two conditions are determined as other situations. In other cases, at block 507, the original configuration remains unchanged.
此外,在动态调整资源分配时,若低优先级任务资源已被限制至初始值,高优先级任务的性能劣化仍然超过阈值(劣化程度不可接受),且连续多次,此时可对低优先级任务的CPU带宽进行限制(即限制CPU资源的上限),若限制后性能劣化仍较为严重,可向集群中心请求进行低优先级任务迁移。In addition, when dynamically adjusting resource allocation, if the low-priority task resources have been limited to the initial value, and the performance degradation of the high-priority task still exceeds the threshold (the degree of degradation is unacceptable), and continues multiple times, then the low-priority task can be Limit the CPU bandwidth of high-level tasks (that is, limit the upper limit of CPU resources). If the performance degradation is still serious after the limit, you can request the cluster center to migrate low-priority tasks.
通过上述方式,可以准确地为高优先级任务和低优先级任务配置合适的资源,在保证高优先级任务的处理效率的同时,还保证了低优先级任务的执行,提高了资源利用率。Through the above method, appropriate resources can be accurately configured for high-priority tasks and low-priority tasks. While ensuring the processing efficiency of high-priority tasks, it also ensures the execution of low-priority tasks and improves resource utilization.
下面结合图6进一步描述本发明计算设备的实现示例的示意图。计算设备601包括任务层601、软件层605和硬件层613。任务层601用于获取任务,并将任务配置成高优先级任务602和低优先级任务604。任务的优先级通过优先级标签603实现。A schematic diagram of an implementation example of the computing device of the present invention is further described below with reference to FIG. 6 . Computing device 601 includes task layer 601, software layer 605, and hardware layer 613. The task layer 601 is used to obtain tasks and configure the tasks into high-priority tasks 602 and low-priority tasks 604 . The priority of the task is implemented through priority tag 603.
高优先级任务或低优先级任务通过软件层605实现在硬件层613上的执行。软件层605包括CPU调度优化模块606,用于实现一个物理核内的资源的管理,例如优先处理高优先级任务并进行逻辑核隔离。CPU调度优化模块606包括单核压制模块608和逻辑核隔离模块609以实现上述功能。软件层605还包括片上共享资源管理模块607,其用于实现管理片上的核间共享资源。片上共享资源管理模块607包括数据采集模块610、资源敏感性分类器611和资源控制器612,以用于实现在核间调节共享资源。High-priority tasks or low-priority tasks are executed on the hardware layer 613 through the software layer 605 . The software layer 605 includes a CPU scheduling optimization module 606, which is used to manage resources within a physical core, such as prioritizing high-priority tasks and isolating logical cores. The CPU scheduling optimization module 606 includes a single core suppression module 608 and a logical core isolation module 609 to implement the above functions. The software layer 605 also includes an on-chip shared resource management module 607, which is used to manage inter-core shared resources on the chip. The on-chip shared resource management module 607 includes a data collection module 610, a resource sensitivity classifier 611 and a resource controller 612, for adjusting shared resources between cores.
硬件层包括片上资源614和资源调配技术(Resource Director Technology,RDT)/存储器系统资源分区和监控(Memory System Resource Partitioning and Monitoring,MPAM),以用于实现任务的执行。片上资源614可以为CPU。另外,片上资源614还包括性能监控单元616。性能监控单元616和RDT/MPAM用于为数据采集模块610提供性能参数。The hardware layer includes on-chip resources 614 and Resource Director Technology (RDT)/Memory System Resource Partitioning and Monitoring (MPAM) to implement task execution. On-chip resource 614 may be a CPU. In addition, the on-chip resources 614 also include a performance monitoring unit 616. The performance monitoring unit 616 and RDT/MPAM are used to provide performance parameters to the data acquisition module 610.
图7进一步示出了根据本申请实施例的用于处理任务的装置700的框图,装置700可以包括多个模块,以用于执行如图2中所讨论的过程200中的对应步骤。如图7所示,装置700包括任务确定单元701,被配置为确定要由处理资源中的物理核中的第一逻辑核执行的第一任务;优先级确定单元702,被配置为确定第一任务是否是预定优先级;执行确定单元703,被配置为如果确定第一任务是预定优先级,确定物理核中的第二逻辑核是否执行预定优先级的第二任务;以及分配单元704,被配置为如果确定第二逻辑核未执行预定优先级的第二任务,向第二逻辑核分配包括空指令的专用任务。FIG. 7 further shows a block diagram of an apparatus 700 for processing a task according to an embodiment of the present application. The apparatus 700 may include a plurality of modules for performing corresponding steps in the process 200 discussed in FIG. 2 . As shown in Figure 7, the apparatus 700 includes a task determining unit 701 configured to determine a first task to be executed by a first logical core among physical cores in the processing resource; a priority determining unit 702 configured to determine the first whether the task is of a predetermined priority level; the execution determination unit 703 is configured to, if it is determined that the first task is of a predetermined priority level, determine whether the second logical core in the physical core executes the second task of the predetermined priority level; and the allocation unit 704 is It is configured to allocate a dedicated task including a null instruction to the second logical core if it is determined that the second logical core does not execute the second task of the predetermined priority.
在一些实施例中,其中任务确定单元701包括:就绪队列确定单元,被配置为获取针对第一逻辑核的任务就绪队列,任务就绪队列是基于任务的优先级的有序队列;第一任务获得单元,被配置为从任务就绪队列获得第一任务。In some embodiments, the task determining unit 701 includes: a ready queue determining unit configured to obtain a task ready queue for the first logical core, where the task ready queue is an ordered queue based on the priority of the task; the first task obtains Unit, configured to obtain the first task from the task ready queue.
在一些实施例中,其中第一任务获得单元,包括:选取单元,被配置为根据优先级从任务就绪队列的头部选取就绪任务;执行确定单元,被配置为确定第一逻辑核是否正在执行当前任务。优先级比较单元,被配置为如果确定第一逻辑核正在执行当前任务,比较就绪任务的优先级和当前任务的优先级。替换单元,被配置为如果确定就绪任务的优先级高于当前任务的优先级,将就绪任务确定为第一任务以用于替换当前任务的执行。In some embodiments, the first task acquisition unit includes: a selection unit configured to select a ready task from the head of the task ready queue according to priority; an execution determination unit configured to determine whether the first logical core is executing current task. The priority comparison unit is configured to compare the priority of the ready task with the priority of the current task if it is determined that the first logical core is executing the current task. The replacement unit is configured to determine the ready task as the first task for replacing execution of the current task if it is determined that the priority of the ready task is higher than the priority of the current task.
在一些实施例中,装置700还包括:任务获取单元,被配置为获取被分配给第一逻辑核的分配任务和针对分配任务的对应优先级;以及添加单元,被配置为基于对应优先级来将分配任务添加到任务就绪队列中。In some embodiments, the apparatus 700 further includes: a task acquisition unit configured to acquire an allocation task allocated to the first logical core and a corresponding priority for the allocation task; and an adding unit configured to obtain the allocation task based on the corresponding priority. Add the assigned task to the task ready queue.
在一些实施例中,其中预定优先级是第一预定优先级,装置700还包括:第一任务执行单元,被配置为如果确定第一任务是第二预定优先级,使得第一逻辑核执行第一任务,第二预定优先级低于第一预定优先级。In some embodiments, where the predetermined priority is the first predetermined priority, the apparatus 700 further includes: a first task execution unit configured to cause the first logical core to execute the first task if it is determined that the first task is the second predetermined priority. A task with a second predetermined priority lower than the first predetermined priority.
在一些实施例中,执行确定单元包括:第二任务执行判定单元,被配置为如果确定第一任务是预定优先级,确定第二逻辑核是否正在执行第二任务;第二任务优先级判定单元,被配置为如果确定第二逻辑核正在执行第二任务,确定第二任务的优先级是否是预定优先级。In some embodiments, the execution determination unit includes: a second task execution determination unit configured to determine whether the second logical core is executing the second task if it is determined that the first task is of a predetermined priority; a second task priority determination unit , configured to determine whether the priority of the second task is a predetermined priority if it is determined that the second logical core is executing the second task.
在一些实施例中,其中物理核是第一物理核,处理资源还包括第二物理核,预定优先级是第一预定优先级,装置700还包括:基线值获取单元,被配置为获取与第一任务有关的性能参数的基线值;共享资源调整单元,被配置为基于基线值来调整可分配给第二物理核上的第三任务的共享资源,第三任务具有低于 第一预定优先级的第二预定优先级。In some embodiments, where the physical core is a first physical core, the processing resource further includes a second physical core, and the predetermined priority is the first predetermined priority, the apparatus 700 further includes: a baseline value acquisition unit configured to obtain the a baseline value of a task-related performance parameter; a shared resource adjustment unit configured to adjust shared resources that can be allocated to a third task on the second physical core based on the baseline value, and the third task has a value lower than A second predetermined priority level to the first predetermined priority level.
在一些实施例中,其中基线值获取单元包括:压制单元,被配置为压制第二物理核对第三任务的执行;以及基线值确定单元,被配置为基于对第三任务的压制,确定基线值。In some embodiments, the baseline value acquisition unit includes: a suppression unit configured to suppress the execution of the third task by the second physical check; and a baseline value determination unit configured to determine the baseline value based on the suppression of the third task. .
在一些实施例中,压制单元包括限制单元,被配置为限制被分配给第三任务的共享资源的上限;或者暂停单元,被配置为暂停第三任务的执行。In some embodiments, the suppressing unit includes a limiting unit configured to limit an upper limit of shared resources allocated to the third task; or a suspending unit configured to suspend execution of the third task.
在一些实施例中,压制单元包括以预定的时间间隔多次压制第二物理核对第三任务的执行;并且基线值确定单元包括基于对第三任务的多次压制,来确定多个基线值。In some embodiments, the suppressing unit includes suppressing the execution of the third task by suppressing the second physical check multiple times at predetermined time intervals; and the baseline value determining unit includes determining multiple baseline values based on suppressing the third task multiple times.
在一些实施例中,其中共享资源调整单元包括:敏感性确定单元,被配置为基于性能参数的基线值,确定第一任务对共享资源的敏感性。可分配资源调整单元,被配置为基于敏感性,调整可分配给第三任务的共享资源。In some embodiments, the shared resource adjustment unit includes: a sensitivity determining unit configured to determine the sensitivity of the first task to the shared resource based on a baseline value of the performance parameter. The allocable resource adjustment unit is configured to adjust shared resources that can be allocated to the third task based on sensitivity.
在一些实施例中,性能参数包括以下至少一项:每千条指令的缓存访问量、缓存未命中率和存储器带宽,其中敏感性确定单元被配置为执行以下至少一项:如果确定每千条指令的缓存访问量低于阈值缓存访问量或者缓存未命中率大于阈值缓存未命中率,确定第一任务对缓存不敏感;如果确定每千条指令的缓存访问量高于或等于阈值缓存访问量并且缓存未命中率小于或等于阈值缓存未命中率,确定第一任务对缓存敏感;如果确定存储器带宽小于阈值带宽,确定第一任务对存储器带宽不敏感;以及如果确定存储器带宽大于或等于阈值带宽,确定第一任务对存储器带宽敏感。In some embodiments, the performance parameter includes at least one of the following: cache accesses per thousand instructions, cache miss rate, and memory bandwidth, wherein the sensitivity determination unit is configured to perform at least one of the following: if it is determined that per thousand instructions If the cache access amount of the instruction is lower than the threshold cache access amount or the cache miss rate is greater than the threshold cache miss rate, it is determined that the first task is not cache-sensitive; if it is determined that the cache access amount per thousand instructions is higher than or equal to the threshold cache access amount and the cache miss rate is less than or equal to the threshold cache miss rate, the first task is determined to be cache sensitive; if the memory bandwidth is determined to be less than the threshold bandwidth, the first task is determined to be insensitive to the memory bandwidth; and if the memory bandwidth is determined to be greater than or equal to the threshold bandwidth , determine that the first task is sensitive to memory bandwidth.
在一些实施例中,可分配资源调整单元包括:第一增加单元,被配置为如果确定第一任务对共享资源不敏感,增加用于第三任务的共享资源的上限;以及动态调整单元,被配置为如果确定第一任务对共享资源敏感,动态调整用于第三任务的共享资源的上限。In some embodiments, the allocable resource adjustment unit includes: a first increasing unit configured to increase the upper limit of shared resources for the third task if it is determined that the first task is not sensitive to the shared resources; and a dynamic adjustment unit configured to increase the upper limit of the shared resources for the third task. It is configured to dynamically adjust the upper limit of shared resources used for the third task if it is determined that the first task is sensitive to the shared resources.
在一些实施例中,其中动态调整单元包括:实际值获取单元,被配置为获取与第一任务有关的性能参数的实际值;以及上限调整单元,被配置为基于基线值和实际值,动态地调整共享资源的上限。In some embodiments, the dynamic adjustment unit includes: an actual value acquisition unit configured to acquire an actual value of a performance parameter related to the first task; and an upper limit adjustment unit configured to dynamically adjust the performance parameter based on the baseline value and the actual value. Adjust the upper limit of shared resources.
在一些实施例中,上限调整单元包括:减少单元,被配置为如果确定基线值和实际值之间的差值超过第一阈值,减少被分配给第三任务的共享资源;以及第二增加单元,被配置为如果确定基线值和实际值之间的差值低于第二阈值,增加被分配给第三任务的共享资源,其中第二阈值低于第一阈值。In some embodiments, the upper limit adjustment unit includes: a reduction unit configured to reduce the shared resource allocated to the third task if it is determined that the difference between the baseline value and the actual value exceeds the first threshold; and a second increase unit , configured to increase the shared resource allocated to the third task if it is determined that the difference between the baseline value and the actual value is lower than a second threshold, wherein the second threshold is lower than the first threshold.
在一些实施例中,其中共享资源包括最后一级缓存LLC和存储器带宽中的至少一项。In some embodiments, the shared resource includes at least one of last level cache LLC and memory bandwidth.
图8示出了可以用来实施本申请内容的实施例的示例设备800的示意性框图。例如,根据本申请实施例的计算设备101和601可由示例设备800来实施。如图所示,设备800包括中央处理单元(CPU)801,其可以根据存储在只读存储器(ROM)802中的计算机程序指令或者从存储单元808加载到随机访问存储器(RAM)803中的计算机程序指令,来执行各种适当的动作和处理。在RAM 803中,还可存储设备800操作所需的各种程序和数据。CPU 801、ROM 802以及RAM 803通过总线804彼此相连。输入/输出(I/O)接口805也连接至总线804。8 illustrates a schematic block diagram of an example device 800 that may be used to implement embodiments of the present disclosure. For example, computing devices 101 and 601 according to embodiments of the present application may be implemented by example device 800. As shown, the device 800 includes a central processing unit (CPU) 801 that can operate on a computer in accordance with computer program instructions stored in a read-only memory (ROM) 802 or loaded from a storage unit 808 into a random access memory (RAM) 803 Program instructions to perform various appropriate actions and processes. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. CPU 801, ROM 802 and RAM 803 are connected to each other via bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
设备800中的多个部件连接至I/O接口805,包括:输入单元806,例如键盘、鼠标等;输出单元807,例如各种类型的显示器、扬声器等;存储单元808,例如磁盘、光盘等;以及通信单元809,例如网卡、调制解调器、无线通信收发机等。通信单元809允许设备800通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Multiple components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, etc.; an output unit 807, such as various types of displays, speakers, etc.; a storage unit 808, such as a magnetic disk, optical disk, etc. ; and communication unit 809, such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices through computer networks such as the Internet and/or various telecommunications networks.
上文所描述的各个过程和处理,例如过程200、400和500,可由处理单元801执行。例如,在一些实施例中,过程200、400和500可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元808。在一些实施例中,计算机程序的部分或者全部可以经由ROM 802和/或通信单元809而被载入和/或安装到设备800上。当计算机程序被加载到RAM 803并由CPU 801执行时,可以执行上文描述的过程200、400和500的一个或多个动作。The various processes and processes described above, such as processes 200, 400, and 500, may be performed by processing unit 801. For example, in some embodiments, processes 200, 400, and 500 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When a computer program is loaded into RAM 803 and executed by CPU 801, one or more actions of processes 200, 400, and 500 described above may be performed.
本申请可以是方法、装置、系统、芯片和/或计算机程序产品。芯片可以包括处理单元和通信接口,处理单元可以处理从通信接口接收到的程序指令。计算机程序产品可以包括计算机可读存储介质,其上载有用于执行本申请的各个方面的计算机可读程序指令。The application may be a method, device, system, chip and/or computer program product. The chip may include a processing unit and a communication interface, and the processing unit may process program instructions received from the communication interface. A computer program product may include a computer-readable storage medium having thereon computer-readable program instructions for performing various aspects of the present application.
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静 态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。Computer-readable storage media may be tangible devices that can retain and store instructions for use by an instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) or flash memory), static State-of-the-art random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory sticks, floppy disks, mechanical coding devices such as punched cards or recessed cards with instructions stored on them The protruding structure in the groove, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or through electrical wires. transmitted electrical signals.
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to various computing/processing devices, or to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage on a computer-readable storage medium in the respective computing/processing device .
用于执行本申请操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本申请的各个方面。Computer program instructions for performing the operations of this application may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages. Source code or object code written in any combination of object-oriented programming languages - such as Smalltalk, C++, etc., and conventional procedural programming languages - such as the "C" language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server implement. In situations involving remote computers, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through the Internet). connect). In some embodiments, by utilizing state information of computer-readable program instructions to personalize an electronic circuit, such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), the electronic circuit can Computer readable program instructions are executed to implement various aspects of the application.
这里参照根据本申请实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本申请的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Various aspects of the present application are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理单元,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理单元执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions may be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus, thereby producing a machine such that the instructions, when executed by a processing unit of the computer or other programmable data processing apparatus, , resulting in an apparatus that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium. These instructions cause the computer, programmable data processing device and/or other equipment to work in a specific manner. Therefore, the computer-readable medium storing the instructions includes An article of manufacture that includes instructions that implement aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other equipment, causing a series of operating steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executed on a computer, other programmable data processing apparatus, or other equipment to implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
附图中的流程图和框图显示了根据本申请的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that embody one or more elements for implementing the specified logical function(s). Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two consecutive blocks may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved. It will also be noted that each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts. , or can be implemented using a combination of specialized hardware and computer instructions.
以上已经描述了本申请的各实施方式,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施方式。在不偏离所说明的各实施方式的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施方式的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其他普通技术人员能理解本文披露的各实施方式。 The various embodiments of the present application have been described above. The above description is illustrative, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles of the various embodiments, practical applications, or improvements to the technology in the market, or to enable other persons of ordinary skill in the art to understand the various embodiments disclosed herein.

Claims (20)

  1. 一种处理任务的方法,其特征在于,所述方法包括:A method for processing tasks, characterized in that the method includes:
    确定要由处理资源中的物理核中的第一逻辑核执行的第一任务;determining a first task to be performed by a first logical core among the physical cores in the processing resource;
    确定所述第一任务是否是预定优先级;Determine whether the first task is a predetermined priority;
    如果确定所述第一任务是所述预定优先级,确定所述物理核中的第二逻辑核是否执行所述预定优先级的第二任务;以及If it is determined that the first task is of the predetermined priority level, determining whether a second logical core in the physical core executes the second task of the predetermined priority level; and
    如果确定所述第二逻辑核未执行所述预定优先级的第二任务,向所述第二逻辑核分配包括空指令的专用任务。If it is determined that the second logical core does not execute the second task of the predetermined priority, a dedicated task including a null instruction is allocated to the second logical core.
  2. 根据权利要求1所述的方法,其特征在于,确定所述第一任务包括:The method of claim 1, wherein determining the first task includes:
    获取针对所述第一逻辑核的任务就绪队列,所述任务就绪队列是基于任务的优先级的有序队列;Obtain a task ready queue for the first logical core, where the task ready queue is an ordered queue based on the priority of the task;
    从所述任务就绪队列获得所述第一任务。Obtain the first task from the task ready queue.
  3. 根据权利要求2所述的方法,其特征在于,获得所述第一任务包括:The method of claim 2, wherein obtaining the first task includes:
    根据优先级从所述任务就绪队列的头部选取就绪任务;以及Select ready tasks from the head of the task ready queue according to priority; and
    确定所述第一逻辑核是否正在执行当前任务;Determine whether the first logical core is executing the current task;
    如果确定所述第一逻辑核正在执行所述当前任务,比较所述就绪任务的优先级和所述当前任务的优先级;以及If it is determined that the first logical core is executing the current task, compare the priority of the ready task with the priority of the current task; and
    如果确定所述就绪任务的优先级高于所述当前任务的优先级,将所述就绪任务确定为所述第一任务以用于替换所述当前任务的执行。If it is determined that the priority of the ready task is higher than the priority of the current task, the ready task is determined as the first task for replacing the execution of the current task.
  4. 根据权利要求2所述的方法,其特征在于,所述方法还包括:The method of claim 2, further comprising:
    获取被分配给所述第一逻辑核的分配任务和针对所述分配任务的对应优先级;以及Obtaining an allocation task allocated to the first logical core and a corresponding priority for the allocation task; and
    基于所述对应优先级来将所述分配任务添加到所述任务就绪队列中。The assigned task is added to the task ready queue based on the corresponding priority.
  5. 根据权利要求1所述的方法,其特征在于,所述预定优先级是第一预定优先级,所述方法还包括:The method according to claim 1, wherein the predetermined priority is a first predetermined priority, and the method further includes:
    如果确定所述第一任务是第二预定优先级,使得所述第一逻辑核执行所述第一任务,所述第二预定优先级低于所述第一预定优先级。If it is determined that the first task is of a second predetermined priority level, the first logical core is caused to execute the first task, and the second predetermined priority level is lower than the first predetermined priority level.
  6. 根据权利要求1所述的方法,其特征在于,确定所述物理核中的第二逻辑核是否执行所述预定优先级的第二任务包括:The method of claim 1, wherein determining whether the second logical core in the physical core executes the second task of the predetermined priority includes:
    如果确定所述第一任务是预定优先级,确定所述第二逻辑核是否正在执行所述第二任务;If it is determined that the first task is of a predetermined priority, determine whether the second logical core is executing the second task;
    如果确定所述第二逻辑核正在执行所述第二任务,确定所述第二任务的优先级是否是所述预定优先级。If it is determined that the second logical core is executing the second task, it is determined whether the priority of the second task is the predetermined priority.
  7. 根据权利要求1所述的方法,其特征在于所述物理核是第一物理核,所述处理资源还包括第二物理核,所述预定优先级是第一预定优先级,所述方法还包括:The method of claim 1, wherein the physical core is a first physical core, the processing resource further includes a second physical core, the predetermined priority is a first predetermined priority, and the method further includes :
    获取与所述第一任务有关的性能参数的基线值;Obtain baseline values of performance parameters related to the first task;
    基于所述基线值来调整可分配给第二物理核上的第三任务的共享资源,所述第三任务具有低于所述第一预定优先级的第二预定优先级。Shared resources allocable to a third task on a second physical core are adjusted based on the baseline value, the third task having a second predetermined priority lower than the first predetermined priority.
  8. 根据权利要求7所述方法,其特征在于,获取所述基线值包括:The method according to claim 7, wherein obtaining the baseline value includes:
    压制所述第二物理核对所述第三任务的执行;以及基于对所述第三任务的所述压制,确定所述基线值。suppressing performance of the third task by the second physical check; and determining the baseline value based on the suppression of the third task.
  9. 根据权利要求8所述的方法,其特征在于,压制所述第三任务的执行包括:The method according to claim 8, characterized in that suppressing the execution of the third task includes:
    限制被分配给所述第三任务的所述共享资源的上限;或者Limit the upper limit of the shared resources allocated to the third task; or
    暂停所述第三任务的执行。Suspend the execution of the third task.
  10. 根据权利要求8所述的方法,其特征在于,The method according to claim 8, characterized in that:
    压制所述第三任务的执行包括:以预定的时间间隔多次压制所述第二物理核对所述第三任务的执行;Suppressing the execution of the third task includes: suppressing the execution of the third task by the second physical check multiple times at predetermined time intervals;
    并且确定所述基线值包括:基于对所述第三任务的多次压制,来确定多个基线值。And determining the baseline value includes: determining multiple baseline values based on multiple suppressions of the third task.
  11. 根据权利要求7所述的方法,其特征在于,调整可分配给第二物理核上的第三任务的共享资源包括:The method of claim 7, wherein adjusting shared resources that can be allocated to the third task on the second physical core includes:
    基于所述性能参数的基线值,确定所述第一任务对共享资源的敏感性;以及determining the sensitivity of the first task to shared resources based on the baseline value of the performance parameter; and
    基于所述敏感性,调整可分配给所述第三任务的共享资源。Based on the sensitivity, shared resources allocable to the third task are adjusted.
  12. 根据权利要求11所述的方法,其特征在于,所述性能参数包括以下至少一项:每千条指令的缓 存访问量、缓存未命中率和存储器带宽,其中确定所述敏感性包括以下至少一项:The method according to claim 11, characterized in that the performance parameters include at least one of the following: cache per thousand instructions Memory access volume, cache miss rate and memory bandwidth, wherein determining the sensitivity includes at least one of the following:
    如果确定每千条指令的缓存访问量低于阈值缓存访问量或者缓存未命中率大于阈值缓存未命中率,确定所述第一任务对缓存不敏感;If it is determined that the cache access amount per thousand instructions is lower than the threshold cache access amount or the cache miss rate is greater than the threshold cache miss rate, it is determined that the first task is not sensitive to cache;
    如果确定每千条指令的缓存访问量高于或者等于所述阈值缓存访问量并且缓存未命中率小于或者等于所述阈值缓存未命中率,确定所述第一任务对缓存敏感;If it is determined that the cache access amount per thousand instructions is higher than or equal to the threshold cache access amount and the cache miss rate is less than or equal to the threshold cache miss rate, it is determined that the first task is cache-sensitive;
    如果确定存储器带宽小于阈值带宽,确定所述第一任务对存储器带宽不敏感;以及If the memory bandwidth is determined to be less than the threshold bandwidth, determining that the first task is not sensitive to memory bandwidth; and
    如果确定存储器带宽大于或者等于所述阈值带宽,确定所述第一任务对存储器带宽敏感。If it is determined that the memory bandwidth is greater than or equal to the threshold bandwidth, it is determined that the first task is sensitive to the memory bandwidth.
  13. 根据权利要求11所述的方法,其特征在于,调整所述共享资源包括:The method according to claim 11, characterized in that adjusting the shared resources includes:
    如果确定所述第一任务对所述共享资源不敏感,增加用于所述第三任务的所述共享资源的上限;以及If it is determined that the first task is not sensitive to the shared resource, increasing the upper limit of the shared resource for the third task; and
    如果确定所述第一任务对所述共享资源敏感,动态调整用于所述第三任务的所述共享资源的上限。If it is determined that the first task is sensitive to the shared resource, dynamically adjust the upper limit of the shared resource used for the third task.
  14. 根据权利要求13所述的方法,其特征在于,动态调整所述共享资源的上限包括:The method according to claim 13, characterized in that dynamically adjusting the upper limit of the shared resources includes:
    获取与所述第一任务有关的所述性能参数的实际值;以及Obtain the actual value of the performance parameter related to the first task; and
    基于所述基线值和所述实际值,动态地调整所述共享资源的上限。Based on the baseline value and the actual value, the upper limit of the shared resource is dynamically adjusted.
  15. 根据权利要求14所述的方法,其特征在于基于所述基线值和所述实际值,动态地调整所述共享资源的上限包括:The method according to claim 14, characterized in that dynamically adjusting the upper limit of the shared resources based on the baseline value and the actual value includes:
    如果确定所述基线值和所述实际值之间的差值超过第一阈值,减少被分配给所述第三任务的所述共享资源;以及If it is determined that the difference between the baseline value and the actual value exceeds a first threshold, reducing the shared resource allocated to the third task; and
    如果确定所述基线值和所述实际值之间的差值低于第二阈值,增加被分配给所述第三任务的所述共享资源,其中所述第二阈值低于所述第一阈值。If it is determined that the difference between the baseline value and the actual value is lower than a second threshold, increasing the shared resource allocated to the third task, wherein the second threshold is lower than the first threshold .
  16. 根据权利要求7所述的方法,其特征在于所述共享资源包括最后一级缓存LLC和存储器带宽中的至少一项。The method of claim 7, wherein the shared resource includes at least one of last level cache LLC and memory bandwidth.
  17. 一种用于处理任务的装置,其特征在于,所述装置包括:A device for processing tasks, characterized in that the device includes:
    任务确定单元,被配置为确定要由处理资源中的物理核中的第一逻辑核执行的第一任务;a task determination unit configured to determine a first task to be executed by a first logical core among the physical cores in the processing resource;
    优先级确定单元,被配置为确定所述第一任务是否是预定优先级;a priority determination unit configured to determine whether the first task is a predetermined priority;
    执行确定单元,被配置为如果确定所述第一任务是所述预定优先级,确定所述物理核中的第二逻辑核是否执行所述预定优先级的第二任务;以及an execution determination unit configured to determine whether a second logical core in the physical core executes a second task of the predetermined priority if it is determined that the first task is of the predetermined priority; and
    分配单元,被配置为如果确定所述第二逻辑核未执行所述预定优先级的第二任务,向所述第二逻辑核分配包括空指令的专用任务。and an allocation unit configured to allocate a dedicated task including a null instruction to the second logical core if it is determined that the second logical core does not execute the second task of the predetermined priority.
  18. 一种电子设备,包括:An electronic device including:
    至少一个计算单元;at least one computing unit;
    至少一个存储器,所述至少一个存储器被耦合到所述至少一个计算单元并且存储用于由所述至少一个计算单元执行的指令,所述指令当由所述至少一个计算单元执行时,使得所述设备执行根据权利要求1-16中任一项所述的方法。At least one memory coupled to the at least one computing unit and storing instructions for execution by the at least one computing unit, the instructions, when executed by the at least one computing unit, cause the The device performs the method according to any one of claims 1-16.
  19. 一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现根据权利要求1-16中任一项所述的方法。A computer-readable storage medium having a computer program stored thereon, which implements the method according to any one of claims 1-16 when executed by a processor.
  20. 一种计算机程序产品,包括计算机可执行指令,其中所述计算机可执行指令被处理器执行时实现根据权利要求1-16中任一项所述的方法。 A computer program product comprising computer-executable instructions, wherein the computer-executable instructions when executed by a processor implement the method according to any one of claims 1-16.
PCT/CN2023/112749 2022-08-24 2023-08-11 Method and apparatus for processing task, and device and storage medium WO2024041401A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211021692.9 2022-08-24
CN202211021692.9A CN117667324A (en) 2022-08-24 2022-08-24 Method, apparatus, device and storage medium for processing tasks

Publications (1)

Publication Number Publication Date
WO2024041401A1 true WO2024041401A1 (en) 2024-02-29

Family

ID=90012524

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/112749 WO2024041401A1 (en) 2022-08-24 2023-08-11 Method and apparatus for processing task, and device and storage medium

Country Status (2)

Country Link
CN (1) CN117667324A (en)
WO (1) WO2024041401A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210055958A1 (en) * 2019-08-22 2021-02-25 Intel Corporation Technology For Dynamically Grouping Threads For Energy Efficiency
CN112698920A (en) * 2021-01-08 2021-04-23 北京三快在线科技有限公司 Container task scheduling method and device, electronic equipment and computer readable medium
CN112749002A (en) * 2019-10-29 2021-05-04 北京京东尚科信息技术有限公司 Method and device for dynamically managing cluster resources
CN112783659A (en) * 2021-02-01 2021-05-11 北京百度网讯科技有限公司 Resource allocation method and device, computer equipment and storage medium
US20210334133A1 (en) * 2020-04-27 2021-10-28 International Business Machines Corporation Adjusting a dispatch ratio for multiple queues
CN114138428A (en) * 2021-10-18 2022-03-04 阿里巴巴(中国)有限公司 SLO (Simultaneous task oriented) guaranteeing method, device, node and storage medium for multi-priority tasks

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210055958A1 (en) * 2019-08-22 2021-02-25 Intel Corporation Technology For Dynamically Grouping Threads For Energy Efficiency
CN112749002A (en) * 2019-10-29 2021-05-04 北京京东尚科信息技术有限公司 Method and device for dynamically managing cluster resources
US20210334133A1 (en) * 2020-04-27 2021-10-28 International Business Machines Corporation Adjusting a dispatch ratio for multiple queues
CN112698920A (en) * 2021-01-08 2021-04-23 北京三快在线科技有限公司 Container task scheduling method and device, electronic equipment and computer readable medium
CN112783659A (en) * 2021-02-01 2021-05-11 北京百度网讯科技有限公司 Resource allocation method and device, computer equipment and storage medium
CN114138428A (en) * 2021-10-18 2022-03-04 阿里巴巴(中国)有限公司 SLO (Simultaneous task oriented) guaranteeing method, device, node and storage medium for multi-priority tasks

Also Published As

Publication number Publication date
CN117667324A (en) 2024-03-08

Similar Documents

Publication Publication Date Title
US10530846B2 (en) Scheduling packets to destination virtual machines based on identified deep flow
US6986137B1 (en) Method, system and program products for managing logical processors of a computing environment
US6587938B1 (en) Method, system and program products for managing central processing unit resources of a computing environment
US6651125B2 (en) Processing channel subsystem pending I/O work queues based on priorities
US7051188B1 (en) Dynamically redistributing shareable resources of a computing environment to manage the workload of that environment
US8510747B2 (en) Method and device for implementing load balance of data center resources
US6519660B1 (en) Method, system and program products for determining I/O configuration entropy
CA2382017C (en) Workload management in a computing environment
US7007276B1 (en) Method, system and program products for managing groups of partitions of a computing environment
AU2013206117B2 (en) Hierarchical allocation of network bandwidth for quality of service
WO2016078178A1 (en) Virtual cpu scheduling method
JP5370946B2 (en) Resource management method and computer system
US20200174844A1 (en) System and method for resource partitioning in distributed computing
WO2017010922A1 (en) Allocation of cloud computing resources
CN111247515A (en) Apparatus and method for providing a performance-based packet scheduler
US20230055813A1 (en) Performing resynchronization jobs in a distributed storage system based on a parallelism policy
US7568052B1 (en) Method, system and program products for managing I/O configurations of a computing environment
US9547576B2 (en) Multi-core processor system and control method
US20190044832A1 (en) Technologies for optimized quality of service acceleration
WO2024041401A1 (en) Method and apparatus for processing task, and device and storage medium
Ru et al. Providing fairer resource allocation for multi-tenant cloud-based systems
Komarasamy et al. Deadline constrained adaptive multilevel scheduling system in cloud environment
CN104899098B (en) A kind of vCPU dispatching method based on shared I/O virtualized environment
CN116893893B (en) Virtual machine scheduling method and device, electronic equipment and storage medium
US11729119B2 (en) Dynamic queue management of network traffic

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23856491

Country of ref document: EP

Kind code of ref document: A1