CN117667324A - Method, apparatus, device and storage medium for processing tasks - Google Patents

Method, apparatus, device and storage medium for processing tasks Download PDF

Info

Publication number
CN117667324A
CN117667324A CN202211021692.9A CN202211021692A CN117667324A CN 117667324 A CN117667324 A CN 117667324A CN 202211021692 A CN202211021692 A CN 202211021692A CN 117667324 A CN117667324 A CN 117667324A
Authority
CN
China
Prior art keywords
task
priority
determined
core
predetermined priority
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211021692.9A
Other languages
Chinese (zh)
Inventor
孙东旭
朱科潜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202211021692.9A priority Critical patent/CN117667324A/en
Priority to PCT/CN2023/112749 priority patent/WO2024041401A1/en
Publication of CN117667324A publication Critical patent/CN117667324A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The application provides a method, a device, equipment, a storage medium and a program product for processing tasks, and relates to the technical field of task management. The method includes determining a first task to be performed by a first logical core of the physical cores in the processing resource, and determining whether the first task is of a predetermined priority. If it is determined that the first task is of a predetermined priority, it is determined whether a second logical core of the physical cores performs a second task of the predetermined priority. If it is determined that the second logical core does not execute the second task of the predetermined priority, a dedicated task including a null instruction is assigned to the second logical core. The embodiment of the application can accelerate the execution of the high-priority task in the processor, improve the processing efficiency of the high-priority task and improve the resource utilization rate of the processor.

Description

Method, apparatus, device and storage medium for processing tasks
Technical Field
Embodiments of the present application relate generally to the field of computers. More particularly, embodiments of the present application relate to methods, apparatuses, devices, and storage media for processing tasks.
Background
With the rapid advancement of computer technology and communication technology, people are increasingly relying on networks and computers to handle various tasks. Thus, there is an explosive increase in the amount of data. In order to manage this data, more and more data centers are presented. These data centers use configured servers in conjunction with a network infrastructure to deliver, accelerate, present, calculate, store, and store various data for users or customers.
The development of data centers has undergone multiple stages, from an initial implementation of the data storage stage to a data processing stage. With the development of cloud technology, the development stage of a cloud data center is entered. With the rapid development of data centers, the number of servers disposed in the data centers is increasing. However, there are many problems to be solved in using these servers to serve clients.
Disclosure of Invention
Embodiments of the present application provide a solution for processing tasks.
According to a first aspect of the present application, a method of processing tasks is provided. The method comprises the following steps: determining a first task to be performed by a first logical core of the physical cores in the processing resource; determining whether the first task is of a predetermined priority; if it is determined that the first task is of a predetermined priority, determining whether a second logical core of the physical cores performs a second task of the predetermined priority; and if it is determined that the second logical core does not perform the second task of the predetermined priority, assigning a dedicated task including a null instruction to the second logical core.
By the method, the execution of the high-priority task in the processor can be quickened, the compression of the high-priority task on the single physical core for the lower-priority task and the interference of the different-priority task on the logic core are eliminated, the execution of the high-priority task in the single physical core is improved, the processing efficiency of the high-priority task is improved, and the resource utilization rate of the processor is increased.
In some embodiments, determining the first task comprises: acquiring a task ready queue aiming at a first logic core, wherein the task ready queue is an ordered queue based on the priority of tasks; the first task is obtained from the task ready queue. By the method, the task to be processed by the logic core can be quickly and accurately acquired, the time for acquiring the task with high priority is reduced, and the processing efficiency is improved.
In some embodiments, obtaining the first task comprises: selecting a ready task from the head of the task ready queue according to the priority; and determining whether the first logical core is executing the current task; if the first logic core is determined to be executing the current task, comparing the priority of the ready task with the priority of the current task; and if the priority of the ready task is determined to be higher than the priority of the current task, determining the ready task as the first task to be used for replacing the execution of the current task. By the method, whether the task in the logic core needs to be replaced or not can be rapidly determined, and the probability of rapidly processing the high-priority task is improved.
In some embodiments, the method further comprises: acquiring an allocation task allocated to a first logic core and a corresponding priority for the allocation task; and adding the assigned task to the task ready queue based on the corresponding priority. By the method, the sequence of the newly allocated tasks in the priority queue can be rapidly determined, so that the high-priority tasks can be processed in time, and the processing efficiency of the high-priority tasks is improved.
In some embodiments, the predetermined priority is a first predetermined priority, the method further comprising: if it is determined that the first task is of a second predetermined priority, causing the first logical core to execute the first task, the second predetermined priority is lower than the first predetermined priority. By the method, the high-priority task can be rapidly executed, and the processing efficiency is improved.
In some embodiments, determining whether a second logical core of the physical cores performs a second task of a predetermined priority comprises: if it is determined that the first task is of a predetermined priority, determining whether the second logical core is executing the second task; if it is determined that the second logical core is executing the second task, it is determined whether the priority of the second task is a predetermined priority. In this way, the efficiency of determining the task priority of another logical core in the physical core can be improved.
In some embodiments, wherein the physical core is a first physical core, the processing resource further comprises a second physical core, the predetermined priority is a first predetermined priority, the method further comprising: obtaining a baseline value of a performance parameter associated with the first task; the shared resources that may be allocated to a third task on the second physical core, the third task having a second predetermined priority that is lower than the first predetermined priority, are adjusted based on the baseline value. By the method, more inter-core shared resources can be allocated for the high-priority task, so that the execution of the high-priority task is quickened, and the processing efficiency of the high-priority task are improved.
In some embodiments, obtaining the baseline value comprises: pressing the second physical check to execute the third task; and determining a baseline value based on the suppressing of the third task. In this way, the base linear energy indicator can be quickly and accurately determined.
In some embodiments, suppressing execution of the third task includes: limiting an upper bound of shared resources allocated to the third task; or to halt execution of the third task. By the method, the baseline performance of the high-priority task can be quickly and accurately acquired.
In some embodiments, suppressing execution of the third task includes: pressing the second physical check at predetermined time intervals a plurality of times to execute the third task; and determining the baseline value comprises: a plurality of baseline values is determined based on the plurality of depressions of the third task. Through the operation for a plurality of times, the vibration or ping-pong phenomenon of the task is avoided.
In some embodiments, adjusting the shared resources that may be allocated to the third task on the second physical core includes: determining a sensitivity of the first task to the shared resource based on the baseline value of the performance parameter; and adjusting the shared resources allocable to the third task based on the sensitivity. By the method, whether the high-priority task is sensitive to the resource can be accurately determined, so that the execution of the low-priority task is ensured while the high-priority task is ensured.
In some embodiments, the performance parameter includes at least one of: the amount of cache accesses per thousand instructions, the cache miss rate, and the memory bandwidth, wherein determining the sensitivity includes at least one of: if the cache access amount of each thousand instructions is determined to be lower than the threshold cache access amount or the cache miss rate is greater than the threshold cache miss rate, determining that the first task is insensitive to the cache; if the cache access amount of each thousand instructions is determined to be greater than or equal to the threshold cache access amount and the cache miss rate is less than or equal to the threshold cache miss rate, determining that the first task is sensitive to the cache; if the memory bandwidth is determined to be less than the threshold bandwidth, determining that the first task is insensitive to the memory bandwidth; and if it is determined that the memory bandwidth is greater than or equal to the threshold bandwidth, determining that the first task is sensitive to the memory bandwidth. By the method, which resource the high priority is sensitive to can be quickly determined, so that accurate information is provided for the configuration of the resource, and the resource allocation efficiency and accuracy are improved.
In some embodiments, adjusting the shared resource includes: if it is determined that the first task is insensitive to the shared resource, increasing an upper bound of the shared resource for the third task; and dynamically adjusting an upper bound of the shared resource for the third task if the first task is determined to be sensitive to the shared resource. By the method, the configuration of the shared resource can be accurately adjusted.
In some embodiments, dynamically adjusting the upper limit of the shared resource includes: acquiring an actual value of a performance parameter related to the first task; and dynamically adjusting an upper bound of the shared resource based on the baseline value and the actual value. By the method, the configuration of the shared resource can be accurately adjusted.
In some embodiments, dynamically adjusting the upper limit of the shared resource based on the baseline value and the actual value comprises: if it is determined that the difference between the baseline value and the actual value exceeds the first threshold, reducing the shared resource allocated to the third task; and if it is determined that the difference between the baseline value and the actual value is below a second threshold, increasing the shared resource allocated to the third task, wherein the second threshold is below the first threshold. By the method, proper resources can be accurately configured for the high-priority tasks and the low-priority tasks, the processing efficiency of the high-priority tasks is ensured, the execution of the low-priority tasks is ensured, and the resource utilization rate is improved.
In some embodiments, the shared resources include at least one of a last level cache LLC and a memory bandwidth. In this way, it can be accurately determined which shared resources to adjust.
According to a second aspect of the present application, an apparatus for processing tasks is provided. The device comprises: a task determination unit configured to determine a first task to be performed by a first logical core of the physical cores in the processing resource; a priority determining unit configured to determine whether the first task is a predetermined priority; an execution determining unit configured to determine whether a second logical core of the physical cores executes a second task of a predetermined priority if it is determined that the first task is of the predetermined priority; and an allocation unit configured to allocate a dedicated task including a null instruction to the second logical core if it is determined that the second logical core does not execute the second task of the predetermined priority.
According to a third aspect of the present application, there is also provided an electronic device comprising: at least one computing unit; at least one memory coupled to the at least one computing unit and storing instructions for execution by the at least one computing unit, which when executed by the at least one computing unit, cause the apparatus to perform the method according to the first aspect of the present application.
According to a fourth aspect of the present application there is also provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method according to the first aspect of the present application.
According to a fifth aspect of the present application there is also provided a computer program product comprising computer executable instructions which when executed by a processor implement the method according to the first aspect of the present application.
It will be appreciated that the apparatus of the second aspect, the electronic device of the third aspect, the computer storage medium of the fourth aspect, or the computer program product of the fifth aspect, as provided above, is for performing the method provided by the first aspect. Accordingly, the explanation or explanation regarding the first aspect is equally applicable to the second aspect, the third aspect, the fourth aspect, and the fifth aspect. The advantages achieved by the second aspect, the third aspect, the fourth aspect, and the fifth aspect may refer to those in the corresponding methods, and are not described herein.
Drawings
The above and other features, advantages and aspects of embodiments of the present application will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, wherein like or similar reference numerals designate like or similar elements, and wherein:
FIG. 1 illustrates a schematic diagram of an example environment in which various embodiments of the present application may be implemented;
FIG. 2 illustrates a schematic flow diagram for processing tasks according to some embodiments of the present application;
FIG. 3 illustrates a system diagram for controlling inter-core shared resources according to some embodiments of the present application;
FIG. 4 illustrates a schematic flow diagram of a process for determining resource sensitivity according to some embodiments of the present application;
FIG. 5 illustrates a schematic flow diagram of a process for dynamically allocating resources according to some embodiments of the present application;
FIG. 6 illustrates a schematic diagram of an implementation example of a computing device, according to some embodiments of the present application;
FIG. 7 illustrates a block diagram of an apparatus according to some embodiments of the present application; and
FIG. 8 illustrates a block diagram of a computing device capable of implementing various embodiments of the present application.
Detailed Description
Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present application are shown in the drawings, it is to be understood that the present application may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided to provide a more thorough and complete understanding of the present application. It should be understood that the drawings and examples of the present application are for illustrative purposes only and are not intended to limit the scope of the present application.
In the description of the embodiments of the present application, the term "comprising" and its similar terms should be understood as open-ended, i.e. "including, but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The terms "first," "second," and the like, may refer to different or the same object. Other explicit and implicit definitions are also possible below.
As described above, with the continuous expansion of data centers, the number of servers is also rapidly increasing. However, the resource utilization of the server is always in a low state, such as central processing unit (Central Process Unit, CPU) utilization. This means a huge waste of resources. In order to improve the resource utilization of the server, it is common practice to mix and deploy tasks (virtual machines/containers, etc.) with different priorities. For example, the server CPU utilization is increased from 15% to 30% by the blending portion, thus saving CPU cost by 50%.
However, when tasks with different priorities are mixed and deployed, CPUs in on-chip resources are time-division multiplexed, and tasks with lower priorities may inevitably interfere with tasks with high priorities. The existing host operating system (or virtual machine supervisory program) task scheduling algorithm cannot guarantee complete preemption of a high-priority task on a lower-priority task; in addition, when tasks with different priorities run on different logic cores on the same physical core, interference exists in the resources such as an arithmetic logic unit, a first-level cache, a second-level cache and the like, and the quick execution of the tasks with high priorities is affected. Further, a Last Level Cache (LLC) and a Memory Bandwidth (MBW) are inter-core shared resources, and when a high-priority task and a lower-priority task are running on different physical cores, the lower-priority task may preempt the inter-core shared resources, resulting in a loss of processing efficiency of the high-priority task.
In order to solve the above problem, one conventional solution is to use a container mix solution, for example, a full scene offline mix solution, which designs a completely new offline scheduler class based on the linux system, where the priority of the scheduler class in the scheduling queue is lower than the default scheduler class and higher than the IDLE scheduler class. However, designing a new scheduling class increases system complexity and operation and maintenance costs, and cannot reuse the characteristics of load balancing in the current system.
Another conventional approach is to classify tasks into delay-sensitive and best effort types. The scheme needs to read the related data of the processing efficiency of the user side in real time, and provides resources for best effort tasks as much as possible on the premise of meeting the processing efficiency of delay sensitive application. However, these schemes require some a priori knowledge of the task, such as the need to know the data (depending on the specific hardware performance and architecture) of the throughput, cache hit rate, etc. of the application at different cache capacities. However, because of the need for a priori knowledge of the tasks, the tasks are limited to specific tasks and cannot be tuned for the various tasks run by the server.
To address at least some of the above problems and other potential problems, in embodiments of the present application, a computing device first determines a first task to be performed by a first logical core of physical cores in a processing resource, and then determines whether the first task is high priority. If the computing device determines that the first task is a high priority task, it is determined whether a second logical core of the physical cores performs the high priority second task. If it is determined that the second logical core does not perform the high priority second task, a dedicated task including a null instruction is assigned to the second logical core. Such that high priority tasks monopolize resources within the physical core. Further, when a high-priority task is running in one physical core, the shared resource occupied by the low-priority task running in other physical cores in the same on-chip resource can be regulated to further promote the execution of the high-priority task. Based on the mode, the embodiment of the application can accelerate the execution of the high-priority task in the processor, improve the processing efficiency of the high-priority task and improve the resource utilization rate of the processor.
FIG. 1 illustrates a schematic diagram of an example environment 100 in which various embodiments of the present application can be implemented. As shown in FIG. 1, environment 100 includes a computing device 101.
Computing device 101 includes, but is not limited to, personal computers, servers, hand-held or laptop devices, mobile devices such as mobile phones, personal Digital Assistants (PDAs), media players, and the like, multiprocessor systems, consumer electronics, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The computing device 101 is used to process various tasks from a user. Tasks described herein are tasks handled by a computing device, which may be virtual machines, containers, a set of processes or a set of threads, etc. In order to be able to provide high quality services for some important tasks, the user may assign a priority, also called hard priority, to the services. For example, the tasks are divided into high-priority tasks and low-priority tasks, such as high-priority task 102 and low-priority task 103 shown in fig. 1. The priorities of the high-priority task 102 and the low-priority task 103 are specified in advance. Tasks assigned task priorities may be tagged with task tags to indicate the priority level of their tasks. For example, each task is assigned a field for storing a task tag.
In one example, the computing device 101 determines the priority of the task depending on the type of task. In another example, the priority of the task is specified by the user. For example, in public cloud scene, the resource-exclusive virtual machine and the resource-sharing virtual machine are mixed and deployed, a high-priority label is added to the resource-exclusive virtual machine, and a low-priority label is added to the resource-sharing virtual machine. In a private cloud scenario, a user may classify tasks into high-priority tasks and low-priority tasks according to requirements such as whether latency is sensitive. The above examples are merely for the purpose of describing the present disclosure and are not intended to be a specific limitation thereof.
Computing device 101 also has on-chip resources 107. The on-chip resources 107 refer to resources provided on the chip, including at least physical cores 108 and 109, and a last level cache LLC and a memory bandwidth MBW, i.e., LLC/MBW 114. For example, on-chip resource 107 is a CPU. The on-chip resources are shown in FIG. 1 to include two physical cores 108 and 109, which are merely examples and not a specific limitation of the present disclosure. One skilled in the art may set the number of physical cores contained in the on-chip resource 107 as desired. For example, the on-chip resources 107 may include one physical core or more than two physical cores.
As shown in fig. 1, physical core 108 includes logical core 110 and logical core 111, and physical core 109 includes logical core 112 and logical core 113. The logical core in fig. 1 is obtained by performing a hyper-threading operation on the physical core. The physical cores shown in fig. 1, including two logical cores, are merely examples and are not a specific limitation of the present disclosure. The number of logical cores in a physical core may be set by those skilled in the art as desired, e.g., one physical core includes one or more than two logical cores, and the number of logical cores in two physical cores may be the same or different.
The computing device 101 is configured with a task ready queue for each logical core for storing tasks performed by each logical core. Upon receiving a task, computing device 101 may assign the newly received task to a task ready queue of one of the logical cores, depending on the load balancing of the logical cores and/or the user's configuration of the task.
The computing device 101 also includes a CPU schedule optimization module 104 for scheduling and optimizing execution of high priority tasks. Tasks in the task ready queue, which are typically assigned to a logical core, may be processed in order based on the time slices assigned to the tasks and the processor time slices, for example, using a CPU time slice round robin scheduling method or a full fair scheduling method. Since the tasks received by the computing device 101 are also prioritized, the CPU schedule optimization module 104 is configured to further adjust the ordering of the various tasks, including priority, within the ready queue and execution of the tasks.
The CPU schedule optimization module 104 includes a single core throttle module 105 and a logical core isolation module 106. The single core throttle module 105 is configured to order tasks including priority within a ready queue, with high priority tasks being queued at the front of the task ready queue and low priority tasks being queued at the back of the task ready queue. If the priorities of the two tasks are the same, queuing is performed according to the existing time slice scheduling mode. Tasks with high priority are then sent to the logical cores for processing by the single core throttle module 105.
If the logical core is processing a high priority task, the logical core isolation module 106 in the CPU optimization scheduling module may further determine if other logical cores on the same physical core are executing a high priority task. For example, if a logical core 110 in a physical core 108 is running a high priority task, the logical core isolation module 106 may determine whether a logical core 111 on the same physical core is running a high priority task. Illustrated in fig. 1 is an example in which a physical core includes two logical cores, and if the physical core includes more than two logical cores, the logical isolation module 106 may determine whether one or more other logical cores are running high priority tasks.
If other logical cores, such as logical core 111, are not running high priority tasks, then logical core isolation module 106 may pull a null instruction task in logical core 111. The empty instruction task has a higher priority than the low priority and a lower priority than the high priority. For example, if other logical cores are running low priority tasks, the low priority tasks are replaced with null instruction tasks. If the other logic cores do not process the task, the empty instruction task is directly pulled up. After the empty instruction task is pulled up on the other logic cores, the low-priority task cannot occupy the other logic cores, so that the execution of the high-priority task is ensured. If other logical cores are also running high priority tasks, no adjustments are made to the other logical cores.
In addition, if the task to be processed by the logic core is a low-priority task, the logic core only executes the low-priority task, and does not perform task control operation on other logic cores. Also, if other logical cores are running high priority tasks, then the low priority tasks running on the logical cores may be replaced with null instruction tasks.
If there are multiple physical cores in the on-chip resource, there is also contention for shared resources between the multiple physical cores. Such as last level cache LLC and memory bandwidth MBW resources common to multiple physical cores. Alternatively or additionally, the computing device 101 also optionally includes an on-chip shared resource manager 115. The on-chip resource sharing manager 115 is configured to obtain a performance index of a high-priority task executed on one physical core, and then adjust occupation of shared resources between physical cores by low-priority tasks on other physical cores based on the performance index.
The on-chip shared resource manager 115 includes a collector 116 for obtaining performance metrics for high priority tasks. The on-chip shared resource manager 115 also includes a classifier 117 for determining whether the high priority task is sensitive to the shared resource. The resource controller 118 then adjusts the occupancy of the shared resource by low priority tasks on one physical core based on whether the high priority tasks on the other physical core are sensitive to the shared resource. For example, if the logical core 110 of the physical core 108 runs a high priority task, in addition to controlling the logical core 111 within the physical core 108 not to execute a low priority task, the shared resources allocated to the low priority task in the physical core 109 may be controlled by the on-chip shared resource manager 115.
By the method, the execution of the high-priority task in the processor can be quickened, the processing efficiency of the high-priority task is improved, and the resource utilization rate of the processor is improved.
A schematic diagram of an example environment 100 in which embodiments of the present application can be implemented is described above in connection with fig. 1. A flowchart of a method 200 for processing tasks according to an embodiment of the disclosure is described below in connection with fig. 2. Method 200 may be performed at computing device 101 in fig. 1 and any suitable computing device.
At block 201, a first task to be performed by a first logical core of physical cores in a processing resource is determined. For example, computing device 101 determines tasks processed by logical core 110 within physical core 108 to allocate to logical core execution.
In some embodiments, the computing device 101 may obtain a task ready queue corresponding to the first logical core, the task ready queue being an ordered queue based on priorities of the tasks. The computing device 101 then retrieves the first task from the task ready queue to be performed by the first logical core. By the method, the task to be processed by the logic core can be acquired quickly and accurately, the time for acquiring the task with high priority is shortened, and the processing efficiency is improved.
In some embodiments, the computing device 101, upon retrieving the first task for execution on the first logical core, will pick a ready task from the head of the task ready queue according to the priority. Since the task ready queue is an ordered queue, the head of the queue stores ready tasks with higher priority. The computing device 101 then also determines whether the first logical core is currently performing the current task. The ready task is executed by the first logical core if it is determined that the first logical core is not executing the task. If the first logical core is executing the current task, then the priority of the ready task needs to be compared with the priority of the current task. If the priority of the ready task is determined to be higher than the priority of the current task, the ready task is determined to be the first task for replacement of execution of the current task. If it is determined that the priority of the ready task is lower than the priority of the current task, the current task is continued to be executed without executing the low-priority ready task. If the priority of the ready task is determined to be equal to the priority of the current task, scheduling is performed by using a CPU time slice round robin scheduling method or a full fairness scheduling method. The above process is typically performed after a new task is assigned to the task ready queue of the first logical core and a single ordering is performed. By the method, whether the task in the logic core needs to be replaced can be rapidly determined, and the probability of rapidly processing the high-priority task is improved. In some embodiments, if a logical core is not assigned a new task, the task to be run is fetched from the task ready queue head when the logical core finishes processing the CPU time slice of the current task. The above examples are merely for the purpose of describing the present disclosure and are not intended to be a specific limitation thereof.
In some embodiments, computing device 101, upon receiving the new task, may obtain the assigned task assigned to the first logical core and a corresponding priority for the assigned task. The computing device then adds the assigned task to the task ready queue based on the corresponding priority. In one example, the computing device 101, upon receiving the tasks assigned to the logical core 110, reorders the tasks in the task ready queue in order of priority from high to low. In another example, the priorities of the newly assigned tasks are inserted into the logical queue. If the priorities of the two tasks are the same, the order of the two tasks is determined in accordance with the manner in which the order is determined by the commonly used time slices. By the method, the sequence of the newly allocated tasks in the priority queue can be rapidly determined, so that the high-priority tasks can be processed in time, and the processing efficiency of the high-priority tasks is improved.
At block 202, it is determined whether the first task is a predetermined priority. The management modes of the resources in the same physical core are different due to different task priorities. Thus, computing device 101 may determine its priority when determining tasks to process within the logic core.
In some embodiments, the predetermined priority is a first predetermined priority, i.e., a high priority. If it is determined that the first task is not the predetermined priority, i.e., is not the first predetermined priority or is a high priority, then the computing device 101 determines that the first task is a second predetermined priority, which is lower than the first predetermined priority, i.e., is a low priority, and then the computing device 101 causes the first logical core to execute the first task. At this time, the tasks of other logic cores are not adjusted. By the method, the first task can be rapidly executed, and the processing efficiency is improved.
If it is determined at block 202 that the first task is a predetermined priority, at block 203 the computing device 101 may further determine whether a second logical core of the physical cores is executing a second task of the predetermined priority. If it is determined at block 203 that the second logical core is executing a second task of a predetermined priority, it is indicated that the second logical core is also processing a task of a high priority. Therefore, the task performed by the second logical core may not be adjusted.
In some embodiments, if it is determined that the first task is a predetermined priority, i.e., the first task is a high priority task, the computing device 101 determines whether the second logical core is executing the second task. If the second task is not executed, the second logical core may be directly caused to execute a task including a null instruction, which may also be referred to as a dedicated task dedicated to scheduling control. If it is determined that the second logical core is executing the second task, it is determined whether the priority of the second task is a predetermined priority. If the determination is of a predetermined priority, no further operation is performed. In this way, the efficiency of determining the priority level of the second task can be improved.
If it is determined that the second logical core did not perform the second task of the predetermined priority, at block 204, a dedicated task including a null instruction is assigned to the second logical core. At this point, the null instruction task is pulled up in the second logical core. Thus, the high priority tasks running in the first logical core are caused to occupy more in-core resources for execution.
By the method, the execution of the high-priority task in the processor can be quickened, the compression of the high-priority task on the single physical core for the lower-priority task is realized, the interference of the different-priority tasks on the logic core is eliminated, the execution of the high-priority task in the single physical core is improved, and the processing efficiency of the high-priority task is improved.
A schematic flow chart diagram for processing tasks according to some embodiments of the present application is described above in connection with fig. 2. An example process for processing inter-core resources among multiple physical cores to further expedite execution of high-priority tasks is described below. The processing resources may include multiple physical cores, and after a high-priority task is run on a first physical core, low-priority applications on one or more other physical cores may also need to be adjusted based on the performance of the high-priority task. In this process, computing device 101 first obtains a baseline value of the performance parameter associated with the first task. The shared resources that may be allocated to a third task on the second physical core, the third task having a second predetermined priority that is lower than the first predetermined priority, are then adjusted based on the baseline value. By the method, more inter-core shared resources can be allocated for the high-priority task, so that the execution of the high-priority task is quickened, and the processing efficiency are improved. This process is further described below in conjunction with fig. 3.
FIG. 3 depicts a system diagram for controlling inter-core shared resources according to some embodiments of the present application. The system 300 is used to control the allocation of inter-core shared resources, which may be the on-chip shared resource manager 115 of FIG. 1. The system 300 includes a data collector 301, a resource sensitivity classifier 302, and a resource controller 303.
The data collector 301 is configured to collect performance metrics of the high-priority task microarchitecture, including, but not limited to, related basic metrics such as number of instructions executed per unit time, instruction cycle, number of cache misses, number of cache accesses, memory bandwidth, and memory constraint at the back end of the pipeline. Based on these metrics, further complex metrics such as the number of cache misses per thousand instructions (Cache Misses Per Kilo Instructions, MPKI), the amount of cache accesses per thousand instructions, e.g., the last level of cache accesses per thousand instructions (LLC Accesses Per Kilo Instructions, APKI), the cache miss rate (Cache Misses Rate, CMR), the number of instructions per cycle (instructions per cycles, IPC), etc., can be calculated. The data collector will also acquire the memory bandwidth MBW allocated to the high priority task. For example, when a high-priority task is executed in a first physical core of the plurality of physical cores, the data collector 301 collects performance indexes corresponding to the high-priority task.
To further understand the performance of high priority tasks, it is necessary to obtain baseline performance for the high priority tasks. The baseline performance of a high priority task refers to performance index values of the high priority task that are executed after suppressing low priority tasks on other physical cores. In one example, the computing device throttles execution of one or more low priority tasks against one or more other physical checkups. The computing device then determines a baseline value based on the throttling of the low priority tasks. And if the low priority task on other physical cores is not suppressed, obtaining a normal operation performance index value which is the high priority task. In this way, the base linear energy indicator can be quickly and accurately determined.
For example, the high priority task is run on the first physical core 108 in fig. 1, execution of the low priority third task on the second physical core 109 is suppressed, and then baseline performance for high priority traffic is obtained. Additionally, execution of the third task by the second physical core 109 is suppressed multiple times at predetermined time intervals while execution of the third task of low priority on the second physical core 109 is suppressed. A plurality of baseline values is then determined based on the plurality of presses of the third task. The baseline value for the index may then be determined based on averaging multiple values for the same index over multiple presses or by performing various suitable processes.
Low priority tasks on other cores are periodically compressed as indicated by block 304 in fig. 3. The throttling process includes two ways, one is to limit the upper bound of the shared resources allocated to the low priority tasks, as indicated at block 305. In this way, the trigger frequency can be set to the order of hundred milliseconds. Another way is to pause execution of the low priority task, as indicated in block 306. The method can accurately acquire the baseline performance of the high-priority task without the interference of the low-priority task, but is not friendly to the low-priority task, so that the triggering frequency can be set to be the second level.
The system 300 also includes a resource sensitivity classifier 302 that determines whether a task is sensitive to or sensitive to a shared resource based on the obtained baseline value of the performance parameter. For example, a determination is made as to whether a high priority task running on first physical core 108 in fig. 1 is sensitive to shared resources. In one example, the computing device 101 determines a sensitivity of the first task to the shared resource based on the baseline value of the performance parameter. As shown in block 307, a determination is made as to whether the shared resource is sensitive based on the baseline value of the performance parameter. The resource controller 303 then adjusts the shared resources that may be allocated to low priority tasks on other cores based on the sensitivity. By the method, whether the high-priority task is sensitive to the resource can be accurately determined, so that the processing efficiency of the high-priority task is ensured, and the execution of the low-priority task is ensured. The process of determining resource sensitivity is further described below in conjunction with FIG. 4.
After the discrimination at block 307, if it is determined that the first task is insensitive to shared resources, at block 308, shared resources for low priority tasks on other cores are statically allocated, e.g., an upper bound for shared resources for low priority tasks on other cores is increased. Since the first task is insensitive to shared resources, more shared resources can be allocated to the low priority tasks, at which time the amount of low priority resources allocated to other cores is increased by a predetermined policy. If it is determined that the first task is sensitive to shared resources, at block 309, an upper bound for shared resources for low priority tasks on other cores is dynamically allocated. The shared resources need to be dynamically configured using the baseline performance and normal performance of the high priority first task in dynamically allocating the shared resources. The shared resources include at least one of a last level cache LLC and a memory bandwidth. In this way, it can be accurately determined which shared resources to adjust. This process of dynamically configuring shared resources will be described in connection with fig. 5.
A system diagram for controlling inter-core shared resources is described above in connection with fig. 3. The process of determining resource sensitivity is further described below in conjunction with FIG. 4. FIG. 4 illustrates a schematic flow diagram of a process for determining resource sensitivity according to some embodiments of the present application;
At block 401, a computing device obtains performance parameters for a first task of high priority. The performance parameter includes at least one of: a per-thousand instruction cache access volume APKI, a cache miss rate CMR, and a memory bandwidth MBW.
At block 402, a determination is made of the APKI and CMR parameters. If it is determined that the amount of cache access APKI per thousand instructions is less than the threshold cache access amount a or the cache miss rate CMR is greater than the threshold cache miss rate B, then it is determined at block 404 that the first task of high priority is insensitive to cache resources, such as LLC. If it is determined that the cache access amount APKI per thousand instructions is greater than or equal to the threshold cache access amount a and the cache miss rate CMR is less than or equal to the threshold cache miss rate B, a determination is made at block 405 that the first task of high priority is sensitive to cache resource LLC.
At block 403, the memory bandwidth MBW is compared to a threshold C, and if it is determined that the memory bandwidth MBW is less than the threshold C, then at block 406 it is determined that the first task of high priority is insensitive to the memory bandwidth. If it is determined that the memory bandwidth MBW is greater than or equal to the threshold C, then at block 407 it is determined that the first task of high priority is sensitive to the memory bandwidth MBW. Alternatively or additionally, when the setting is insensitive to a class of resources, the wait condition is effective when it is established a plurality of times in succession, and a ping-pong phenomenon is prevented, which refers to a back-and-forth variation in resource allocation between different results due to the different results of each detection. By the method, which resource the high priority is sensitive to can be quickly determined, so that accurate information is provided for the configuration of the resource, and the resource allocation efficiency and accuracy are improved.
A schematic flow chart of a process for determining resource sensitivity according to some embodiments of the present application is described above in connection with fig. 4. The computing device is acquiring a baseline performance of a high priority first task running on the first core and actual values of performance parameters of the first task that is not being compressed. The upper limit of the shared resource is then dynamically adjusted based on the baseline value and the actual value. By the method, the configuration of the shared resource can be accurately adjusted. In some embodiments, if it is determined that the difference between the baseline value and the actual value exceeds the first threshold, the computing device reduces shared resources allocated to a third task of low priority running on the other cores. If it is determined that the difference between the baseline value and the actual value is below a second threshold, the shared resource allocated to the third task is increased, wherein the second threshold is below the first threshold. By the method, proper resources can be accurately configured for the high-priority tasks and the low-priority tasks, the processing efficiency of the high-priority tasks is ensured, the execution of the low-priority tasks is ensured, and the resource utilization rate is improved. The process for dynamically allocating resources is further described below in conjunction with fig. 5.
Fig. 5 illustrates a schematic flow diagram of a process for dynamically allocating resources according to some embodiments of the present application. An initial state is set when running a low priority task, which is allocated a certain shared resource, e.g. LLC is set to 2 way, with a memory bandwidth of 10%. The above examples are merely for the purpose of describing the present disclosure and are not intended to be a specific limitation thereof.
At block 501, the computing device compares the actual value of the collected index for the first task of high priority to the baseline performance, e.g., compares the actual value of the performance index APKI or MBW to the baseline value. If the collected metrics are worse than the baseline performance, e.g., the difference between the two exceeds a threshold E, at block 502, the resources allocated to the low priority task are reduced, e.g., halved or a proportion reduced, at block 508. Alternatively or additionally, block 505 may optionally be included, where the above process may be performed multiple times in succession, operation 508 being performed only if a threshold number of differences in the multiple comparisons exceeds a threshold E.
If the actual value of the collected indicator is relatively close to the baseline performance, e.g., the difference between the two times values is less than the threshold F, i.e., the performance degradation is less than the threshold F, at block 503, the resources of the low priority task are scaled up at block 509. Alternatively or additionally, block 506 may optionally be included where the above process may be performed multiple times in succession, with operation 509 only if the difference between the threshold number of times in the multiple comparisons is less than threshold F. In block 504, the case where the above two conditions are not satisfied is determined as the other case. In other cases, the original configuration is maintained at block 507.
In addition, when dynamically adjusting resource allocation, if the low-priority task resources are limited to the initial values, performance degradation of the high-priority task still exceeds a threshold (degradation degree is not acceptable) and is continuous for a plurality of times, at this time, the CPU bandwidth of the low-priority task (i.e. the upper limit of the CPU resources is limited), and if the performance degradation after the limitation is still serious, the low-priority task migration can be requested to the cluster center.
By the method, proper resources can be accurately configured for the high-priority tasks and the low-priority tasks, the processing efficiency of the high-priority tasks is ensured, the execution of the low-priority tasks is ensured, and the resource utilization rate is improved.
A schematic diagram of an example implementation of a computing device of the present invention is further described below in conjunction with fig. 6. The computing device 601 includes a task layer 601, a software layer 605, and a hardware layer 613. The task layer 601 is configured to acquire tasks and configure the tasks into a high priority task 602 and a low priority task 604. The priority of tasks is achieved by a priority label 603.
The high priority tasks or low priority tasks are implemented by the software layer 605 for execution on the hardware layer 613. The software layer 605 includes a CPU schedule optimization module 606 for implementing management of resources within one physical core, such as prioritizing high priority tasks and performing logical core isolation. The CPU schedule optimization module 606 includes a single core throttle module 608 and a logic core isolation module 609 to achieve the above described functionality. The software layer 605 also includes an on-chip shared resource management module 607 for implementing management of on-chip inter-core shared resources. The on-chip shared resource management module 607 includes a data acquisition module 610, a resource sensitivity classifier 611, and a resource controller 612 for implementing inter-core throttling of shared resources.
The hardware layer includes on-chip resources 614 and resource allocation techniques (Resource Director Technology, RDT)/memory system resource partitioning and monitoring (Memory System Resource Partitioning and Monitoring, MPAM) for achieving task execution. The on-chip resource 614 may be a CPU. In addition, the on-chip resources 614 also include a performance monitoring unit 616. The performance monitoring unit 616 and the RDT/MPAM are used to provide performance parameters to the data acquisition module 610.
Fig. 7 further illustrates a block diagram of an apparatus 700 for processing tasks, according to an embodiment of the present application, the apparatus 700 may include a plurality of modules for performing corresponding steps in the process 200 as discussed in fig. 2. As shown in fig. 7, the apparatus 700 includes a task determination unit 701 configured to determine a first task to be performed by a first logical core of physical cores in a processing resource; a priority determination unit 702 configured to determine whether the first task is a predetermined priority; an execution determining unit 703 configured to determine whether a second logical core of the physical cores executes a second task of a predetermined priority if it is determined that the first task is of the predetermined priority; and an allocation unit 704 configured to allocate a dedicated task including a null instruction to the second logical core if it is determined that the second logical core does not execute the second task of the predetermined priority.
In some embodiments, wherein the task determination unit 701 comprises: a ready queue determining unit configured to acquire a task ready queue for the first logical core, the task ready queue being an ordered queue based on priorities of the tasks; a first task obtaining unit configured to obtain a first task from the task ready queue.
In some embodiments, wherein the first task obtaining unit comprises: a selecting unit configured to select a ready task from a head of the task ready queue according to the priority; and an execution determining unit configured to determine whether the first logic core is executing the current task. And a priority comparison unit configured to compare the priority of the ready task with the priority of the current task if it is determined that the first logic core is executing the current task. And a replacement unit configured to determine the ready task as a first task for replacing execution of the current task if it is determined that the priority of the ready task is higher than the priority of the current task.
In some embodiments, the apparatus 700 further comprises: a task acquisition unit configured to acquire an allocation task allocated to the first logical core and a corresponding priority for the allocation task; and an adding unit configured to add the allocation task to the task ready queue based on the corresponding priority.
In some embodiments, wherein the predetermined priority is a first predetermined priority, the apparatus 700 further comprises: and a first task execution unit configured to cause the first logic core to execute the first task if it is determined that the first task is of a second predetermined priority, the second predetermined priority being lower than the first predetermined priority.
In some embodiments, the execution determining unit includes: a second task execution determination unit configured to determine whether the second logical core is executing the second task if it is determined that the first task is a predetermined priority; and a second task priority determination unit configured to determine whether the priority of the second task is a predetermined priority if it is determined that the second logic core is executing the second task.
In some embodiments, wherein the physical core is a first physical core, the processing resource further comprises a second physical core, the predetermined priority is a first predetermined priority, and the apparatus 700 further comprises: a baseline value acquisition unit configured to acquire a baseline value of a performance parameter related to the first task; and a shared resource adjustment unit configured to adjust a shared resource allocable to a third task on the second physical core, the third task having a second predetermined priority lower than the first predetermined priority, based on the baseline value.
In some embodiments, wherein the baseline value acquisition unit comprises: a pressing unit configured to press the second physical collation against execution of the third task; and a baseline value determination unit configured to determine a baseline value based on the suppression of the third task.
In some embodiments, the throttle unit includes a limiting unit configured to limit an upper limit of the shared resources allocated to the third task; or a suspension unit configured to suspend the execution of the third task.
In some embodiments, the compacting unit includes compacting the execution of the second physical collation third task a plurality of times at predetermined time intervals; and the baseline value determination unit includes determining a plurality of baseline values based on the plurality of depressions of the third task.
In some embodiments, wherein the shared resource adjustment unit comprises: and a sensitivity determination unit configured to determine a sensitivity of the first task to the shared resource based on the baseline value of the performance parameter. An allocable resource adjusting unit configured to adjust the shared resource allocable to the third task based on the sensitivity.
In some embodiments, the performance parameter includes at least one of: a cache access amount per thousand instructions, a cache miss rate, and a memory bandwidth, wherein the sensitivity determination unit is configured to perform at least one of: if the cache access amount of each thousand instructions is determined to be lower than the threshold cache access amount or the cache miss rate is greater than the threshold cache miss rate, determining that the first task is insensitive to the cache; if the cache access amount of each thousand instructions is determined to be greater than or equal to the threshold cache access amount and the cache miss rate is less than or equal to the threshold cache miss rate, determining that the first task is sensitive to the cache; if the memory bandwidth is determined to be less than the threshold bandwidth, determining that the first task is insensitive to the memory bandwidth; and if it is determined that the memory bandwidth is greater than or equal to the threshold bandwidth, determining that the first task is sensitive to the memory bandwidth.
In some embodiments, the allocable resource adjustment unit comprises: a first increasing unit configured to increase an upper limit of the shared resource for the third task if it is determined that the first task is insensitive to the shared resource; and a dynamic adjustment unit configured to dynamically adjust an upper limit of the shared resource for the third task if it is determined that the first task is sensitive to the shared resource.
In some embodiments, wherein the dynamic adjustment unit comprises: an actual value acquisition unit configured to acquire an actual value of a performance parameter related to the first task; and an upper limit adjustment unit configured to dynamically adjust an upper limit of the shared resource based on the baseline value and the actual value.
In some embodiments, the upper limit adjustment unit includes: a reduction unit configured to reduce the shared resource allocated to the third task if it is determined that the difference between the baseline value and the actual value exceeds the first threshold value; and a second increasing unit configured to increase the shared resource allocated to the third task if it is determined that the difference between the baseline value and the actual value is below a second threshold, wherein the second threshold is lower than the first threshold.
In some embodiments, wherein the shared resource comprises at least one of a last level cache LLC and a memory bandwidth.
Fig. 8 shows a schematic block diagram of an example device 800 that may be used to implement embodiments of the present disclosure. For example, computing devices 101 and 601 according to embodiments of the present application may be implemented by example device 800. As shown, the device 800 includes a Central Processing Unit (CPU) 801 that can perform various suitable actions and processes in accordance with computer program instructions stored in a Read Only Memory (ROM) 802 or loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The CPU 801, ROM 802, and RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
Various processes and treatments described above, such as processes 200, 400, and 500, may be performed by processing unit 801. For example, in some embodiments, processes 200, 400, and 500 may be implemented as computer software programs tangibly embodied on a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When the computer program is loaded into RAM803 and executed by CPU 801, one or more of the acts of processes 200, 400, and 500 described above may be performed.
The present application may be a method, apparatus, system, chip and/or computer program product. The chip may include a processing unit and a communication interface, and the processing unit may process program instructions received from the communication interface. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing the various aspects of the present application.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for performing the operations of the present application may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present application are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which may execute the computer readable program instructions.
Various aspects of the present application are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The embodiments of the present application have been described above, the foregoing description is exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (20)

1. A method of processing tasks, the method comprising:
determining a first task to be performed by a first logical core of the physical cores in the processing resource;
determining whether the first task is a predetermined priority;
if it is determined that the first task is the predetermined priority, determining whether a second logical core of the physical cores performs a second task of the predetermined priority; and
and if the second logic core is determined not to execute the second task with the preset priority, allocating a special task comprising a null instruction to the second logic core.
2. The method of claim 1, wherein determining the first task comprises:
acquiring a task ready queue for the first logic core, wherein the task ready queue is an ordered queue based on the priority of tasks;
the first task is obtained from the task ready queue.
3. The method of claim 2, wherein obtaining the first task comprises:
selecting a ready task from the head of the task ready queue according to priority; and
determining whether the first logical core is executing a current task;
if the first logic core is determined to be executing the current task, comparing the priority of the ready task with the priority of the current task; and
and if the priority of the ready task is determined to be higher than the priority of the current task, determining the ready task as the first task to be used for replacing the execution of the current task.
4. The method according to claim 2, wherein the method further comprises:
acquiring an allocation task allocated to the first logic core and a corresponding priority for the allocation task; and
the assigned task is added to the task ready queue based on the corresponding priority.
5. The method of claim 1, wherein the predetermined priority is a first predetermined priority, the method further comprising:
if it is determined that the first task is a second predetermined priority, causing the first logical core to execute the first task, the second predetermined priority being lower than the first predetermined priority.
6. The method of claim 1, wherein determining whether a second logical core of the physical cores performs the second task of the predetermined priority comprises:
if it is determined that the first task is a predetermined priority, determining whether the second logical core is executing the second task;
if it is determined that the second logical core is executing the second task, it is determined whether the priority of the second task is the predetermined priority.
7. The method of claim 1, wherein the physical core is a first physical core, the processing resource further comprises a second physical core, the predetermined priority is a first predetermined priority, the method further comprising:
obtaining a baseline value of a performance parameter related to the first task;
a shared resource allocable to a third task on a second physical core, the third task having a second predetermined priority lower than the first predetermined priority, is adjusted based on the baseline value.
8. The method of claim 7, wherein obtaining the baseline value comprises:
suppressing execution of the second physical checkup the third task; and
the baseline value is determined based on the suppressing of the third task.
9. The method of claim 8, wherein suppressing execution of the third task comprises:
limiting an upper bound of the shared resources allocated to the third task; or alternatively
And suspending the execution of the third task.
10. The method of claim 8, wherein the step of determining the position of the first electrode is performed,
suppressing execution of the third task includes: suppressing the execution of the second physical checkup the third task a plurality of times at predetermined time intervals;
and determining the baseline value comprises: a plurality of baseline values is determined based on a plurality of presses of the third task.
11. The method of claim 7, wherein adjusting shared resources allocable to a third task on a second physical core comprises:
determining a sensitivity of the first task to a shared resource based on the baseline value of the performance parameter; and
based on the sensitivity, a shared resource that can be allocated to the third task is adjusted.
12. The method of claim 11, wherein the performance parameter comprises at least one of: a cache access amount per thousand instructions, a cache miss rate, and a memory bandwidth, wherein determining the sensitivity comprises at least one of:
if the cache access amount of each thousand instructions is determined to be lower than the threshold cache access amount or the cache miss rate is determined to be greater than the threshold cache miss rate, determining that the first task is insensitive to the cache;
determining that the first task is cache sensitive if it is determined that the cache access amount per thousand instructions is greater than or equal to the threshold cache access amount and that the cache miss rate is less than or equal to the threshold cache miss rate;
if the memory bandwidth is determined to be less than the threshold bandwidth, determining that the first task is insensitive to the memory bandwidth; and
if it is determined that the memory bandwidth is greater than or equal to the threshold bandwidth, it is determined that the first task is sensitive to memory bandwidth.
13. The method of claim 11, wherein adjusting the shared resource comprises:
if it is determined that the first task is insensitive to the shared resource, increasing an upper bound of the shared resource for the third task; and
Dynamically adjusting an upper bound of the shared resource for the third task if the first task is determined to be sensitive to the shared resource.
14. The method of claim 13, wherein dynamically adjusting the upper limit of the shared resource comprises:
acquiring an actual value of the performance parameter related to the first task; and
an upper bound of the shared resource is dynamically adjusted based on the baseline value and the actual value.
15. The method of claim 14, wherein dynamically adjusting the upper limit of the shared resource based on the baseline value and the actual value comprises:
reducing the shared resource allocated to the third task if it is determined that the difference between the baseline value and the actual value exceeds a first threshold; and
if it is determined that the difference between the baseline value and the actual value is below a second threshold, the shared resource allocated to the third task is increased, wherein the second threshold is below the first threshold.
16. The method of claim 7, wherein the shared resource comprises at least one of a last level cache LLC and a memory bandwidth.
17. An apparatus for processing tasks, the apparatus comprising:
a task determination unit configured to determine a first task to be performed by a first logical core of the physical cores in the processing resource;
a priority determining unit configured to determine whether the first task is a predetermined priority;
an execution determining unit configured to determine whether a second logical core of the physical cores executes a second task of the predetermined priority if it is determined that the first task is the predetermined priority; and
an allocation unit configured to allocate a dedicated task including a null instruction to the second logical core if it is determined that the second logical core does not execute the second task of the predetermined priority.
18. An electronic device, comprising:
at least one computing unit;
at least one memory coupled to the at least one computing unit and storing instructions for execution by the at least one computing unit, the instructions when executed by the at least one computing unit cause the apparatus to perform the method of any one of claims 1-16.
19. A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method according to any of claims 1-16.
20. A computer program product comprising computer executable instructions which when executed by a processor implement the method according to any of claims 1-16.
CN202211021692.9A 2022-08-24 2022-08-24 Method, apparatus, device and storage medium for processing tasks Pending CN117667324A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211021692.9A CN117667324A (en) 2022-08-24 2022-08-24 Method, apparatus, device and storage medium for processing tasks
PCT/CN2023/112749 WO2024041401A1 (en) 2022-08-24 2023-08-11 Method and apparatus for processing task, and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211021692.9A CN117667324A (en) 2022-08-24 2022-08-24 Method, apparatus, device and storage medium for processing tasks

Publications (1)

Publication Number Publication Date
CN117667324A true CN117667324A (en) 2024-03-08

Family

ID=90012524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211021692.9A Pending CN117667324A (en) 2022-08-24 2022-08-24 Method, apparatus, device and storage medium for processing tasks

Country Status (2)

Country Link
CN (1) CN117667324A (en)
WO (1) WO2024041401A1 (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11422849B2 (en) * 2019-08-22 2022-08-23 Intel Corporation Technology for dynamically grouping threads for energy efficiency
CN112749002A (en) * 2019-10-29 2021-05-04 北京京东尚科信息技术有限公司 Method and device for dynamically managing cluster resources
US11204802B2 (en) * 2020-04-27 2021-12-21 International Business Machines Corporation Adjusting a dispatch ratio for multiple queues
CN112698920A (en) * 2021-01-08 2021-04-23 北京三快在线科技有限公司 Container task scheduling method and device, electronic equipment and computer readable medium
CN112783659B (en) * 2021-02-01 2023-08-04 北京百度网讯科技有限公司 Resource allocation method and device, computer equipment and storage medium
CN114138428A (en) * 2021-10-18 2022-03-04 阿里巴巴(中国)有限公司 SLO (Simultaneous task oriented) guaranteeing method, device, node and storage medium for multi-priority tasks

Also Published As

Publication number Publication date
WO2024041401A1 (en) 2024-02-29

Similar Documents

Publication Publication Date Title
US10530846B2 (en) Scheduling packets to destination virtual machines based on identified deep flow
US20210200587A1 (en) Resource scheduling method and apparatus
US7734676B2 (en) Method for controlling the number of servers in a hierarchical resource environment
US10754706B1 (en) Task scheduling for multiprocessor systems
US9442763B2 (en) Resource allocation method and resource management platform
EP3553657A1 (en) Method and device for allocating distributed system task
EP1256039B1 (en) Workload management in a computing environment
US8239869B2 (en) Method, system and apparatus for scheduling computer micro-jobs to execute at non-disruptive times and modifying a minimum wait time between the utilization windows for monitoring the resources
CN109564528B (en) System and method for computing resource allocation in distributed computing
US20200174844A1 (en) System and method for resource partitioning in distributed computing
US20180032373A1 (en) Managing data processing resources
US11106503B2 (en) Assignment of resources to database connection processes based on application information
US20230229495A1 (en) Task scheduling method and apparatus
KR101471749B1 (en) Virtual machine allcoation of cloud service for fuzzy logic driven virtual machine resource evaluation apparatus and method
CN111247515A (en) Apparatus and method for providing a performance-based packet scheduler
US11438271B2 (en) Method, electronic device and computer program product of load balancing
CN115269190A (en) Memory allocation method and device, electronic equipment, storage medium and product
US20230379268A1 (en) Resource scheduling method and system, electronic device, computer readable storage medium
US11388050B2 (en) Accelerating machine learning and profiling over a network
CN112685167A (en) Resource using method, electronic device and computer program product
CN117667324A (en) Method, apparatus, device and storage medium for processing tasks
KR20230063015A (en) Apparatus and method for managing virtual machine cpu resource in virtualization server
Wu et al. Enabling adaptive deep neural networks for video surveillance in distributed edge clouds
US20220365809A1 (en) Analysis processing apparatus, system, method, and non-transitory computer readable medium storing program
CN116680044A (en) Task scheduling method, device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication