WO2023232127A1 - 任务调度方法、装置、系统及相关设备 - Google Patents

任务调度方法、装置、系统及相关设备 Download PDF

Info

Publication number
WO2023232127A1
WO2023232127A1 PCT/CN2023/097910 CN2023097910W WO2023232127A1 WO 2023232127 A1 WO2023232127 A1 WO 2023232127A1 CN 2023097910 W CN2023097910 W CN 2023097910W WO 2023232127 A1 WO2023232127 A1 WO 2023232127A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
cpu
target cpu
target
scheduling
Prior art date
Application number
PCT/CN2023/097910
Other languages
English (en)
French (fr)
Inventor
吉文克
王俊捷
冯犇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202210901085.5A external-priority patent/CN117215732A/zh
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023232127A1 publication Critical patent/WO2023232127A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt

Definitions

  • the present application relates to the field of data processing technology, and in particular, to a task scheduling method, device, system and related equipment.
  • multi-processor core architecture is widely used in computing devices such as terminals and servers.
  • a cost estimation model is usually used to estimate in advance the costs incurred by each processor core in executing the task, including input/output (IO) costs (i.e., the costs incurred by reading and writing data). cost) and CPU cost (that is, the cost incurred by the CPU performing operations), thereby scheduling the task to the processor core with the smallest cost, so that the performance of the multi-processor cores reaches a high level.
  • IO input/output
  • CPU cost that is, the cost incurred by the CPU performing operations
  • a task scheduling method, a task scheduling device, a task scheduling system, a computing device, a chip, a computer readable storage medium and a computer program product are provided to improve the rationality of task scheduling and thereby improve task processing efficiency.
  • embodiments of the present application provide a task scheduling method, which is applied to a heterogeneous hardware platform including multiple CPUs.
  • the multiple CPUs can adopt a large and small core architecture, that is, some of the multiple CPUs include large cores. , some CPUs among multiple CPUs include small cores; in the process of task scheduling, the task to be processed is obtained, and the comprehensive cost generated by the target CPU in the multiple CPUs processing the task is predicted.
  • the comprehensive cost is based on It is obtained by the IO cost, CPU cost and additional cost generated by the target CPU processing the task.
  • the additional cost is determined according to the demand characteristics of the task or the running status of the target CPU, so that when the target CPU processes the task, the comprehensive cost generated by the task satisfies the schedule. Condition, schedule the task to the target CPU.
  • the additional cost corresponding to the CPU processing of the task is also evaluated based on the task requirements or the running status of the CPU, so that the additional cost can be used to perform task scheduling. Correct deviations, achieve optimal scheduling, and prevent CPUs in heterogeneous hardware platforms from being assigned too many tasks, resulting in low processing efficiency for some tasks, thereby improving the rationality of task scheduling.
  • the task can be scheduled to other CPUs in the heterogeneous hardware platform, such as to other CPUs whose comprehensive costs meet the scheduling conditions.
  • the target CPU is a CPU including a large core. Therefore, when scheduling tasks, the priority of scheduling the task to the target CPU may be determined based on the comprehensive cost incurred by the target CPU in processing the task. , Moreover, when the priority of scheduling the task to the target CPU is the first priority, the task is scheduled to the target CPU. Correspondingly, when the priority of the task scheduled to the target CPU is not the first priority, the task can be scheduled to other CPUs. In this way, the task can be scheduled to a CPU with a higher priority corresponding to the comprehensive cost, so that the high-performance computing power of the large core can be used to process the task, thereby improving the processing efficiency of the task.
  • the target CPU is a CPU including a small core.
  • the priority of scheduling the task to the target CPU may be determined based on the comprehensive cost incurred by the target CPU in processing the task.
  • the priority of the task scheduled to the target CPU is the second priority
  • the task is dispatched to the target CPU, and the second priority is lower than the first priority.
  • the priority of the task scheduled to the target CPU is not the second priority
  • the task can be scheduled to other CPUs. In this way, tasks can be scheduled to a CPU with a lower priority corresponding to the comprehensive cost, so that the high-performance computing power of the small core can be used to process tasks, reducing the current task burden of the large core, thereby improving the rationality of task scheduling.
  • the target CPU is a CPU including a large core.
  • the priority of scheduling the task to the target CPU may be determined based on the comprehensive cost incurred by the target CPU in processing the task. level, and when the priority of the task scheduled to the target CPU is the third priority, the task is scheduled to the target CPU and CPUs including small cores.
  • the third priority is lower than the first priority, and the third priority level is higher than the second priority.
  • a CPU including a large core and a CPU including a small core can be used to process the multi-threaded task in parallel, which not only improves the rationality of task scheduling, but also improves task processing efficiency as much as possible.
  • predicting the comprehensive cost generated by the target CPU among multiple CPUs for processing the task specifically, when the available resources of the CPU including the large core are greater than the threshold, predicting the comprehensive cost among the multiple CPUs.
  • the comprehensive cost generated by the target CPU processing task in order to improve the rationality of task scheduling by predicting the comprehensive cost generated by the CPU processing task when the available resources of the large core are limited, and avoid including tasks assigned to the CPU with large cores Excessive number affects the processing efficiency of some tasks.
  • the task can be directly scheduled to the CPU including the large core, so as to utilize the high-performance computing power of the large core to improve task processing efficiency.
  • the additional cost generated by the target CPU processing the task is based on the number of requests made by the task to the target CPU, the number of times the target CPU is requested within the unit time, the working frequency of the target CPU, and the number of requests included in the target CPU. Any one or more of the number of processor cores available.
  • a configuration interface can also be presented, and in response to user operations on the configuration interface, the task scheduling mode of the heterogeneous hardware platform can be configured.
  • the task scheduling mode includes online mode or offline mode. Among them, when in the online mode, tasks can be scheduled according to the comprehensive cost generated by the CPU processing tasks, and when in the offline mode, the tasks can be scheduled to one or more fixed CPUs. In this way, flexible and reasonable scheduling of tasks can be achieved in different time periods or different scenarios.
  • embodiments of the present application provide a task scheduling method, which is applied to a heterogeneous hardware platform including multiple processor cores.
  • the multiple processor cores can adopt a large and small core architecture, that is, among multiple processor cores Some of the processor cores include large cores, and some of the processor cores include small cores; in the process of task scheduling, tasks to be processed are obtained, and target processing in the multiple processor cores is predicted.
  • the comprehensive cost generated by the processor core processing the task The comprehensive cost is obtained based on the IO cost, computing cost and additional cost generated by the target processor core processing the task.
  • the additional cost is based on the demand characteristics of the task or the target processor.
  • the running status of the core is determined, so that when the comprehensive cost generated by the target processor core processing the task meets the scheduling conditions, the task is scheduled to the target processor core.
  • the additional cost corresponding to the processor core processing the task is also evaluated based on the requirements of the task or the operating status of the processor core, so that the additional cost can be utilized right Task scheduling is corrected to achieve optimal scheduling and avoid processor cores in heterogeneous hardware platforms from being assigned too many tasks, resulting in low processing efficiency of some tasks, thereby improving the rationality of task scheduling.
  • the task scheduling when performing task scheduling, when the comprehensive cost generated by the target processor core processing the task is the smallest among the comprehensive costs generated by multiple processor cores processing the task respectively, the task is scheduled to target processor core, otherwise, schedule the task to other processor cores with the smallest comprehensive cost.
  • the rationality of task scheduling can be effectively improved, and excessive tasks assigned to some processor cores can be avoided, resulting in low processing efficiency of some tasks.
  • embodiments of the present application provide a task scheduling device, which is applied to a heterogeneous hardware platform.
  • the heterogeneous hardware platform includes multiple CPUs. Some of the CPUs among the multiple CPUs include large cores. Some of the multiple CPUs include large cores.
  • the CPU includes a small core, and the device includes: an acquisition module, used to obtain tasks to be processed; a prediction module, used to predict the comprehensive cost generated by the target CPU processing tasks in multiple CPUs, and the comprehensive cost is generated based on the target CPU processing tasks.
  • the input and output IO cost, CPU cost and additional cost are obtained.
  • the additional cost is determined according to the demand characteristics of the task or the running status of the target CPU; the scheduling module is used to schedule the task when the comprehensive cost generated by the target CPU processing task meets the scheduling conditions. to the target CPU.
  • the target CPU includes a large core
  • the scheduling module is specifically configured to: determine the priority of scheduling the task to the target CPU based on the comprehensive cost generated by the target CPU processing the task; when the task is scheduled to the priority of the target CPU When it is the first priority, the task is scheduled to the target CPU.
  • the target CPU includes a small core
  • the scheduling module is specifically configured to: determine the priority of scheduling the task to the target CPU based on the comprehensive cost generated by the target CPU processing the task; when the task is scheduled to the priority of the target CPU When it is the second priority, the task is scheduled to the target CPU, and the second priority is lower than the first priority.
  • the target CPU includes a large core
  • the task is a multi-threaded task.
  • the scheduling module is specifically used to: determine the priority of scheduling the task to the target CPU according to the comprehensive cost generated by the target CPU processing the task; when the task is scheduled
  • the priority of the target CPU is the third priority
  • the task is scheduled to the target CPU and the CPU including the small core.
  • the third priority is lower than the first priority, and the third priority is higher than the second priority.
  • the prediction module is specifically configured to predict the comprehensive cost generated by the target CPU processing task in multiple CPUs when the available resources of the CPU including the large core are less than a threshold.
  • the additional cost is determined by any one of the number of times a task requests the target CPU, the number of times the target CPU is requested within a unit time, the operating frequency of the target CPU, and the number of processor cores included in the target CPU.
  • One or more species are obtained.
  • the device further includes: a presentation module, used to present the configuration interface; a configuration module, used to configure the task scheduling mode of the heterogeneous hardware platform in response to the user's operation on the configuration interface.
  • the task scheduling Modes include online mode or offline mode.
  • the technical effects of the third aspect and each embodiment in the third aspect can be found in the corresponding first aspect and the first aspect. The technical effects of each embodiment will not be described in detail here.
  • embodiments of the present application also provide a task scheduling device, which is applied to a heterogeneous hardware platform.
  • the heterogeneous hardware platform includes multiple processor cores, and the multiple processor cores include large cores and small cores.
  • the device includes: an acquisition module, used to acquire tasks to be processed; a prediction module, used to predict the comprehensive cost generated by the target processor core processing tasks in multiple processor cores, and the comprehensive cost is generated based on the target processor core processing tasks.
  • the input and output IO cost, calculation cost and additional cost are obtained.
  • the additional cost is determined according to the demand characteristics of the task or the operating status of the target processor core; scheduling Module, used to schedule the task to the target processor core when the comprehensive cost generated by the target processor core for processing the task meets the scheduling conditions.
  • the scheduling module is specifically configured to schedule the task to the target processor core when the comprehensive cost generated by the target processor core processing the task is the smallest among the comprehensive costs generated by multiple processor cores processing the task separately.
  • the task scheduling device provided in the fourth aspect corresponds to the task scheduling method provided in the second aspect
  • the technical effects of the fourth aspect and each embodiment in the fourth aspect can be found in the corresponding second aspect and the second aspect. The technical effects of each embodiment will not be described in detail here.
  • embodiments of the present application provide a computing device, including: a processor and a memory; the memory is used to store instructions, and when the computing device is running, the processor executes the instructions stored in the memory, so that the calculation
  • the device performs the task scheduling method described in the above first aspect or any implementation of the first aspect, or causes the computing device to perform the task scheduling method described in the above second aspect or any implementation of the second aspect.
  • the memory can be integrated into the processor or independent of the processor.
  • the computing device may also include a bus. Among them, the processor is connected to the memory through a bus.
  • the memory may include readable memory and random access memory.
  • embodiments of the present application provide a computing device, including: a processor and a memory; the memory is used to store instructions, and when the computing device is running, the processor executes the instructions stored in the memory, so that the calculation The device executes the task scheduling method described in the above second aspect or any implementation of the second aspect.
  • the memory can be integrated into the processor or independent of the processor.
  • the computing device may also include a bus. Among them, the processor is connected to the memory through a bus.
  • the memory may include readable memory and random access memory.
  • embodiments of the present application provide a chip, including a power supply circuit and a processing circuit.
  • the power supply circuit is used to supply power to the processing circuit.
  • the processing circuit is used to obtain tasks to be processed, predict heterogeneous
  • the target CPU processes the input and output IO costs, CPU costs and additional costs generated by the task.
  • the additional costs are determined according to the demand characteristics of the task or the operating status of the target CPU; when the target CPU processes the The comprehensive cost generated by the task satisfies the scheduling condition, and the processing circuit schedules the task to the target CPU.
  • embodiments of the present application provide a chip, including a power supply circuit and a processing circuit.
  • the power supply circuit is used to supply power to the processing circuit.
  • the processing circuit is used to obtain tasks to be processed; predicting the The comprehensive cost generated by the target processor core in the plurality of processor cores processing the task.
  • the multiple processor cores include large cores and small cores.
  • the comprehensive cost is generated based on the target processor core processing the task.
  • the input and output IO cost, calculation cost and additional cost are obtained.
  • the additional cost is determined according to the demand characteristics of the task or the operating status of the target processor core; when the target processor core processes the task, the The comprehensive cost satisfies the scheduling condition, and the task is scheduled to the target processor core.
  • inventions of the present application also provide a task scheduling system.
  • the task scheduling system includes an upper-layer application for generating tasks to be processed, and the method described in any implementation of the above-mentioned third aspect to the fourth aspect.
  • embodiments of the present application further provide a computer-readable storage medium in which a program or instructions are stored.
  • the computer-readable storage medium When run on a computer, the computer-readable storage medium causes the above-mentioned first aspect or the first aspect of the invention.
  • the task scheduling method described in the first device is executed in any implementation manner, or the task scheduling method described in the above second aspect or any implementation manner of the second aspect is executed.
  • embodiments of the present application also provide a computer program product containing instructions that, when run on a computer, cause the computer to perform the task scheduling described in the first aspect or any implementation of the first aspect.
  • method, or The other causes the computer to execute the task scheduling method described in the above second aspect or any implementation of the second aspect.
  • Figure 1 is a schematic structural diagram of an exemplary task scheduling system provided by an embodiment of the present application.
  • Figure 2 is a schematic flowchart of a task scheduling method provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of another task scheduling method provided by an embodiment of the present application.
  • Figure 4 is a schematic structural diagram of another task scheduling system provided by an embodiment of the present application.
  • Figure 5 is a schematic system structure diagram of an exemplary distributed database application scenario provided by the embodiment of the present application.
  • Figure 6 is a schematic structural diagram of an exemplary distributed database provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of yet another task scheduling method provided by an embodiment of the present application.
  • Figure 8 is a schematic structural diagram of a task scheduling device provided by an embodiment of the present application.
  • Figure 9 is a schematic structural diagram of another task scheduling device provided by an embodiment of the present application.
  • Figure 10 is a schematic diagram of the hardware structure of a computing device provided by an embodiment of the present application.
  • the task scheduling system 100 may include an upper-layer application 101 , a task scheduling device 102 and a heterogeneous hardware platform 103 . Moreover, data communication can be performed between the upper-layer application 101, the task scheduling device 102, and the heterogeneous hardware platform.
  • the upper-layer application 101 includes one or more applications 1011, which are used to generate tasks required to be processed by the heterogeneous hardware platform 103, such as data reading and writing tasks, data calculation tasks, etc.
  • the upper-layer application 101 also includes a main thread 1012 and a thread pool controller 1013.
  • the main thread 1012 can issue an instruction to create a thread pool to the thread pool controller 1013, so that the thread pool controller 1013 Based on this instruction, a thread pool 1014 is created, and a plurality of threads are further created in the thread pool 1014.
  • the main thread 1012 can receive the task issued by the application 1011, and use the thread pool controller 1013 to allocate one or more threads from the thread pool 1014 to the task.
  • the task when one thread is assigned to the task, the task can be called a single-threaded task; when multiple threads are assigned to the task, the task can be called a multi-threaded task.
  • the thread pool controller 1013 can provide the single-threaded task or the multi-threaded task to the task scheduling device 102 for task scheduling, that is, the task is scheduled to be processed on one or more processor cores in the heterogeneous hardware platform 103.
  • the task scheduling device 102 uses the cost estimation model to predict the cost incurred by the processor cores in the heterogeneous hardware platform 103 for executing the task.
  • the cost may be, for example, the sum of the IO cost and the CPU cost incurred by the processor core executing the task. Then, The task scheduling device 102 schedules the task to the processor core with the lowest cost for processing based on the cost incurred by each processor core in processing the task.
  • the heterogeneous hardware platform 103 includes multiple CPUs, and the multiple CPUs adopt a large and small core architecture. That is, some CPUs in the heterogeneous hardware platform 103 include large cores (that is, processor cores with stronger performance), and the remaining CPUs include small cores. (i.e. a weaker processor core). As shown in FIG. 1 , CPU1 in the heterogeneous hardware platform 103 includes four large cores, and CPU2 in the heterogeneous hardware platform 103 includes four small cores.
  • the task scheduling device 102 uses the cost estimation model to predict the task cost, there may be one or some processor cores that always have the smallest cost in executing the task. Therefore, the task scheduling device 102 may frequently send requests to this processor core.
  • the processor core schedules tasks, which can easily cause too many tasks on the processor core and cause task blocking. As a result, some services on the processor core are queued for processing, resulting in low task processing efficiency.
  • the remaining processor cores in the heterogeneous hardware platform 103 may be idle for a long time because they are not assigned tasks, resulting in a waste of resources. In this way, the task scheduling device 102 is prone to unreasonable task scheduling.
  • the task scheduling device 102 predicts the cost incurred by multiple CPUs in the heterogeneous hardware platform 103 to process the task respectively.
  • the cost is the IO cost, CPU cost and additional cost generated by the CPU processing the task.
  • the cost is calculated.
  • the additional cost can be used to evaluate the additional cost generated by the CPU executing the task, which is based on the demand characteristics of the task (such as the number of instructions the task requires the CPU to execute, or the size of the memory used to process the task, etc.), or The operating status of the CPU (such as operating frequency, etc.) is determined. In this way, when the cost incurred by the target CPU among the multiple CPUs in processing the task satisfies the scheduling condition, the task scheduling device 102 schedules the task to the target CPU so that the target CPU processes the task.
  • the task scheduling device 102 When the task scheduling device 102 performs task scheduling, on the basis of taking into account the IO cost and the CPU cost, it also evaluates the additional cost corresponding to the CPU processing the task according to the requirements of the task or the running status of the CPU, which makes the task scheduling device 102 can use additional costs to correct task scheduling and achieve optimal scheduling, which can avoid the CPU in the heterogeneous hardware platform 103 being assigned an excessive number of tasks, resulting in low processing efficiency of some tasks, thereby improving task scheduling. rationality.
  • the task scheduling device 102 can be implemented through software, for example, through at least one of a virtual machine, a container, and a computing engine.
  • the task scheduling device 102 can be implemented by a physical device including a processor, where the processor can be a CPU, an application-specific integrated circuit (ASIC), a programmable logic device (PLD), Complex programmable logical device (CPLD), field-programmable gate array (FPGA), general array logic (GAL), system on chip (SoC), software Any kind of processor such as software-defined infrastructure (SDI) chip, artificial intelligence (AI) chip, or any combination thereof.
  • the task scheduling device 102 can be implemented through software + hardware.
  • the task scheduling device 102 can collect relevant information of the heterogeneous hardware platform 103 during runtime through hardware, and implement task scheduling through software.
  • the task scheduling device 102 can be deployed independently of the upper-layer application 101 and the heterogeneous hardware platform 103, as shown in Figure 1 .
  • the task scheduling device 102 can be integrated and deployed in the upper-layer application 101, or integrated and deployed in the heterogeneous hardware platform 103, which is not limited in this embodiment.
  • the heterogeneous hardware platform 103 may include a larger number of CPUs, and different CPUs may be located in the same computing device, or may be located in different computing devices, which is not limited in this embodiment. .
  • the task scheduling system 100 shown in FIG. 1 can be applied to database application scenarios.
  • the upper-layer application can generate data processing tasks for data in the database, and can schedule the data processing tasks to heterogeneous servers through the task scheduling device 102.
  • the execution is performed on the CPU in the hardware platform 103, so that the CPU reads the data in the database and performs corresponding operations on it, and then feeds back the results of the operations to the upper-layer application 101.
  • the task scheduling device 100 can also be applied in other applicable scenarios, which is not limited in this application.
  • FIG. 2 it is a schematic flowchart of a task scheduling method in an embodiment of the present application.
  • This method can be applied to the task scheduling system 100 shown in FIG. 1 .
  • this method can also be applied to other applicable task scheduling systems.
  • the following is an example of application to the task scheduling system 100 shown in Figure 1.
  • the method may specifically include:
  • S201 The upper-layer application 101 generates a task to be processed.
  • the upper-layer application 101 includes one or more applications, and during operation, the one or more applications can request the heterogeneous hardware platform 103 to provide corresponding data processing services, such as data reading and writing services, business processing services, etc. Data processing services, etc.
  • the upper-layer application 101 can generate a task to be processed.
  • the task can be a single-threaded task, that is, execute one thread to process the task; or the task can be a multi-threaded task, that is, execute multiple threads to implement the task.
  • multiple threads can be executed concurrently to speed up the processing of the task.
  • the upper-layer application 101 can provide the task to the task scheduling device 102 so that the task scheduling device 102 schedules the task to the CPU in the heterogeneous hardware platform 103 for processing.
  • the task scheduling device 102 predicts the comprehensive cost generated by the target CPU among the multiple CPUs included in the heterogeneous hardware platform 103 when processing the task.
  • the comprehensive cost is obtained based on the IO cost, CPU cost and additional cost generated by the target CPU processing the task. , where the additional cost is determined based on the demand characteristics of the task or the running status of the target CPU.
  • the task scheduling device 102 can predict the comprehensive cost generated by the CPU processing the task.
  • the task scheduling device 102 predicts the comprehensive cost generated by one CPU (hereinafter referred to as the target CPU) for processing the task as an example.
  • the process of predicting the comprehensive cost generated by the remaining CPUs processing the task is similar. , can be understood by reference.
  • the task scheduling device 102 can perform performance analysis on the acquired task in order to obtain relevant information of the task, such as task type, number of instructions executed in a single clock cycle (cycle) required by the task, clock cycle, and the number of instructions executed by the task. The total number of instructions that need to be executed, the number of data pages that need to be read and written by the task, etc.
  • task types can include computing-intensive (core bound) and memory-intensive (memory bound). Computationally intensive, used to indicate that the task relies more on the CPU than on the memory. That is, when the CPU performs the task, most of the operations performed are arithmetic operations, which can be characterized by numerical values (such as 0 to 1).
  • the task scheduling device 102 can also cache the relevant information of the task, so that after receiving the same task, the task scheduling device 102 can directly read the relevant information of the task from the cache without re-executing the task. Performance analysis to speed up task processing efficiency.
  • the task scheduling device 102 can predict the IO cost, CPU cost and additional cost generated by the target CPU processing the task based on the relevant information of the task.
  • the IO cost refers to the cost incurred by the CPU to perform data read and write operations when processing the task.
  • the task scheduling device 102 can calculate the IO cost through formula (1).
  • IO cost k1*M*P*T formula (1)
  • k1 is a normalization coefficient, which is used to limit the calculated IO cost to the value range of 0 to 1, and its value can be a positive number less than 1.
  • M represents memory bound, which is used to indicate the dependence of the task on memory. Its value range can be (0,1).
  • P is the number of data pages that the CPU needs to access when processing this task, which is a positive integer greater than or equal to 1; usually, since the amount of data for each data page is usually specified, the number of data pages is the same as a single
  • the product of the amount of data on the data page can be used as the amount of data to be read and written for the task.
  • T is the conversion overhead, which is a positive number greater than 0.
  • the task scheduling device 102 can be pre-configured with conversion overheads corresponding to different storage media, so that after obtaining the task to be processed, the conversion overhead can be determined based on the storage medium where the data that the task needs to access is located.
  • the CPU cost refers to the cost incurred by the CPU in performing data operations when processing the task. For example, in database application scenarios, the processing of data filtering, projection, predicate pushdown and other operators in query tasks, or data addressing, data calculation, etc., all require CPU computing resources. For example, the task scheduling device 102 can calculate the CPU cost through formula (2).
  • CPU cost k2*C*I formula (2)
  • k2 is a normalization coefficient, which is used to limit the calculated CPU cost to the value range of 0 to 1, and its value can be a positive number less than 1.
  • C represents core bound, which is used to indicate the dependence of the task on the CPU. Its value range can be (0,1). I indicates the total number of instructions that need to be executed for the task, which is a positive integer greater than or equal to 1.
  • the additional cost is used to evaluate the additional cost incurred by the CPU executing the task. This additional cost can be used to avoid skewing in task scheduling as much as possible, that is, to avoid assigning too many tasks to one or some CPUs.
  • the task scheduling device 102 can calculate the additional cost according to the demand characteristics of the task or the running status of the CPU.
  • the demand characteristics of the task can be, for example, the number of times the task requests the CPU (that is, the number of instructions that the CPU needs to execute); the operating status of the CPU can include, for example, the number of instructions executed by the CPU in the past unit time, and the work of the CPU. Frequency, the number of processor cores in the CPU that can be used to process tasks, etc.
  • k3 is a normalization coefficient, used to limit the calculated additional cost to the value range of 0 to 1, and its value can be a positive number less than 1.
  • Commands′ represents the number of times the task requests the CPU to execute instructions, which is a positive integer greater than or equal to 1.
  • Commands represents the number of instructions executed by the CPU in the past unit time, which can be obtained by periodically sampling data on the heterogeneous hardware platform 103. It is an integer greater than or equal to 0.
  • freq represents the working frequency of the CPU, which can be calculated based on the clock cycle of the CPU. It is a positive integer greater than or equal to 1.
  • Cores represents the number of processor cores in the CPU that can be used to process tasks. It is a positive integer greater than or equal to 1.
  • the task scheduling device 102 can also be adjusted based on the above exemplary formulas. , or use other formulas to calculate the comprehensive cost incurred by the target CPU processing tasks, etc.
  • the demand characteristics of the task may also be the size of the memory area required by the task, so that the task scheduling device 102 can The additional cost is calculated based on the amount of memory space required to perform the task.
  • the task scheduling device 102 can calculate the additional amount based on any one or more parameters of the number of times a task requests the CPU, the number of times the CPU is requested within a unit time, the operating frequency of the CPU, and the number of processor cores included in the CPU. cost.
  • the task scheduling device 102 may perform a weighted summation of the IO cost, the CPU cost, and the additional cost to calculate the comprehensive cost, etc., which is not limited in this embodiment.
  • the above-mentioned formula (1) to formula (4) used by the task scheduling device 102 to calculate the cost of processing tasks by the target CPU can be encapsulated into a cost estimation model, so that the task scheduling device 102 obtains the to-be-processed After completing the task, it can be input into the cost estimation model, so that the comprehensive cost generated by each CPU in the heterogeneous hardware platform 103 processing the task can be inferred from the cost estimation model.
  • the task scheduling device 102 can calculate the currently available resources of each CPU in the heterogeneous hardware platform 103, so that the currently available resources in the multiple CPUs included in the heterogeneous hardware platform 103 do not satisfy the task. Filter based on CPU requirements.
  • the task scheduling device 102 can calculate the filtered comprehensive cost generated by each CPU processing task in order to perform task scheduling. In this way, the number of CPUs that need to be calculated by the task scheduling device 102 to generate task processing costs can be reduced, thereby reducing the amount of calculation required by the task scheduling device 102 to implement task scheduling.
  • the task scheduling device 102 can compare whether the comprehensive cost generated by the target CPU processing the task is less than the cost threshold, and if the comprehensive cost generated by the target CPU processing the task is less than the cost threshold, the task scheduling device 102 The task can be directly scheduled to the target CPU so that the task can be processed by the target CPU. If the comprehensive cost generated by the target CPU processing the task is greater than or equal to the cost threshold, the task scheduling device 102 can determine the CPU used to execute the task from the remaining CPUs of the heterogeneous hardware platform 103 . Since the CPU usually includes multiple processor cores, when scheduling a task to the target CPU, the task scheduling device 102 may further specify one or more processor cores in the target CPU for executing the task.
  • the task scheduling device 102 determines the target CPU for executing the task based on the comprehensive cost corresponding to each CPU.
  • the task scheduling device 102 obtains a threshold value according to the comprehensive cost corresponding to each CPU, and selects the target CPU from the CPUs whose comprehensive cost is less than the threshold value. The threshold value can be adjusted.
  • the task scheduling device 102 sorts the comprehensive costs corresponding to each CPU in ascending order, so that the task scheduling device 102 can determine whether the target CPU is the CPU with the smallest or relatively small comprehensive cost.
  • the task scheduling device 102 can directly schedule the task to the target CPU, and can further specify one or more processor cores in the target CPU for executing the task; if not, the task scheduling device 102 can schedule the task Scheduling to other CPUs with the smallest or relatively small comprehensive cost so that other CPUs can handle the task.
  • the target CPU is specifically a CPU including a large core, such as CPU1 in Figure 1, and the target CPU has fewer available resources. Specifically, the available resources may be less than a threshold, etc.
  • the task scheduling device 102 can determine whether to schedule the task based on the cost incurred by the CPU including the large core in processing the task and the cost incurred by the CPU including the small core in processing the task. Scheduled to the target CPU.
  • the task scheduling device 102 may receive N tasks within a preset time period, and for each of the N tasks, the task scheduling device 102 may calculate the cost of a CPU including a large core to process the task. The ratio between the comprehensive cost generated and the comprehensive cost generated by the CPU including the small core to process the task, and then according to each CPU The corresponding comprehensive cost ratio determines the target CPU for executing the task.
  • the task scheduling device 102 can determine a threshold value based on the comprehensive cost ratio corresponding to each CPU, and schedule one or more tasks among the N tasks whose comprehensive cost ratio is less than the threshold value to include In the target CPU of the large core, one or more tasks whose comprehensive cost ratio is greater than the threshold value are scheduled to the CPU including the small core. For tasks whose comprehensive cost ratio is equal to the threshold value, the task scheduling device 102 can randomly The task is scheduled to a CPU including large cores, or the task is scheduled to a CPU including small cores.
  • the task scheduling device 102 can determine the priority of scheduling the task to each CPU based on the comprehensive cost incurred by each CPU for the task, and when determining the priority of scheduling the task to a CPU including a large core
  • the priority level is the first priority level
  • the task scheduling device 102 schedules the task to a CPU including a large core.
  • the priority of scheduling the task to the CPU including the small core is the second priority
  • the task scheduling device 102 schedules the task to the CPU including the small core.
  • the priority of the task scheduled to a CPU including a large core is the third priority, if the task is a single-threaded task, the task scheduling device 102 may randomly schedule it to a CPU including a large core or a CPU including a small core.
  • the task scheduling device 102 can schedule the task to a CPU including a large core and a CPU including a small core.
  • the first priority is higher than the second priority
  • the second priority is higher than the third priority.
  • the task scheduling device 102 can sort the cost ratios corresponding to the N tasks in order from small to large. In this way, when scheduling each task, the task scheduling device 102 can first determine whether the task belongs to the i-1 tasks at the top of the order (that is, determine whether the priority of scheduling the task to a CPU including a large core is the first priority), that is, determine whether the task belongs to the range [1, i-1]. If so, the task scheduling device 102 can schedule the task to a CPU including a large core, such as the target CPU or other CPU, and can Further specify one or more large cores on the CPU that handle the task.
  • a CPU including a large core such as the target CPU or other CPU
  • the task scheduling device can further determine whether the task belongs to the range of the i+1th to Nth tasks (that is, determine whether the priority of the task scheduled to the CPU including the small core is the second priority), That is, it is determined whether the task belongs to the range [i+1, N]. If so, the task scheduling device 102 can schedule the task to a CPU including a small core, and can further specify that one or more small cores on the CPU The core handles this task. If not, the task scheduling device can determine that the task is the i-th task in order (that is, determine that the priority of scheduling the task to the CPU including the large core is the third priority), and can randomly schedule the task to the CPU including the large core. core CPU, or schedule the task to a CPU including small cores.
  • the task scheduling device 102 can schedule the task to one or more CPUs so that multiple processor cores on the CPU are used to concurrently execute the multi-threaded task. For example, when a multi-threaded task falls within the range of [1, i-1], the multi-threaded task can be processed concurrently by multiple large cores on the target CPU; when the multi-threaded task falls within the range of [i+1, N] within, the multi-threaded task can be processed concurrently by multiple small cores; when the multi-threaded task is the i-th task in order, the task scheduling device 102 can simultaneously schedule the multi-threaded task to a CPU including a large core and a CPU including a small core CPU, so that some large cores and some small cores execute the multi-threaded task concurrently. In this way, the scheduling process of each task by the task scheduling device 102 can be simplified, while improving the task scheduling efficiency, it can also reduce the resource consumption required for task
  • the task scheduling device 102 may also adopt other implementation methods to determine whether to schedule the task according to the cost incurred by the target CPU in processing the task.
  • the task is scheduled to the target CPU, which is not limited in this embodiment.
  • the heterogeneous hardware platform 103 uses the target CPU to process the tasks provided by the task scheduling device 102.
  • the heterogeneous hardware platform 103 can use the processor core in the target CPU to process the task.
  • the heterogeneous hardware platform 103 can use one processor core in the target CPU to execute the task; and when the task is a multi-threaded task, the heterogeneous hardware platform 103 can Use multiple processor cores in the target CPU to execute the task concurrently to improve task execution efficiency.
  • the heterogeneous hardware platform 103 can also return the execution result corresponding to the task to the upper application 101 through the task scheduling device 102, such as the data access result corresponding to the query task, the data calculation result corresponding to the calculation task, etc. .
  • the task scheduling device 102 mainly schedules the task to a CPU including a large core or a CPU including a small core in the heterogeneous hardware platform 103 based on the predicted comprehensive cost generated by each CPU processing task.
  • the task scheduling device 102 can first determine whether the available resources of the large cores in the heterogeneous hardware platform 103 are sufficient, such as by periodically checking the heterogeneous hardware platform 103 Performance sampling was performed to determine the available resources of the target CPU including large cores.
  • the task scheduling device 102 can directly schedule the task to the CPU where the large core is located.
  • the available resources of the large core are limited (for example, the available resources of the large core are less than the threshold)
  • the task scheduling device 102 can determine the comprehensive cost of each CPU processing the task by executing the above steps S202 to S204. Whether to schedule the task to a CPU including large cores or to a CPU including small cores to improve the rationality of task scheduling.
  • the task scheduling device 102 can dynamically allocate hardware resources in the heterogeneous hardware platform 103 for various unknown tasks received. Moreover, due to the existence of the above additional cost, the task scheduling device 102 can perform different tasks on different tasks. The hardware resources allocated to the same task can be different at any time, so that the rationality of task scheduling can be achieved. In actual application, the task scheduling device may have two working modes, namely online mode and offline mode.
  • the task scheduling device 102 When in the online mode, the task scheduling device 102 provides online services for dynamically allocating hardware resources to various unknown tasks based on the above process.
  • the task scheduling device 102 When the task scheduling device 102 is in the offline mode, since the tasks that the heterogeneous hardware platform 103 needs to process in the offline mode are usually fixed, such as fixed batch tasks (or known tasks), the heterogeneous hardware platform can be utilized. 103 fixed CPU resources to handle the task. At this time, when scheduling these tasks, the task scheduling device 102 can directly schedule the tasks to the pre-designated CPU.
  • the task scheduling device 102 can determine whether to use the online mode or the offline mode for task scheduling based on the configuration of the user (such as an administrator, etc.).
  • the task scheduling system 100 can provide a configuration interface as shown in FIG. 3 to the outside, so that the user can specify on the configuration interface that the task scheduling device 102 currently uses an online mode or an offline mode for task scheduling.
  • the user can specify the time period during which the task scheduling device 102 adopts the online mode for task scheduling and the time period during which the offline mode is used for task scheduling on the configuration interface, so that the task scheduling device 102 can operate in different settings according to the user's configuration operations.
  • the task scheduling process is executed according to the specified task scheduling mode during the time period.
  • the user can also configure parameters for calculating the comprehensive cost generated by the CPU processing task on the configuration interface. For example, as shown in Figure 3, users can configure the normalization coefficient k (such as k1, k2, k3, etc. above), the conversion overhead corresponding to different storage media, the amount of data in each data page, etc. on the configuration interface.
  • the parameters are configured so that the task scheduling device 102 performs cost calculation based on each parameter configured by the user.
  • the task scheduling device 102 can also determine the task scheduling with the processor core as the granularity. to which processor core.
  • FIG. 4 another task scheduling method provided by the embodiment of the present application is exemplified. This method can still be applied to the task scheduling system 100 shown in FIG. 1 above. As shown in Figure 4, the method may specifically include:
  • the task scheduling device 102 predicts the comprehensive cost generated by the target processor core among the multiple processor cores included in the heterogeneous hardware platform 103 when processing the task.
  • the comprehensive cost is based on the IO cost generated by the target processor core processing the task. It is obtained by calculating the cost and the additional cost, where the additional cost is determined based on the demand characteristics of the task or the operating status of the target processor core.
  • the task scheduling device 102 can separately predict the IO cost, computing cost and additional cost generated by each processor core in the heterogeneous hardware platform 103 to process the task, where the additional cost is It can be determined based on the demand characteristics of the task or the running status of the processor core.
  • the task scheduling device 102 calculates the IO cost generated by each processor core processing the task, which can be calculated according to the following formula (5).
  • IO cost k1*M*p*T formula (5)
  • k1 is a normalization coefficient, which is used to limit the calculated IO cost to the value range of 0 to 1, and its value can be a positive number less than 1.
  • M represents memory bound, which is used to indicate the dependence of the task on memory. Its value range can be (0,1).
  • p is the number of data pages that the processor core needs to access when processing the task, which is a positive integer greater than or equal to 1.
  • T is the conversion overhead, which is a positive number greater than 0 and can be determined according to the type of storage medium that the processor core needs to read. Different types of storage media correspond to different sizes of conversion overhead.
  • k2 is a normalization coefficient, which is used to limit the calculated calculation cost to the value range of 0 to 1, and its value can be a positive number less than 1.
  • C represents core bound, which is used to indicate the dependence of the task on the processor core. Its value range can be (0,1). I indicates the total number of instructions that need to be executed for the task, which is a positive integer greater than or equal to 1.
  • k3 is a normalization coefficient, used to limit the calculated additional cost to the value range of 0 to 1, and its value can be a positive number less than 1.
  • Commands′ represents the number of times the task requests the processor core to execute instructions, which is a positive integer greater than or equal to 1.
  • Commands represents the number of instructions executed by the processor core in the past unit time, which is an integer greater than or equal to 0.
  • freq represents the operating frequency of the processor core, which is a positive integer greater than or equal to 1.
  • the above-mentioned formula (5) to formula (8) used by the task scheduling device 102 to calculate the cost of processing tasks by the target processor core can be encapsulated into a cost estimation model, so that the task scheduling device 102 obtains After the task to be processed is input, it can be input into the cost estimation model, so that the comprehensive cost generated by each processor core in the heterogeneous hardware platform 103 for processing the task can be inferred from the cost estimation model.
  • the task scheduling device 102 can compare whether the comprehensive cost generated by the target processor core processing the task is less than the cost threshold, and if the comprehensive cost generated by the target processor core processing the task is less than the cost threshold, Then the task scheduling device 102 can directly schedule the task to the target processor core so that the target processor core processes the task. If the comprehensive cost generated by the target processor core processing the task is greater than or equal to the cost threshold, the task scheduling device 102 can determine the processor core used to execute the task from the remaining processor cores of the heterogeneous hardware platform 103 .
  • the task scheduling device 102 determines the comprehensive cost corresponding to each processor core to execute the task.
  • Target processor core the task scheduling device 102 obtains a threshold value according to the comprehensive cost corresponding to each processor core, and selects the target processor core from the processor cores whose comprehensive cost is less than the threshold value. The threshold value can be adjusted.
  • the task scheduling device 102 can sort the comprehensive costs corresponding to each processor core in order from small to large, so that the task scheduling device 102 can determine whether the target processor core has the smallest comprehensive cost or the relative cost. If the CPU is smaller, the task scheduling device 102 can directly schedule the task to the target processor core. If not, the task scheduling device 102 can schedule the task to other processor cores with the smallest or relatively small overall cost. , so that other processor cores can handle the task.
  • the task scheduling device 102 may also use other implementation methods to determine whether to schedule the task to the target processor core, which is not limited in this embodiment.
  • the heterogeneous hardware platform 103 uses the target processor core to process the task provided by the task scheduling device 102.
  • the heterogeneous hardware platform 103 can use one processor core (ie, the target processor core) to perform tasks.
  • the task may be a single-threaded task, so that the target processor core can run a single thread assigned to the task to process the task.
  • the task can be a multi-threaded task, so that the target processor core can sequentially run each thread assigned to the task to process the task.
  • the task scheduling device 102 mainly schedules the task to the target processor core in the heterogeneous hardware platform 103 based on the predicted comprehensive cost incurred by each processor core in processing the task.
  • some processor cores in the heterogeneous hardware platform 103 may have difficulty processing new tasks.
  • the currently available resources of this part of the processor cores are insufficient to support the processor cores in processing the new tasks. Therefore, in a further possible implementation, the task scheduling device 102 can monitor the currently available resources of each processor core in the heterogeneous hardware platform 103, so as to monitor the currently available resources in the multiple processor cores included in the heterogeneous hardware platform 103. Processor cores that do not meet the requirements of the task are filtered.
  • the task scheduling device 102 can calculate the filtered comprehensive cost incurred by each processor core in processing the task, in order to perform task scheduling. In this way, the number of processor cores that the task scheduling device 102 needs to calculate to generate a comprehensive cost for processing tasks can be reduced, thereby reducing the amount of calculation required by the task scheduling device 102 to implement task scheduling.
  • the heterogeneous hardware platform 103 can also return the execution result corresponding to the task to the upper application 101 through the task scheduling device 102, such as the data access result corresponding to the query task, the data calculation result corresponding to the calculation task, etc. .
  • the task scheduling system 100 shown in FIG. 1 can be applied in a variety of application scenarios, for example, it can be used in the database application scenario shown in FIG. 5 .
  • the database application scenario shown in Figure 5 can be a centralized database application scenario, or a distributed database application scenario, etc.
  • the upper-layer application 101 is a database application. Specifically, it can be a client 130, a client 131, etc. that supports interaction with the user. The client 130 or the client 131 can target data in the database 110 based on the user.
  • Add, delete, modify, and query operations to generate corresponding data processing tasks (such as query tasks, etc.), and request the task scheduling device 102 to schedule the data processing tasks to the corresponding CPU (or processor core) in the heterogeneous hardware platform 103 for execution.
  • the heterogeneous hardware platform 103 can use the allocated CPU to run the database engine 120 to access the database 110 and perform corresponding data reading, writing, calculation and other operations.
  • the process of task scheduling performed by the task scheduling device 102 will be exemplified below in conjunction with a distributed database application scenario.
  • the distributed database shown in Figure 6, which includes a database engine 120 and multiple Region Server (Region Server, RS) nodes.
  • Figure 6 takes the RS node 101 and the RS node 102 as an example for illustration.
  • the database engine 120 divides the data stored and managed by the distributed database to obtain multiple partitions. Each partition includes one or more pieces of data, and data belonging to different partitions are usually different.
  • partitioning when storing and managing each piece of data in a distributed database, part of the data can be used as the primary key corresponding to the piece of data. The primary key is used Uniquely identify this piece of data in the distributed database. Therefore, the database engine 120 can perform interval division according to the possible value range of the primary key, and each divided interval corresponds to a partition.
  • the database engine 120 is also used to allocate partitions to the RS node 101 and the RS node 102.
  • the partitions assigned to each RS node can be maintained through the management table created by the database engine 120.
  • RS node 101 and RS node 102 are respectively used to perform data read and write services belonging to different partitions.
  • RS node 102 performs data read and write services belonging to partition 1 to partition N
  • RS node 103 performs data read and write services belonging to partition N+1.
  • the RS node will temporarily store the data sent by the client 130 in the memory, so the amount of data temporarily stored in the memory will continue to increase.
  • the RS node can persistently store the data in the memory to the file system.
  • it can be written to the distributed file system in the form of a file (not shown in Figure 6).
  • the file system may be a distributed file system (DFS), Hadoop distributed file system (HDFS), etc., which is not limited in this embodiment.
  • the file system is used to store data in the form of files.
  • the files here can include one or more partitions, or they can only contain part of the data of one partition.
  • FIG. 7 a schematic flowchart of another task scheduling method provided by an embodiment of the present application is shown. This method may specifically include:
  • S701 When the client 130 starts, it creates a thread pool, which includes one or more threads.
  • the client 130 can also initialize some threads when starting, such as creating log writing threads, checkpoint threads, etc.
  • the log writing process can be used to save the data processing records in the database to the log and write them into the corresponding storage area, such as a persistent storage area, etc.
  • the checkpoint thread is used to control the coordination and synchronization between the data files, control files and redo log files in the database, so that when there is an exception in the database, the data can be restored to the correct data before the checkpoint.
  • S702 The client 130 generates a query task.
  • the client 130 can generate a corresponding query task according to the SQL statement input by the user.
  • user operations on the database include query (select), update (update), delete (delete), add (insert), or other operations.
  • this embodiment takes the user's request to query data in the database as an example for illustrative description.
  • the client 130 can perform syntax analysis and semantic analysis on the SQL statement input by the user.
  • syntax analysis refers to using the syntax rules of the SQL language to verify whether there are grammatical errors in the SQL statement
  • semantic analysis refers to analyzing whether the semantics of the SQL statement is legal.
  • the client 130 can obtain the initial plan tree, which indicates the execution plan for accessing the data.
  • the client 130 may then utilize one or more optimizers to optimize the initial plan tree to improve the data Access efficiency.
  • the client 130 determines based on multiple operators included in the optimized plan tree, such as scan operator, aggregate operator, sort operator, filter operator, etc. Threads that execute each operator in the plan tree.
  • the query task indicates multiple threads that execute the operators in the plan tree.
  • the client 130 may also use other devices in the task scheduling system to implement the above process of generating query tasks, which is not limited in this embodiment.
  • the client 130 sends the query task to the query task scheduling device 102 to request the task scheduling device 102 to allocate corresponding hardware resources to the query task.
  • step S704 The task scheduling device 102 determines whether the available resources of the large core in the heterogeneous hardware platform 103 are greater than the threshold. If yes, step S705 is executed; if not, step S706 is executed.
  • the task scheduling device 102 can continuously monitor the resource usage in each CPU in the heterogeneous hardware platform 103, so as to determine whether to perform task scheduling directly according to the available resources in the heterogeneous hardware platform 103, or according to the requirements for processing the task.
  • the generated comprehensive cost is used for task scheduling.
  • the task scheduling device 102 schedules the query task to the CPU including large cores in the heterogeneous hardware platform 103, and continues to execute step S709.
  • the task scheduling device 102 can directly schedule the query task to the CPU including the large core for processing, thereby utilizing the high computing power of the large core to improve task processing efficiency.
  • the task scheduling device 102 predicts the comprehensive cost generated by each CPU in the heterogeneous hardware platform 103 when processing the query task.
  • the comprehensive cost generated by each CPU processing the query task is the IO cost generated by the CPU processing the query task, and the CPU The sum of the cost and the additional cost, where the additional cost is determined based on the demand characteristics of the task and the running status of the CPU.
  • the task scheduling device 102 can perform task scheduling according to the comprehensive cost generated by each CPU processing the query task, thereby improving the rationality of task scheduling.
  • the task scheduling device 102 can perform performance analysis on the query task to determine the task type corresponding to the query task, the number of instructions executed in a single clock cycle (cycle) required by the task, the clock cycle, and the number of instructions that the task needs to execute. Information such as the total number of instructions and the number of data pages required to be read and written by the task. Then, the task scheduling device 102 can calculate the IO cost, CPU cost and additional cost generated by each CPU in the heterogeneous hardware platform 103 processing the query task based on this information. Among them, the task scheduling device 102 can calculate the IO cost, CPU cost and additional cost generated by each CPU processing the query task based on the formula (1) to formula (3) in the embodiment shown in FIG. 2, and further calculate it according to the formula ( 4) Calculate the total cost incurred by each CPU in processing the query task. For details, please refer to the foregoing relevant descriptions. This embodiment will not be repeated here.
  • the task scheduling device 102 determines the target CPU with the smallest cost for processing the query task based on the cost incurred by each CPU in processing the query task.
  • the task scheduling device 102 schedules the query task to the target CPU in the heterogeneous hardware platform 103.
  • the task scheduling device schedules the query task to the target CPU according to the comprehensive cost generated by each CPU processing the query task. Please refer to the related steps S202 and S203 in the embodiment shown in Figure 2. The description will not be repeated here.
  • the heterogeneous hardware platform 103 processes the query task and generates query results.
  • the heterogeneous hardware platform 103 can use a CPU including a large core or the above-mentioned target CPU to execute the query task and generate a query result, which includes the data that the user needs to query.
  • the heterogeneous hardware platform 103 can run the database engine 120 based on a CPU including a large core or the above-mentioned target CPU, and retrieve data from the memory of the RS node 101 or RS node 102 in the distributed database according to the query task. Read the data (and perform corresponding data operations) to get the query results.
  • the database engine 120 can further access the file system connected to the distributed database to read data from the file system (and perform corresponding data operations) , get the query results.
  • the heterogeneous hardware platform 103 returns the query result corresponding to the query task to the task scheduling device 102.
  • the task scheduling device 102 feeds back the query result to the database application, so that the database application provides the query result to the user.
  • the task scheduling method provided by the present application is described in detail above with reference to FIGS. 1 to 7 .
  • the task scheduling device and computing device provided by the present application will be described with reference to FIGS. 8 , 9 and 10 .
  • embodiments of the present application also provide a task scheduling device.
  • FIG. 8 a schematic diagram of a task scheduling device provided by an embodiment of the present application is shown.
  • the task scheduling device 800 shown in Figure 8 is applied to a heterogeneous hardware platform.
  • the heterogeneous hardware platform includes multiple CPUs. Some of the multiple CPUs include large cores. Some of the multiple CPUs include large cores. CPUs include small cores.
  • task scheduling device 800 includes:
  • Prediction module 802 is used to predict the comprehensive cost generated by the target CPU processing the task among the multiple CPUs.
  • the comprehensive cost is based on the input and output IO cost, CPU cost and additional cost generated by the target CPU processing the task.
  • the additional cost is determined based on the demand characteristics of the task or the operating status of the target CPU;
  • the scheduling module 803 is configured to schedule the task to the target CPU when the comprehensive cost generated by the target CPU processing the task satisfies the scheduling condition.
  • the target CPU includes a large core
  • the scheduling module 803 is used to:
  • the task scheduled to the target CPU is the first priority
  • the task is scheduled to the target CPU.
  • the target CPU includes a small core
  • the scheduling module 803 is used to:
  • the priority of the task scheduled to the target CPU is a second priority
  • the task is scheduled to the target CPU, and the second priority is lower than the first priority
  • the target CPU includes a large core
  • the task is a multi-threaded task
  • the scheduling module 803 is used to:
  • the priority of the task scheduled to the target CPU is a third priority
  • the task is scheduled to the target CPU and a CPU including a small core, and the third priority is lower than the first priority, And the third priority is higher than the second priority.
  • the prediction module 802 is used to:
  • the additional cost is based on the number of times the task requests the target CPU, the number of times the target CPU is requested within the unit time, the operating frequency of the target CPU, the target The CPU includes any one or more of the number of processor cores.
  • the task scheduling device 800 further includes:
  • the configuration module 805 is configured to configure the task scheduling mode of the heterogeneous hardware platform in response to the user's operation on the configuration interface.
  • the task scheduling mode includes an online mode or an offline mode.
  • the task scheduling device 800 provided in this embodiment corresponds to the task scheduling method in the embodiment shown in Figure 2. Therefore, the functions of each module in this embodiment and the technical effects thereof can be found in Figure 2. Relevant parts in the embodiment are described and will not be repeated here.
  • embodiments of the present application also provide a task scheduling device.
  • FIG. 9 a schematic diagram of a task scheduling device provided by an embodiment of the present application is shown.
  • the task scheduling device 900 shown in Figure 9 is applied to a heterogeneous hardware platform.
  • the heterogeneous hardware platform includes multiple processor cores.
  • the multiple processor cores include large cores and small cores.
  • the device task scheduling Device 900 includes:
  • Acquisition module 901 used to obtain tasks to be processed
  • the prediction module 902 is used to predict the comprehensive cost generated by the target processor core among the plurality of processor cores when processing the task.
  • the comprehensive cost is based on the input and output IO generated by the target processor core processing the task.
  • the additional cost is determined based on the demand characteristics of the task or the operating status of the target processor core;
  • the scheduling module 903 is configured to schedule the task to the target processor core when the comprehensive cost generated by the target processor core processing the task meets the scheduling condition.
  • the scheduling module 903 is configured to: when the comprehensive cost generated by the target processor core processing the task is among the comprehensive costs generated by the multiple processor cores processing the task respectively. Minimally, schedule the task to the target processor core.
  • the task scheduling device 900 provided in this embodiment corresponds to the task scheduling method in the embodiment shown in Figure 4. Therefore, the functions of each module in this embodiment and the technical effects thereof can be found in Figure 4. Relevant parts in the embodiment are described and will not be repeated here.
  • the computing device 1000 may include a communication interface 1010 and a processor 1020.
  • the computing device 1000 may also include a memory 1030.
  • the memory 1030 may be disposed inside the computing device 1000 or may be disposed outside the computing device 1000 .
  • each action performed by the task scheduling device in the embodiments shown in FIG. 2 and FIG. 4 can be implemented by the processor 1020.
  • the processor 1020 can obtain the task scheduling request through the communication interface 1010, and be used to implement any of the methods executed in Figure 2 and Figure 4.
  • each step of the processing flow can complete the methods executed in FIG. 2 and FIG. 4 through integrated logic circuits of hardware in the processor 1020 or instructions in the form of software.
  • the program code executed by the processor 1020 to implement the above method may be stored in the memory 1030.
  • the memory 1030 and the processor 1020 are connected, such as coupling connection, etc.
  • Some features of the embodiments of the present application may be implemented/supported by the processor 1020 executing program instructions or software codes in the memory 1030.
  • the software components loaded on the memory 1030 can be summarized functionally or logically, for example, the prediction module 802 and the scheduling module 803 shown in Figure 8, or the prediction module 902 and the scheduling module 903 shown in Figure 9.
  • the function of the acquisition module 801 shown in FIG. 8 or the acquisition module 901 shown in FIG. 9 can be implemented by the communication interface 1010.
  • Any communication interface involved in the embodiments of this application may be a circuit, bus, transceiver, or any other device that can be used for information exchange.
  • the communication interface 1010 in the computing device 1000 for example, the other device may be a device connected to the computing device 1000, or the like.
  • the processor involved in the embodiments of this application may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, which may implement or Execute each method, step and logical block diagram disclosed in the embodiment of this application.
  • a general-purpose processor may be a microprocessor or any conventional processor, etc.
  • the steps of the methods disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware processor for execution, or can be executed by a combination of hardware and software modules in the processor.
  • the coupling in the embodiment of this application is an indirect coupling or communication connection between devices, modules or modules, which may be in electrical, mechanical or other forms, and is used for information interaction between devices, modules or modules.
  • the processor may operate in conjunction with the memory.
  • the memory can be a non-volatile memory, such as a hard disk or a solid state drive, or a volatile memory, such as a random access memory.
  • Memory is, but is not limited to, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
  • the embodiments of the present application do not limit the specific connection medium between the above communication interface, processor and memory.
  • the memory, processor and communication interface can be connected through a bus.
  • the bus can be divided into address bus, data bus, control bus, etc.
  • embodiments of the present application also provide a chip, including a power supply circuit and a processing circuit.
  • the power supply circuit is used to supply power to the processing circuit.
  • the processing circuit is used to obtain tasks to be processed and predict The comprehensive cost generated by the target CPU among multiple CPUs included in the heterogeneous hardware platform processing the task. Some CPUs among the multiple CPUs include large cores, and some CPUs among the multiple CPUs include small cores.
  • the comprehensive cost is based on The target CPU processes the input and output IO costs, CPU costs and additional costs generated by the task. The additional costs are determined according to the demand characteristics of the task or the operating status of the target CPU; when the target CPU The comprehensive cost generated by processing the task satisfies the scheduling condition, and the processing circuit schedules the task to the target CPU.
  • the power supply circuit includes, but is not limited to, at least one of the following: a power supply subsystem, a power management chip, a power consumption management processor, or a power consumption management control circuit.
  • embodiments of the present application also provide a computer storage medium, which stores a software program.
  • the software program can implement any one or more of the above.
  • the embodiment provides a method executed by the task scheduling device 102.
  • the computer storage medium may include: U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other various media that can store program codes.
  • embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer or other programmable device that directs a computer or other programmable device to work in a specific way. in a machine-readable memory such that instructions stored in the computer-readable memory produce an article of manufacture that includes instruction means that implements a process or processes in a flowchart and/or a block or blocks in a block diagram function specified in.
  • These computer program instructions may also be loaded onto a computer or other programmable device such that a series of operational steps are performed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide The steps used to implement the functionality specified in a process or processes in a flowchart and/or in a block or blocks in a block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

提供一种任务调度方法,该方法应用于包括多个CPU的异构硬件平台,该多个CPU采用大小核架构;在调度任务过程中,获取待处理的任务,并预测该多个CPU中的目标CPU处理该任务所产生的综合代价,该综合代价是根据目标CPU处理该任务产生的IO代价、CPU代价以及附加代价得到的,其中,附加代价根据该任务的需求特征或目标CPU的运行状态进行确定,从而当目标CPU处理该任务产生的综合代价满足调度条件,将该任务调度至目标CPU。如此,利用附加代价能够对任务调度进行纠偏,实现最优调度,避免异构硬件平台中的CPU因为分配到过多数量的任务而导致部分任务的处理效率较低,从而实现提高任务调度的合理性。此外,本申请还公开相应的任务调度装置、系统及相关设备。

Description

任务调度方法、装置、系统及相关设备
本申请要求于2022年6月2日提交中国专利局、申请号为202210625934.9、发明名称为“一种数据处理方法”的中国专利申请的优先权,以及于2022年7月28日提交的申请号为202210901085.5、发明名称为“任务调度方法、装置、系统及相关设备”的中国专利申请的优先权,前述两件专利申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,尤其涉及一种任务调度方法、装置、系统及相关设备。
背景技术
目前,多处理器核(core)架构,在终端、服务器等计算设备中得到广泛应用。在基于多处理器核架构处理任务时,通常会利用代价估算模型,预先估算各个处理器核执行该任务所产生的代价,包括输入输出(input/output,IO)代价(即读写数据所产生的代价)、CPU代价(即CPU执行运算所产生的代价),从而将该任务调度至代价最小的处理器核上,以使得多处理器核的性能达到较高水平。
但是,在包括大核以及小核的大小核架构(big.little architecture)中,在根据代价估算模型所预估出的代价进行任务调度时,容易出现任务调度不合理的问题,如部分处理器核分配到的任务数量过多而发生任务阻塞,同时另一部分处理器核未分配到任务而处于闲置状态,从而影响任务的处理效率。
发明内容
提供一种任务调度方法、任务调度装置、任务调度系统、计算设备、芯片、计算机可读存储介质以及计算机程序产品,以提高任务调度的合理性,从而提高任务的处理效率。
第一方面,本申请实施例提供一种任务调度方法,该方法应用于包括多个CPU的异构硬件平台,该多个CPU可以采用大小核架构,即多个CPU中的部分CPU包括大核,多个CPU中的部分CPU包括小核;在进行任务调度的过程中,获取待处理的任务,并预测该多个CPU中的目标CPU处理该任务所产生的综合代价,该综合代价是根据目标CPU处理该任务产生的IO代价、CPU代价以及附加代价得到的,其中,附加代价根据该任务的需求特征或目标CPU的运行状态进行确定,从而当目标CPU处理该任务产生的综合代价满足调度条件,将该任务调度至目标CPU。
由于在进行任务调度时,在考虑到IO代价和CPU代价的基础上,还根据任务的需求或CPU的运行状态,评估CPU处理该任务所对应的附加代价,从而可以利用附加代价对任务调度进行纠偏,实现最优调度,避免异构硬件平台中的CPU因为分配到过多数量的任务而导致部分任务的处理效率较低,从而实现提高任务调度的合理性。
进一步地,当目标CPU处理该任务产生的综合代价不满足调度条件,则可以将该任务调度至异构硬件平台中的其他CPU,如调度至综合代价满足该调度条件的其他CPU。
在一种可能的实施方式中,目标CPU为包括大核的CPU,从而在进行任务调度时,具体可以是根据该目标CPU处理任务产生的综合代价,确定将该任务调度至目标CPU的优先级, 并且,当该任务调度至目标CPU的优先级为第一优先级时,将该任务至目标CPU。相应的,当该任务调度至目标CPU的优先级不为第一优先级时,可以将该任务调度至其他CPU。如此,可以将任务调度至综合代价对应的优先级较高的CPU中,以便利用大核的高性能算力处理任务,从而可以提高任务的处理效率。
在一种可能的实施方式中,目标CPU为包括小核的CPU,则在进行任务调度时,具体可以是根据该目标CPU处理任务产生的综合代价,确定将该任务调度至目标CPU的优先级,并且,当该任务调度至目标CPU的优先级为第二优先级时,将该任务至目标CPU,该第二优先级低于第一优先级。相应的,当该任务调度至目标CPU的优先级不为第二优先级时,可以将该任务调度至其他CPU。如此,可以将任务调度至综合代价对应的优先级较低的CPU中,以便利用小核的高性能算力处理任务,减少大核当前的任务负担,从而提高任务调度的合理性。
在一种可能的实施方式中,目标CPU为包括大核的CPU,则在调度多线程任务时,具体可以是根据该目标CPU处理任务产生的综合代价,确定将该任务调度至目标CPU的优先级,并且,当该任务调度至目标CPU的优先级为第三优先级时,将该任务至目标CPU以及包括小核的CPU,该第三优先级低于第一优先级,且第三优先级高于第二优先级。如,可以利用包括大核的CPU以及包括小核的CPU并行处理该多线程任务,在提高任务调度合理性的同时,也能尽可能提高任务处理效率。
在一种可能的实施方式中,在预测多个CPU中的目标CPU处理该任务产生的综合代价时,具体可以是当包括大核的CPU的可用资源大于阈值时,预测该多个CPU中的目标CPU处理任务产生的综合代价,以便在大核的可用资源有限的情况下,通过预测CPU处理任务所产生的综合代价来提高任务调度的合理性,避免包括大核的CPU上分配到的任务数量过多而影响部分任务的处理效率。
可选地,当包括大核的CPU的可用资源大于或者等于阈值时,可以直接将该任务调度至包括大核的CPU,以便利用大核的高性能算力提高任务处理效率。
在一种可能的实施方式中,目标CPU处理任务所产生的附加代价,根据该任务向目标CPU请求的次数、目标CPU在单位时间内被请求的次数、目标CPU的工作频率、目标CPU包括的处理器核的数量中的任意一种或多种得到。
在一种可能的实施方式中,在进行任务调度之前,还可以呈现配置界面,并响应于用户在该配置界面的操作,对异构硬件平台的任务调度模式进行配置,该任务调度模式包括在线模式或离线模式。其中,当处于在线模式时,可以根据CPU处理任务产生的综合代价进行任务调度,而当处于离线模式时,可以将任务调度至固定的一个或者多个CPU中。如此,可以在不同时间段或者不同场景实现任务的灵活、合理调度。
第二方面,本申请实施例提供一种任务调度方法,该方法应用于包括多个处理器核的异构硬件平台,该多个处理器核可以采用大小核架构,即多个处理器核中的部分处理器核包括大核,多个处理器核中的部分处理器核包括小核;在进行任务调度的过程中,获取待处理的任务,并预测该多个处理器核中的目标处理器核处理该任务所产生的综合代价,该综合代价是根据目标处理器核处理该任务产生的IO代价、计算代价以及附加代价得到的,其中,附加代价根据该任务的需求特征或目标处理器核的运行状态进行确定,从而当目标处理器核处理该任务产生的综合代价满足调度条件,将该任务调度至目标处理器核。
由于在进行任务调度时,在考虑到IO代价和计算代价的基础上,还根据任务的需求或处理器核的运行状态,评估处理器核处理该任务所对应的附加代价,从而可以利用附加代价对 任务调度进行纠偏,实现最优调度,避免异构硬件平台中的处理器核因为分配到过多数量的任务而导致部分任务的处理效率较低,从而实现提高任务调度的合理性。
在一种可能的实施方式中,在进行任务调度时,当目标处理器核处理任务产生的综合代价在多个处理器核分别处理该任务所产生的综合代价中最小时,将该任务调度至目标处理器核,否则,将该任务调度至综合代价最小的其他处理器核。如此,通过利用综合代价最小的处理器核处理该任务,可以有效提高任务调度的合理性,避免部分处理器核上分配到过多的任务而导致部分任务的处理效率较低。
第三方面,本申请实施例提供一种任务调度装置,该装置应用于异构硬件平台,异构硬件平台包括多个CPU,多个CPU中的部分CPU包括大核,多个CPU中的部分CPU包括小核,该装置包括:获取模块,用于获取待处理的任务;预测模块,用于预测多个CPU中的目标CPU处理任务产生的综合代价,综合代价是根据目标CPU处理任务产生的输入输出IO代价、CPU代价以及附加代价得到的,附加代价根据任务的需求特征或目标CPU的运行状态进行确定;调度模块,用于当目标CPU处理任务产生的综合代价满足调度条件,将任务调度至目标CPU。
在一种可能的实施方式中,目标CPU包括大核,调度模块具体用于:根据目标CPU处理任务产生的综合代价,确定任务调度至目标CPU的优先级;当任务调度至目标CPU的优先级为第一优先级时,将任务调度至目标CPU。
在一种可能的实施方式中,目标CPU包括小核,调度模块具体用于:根据目标CPU处理任务产生的综合代价,确定任务调度至目标CPU的优先级;当任务调度至目标CPU的优先级为第二优先级时,将任务调度至目标CPU,第二优先级低于第一优先级。
在一种可能的实施方式中,目标CPU包括大核,任务为多线程任务,调度模块具体用于:根据目标CPU处理任务产生的综合代价,确定任务调度至目标CPU的优先级;当任务调度至目标CPU的优先级为第三优先级时,将任务调度至目标CPU以及包括小核的CPU,第三优先级低于第一优先级,且第三优先级高于第二优先级。
在一种可能的实施方式中,预测模块,具体用于当包括大核的CPU的可用资源小于阈值时,预测多个CPU中的目标CPU处理任务产生的综合代价。
在一种可能的实施方式中,附加代价根据任务向目标CPU请求的次数、目标CPU在单位时间内被请求的次数、目标CPU的工作频率、目标CPU包括的处理器核的数量中的任意一种或多种得到。
在一种可能的实施方式中,装置还包括:呈现模块,用于呈现配置界面;配置模块,用于响应于用户在配置界面的操作,对异构硬件平台的任务调度模式进行配置,任务调度模式包括在线模式或离线模式。
由于第三方面提供的任务调度装置,对应于第一方面提供的任务调度方法,因此,第三方面以及第三方面中各实施方式所具有技术效果,可以参见相应的第一方面以及第一方面中各实施方式所具有的技术效果,在此不做赘述。
第四方面,本申请实施例还提供了一种任务调度装置,该装置应用于异构硬件平台,异构硬件平台包括多个处理器核,多个处理器核包括大核以及小核,该装置包括:获取模块,用于获取待处理的任务;预测模块,用于预测多个处理器核中的目标处理器核处理任务产生的综合代价,综合代价是根据目标处理器核处理任务产生的输入输出IO代价、计算代价以及附加代价得到的,附加代价根据任务的需求特征或目标处理器核的运行状态进行确定;调度 模块,用于当目标处理器核处理任务产生的综合代价满足调度条件,将任务调度至目标处理器核。
在一种可能的实施方式中,调度模块具体用于当目标处理器核处理任务产生的综合代价在多个处理器核分别处理任务所产生的综合代价中最小,将任务调度至目标处理器核。
由于第四方面提供的任务调度装置,对应于第二方面提供的任务调度方法,因此,第四方面以及第四方面中各实施方式所具有技术效果,可以参见相应的第二方面以及第二方面中各实施方式所具有的技术效果,在此不做赘述。
第五方面,本申请实施例提供一种计算设备,包括:处理器和存储器;该存储器用于存储指令,当该计算设备运行时,该处理器执行该存储器存储的该指令,以使该计算设备执行上述第一方面或第一方面的任一实现方式中所述的任务调度方法,或者以使该计算设备执行上述第二方面或第二方面的任一实现方式中所述的任务调度方法。需要说明的是,该存储器可以集成于处理器中,也可以是独立于处理器之外。计算设备还可以包括总线。其中,处理器通过总线连接存储器。其中,存储器可以包括可读存储器以及随机存取存储器。
第六方面,本申请实施例提供一种计算设备,包括:处理器和存储器;该存储器用于存储指令,当该计算设备运行时,该处理器执行该存储器存储的该指令,以使该计算设备执行上述第二方面或第二方面的任一实现方式中所述的任务调度方法。需要说明的是,该存储器可以集成于处理器中,也可以是独立于处理器之外。计算设备还可以包括总线。其中,处理器通过总线连接存储器。其中,存储器可以包括可读存储器以及随机存取存储器。
第七方面,本申请实施例提供了一种芯片,包括供电电路以及处理电路,所述供电电路用于对所述处理电路进行供电,所述处理电路用于获取待处理的任务,预测异构硬件平台包括的多个CPU中的目标CPU处理所述任务产生的综合代价,多个CPU中的部分CPU包括大核,多个CPU中的部分CPu包括小核,所述综合代价是根据所述目标CPU处理所述任务产生的输入输出IO代价、CPU代价以及附加代价得到的,所述附加代价根据所述任务的需求特征或所述目标CPU的运行状态进行确定;当所述目标CPU处理所述任务产生的综合代价满足调度条件,所述处理电路将所述任务调度至所述目标CPU。
第八方面,本申请实施例提供了一种芯片,包括供电电路以及处理电路,所述供电电路用于对所述处理电路进行供电,所述处理电路用于获取待处理的任务;预测所述多个处理器核中的目标处理器核处理所述任务产生的综合代价,该多个处理器核包括大核和小核,所述综合代价是根据所述目标处理器核处理所述任务产生的输入输出IO代价、计算代价以及附加代价得到的,所述附加代价根据所述任务的需求特征或所述目标处理器核的运行状态进行确定;当所述目标处理器核处理所述任务产生的综合代价满足调度条件,将所述任务调度至所述目标处理器核。
第九方面,本申请实施例还提供了一种任务调度系统,该任务调度系统包括用于生成待处理的任务的上层应用、上述第三方面至第四方面的任一实现方式中所述的任务调度装置,以及包括上述第一方面至第二方面的任一实现方式中所述的异构硬件平台。
第十方面,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有程序或指令,当其在计算机上运行时,使得上述第一方面或第一方面的任一实现方式中第一设备所述的任务调度方法被执行,或者使得上述第二方面或第二方面的任一实现方式中所述的任务调度方法被执行。
第十一方面,本申请实施例还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第一方面的任一实现方式中所述的任务调度方法,或 者使得计算机执行上述第二方面或第二方面的任一实现方式中所述的任务调度方法。
另外,第三方面至十一方面中任一种实现方式所带来的技术效果可参见第一方面以及第二方面中不同实现方式所带来的技术效果,此处不再赘述。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一示例性任务调度系统的结构示意图;
图2为本申请实施例提供的一种任务调度方法的流程示意图;
图3为本申请实施例提供的另一种任务调度方法的流程示意图;
图4为本申请实施例提供的另一种任务调度系统的结构示意图;
图5为本申请实施例提供的一示例性分布式数据库应用场景的系统结构示意图;
图6为本申请实施例提供的一示例性分布式数据库的结构示意图;
图7为本申请实施例提供的又一种任务调度方法的流程示意图;
图8为本申请实施例提供的一种任务调度装置的结构示意图;
图9为本申请实施例提供的另一种任务调度装置的结构示意图;
图10为本申请实施例提供的一种计算设备的硬件结构示意图。
具体实施方式
参见图1,为一示例性任务调度系统的结构示意图。如图1所示,任务调度系统100可以包括上层应用101、任务调度装置102以及异构硬件平台103。并且,上层应用101、任务调度装置102以及异构硬件平台之间可以进行数据通信。
其中,上层应用101包括一个或者多个应用1011,用于生成异构硬件平台103所需处理的任务,如数据读写任务、数据计算任务等。另外,上层应用101还包括主线程1012以及线程池控制器1013,并且,当上层应用101启动时,主线程1012可以向线程池控制器1013下发创建线程池的指令,从而线程池控制器1013基于该指令,创建线程池1014,并在线程池1014中进一步创建多个线程。在上层应用101启动运行后,主线程1012可以接收应用1011下发的任务,并利用线程池控制器1013从线程池1014中为该任务分配一个或者多个线程。其中,当为该任务分配一个线程时,该任务可以称为单线程任务;当为该任务分配多个线程时,该任务可以称为多线程任务。然后,线程池控制器1013可以将单线程任务或者多线程任务提供给任务调度装置102进行任务调度,即将任务调度至异构硬件平台103中的一个或者多个处理器核上进行处理。
任务调度装置102利用代价估算模型,预测异构硬件平台103中的处理器核执行任务所产生的代价,该代价例如可以是处理器核执行任务所产生的IO代价与CPU代价之和,然后,任务调度装置102根据各个处理器核处理该任务所产生的代价,将该任务调度至代价最小的处理器核上进行处理。
异构硬件平台103包括多个CPU,并且,该多个CPU采用大小核架构,即异构硬件平台103中的部分CPU包括大核(即性能较强的处理器核),其余CPU包括小核(即性能较弱的处理器核)。如图1所示,异构硬件平台103中的CPU1中包括4个大核,异构硬件平台103中的CPU2中包括4个小核。
实际应用场景中,任务调度装置102在利用代价估算模型预测任务代价时,可能会存在某个或者某些处理器核执行任务所产生的代价始终最小,因此,任务调度装置102可能会频繁向该处理器核调度任务,这容易使得该处理器核上的任务数量过多而发生任务阻塞,从而该处理器核上的部分业务因为排队等待处理而导致任务处理效率较低。同时,异构硬件平台103中的其余部分处理器核可能因为未分配到任务而长时间处于闲置状态,从而存在资源浪费。如此,任务调度装置102容易出现任务调度不合理的情况。
基于此,本申请实施例提供了一种任务调度方法,用以提高任务调度的合理性,从而能够提高任务的处理效率。具体地,任务调度装置102针对待处理的任务,预测异构硬件平台103中的多个CPU分别处理该任务所产生的代价,该代价是由CPU处理该任务产生的IO代价、CPU代价以及附加代价进行计算得到。其中,附加代价,可以用于评估CPU执行该任务所产生的额外代价,其根据任务的需求特征(如该任务要求CPU执行的指令数量、或者处理该任务时使用的内存的大小等)、或者CPU的运行状态(如工作频率等)进行确定。这样,当多个CPU中的目标CPU处理该任务所产生的代价满足调度条件,任务调度装置102将该任务调度至该目标CPU,以便由该目标CPU处理该任务。
由于任务调度装置102在进行任务调度时,在考虑到IO代价和CPU代价的基础上,还根据任务的需求或CPU的运行状态,评估CPU处理该任务所对应的附加代价,这使得任务调度装置102能够利用附加代价对任务调度进行纠偏,实现最优调度,可以避免异构硬件平台103中的CPU因为分配到过多数量的任务而导致部分任务的处理效率较低,从而实现提高任务调度的合理性。
示例性地,任务调度装置102可以通过软件实现,例如可以是通过虚拟机、容器、计算引擎中的至少一种实现等。或者,任务调度装置102可以通过包括处理器的物理设备实现,其中,处理器可以是CPU,以及专用集成电路(application-specific integrated circuit,ASIC)、可编程逻辑器件(programmable logic device,PLD)、复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)、片上系统(system on chip,SoC)、软件定义架构(software-defined infrastructure,SDI)芯片、人工智能(artificial intelligence,AI)芯片等任意一种处理器或其任意组合。又或者,任务调度装置102可以通过软件+硬件实现等,如任务调度装置102可以通过硬件采集异构硬件平台103在运行时的相关信息,并通过软件实现任务调度等。
实际部署时,任务调度装置102可以独立于上层应用101以及异构硬件平台103进行部署,如图1所示。或者,任务调度装置102可以集成部署于上层应用101中,或者集成部署于异构硬件平台103中,本实施例对此并不进行限定。
值得注意的是,图1所示的系统架构仅作为一种示例,并不用于限定其具体实现局限于该示例。比如,在其它可能的系统架构中,异构硬件平台103可以包括更多数量的CPU,并且,不同CPU可以位于同一计算设备,或者可以位于不同的计算设备,本实施例对此并不进行限定。
并且,上述图1所示的任务调度系统100,可以适用于数据库应用场景,如上层应用可以生成针对数据库中数据的数据处理任务,并且可以通过任务调度装置102将该数据处理任务调度至异构硬件平台103中的CPU上进行执行,以便由该CPU读取数据库中的数据并对其执行相应的运算,然后再将运算得到的结果反馈给上层应用101。实际应用时,任务调度装置100还可以是应用于其它可适用的场景中,本申请对此并不进行限定。
为使本申请的上述目的、特征和优点能够更加明显易懂,下面将结合附图对本申请实施例中的各种非限定性实施方式进行示例性说明。显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,基于上述内容所获得的所有其它实施例,都属于本申请保护的范围。
如图2所示,为本申请实施例中一种任务调度方法的流程示意图,该方法可以应用于如图1所示的任务调度系统100中。实际应用时,该方法也可以应用于其它可适用的任务调度系统中。为便于理解与描述,下面以应用于图1所示的任务调度系统100为例进行示例性说明,该方法具体可以包括:
S201:上层应用101生成待处理的任务。
本实施例中,上层应用101包括一个或者多个应用,并且,该一个或者多个应用在运行过程中可以请求异构硬件平台103提供相应的数据处理服务,如读写数据的服务、对业务数据进行运算的服务等。具体实现时,上层应用101可以生成待处理的任务,该任务可以是单线程任务,即执行一个线程来处理该任务;或者,该任务可以是多线程任务,即通过执行多个线程实现对该任务的处理,实际应用场景中,可以并发执行多个线程来加快处理该任务。
上层应用101在生成待处理的任务后,可以将该任务提供给任务调度装置102,以便任务调度装置102将该任务调度至异构硬件平台103中的CPU上进行处理。
S202:任务调度装置102预测异构硬件平台103包括的多个CPU中的目标CPU处理该任务产生的综合代价,该综合代价是根据目标CPU处理该任务产生的IO代价、CPU代价以及附加代价得到的,其中,该附加代价根据任务的需求特征或者目标CPU的运行状态进行确定。
实际应用时,针对异构硬件平台103中的各个CPU,任务调度装置102可以预测该CPU处理该任务所产生的综合代价。为便于理解,下面以任务调度装置102预测一个CPU(以下称之为目标CPU)处理该任务产生的综合代价为例进行示例性说明,预测其余各个CPU处理该任务所产生的综合代价的过程类似,可参照理解。
具体实现时,任务调度装置102可以对获取的任务进行性能分析,以便获取该任务的相关信息,如任务类型、该任务所要求的单个时钟周期(cycle)执行的指令数、时钟周期、该任务所需执行的指令(instructions)总数、该任务所需读写的数据页的数量等。其中,任务类型,可以包括计算密集型(core bound)和内存密集型(memory bound)。计算密集型,用于指示该任务对CPU的依赖程度高于对内存的依赖程度,即CPU在执行该任务时,所执行的操作多数为运算操作,可以用数值进行表征(如0~1之间的数值);内存密集型,用于指示该任务对内存的依赖程度高于对CPU的依赖程度,即CPU在执行该任务时,所执行的操作多数为读写数据的IO操作,可以用数值进行表征(如0~1之间的数值)。
进一步地,任务调度装置102还可以对该任务的相关信息进行缓存,从而任务调度装置102在接收到相同的任务后,可以直接从缓存中读取该任务的相关信息,无需重新对该任务进行性能分析,以此可以加快任务的处理效率。
然后,任务调度装置102可以根据该任务的相关信息,预测目标CPU处理该任务所产生的IO代价、CPU代价以及附加代价。
其中,IO代价,是指CPU处理该任务的过程中执行数据读写操作所产生的代价。示例性地,任务调度装置102可以通过公式(1)计算出IO代价。
IO代价=k1*M*P*T    公式(1)
其中,k1为归一化系数,用于将计算出的IO代价限制在0至1的取值范围内,其取值可以是小于1的正数。M表征memory bound,用于指示该任务对于内存的依赖程度,其取值范围可以是(0,1)。P为CPU处理该任务时所需访问的数据页的数量,其为大于或者等于1的正整数;通常情况下,由于每个数据页的数据量通常规定,因此,该数据页的数量与单个数据页上的数据量的乘积,可以作为该任务所需读写的数据量。T为转换开销,其为大于0的正数,由于CPU读取不同存储介质的开销通常存在差异,如CPU读取硬盘所需的开销通常高于CPU读取内存所需的开销等,因此,可以基于数据页所在的存储介质设定不同的转换开销。实际应用时,任务调度装置102可以预先配置有不同存储介质对应的转换开销,从而在获取到待处理的任务后,可以根据该任务所需访问的数据所在的存储介质,确定该转换开销。
CPU代价,是指CPU处理该任务的过程中执行数据运算操作所产生的代价。比如,在数据库应用场景中,执行查询任务中的数据过滤、投影、谓词下推等算子的过程中,或者进行数据寻址、数据计算等都需要消耗CPU的计算资源。示例性地,任务调度装置102可以通过公式(2)计算出CPU代价。
CPU代价=k2*C*I    公式(2)
其中,k2为归一化系数,用于将计算出的CPU代价限制在0至1的取值范围内,其取值可以是小于1的正数。C表征core bound,用于指示该任务对于CPU的依赖程度,其取值范围可以是(0,1)。I指示该任务所需执行的指令(instructions)总数,其为大于或者等于1的正整数。
附加代价,用于评估CPU执行该任务所产生的额外代价,可以利用该附加代价尽可能避免任务调度出现倾斜,即避免某个或者某些CPU分配的任务过多。本实施例中,任务调度装置102可以根据任务的需求特征或者CPU的运行状态计算附加代价。其中,任务的需求特征,例如可以是该任务请求CPU的次数(也即CPU所需执行的指令数量);CPU的运行状态,例如可以包括CPU在过去单位时间内执行的指令数量、CPU的工作频率、CPU中能够用于处理任务的处理器核的数量等。以根据任务的需求特征以及CPU的运行状态计算附加代价为例,任务调度装置102可以通过公式(3)计算出附加代价。
附加代价=k3*(Commands′+Commands)/(freq*cores)    公式(3)
其中,k3为归一化系数,用于将计算出的附加代价限制在0至1的取值范围内,其取值可以是小于1的正数。Commands′表征任务请求CPU执行指令的次数,其为大于或者等于1的正整数。Commands表征CPU在过去单位时间内执行的指令数量,可以通过对异构硬件平台103进行周期性的数据采样得到,其为大于或者等于0的整数。freq表征CPU的工作频率,可以根据CPU的时钟周期进行计算得到,其为大于或者等于1的正整数。cores,表征CPU中能够用于处理任务的处理器核的数量,其为大于或者等于1的正整数。
最后,任务调度装置102可以根据预测出的IO代价、CPU代价以及附加代价,计算出目标CPU处理该任务所产生的综合代价。比如,任务调度装置102可以基于下述公式(4)计算出目标CPU处理该任务所产生的综合代价W。
W=IO代价+CPU代价+附加代价     公式(4)
值得注意的是,上述公式(1)至公式(4)所示的计算代价的实现方式仅作为一些示例性说明,实际应用时,任务调度装置102也可以在上述示例性公式的基础上进行调整,或者采用其它公式计算目标CPU处理任务所产生的综合代价等。比如,在其它可能的实现示例中,任务的需求特征,也可以是该任务所要求的内存区域大小,从而任务调度装置102可以根据 执行该任务所需的内存空间大小计算附加代价。或者,任务调度装置102可以根据任务向CPU请求的次数、CPU在单位时间内被请求的次数、CPU的工作频率、CPU包括的处理器核的数量中的任意一种或多种参数计算得到附加代价。或者,任务调度装置102可以对IO代价、CPU代价以及附加代价进行加权求和以计算得到综合代价等,本实施例对此并不进行限定。
作为一种示例,上述将任务调度装置102计算目标CPU处理任务的代价时所采用的公式(1)至公式(4),可以封装成一个代价估算模型,从而任务调度装置102在获取到待处理的任务后,可以将其输入至该代价估算模型中,从而可以由该代价估算模型推理出异构硬件平台103中的各个CPU处理该任务所分别产生的综合代价。
实际应用场景中,异构硬件平台103中可能存在部分CPU难以处理新的任务,如该部分CPU的当前可用资源不足以支持该CPU处理该新的任务。因此,在进一步可能的实施方式中,任务调度装置102可以对异构硬件平台103中的各个CPU的当前可用资源,从而对异构硬件平台103包括的多个CPU中当前可用资源不满足该任务需求的CPU进行过滤。相应的,任务调度装置102可以计算过滤后的各个CPU处理任务所产生的综合代价,以便进行任务调度。如此,可以减少任务调度装置102所需计算的产生处理任务代价的CPU数量,以此可以减少任务调度装置102实现任务调度所需消耗的计算量。
S203:当目标CPU处理任务产生的综合代价满足调度条件,任务调度装置102将该任务调度至目标CPU。
本实施例中,提供了以下几种确定将任务调度至目标CPU的实现示例。
在第一种实现示例中,任务调度装置102可以比较该目标CPU处理任务所产生的综合代价是否小于代价阈值,并且,若目标CPU处理任务所产生的综合代价小于代价阈值,则任务调度装置102可以直接将该任务调度至目标CPU,以便由目标CPU处理该任务。而若目标CPU处理该任务所产生的综合代价大于或者等于代价阈值,则任务调度装置102可以从异构硬件平台103的其余CPU中确定用于执行任务的CPU。由于该CPU上通常包括多个处理器核,因此,任务调度装置102在将任务调度至目标CPU时,还可以进一步指定该目标CPU中用于执行该任务的一个或者多个处理器核。
在第二种实现示例中,任务调度装置102在预测出异构硬件平台103中的各个CPU处理该任务所产生的综合代价后,根据各个CPU对应的综合代价确定出执行该任务的目标CPU。一种实现方式是,任务调度装置102根据各个CPU对应的综合代价获得一个门限值,从综合代价小于该门限值的CPU中选择目标CPU,门限值是可以调整的。另一种实现方式是,任务调度装置102对各个CPU对应的综合代价按照由小到大的顺序进行排序,从而任务调度装置102可以判断目标CPU是否为综合代价最小或相对较小的CPU,如果是,则任务调度装置102可以直接将该任务调度至目标CPU,并可以进一步指定该目标CPU中用于执行该任务的一个或者多个处理器核;如果不是,则任务调度装置102可以将任务调度至其它综合代价最小或相对较小的CPU,以便由其它CPU处理该任务。
在第三种实现示例中,目标CPU具体为包括大核的CPU,如图1中的CPU1等,并且,该目标CPU的可用资源较少,具体可以是可用资源小于阈值等。此时,在大核资源有限的情况下,任务调度装置102可以根据包括大核的CPU处理该任务所产生的代价、以及包括小核的CPU处理该任务所产生的代价,确定是否将该任务调度至目标CPU。
具体实现时,任务调度装置102在预设时间段内可能会接收到N个任务,并且,针对该N个任务中的每个任务,任务调度装置102可以计算包括大核的CPU处理该任务所产生的综合代价与包括小核的CPU处理该任务所产生的综合代价之间的比值,然后根据各个CPU对 应的综合代价比值确定出执行该任务的目标CPU。
作为一种实现示例,任务调度装置102根据各个CPU对应的综合代价比值,可以确定一个门限值,并将该N个任务中综合代价比值小于该门限值的一个或者多个任务调度至包括大核的目标CPU中,其余综合代价比值大于该门限值的一个或者多个任务调度至包括小核的CPU中,而对于综合代价比值等于该门限值的任务,任务调度装置102可以随机将该任务调度至包括大核的CPU,或者将该任务调度至包括小核的CPU。
作为另一种实现,任务调度装置102可以根据各个CPU处于该任务所产生的综合代价,确定将该任务调度至各个CPU的优先级,并且,当确定该任务调度至包括大核的CPU的优先级为第一优先级时,任务调度装置102将该任务调度至包括大核的CPU。当确定该任务调度至包括小核的CPU的优先级为第二优先级时,任务调度装置102将该任务调度至包括小核的CPU。当该任务调度至包括大核的CPU的优先级为第三优先级时,若该任务为单线程任务,则任务调度装置102可以随机将其调度至包括大核的CPU或者包括小核的CPU,而若该任务为多线程任务,则任务调度装置102可以将该任务调度至包括大核的CPU以及包括小核的CPU。在本实施例中,第一优先级高于第二优先级,第二优先级高于第三优先级。
举例来说,任务调度装置102在得到各个任务分别对应的代价比值后,可以对N个任务分别对应的代价比值按照从小到大的顺序进行排序。如此,在对每个任务进行调度时,任务调度装置102可以先判断该任务是否属于排序靠前的i-1个任务(即判断该任务调度至包括大核的CPU的优先级是否为第一优先级),也即判断任务是否属于[1,i-1]的范围内,如果是,则任务调度装置102可以将该任务调度至包括大核的CPU,如目标CPU或者其他CPU,并可以进一步指定该CPU上的处理该任务的一个或者多个大核。如果不是,则任务调度装置可以进一步判断该任务是否属于排序为第i+1至第N个任务的范围(即判断该任务调度至包括小核的CPU的优先级是否为第二优先级),即判断任务是否属于[i+1,N]的范围内,如果是,则任务调度装置102可以将该任务调度至包括小核的CPU,并可以进一步指定由该CPU上的一个或者多个小核处理该任务。如果不是,则任务调度装置可以确定该任务为排序第i个任务(也即确定该任务调度至包括大核的CPU的优先级为第三优先级),并可以随机将该任务调度至包括大核的CPU,或者将该任务调度至包括小核的CPU。
进一步地,当待处理的任务为多线程任务时,任务调度装置102可以将该任务调度至一个或者多个CPU上,以便利用该CPU上的多个处理器核并发执行该多线程任务。如,当多线程任务属于[1,i-1]的范围内,则可以由目标CPU上的多个大核并发处理该多线程任务;当多线程任务属于[i+1,N]的范围内,则可以由多个小核并发处理该多线程任务;当多线程任务为排序第i个任务,则任务调度装置102可以同时将该多线程任务调度至包括大核的CPU以及包括小核的CPU,从而由部分大核以及部分小核并发执行该多线程任务。如此,可以简化任务调度装置102对于各个任务的调度过程,在提高任务调度效率的同时,也能减少任务调度所需的资源消耗。
当然,上述确定是否将任务调度至目标CPU的实现方式仅作为一些示例性说明,实际应用时,任务调度装置102也可以采用其它实现方式,根据目标CPU处理该任务产生的代价,确定是否将该任务调度至目标CPU,本实施例对此并不进行限定。
S204:异构硬件平台103利用目标CPU处理任务调度装置102提供的任务。
在任务调度装置102将任务调度至目标CPU后,异构硬件平台103即可利用该目标CPU中的处理器核处理该任务。其中,当该任务为单线程任务时,异构硬件平台103可以利用目标CPU中的一个处理器核执行该任务;而当该任务为多线程任务时,异构硬件平台103可以 利用目标CPU中的多个处理器核并发执行该任务,以提高任务执行效率。
进一步地,异构硬件平台103在完成该任务后,还可以通过任务调度装置102向上层应用101返回该任务对应的执行结果,如查询任务对应的数据访问结果、计算任务对应的数据计算结果等。
本实施例中,主要是以任务调度装置102根据预测的各个CPU处理任务所产生的综合代价,将该任务调度至异构硬件平台103中包括大核的CPU或者包括小核CPU。实际应用时,由于异构硬件平台103中的大核性能远远高于小核的性能,并且,通常会更倾向于利用大核处理任务。因此,在一种可能的实施方式中,任务调度装置102在进行任务调度之前,可以先判断异构硬件平台103中的大核的可用资源是否充足,如通过周期性的对异构硬件平台103进行性能采样以确定包括大核的目标CPU的可用资源。并且,在大核的可用资源充足的情况下(例如大核的可用资源大于阈值),任务调度装置102可以直接将该任务调度至大核所在的CPU中。而当大核的可用资源有限的情况下(例如大核的可用资源小于该阈值),任务调度装置102可以通过执行上述步骤S202至步骤S204,根据预测的各个CPU处理该任务的综合代价,确定将该任务调度至包括大核的CPU,还是调度至包括小核的CPU,以此提高任务调度的合理性。
上述图2所示实施例中,任务调度装置102可以为接收到的各种未知的任务进行动态分配异构硬件平台103中硬件资源,并且,由于上述附加代价的存在,任务调度装置102在不同时刻针对相同任务所分配的硬件资源可以存在差异,以此可以实现任务调度的合理性。实际应用时,任务调度装置可以具有两种工作模式,分别为在线模式以及离线模式。
其中,当处于在线模式时,任务调度装置102基于上述过程为各种未知任务提供动态分配硬件资源的在线服务。而当任务调度装置102处于离线模式时,由于离线模式下异构硬件平台103所需处理的任务通常固定,如为固定的批量任务(或者称为已知任务),从而可以利用异构硬件平台103中的固定CPU资源来处理该任务。此时,任务调度装置102在为这些任务进行调度时,可以直接将该任务调度至预先指定的CPU。
在一种可能的实施方式中,任务调度装置102可以基于用户(如管理员等)的配置,确定采用在线模式或者离线模式进行任务调度。例如,任务调度系统100可以对外提供如图3所示的配置界面,从而用户可以在该配置界面上指定任务调度装置102当前采用在线模式或者离线模式进行任务调度。或者,用户可以在该配置界面上指定任务调度装置102采用在线模式进行任务调度的时间段、以及采用离线模式进行任务调度的时间段,从而任务调度装置102可以根据用户的配置操作,在不同的时间段按照指定的任务调度模式执行任务调度过程。
进一步地,针对在线模式,用户还可以在该配置界面上对用于计算CPU处理任务所产生的综合代价进行参数配置。比如,如图3所示,用户可以在配置界面上对归一化系数k(如上述k1、k2、k3等)、不同存储介质分别对应的转换开销、每个数据页所具有的数据量等参数进行配置,以便任务调度装置102基于用户配置的各个参数进行代价计算。
上述图2所示实施例中,主要介绍了以CPU为粒度进行任务调度的具体实现过程,而在其它可能的实施例中,任务调度装置102还可以以处理器核为粒度,确定该任务调度至哪个处理器核上。下面,结合图4,对本申请实施例提供的另一种任务调度方法进行示例性说明,该方法仍然可以应用于上述图1所示的任务调度系统100。如图4所示,该方法具体可以包括:
S401:上层应用101生成待处理的任务。
S402:任务调度装置102预测异构硬件平台103包括的多个处理器核中的目标处理器核处理该任务产生的综合代价,该综合代价是根据目标处理器核处理该任务产生的IO代价、计算代价以及附加代价得到的,其中,该附加代价根据任务的需求特征或者目标处理器核的运行状态进行确定。
与上述图2所示实施例不同的是,任务调度装置102可以分别预测异构硬件平台103中的各个处理器核处理该任务所产生的IO代价、计算代价以及附加代价,其中,该附加代价可以是根据任务的需求特征或者处理器核的运行状态进行确定。
作为一些示例,任务调度装置102在计算各个处理器核处理任务所产生的IO代价,可以根据下述公式(5)进行计算得到。
IO代价=k1*M*p*T    公式(5)
其中,k1为归一化系数,用于将计算出的IO代价限制在0至1的取值范围内,其取值可以是小于1的正数。M表征memory bound,用于指示该任务对于内存的依赖程度,其取值范围可以是(0,1)。p为处理器核处理该任务时所需访问的数据页的数量,其为大于或者等于1的正整数。T为转换开销,其为大于0的正数,可以根据处理器核所需读取的存储介质的类型进行确定,不同类型的存储介质对应于不同大小的转换开销。
任务调度装置102在计算各个处理器核处理任务所产生的计算代价,可以根据下述公式(6)进行计算得到。
计算代价=k2*C*I    公式(6)
其中,k2为归一化系数,用于将计算出的计算代价限制在0至1的取值范围内,其取值可以是小于1的正数。C表征core bound,用于指示该任务对于处理器核的依赖程度,其取值范围可以是(0,1)。I指示该任务所需执行的指令(instructions)总数,其为大于或者等于1的正整数。
任务调度装置102在计算各个处理器核处理任务所产生的附加代价,可以根据下述公式(7)进行计算得到。
附加代价=k3*(Commands′+Commands)/freq    公式(7)
其中,k3为归一化系数,用于将计算出的附加代价限制在0至1的取值范围内,其取值可以是小于1的正数。Commands′表征任务请求处理器核执行指令的次数,其为大于或者等于1的正整数。Commands表征处理器核在过去单位时间内执行的指令数量,其为大于或者等于0的整数。freq表征处理器核的工作频率,其为大于或者等于1的正整数。
最后,任务调度装置102可以根据预测出的IO代价、计算代价以及附加代价,计算出目标处理器核处理该任务所产生的综合代价。比如,任务调度装置102可以基于下述公式(8)计算出目标处理器核处理该任务所产生的综合代价w。
w=IO代价+计算代价+附加代价    公式(8)
作为一种示例,上述将任务调度装置102计算目标处理器核处理任务的代价时所采用的公式(5)至公式(8),可以封装成一个代价估算模型,从而任务调度装置102在获取到待处理的任务后,可以将其输入至该代价估算模型中,从而可以由该代价估算模型推理出异构硬件平台103中的各个处理器核处理该任务所分别产生的综合代价。
S403:当目标处理器核处理任务产生的综合代价满足调度条件,任务调度装置102将该任务调度至目标处理器核。
作为一种实现示例,任务调度装置102可以比较该目标处理器核处理任务所产生的综合代价是否小于代价阈值,并且,若目标处理器核处理任务所产生的综合代价小于代价阈值, 则任务调度装置102可以直接将该任务调度至目标处理器核,以便由目标处理器核处理该任务。而若目标处理器核处理该任务所产生的综合代价大于或者等于代价阈值,则任务调度装置102可以从异构硬件平台103的其余处理器核中确定用于执行任务的处理器核。
作为另一种实现示例,任务调度装置102在预测出异构硬件平台103中的各个处理器核处理该任务所产生的综合代价后,根据各个处理器核对应的综合代价确定出执行该任务的目标处理器核。在一种实现方式中,任务调度装置102根据各个处理器核对应的综合代价获得一个门限值,从综合代价小于该门限值的处理器核中选择目标处理器核,门限值是可以调整的。在另一种实现方式中,任务调度装置102可以对各个处理器核对应的综合代价按照由小到大的顺序进行排序,从而任务调度装置102可以判断目标处理器核是否为综合代价最小活相对较小的CPU,如果是,则任务调度装置102可以直接将该任务调度至目标处理器核,如果不是,则任务调度装置102可以将任务调度至其它综合代价最小或相对较小的处理器核,以便由其它处理器核处理该任务。
实际应用时,任务调度装置102也可以采用其它实现方式,确定是否将任务调度至该目标处理器核,本实施例对此并不进行限定。
S404:异构硬件平台103利用目标处理器核处理任务调度装置102提供的任务。
本实施例中,异构硬件平台103可以利用一个处理器核(即目标处理器核)执行任务。其中,该任务,可以是单线程任务,从而目标处理器核可以运行分配给任务的单个线程来处理任务。或者,该任务也可以是多线程任务,从而目标处理器核可以依次运行分配给任务的各个线程来处理任务。
本实施例中,主要是以任务调度装置102根据预测的各个处理器核处理任务所产生的综合代价,将该任务调度至异构硬件平台103中的目标处理器核。实际应用场景中,异构硬件平台103中可能存在部分处理器核难以处理新的任务,如该部分处理器核的当前可用资源不足以支持该处理器核处理该新的任务等。因此,在进一步可能的实施方式中,任务调度装置102可以监控异构硬件平台103中的各个处理器核的当前可用资源,从而对异构硬件平台103包括的多个处理器核中当前可用资源不满足该任务需求的处理器核进行过滤。相应的,任务调度装置102可以计算过滤后的各个处理器核处理任务所产生的综合代价,以便进行任务调度。如此,可以减少任务调度装置102所需计算的处理任务产生综合代价的处理器核数量,以此可以减少任务调度装置102实现任务调度所需消耗的计算量。
进一步地,异构硬件平台103在完成该任务后,还可以通过任务调度装置102向上层应用101返回该任务对应的执行结果,如查询任务对应的数据访问结果、计算任务对应的数据计算结果等。
实际应用时,图1所示的任务调度系统100可以应用于多种应用场景中,例如可以是图5所示的数据库应用场景中。其中,图5所示的数据库应用场景,可以是集中式数据库应用场景,或者可以是分布式数据库应用场景等。在图5所示数据库场景中,上层应用101为数据库应用,具体可以是支持与用户进行交互的客户端130、客户端131等,该客户端130或客户端131可以基于用户针对数据库110中数据的增删改查操作,生成相应的数据处理任务(如查询任务等),并请求任务调度装置102将该数据处理任务调度至异构硬件平台103中的相应CPU(或者处理器核)上进行执行。在执行该数据处理任务时,异构硬件平台103可以利用分配的CPU运行数据库引擎120,以访问数据库110并执行进行相应的数据读、写、计算等操作。
为便于理解,下面结合分布式数据库应用场景,对任务调度装置102执行任务调度的过程进行示例性说明。示例性地,参见图6所示的分布式数据库,包括数据库引擎120以及多个分区服务器(Region Server,RS)节点,图6中以包括RS节点101以及RS节点102为例进行示例性说明。其中,数据库引擎120对分布式数据库所存储和管理的数据进行划分,得到多个分区。每个分区中包括一条或者多条数据,属于不同分区的数据通常存在差异。作为一种划分分区的实现示例,分布式数据库中在存储和管理每条数据时,可以将该条数据中的部分内容作为该条数据对应的主关键字(primary key),主关键字用于在分布式数据库中对这条数据进行唯一标识。从而,数据库引擎120可以根据主关键字的可能取值范围进行区间划分,每个划分得到的区间对应于一个分区。
同时,数据库引擎120还用于为RS节点101以及RS节点102分配分区,每个RS节点所分配到的分区可以通过数据库引擎120所创建的管理表进行维护。RS节点101以及RS节点102分别用于执行属于不同分区的数据读写业务,如图1中RS节点102执行属于分区1至分区N的数据读写业务,而RS节点103执行属于分区N+1至分区M的数据读写业务。
通常情况下,RS节点会将客户端130发送的数据暂存在内存中,从而内存中所暂存的数据量会不断增加。当内存中的数据量达到阈值时,RS节点可以将内存中的数据持久化存储至文件系统,具体可以是以文件的形式写入分布式文件系统(图6未示出)。作为一些示例,该文件系统例如可以是分布式文件系统(distributed file system,DFS)、Hadoop分布式文件系统(hadoop distributed file system,HDFS)等,本实施例对此并不进行限定。文件系统用于以文件的形式存储数据,这里的文件可以包括一个或多个分区,也可以是仅包含一个分区的部分数据。
在上述图5以及图6所示的分布式数据库的基础上,参见图7,示出了本申请实施例提供的另一种任务调度方法的流程示意图,该方法具体可以包括:
S701:客户端130在启动时,创建线程池,该线程池包括一个或者多个线程。
实际应用时,客户端130在启动时还可以初始化部分线程,如创建日志写线程、检查点(checkpoint)线程等。其中,日志写进程,可以用于将数据库中的数据处理记录保持至日志中并写入相应的存储区域,如持久化存储区域等。检查点线程,用于控制数据库中的数据文件、控制文件和重做日志文件之间的协调同步,以便在数据库存在异常时,将数据恢复至检查点之前的正确数据。
S702:客户端130生成查询任务。
本实施例中,客户端130可以对外提供交互界面,从而用户可以在该交互界面输入SQL语句,如输入“select*from table_name where column=value”,用于查询表格“table_name”中值为“value”的一列或者多列数据。客户端130可以根据用户输入的SQL语句,生成相应的查询任务。
通常情况下,用户针对数据库的操作包括查询(select)、更新(update)、删除(delete)、添加(insert),或者可以是其它操作等。为便于理解,本实施例中以用户请求查询数据库中的数据为例进行示例性说明。
作为一种生成查询任务的实现示例,客户端130可以对用户输入的SQL语句进行语法分析以及语义分析。其中,语法分析,是指利用SQL语言的语法规则校验该SQL语句是否存在语法错误;语义分析,是指分析该SQL语句的语义是否合法。当SQL语句的语法以及语义均合法后,客户端130可以得到初始计划树,该初始计划树指示了针对访问数据的执行计划。然后,客户端130可以利用一个或者多个优化器对初始计划树进行优化,以便提高数据 访问效率。接着,客户端130根据优化后的计划树中包括的多个算子,如扫描(scan)算子、聚合(aggregate)算子、分类(sort)算子、过滤(filter)算子等,确定执行计划树中的各个算子的线程,不同的线程负责执行不同的算子,并生成查询任务,该查询任务指示了执行计划树中的算子的多个线程。实际应用时,客户端130还可以是借助任务调度系统中的其它设备实现上述生成查询任务的过程,本实施例在此并不进行限定。
S703:客户端130将查询任务发送给查询任务调度装置102,以请求任务调度装置102为该查询任务分配相应的硬件资源。
S704:任务调度装置102判断异构硬件平台103中大核的可用资源是否大于阈值,若是,则执行步骤S705;若否,则执行步骤S706。
本实施例中,任务调度装置102可以持续监控异构硬件平台103中的各个CPU中的资源使用情况,以便根据异构硬件平台103中的可用资源确定是直接进行任务调度,还是根据处理任务所产生的综合代价进行任务调度。
S705:任务调度装置102将查询任务调度至异构硬件平台103中包括大核的CPU,并继续执行步骤S709。
当大核的可用资源充足时,任务调度装置102可以直接将查询任务调度至包括大核的CPU中进行处理,从而可以利用大核所具有的高算力来提高任务处理效率。
S706:任务调度装置102预测异构硬件平台103中的各个CPU处理该查询任务产生的综合代价,每个CPU处理该查询任务所产生的综合代价为该CPU处理该查询任务产生的IO代价、CPU代价以及附加代价之和,其中,附加代价根据任务的需求特征以及该CPU的运行状态进行确定。
本实施例中,当大核的可用资源有限时,任务调度装置102可以根据各个CPU处理查询任务产生的综合代价进行任务调度,以此提高任务调度的合理性。
具体实现时,任务调度装置102可以对查询任务进行性能分析,确定该查询任务对应的任务类型、该任务所要求的单个时钟周期(cycle)执行的指令数、时钟周期、该任务所需执行的指令(instructions)总数、该任务所需读写的数据页的数量等信息。然后,任务调度装置102可以根据这些信息计算异构硬件平台103中的各个CPU处理该查询任务所产生的IO代价、CPU代价以及附加代价。其中,任务调度装置102可以基于上述图2所示实施例中的公式(1)至公式(3)计算各个CPU处理该查询任务所产生的IO代价、CPU代价以及附加代价,并进一步根据公式(4)计算出各个CPU处理该查询任务所产生的总代价,具体可参见前述相关部分描述,本实施例在此不再进行赘述。
S707:任务调度装置102根据各个CPU处理该查询任务产生的代价,确定处理该查询任务所需代价最小的目标CPU。
S708:任务调度装置102将查询任务调度至异构硬件平台103中的目标CPU。
本实施例中,任务调度装置根据各个CPU处理查询任务所产生的综合代价,将查询任务调度至目标CPU的具体实现过程,可参见上述图2所示实施例中的步骤S202以及步骤S203的相关之处描述,在此不做赘述。
S709:异构硬件平台103处理该查询任务,并生成查询结果。
其中,异构硬件平台103可以利用包括大核的CPU或者上述目标CPU执行该查询任务后,并生成查询结果,该查询结果包括用户所需查询的数据。
具体实现时,异构硬件平台103可以基于包括大核的CPU或者上述目标CPU运行数据库引擎120,并根据该查询任务从分布式数据库中的RS节点101或者RS节点102的内存中 读取数据(并执行相应的数据运算),得到查询结果。当RS节点的内存中未包括用户所需读取的数据时,数据库引擎120还可以进一步访问与分布式数据库连接的文件系统,以便从该文件系统中读取数据(并执行相应的数据运算),得到查询结果。
S710:异构硬件平台103向任务调度装置102返回该查询任务对应的查询结果。
S711:任务调度装置102将查询结果反馈给数据库应用,以便数据库应用将该查询结果提供给用户。
如此,不仅能够满足用户针对分布式数据库的数据查询需求,而且,基于上述过程也能实现对任务的合理调度,从而可以提高向用户反馈查询结果的效率。
上文中结合图1至图7,详细描述了本申请所提供的任务调度方法,下面将结合图8、图9和图10,描述根据本申请所提供的任务调度装置、计算设备。
与上述方法同样的发明构思,本申请实施例还提供一种任务调度装置。参见图8,示出了本申请实施例提供的一种任务调度装置的示意图。其中,图8所示的任务调度装置800应用于异构硬件平台,所述异构硬件平台包括多个CPU,所述多个CPU中的部分CPU包括大核,所述多个CPU中的部分CPU包括小核。
如图8所示,任务调度装置800包括:
获取模块801,用于获取待处理的任务;
预测模块802,用于预测所述多个CPU中的目标CPU处理所述任务产生的综合代价,所述综合代价是根据所述目标CPU处理所述任务产生的输入输出IO代价、CPU代价以及附加代价得到的,所述附加代价根据所述任务的需求特征或所述目标CPU的运行状态进行确定;
调度模块803,用于当所述目标CPU处理所述任务产生的综合代价满足调度条件,将所述任务调度至所述目标CPU。
在一种可能的实施方式中,所述目标CPU包括大核,所述调度模块803用于:
根据所述目标CPU处理所述任务产生的综合代价,确定所述任务调度至所述目标CPU的优先级;
当所述任务调度至所述目标CPU的优先级为第一优先级时,将所述任务调度至所述目标CPU。
在一种可能的实施方式中,所述目标CPU包括小核,所述调度模块803用于:
根据所述目标CPU处理所述任务产生的综合代价,确定所述任务调度至所述目标CPU的优先级;
当所述任务调度至所述目标CPU的优先级为第二优先级时,将所述任务调度至所述目标CPU,所述第二优先级低于第一优先级。
在一种可能的实施方式中,所述目标CPU包括大核,所述任务为多线程任务,所述调度模块803用于:
根据所述目标CPU处理所述任务产生的综合代价,确定所述任务调度至所述目标CPU的优先级;
当所述任务调度至所述目标CPU的优先级为第三优先级时,将所述任务调度至所述目标CPU以及包括小核的CPU,所述第三优先级低于第一优先级,且所述第三优先级高于第二优先级。
在一种可能的实施方式中,所述预测模块802,用于:
当包括大核的CPU的可用资源小于阈值时,预测所述多个CPU中的目标CPU处理所述 任务产生的综合代价。
在一种可能的实施方式中,所述附加代价根据所述任务向所述目标CPU请求的次数、所述目标CPU在单位时间内被请求的次数、所述目标CPU的工作频率、所述目标CPU包括的处理器核的数量中的任意一种或多种得到。
在一种可能的实施方式中,所述任务调度装置800还包括:
呈现模块804,用于呈现配置界面;
配置模块805,用于响应于用户在所述配置界面的操作,对所述异构硬件平台的任务调度模式进行配置,所述任务调度模式包括在线模式或离线模式。
本实施例提供的任务调度装置800,对应于上述图2所示实施例中的任务调度方法,因此,本实施例中的各个模块的功能及其所具有的技术效果,可参见前述图2所示实施例中的相关之处描述,在此不做赘述。
另外,与上述方法同样的发明构思,本申请实施例还提供一种任务调度装置。参见图9,示出了本申请实施例提供的一种任务调度装置的示意图。其中,图9所示的任务调度装置900应用于异构硬件平台,所述异构硬件平台包括多个处理器核,所述多个处理器核包括大核以及小核,所述装置任务调度装置900包括:
获取模块901,用于获取待处理的任务;
预测模块902,用于预测所述多个处理器核中的目标处理器核处理所述任务产生的综合代价,所述综合代价是根据所述目标处理器核处理所述任务产生的输入输出IO代价、计算代价以及附加代价得到的,所述附加代价根据所述任务的需求特征或所述目标处理器核的运行状态进行确定;
调度模块903,用于当所述目标处理器核处理所述任务产生的综合代价满足调度条件,将所述任务调度至所述目标处理器核。
在一种可能的实施方式中,所述调度模块903用于当所述目标处理器核处理所述任务产生的综合代价在所述多个处理器核分别处理所述任务所产生的综合代价中最小,将所述任务调度至所述目标处理器核。
本实施例提供的任务调度装置900,对应于上述图4所示实施例中的任务调度方法,因此,本实施例中的各个模块的功能及其所具有的技术效果,可参见前述图4所示实施例中的相关之处描述,在此不做赘述。
此外,本申请实施例还提供一种计算设备,如图10所示,计算设备1000中可以包括通信接口1010、处理器1020。可选的,计算设备1000中还可以包括存储器1030。其中,存储器1030可以设置于计算设备1000内部,还可以设置于计算设备1000外部。示例性地,上述图2以及图4所示实施例中任务调度装置执行的各个动作均可以由处理器1020实现。处理器1020可以通过通信接口1010获取任务调度请求,并用于实现图2以及图4中所执行的任一方法。在实现过程中,处理流程的各步骤可以通过处理器1020中的硬件的集成逻辑电路或者软件形式的指令完成图2以及图4中执行的方法。为了简洁,在此不再赘述。处理器1020用于实现上述方法所执行的程序代码可以存储在存储器1030中。存储器1030和处理器1020连接,如耦合连接等。
本申请实施例的一些特征可以由处理器1020执行存储器1030中的程序指令或者软件代码来完成/支持。存储器1030上在加载的软件组件可以从功能或者逻辑上进行概括,例如,图8所示的预测模块802、调度模块803,或者图9所示的预测模块902、调度模块903。而图8所示的获取模块801,或者,图9所示的获取模块901的功能可以由通信接口1010实现。
本申请实施例中涉及到的任一通信接口可以是电路、总线、收发器或者其它任意可以用于进行信息交互的装置。比如计算设备1000中的通信接口1010,示例性地,该其它装置可以是与该计算设备1000相连的设备等。
本申请实施例中涉及的处理器可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
本申请实施例中的耦合是装置、模块或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式,用于装置、模块或模块之间的信息交互。
处理器可能和存储器协同操作。存储器可以是非易失性存储器,比如硬盘或固态硬盘等,还可以是易失性存储器,例如随机存取存储器。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。
本申请实施例中不限定上述通信接口、处理器以及存储器之间的具体连接介质。比如存储器、处理器以及通信接口之间可以通过总线连接。所述总线可以分为地址总线、数据总线、控制总线等。
基于以上实施例,本申请实施例还提供了一种芯片,包括供电电路以及处理电路,所述供电电路用于对所述处理电路进行供电,所述处理电路用于获取待处理的任务,预测异构硬件平台包括的多个CPU中的目标CPU处理所述任务产生的综合代价,多个CPU中的部分CPU包括大核,多个CPU中的部分CPu包括小核,所述综合代价是根据所述目标CPU处理所述任务产生的输入输出IO代价、CPU代价以及附加代价得到的,所述附加代价根据所述任务的需求特征或所述目标CPU的运行状态进行确定;当所述目标CPU处理所述任务产生的综合代价满足调度条件,所述处理电路将所述任务调度至所述目标CPU。
示例性地,供电电路包括但不限于如下至少一个:供电子系统、电管管理芯片、功耗管理处理器或功耗管理控制电路。
基于以上实施例,本申请实施例还提供了一种计算机存储介质,该存储介质中存储软件程序,该软件程序在被一个或多个计算设备读取并执行时可实现上述任意一个或多个实施例提供的任务调度装置102执行的方法。所述计算机存储介质可以包括:U盘、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程设备的处理器以产生一个机器,使得通过计算机或其他可编程任务调度设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程设备以特定方式工作的计算 机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。
显然,本领域的技术人员可以对本申请实施例进行各种改动和变型而不脱离本申请实施例的范围。这样,倘若本申请实施例的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (22)

  1. 一种任务调度方法,其特征在于,所述方法应用于异构硬件平台,所述异构硬件平台包括多个CPU,所述多个CPU中的部分CPU包括大核,所述多个CPU中的部分CPU包括小核,所述方法包括:
    获取待处理的任务;
    预测所述多个CPU中的目标CPU处理所述任务产生的综合代价,所述综合代价是根据所述目标CPU处理所述任务产生的输入输出IO代价、CPU代价以及附加代价得到的,所述附加代价根据所述任务的需求特征或所述目标CPU的运行状态进行确定;
    当所述目标CPU处理所述任务产生的综合代价满足调度条件,将所述任务调度至所述目标CPU。
  2. 根据权利要求1所述的方法,其特征在于,所述目标CPU包括大核,所述当所述目标CPU处理所述任务产生的综合代价满足调度条件,将所述任务调度至所述目标CPU,包括:
    根据所述目标CPU处理所述任务产生的综合代价,确定所述任务调度至所述目标CPU的优先级;
    当所述任务调度至所述目标CPU的优先级为第一优先级时,将所述任务调度至所述目标CPU。
  3. 根据权利要求1所述的方法,其特征在于,所述目标CPU包括小核,所述当所述目标CPU处理所述任务产生的综合代价满足调度条件,将所述任务调度至所述目标CPU,包括:
    根据所述目标CPU处理所述任务产生的综合代价,确定所述任务调度至所述目标CPU的优先级;
    当所述任务调度至所述目标CPU的优先级为第二优先级时,将所述任务调度至所述目标CPU,所述第二优先级低于第一优先级。
  4. 根据权利要求1所述的方法,其特征在于,所述目标CPU包括大核,所述任务为多线程任务,所述当所述目标CPU处理所述任务产生的综合代价满足调度条件,将所述任务调度至所述目标CPU,包括:
    根据所述目标CPU处理所述任务产生的综合代价,确定所述任务调度至所述目标CPU的优先级;
    当所述任务调度至所述目标CPU的优先级为第三优先级时,将所述任务调度至所述目标CPU以及包括小核的CPU,所述第三优先级低于第一优先级,且所述第三优先级高于第二优先级。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述预测所述多个CPU中的目标CPU处理所述任务产生的综合代价,包括:
    当包括大核的CPU的可用资源小于阈值时,预测所述多个CPU中的目标CPU处理所述任务产生的综合代价。
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述附加代价根据所述任务向所述目标CPU请求的次数、所述目标CPU在单位时间内被请求的次数、所述目标CPU的工作频率、所述目标CPU包括的处理器核的数量中的任意一种或多种得到。
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述方法还包括:
    呈现配置界面;
    响应于用户在所述配置界面的操作,对所述异构硬件平台的任务调度模式进行配置,所述任务调度模式包括在线模式或离线模式。
  8. 一种任务调度方法,其特征在于,所述方法应用于异构硬件平台,所述异构硬件平台包括多个处理器核,所述多个处理器核包括大核以及小核,所述方法包括:
    获取待处理的任务;
    预测所述多个处理器核中的目标处理器核处理所述任务产生的综合代价,所述综合代价是根据所述目标处理器核处理所述任务产生的输入输出IO代价、计算代价以及附加代价得到的,所述附加代价根据所述任务的需求特征或所述目标处理器核的运行状态进行确定;
    当所述目标处理器核处理所述任务产生的综合代价满足调度条件,将所述任务调度至所述目标处理器核。
  9. 根据权利要求8所述的方法,其特征在于,所述当所述目标处理器核处理所述任务产生的综合代价满足调度条件,将所述任务调度至所述目标处理器核,包括:
    当所述目标处理器核处理所述任务产生的综合代价在所述多个处理器核分别处理所述任务所产生的综合代价中最小,将所述任务调度至所述目标处理器核。
  10. 一种任务调度装置,其特征在于,所述装置应用于异构硬件平台,所述异构硬件平台包括多个CPU,所述多个CPU中的部分CPU包括大核,所述多个CPU中的部分CPU包括小核,所述装置包括:
    获取模块,用于获取待处理的任务;
    预测模块,用于预测所述多个CPU中的目标CPU处理所述任务产生的综合代价,所述综合代价是根据所述目标CPU处理所述任务产生的输入输出IO代价、CPU代价以及附加代价得到的,所述附加代价根据所述任务的需求特征或所述目标CPU的运行状态进行确定;
    调度模块,用于当所述目标CPU处理所述任务产生的综合代价满足调度条件,将所述任务调度至所述目标CPU。
  11. 根据权利要求10所述的装置,其特征在于,所述目标CPU包括大核,所述调度模块用于:
    根据所述目标CPU处理所述任务产生的综合代价,确定所述任务调度至所述目标CPU的优先级;
    当所述任务调度至所述目标CPU的优先级为第一优先级时,将所述任务调度至所述目标CPU。
  12. 根据权利要求10所述的装置,其特征在于,所述目标CPU包括小核,所述调度模块用于:
    根据所述目标CPU处理所述任务产生的综合代价,确定所述任务调度至所述目标CPU的优先级;
    当所述任务调度至所述目标CPU的优先级为第二优先级时,将所述任务调度至所述目标CPU,所述第二优先级低于第一优先级。
  13. 根据权利要求10所述的装置,其特征在于,所述目标CPU包括大核,所述任务为多线程任务,所述调度模块用于:
    根据所述目标CPU处理所述任务产生的综合代价,确定所述任务调度至所述目标CPU的优先级;
    当所述任务调度至所述目标CPU的优先级为第三优先级时,将所述任务调度至所述目标CPU以及包括小核的CPU,所述第三优先级低于第一优先级,且所述第三优先级高于第二优先级。
  14. 根据权利要求10至13任一项所述的装置,其特征在于,所述预测模块,用于当包 括大核的CPU的可用资源小于阈值时,预测所述多个CPU中的目标CPU处理所述任务产生的综合代价。
  15. 根据权利要求10至14任一项所述的装置,其特征在于,所述附加代价根据所述任务向所述目标CPU请求的次数、所述目标CPU在单位时间内被请求的次数、所述目标CPU的工作频率、所述目标CPU包括的处理器核的数量中的任意一种或多种得到。
  16. 根据权利要求10至15任一项所述的装置,其特征在于,所述装置还包括:
    呈现模块,用于呈现配置界面;
    配置模块,用于响应于用户在所述配置界面的操作,对所述异构硬件平台的任务调度模式进行配置,所述任务调度模式包括在线模式或离线模式。
  17. 一种任务调度装置,其特征在于,所述装置应用于异构硬件平台,所述异构硬件平台包括多个处理器核,所述多个处理器核包括大核以及小核,所述装置包括:
    获取模块,用于获取待处理的任务;
    预测模块,用于预测所述多个处理器核中的目标处理器核处理所述任务产生的综合代价,所述综合代价是根据所述目标处理器核处理所述任务产生的输入输出IO代价、计算代价以及附加代价得到的,所述附加代价根据所述任务的需求特征或所述目标处理器核的运行状态进行确定;
    调度模块,用于当所述目标处理器核处理所述任务产生的综合代价满足调度条件,将所述任务调度至所述目标处理器核。
  18. 根据权利要求17所述的装置,其特征在于,所述调度模块用于当所述目标处理器核处理所述任务产生的综合代价在所述多个处理器核分别处理所述任务所产生的综合代价中最小,将所述任务调度至所述目标处理器核。
  19. 一种计算设备,其特征在于,所述计算设备包括处理器和存储器;
    所述处理器用于执行所述存储器中存储的指令,以使得所述计算设备执行如权利要求1至7中任一项所述的方法,或执行如权利要求8或9所述的方法。
  20. 一种任务调度系统,其特征在于,所述任务调度系统包括上层应用、如权利要求10至18任一项所述的任务调度装置,以及如权利要求1至9任一项所述的异构硬件平台;
    其中,所述上层应用,用于生成待处理的任务。
  21. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当其在至少一个计算设备上运行时,使得所述至少一个计算设备执行如权利要求1至9任一项所述的方法。
  22. 一种包含指令的计算机程序产品,其特征在于,当其在至少一个计算设备上运行时,使得所述至少一个计算设备执行如权利要求1至9中任一项所述的方法。
PCT/CN2023/097910 2022-06-02 2023-06-02 任务调度方法、装置、系统及相关设备 WO2023232127A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202210625934 2022-06-02
CN202210625934.9 2022-06-02
CN202210901085.5A CN117215732A (zh) 2022-06-02 2022-07-28 任务调度方法、装置、系统及相关设备
CN202210901085.5 2022-07-28

Publications (1)

Publication Number Publication Date
WO2023232127A1 true WO2023232127A1 (zh) 2023-12-07

Family

ID=89025695

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/097910 WO2023232127A1 (zh) 2022-06-02 2023-06-02 任务调度方法、装置、系统及相关设备

Country Status (1)

Country Link
WO (1) WO2023232127A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100262966A1 (en) * 2009-04-14 2010-10-14 International Business Machines Corporation Multiprocessor computing device
US20150040136A1 (en) * 2013-08-01 2015-02-05 Texas Instruments, Incorporated System constraints-aware scheduler for heterogeneous computing architecture
CN104778083A (zh) * 2015-03-27 2015-07-15 华为技术有限公司 异构多核可重构计算平台上任务调度的方法和装置
CN105117281A (zh) * 2015-08-24 2015-12-02 哈尔滨工程大学 一种基于任务申请信号和处理器内核执行代价值的任务调度方法
CN107797853A (zh) * 2016-09-07 2018-03-13 深圳市中兴微电子技术有限公司 一种任务调度方法、装置及多核处理器

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100262966A1 (en) * 2009-04-14 2010-10-14 International Business Machines Corporation Multiprocessor computing device
US20150040136A1 (en) * 2013-08-01 2015-02-05 Texas Instruments, Incorporated System constraints-aware scheduler for heterogeneous computing architecture
CN104778083A (zh) * 2015-03-27 2015-07-15 华为技术有限公司 异构多核可重构计算平台上任务调度的方法和装置
CN105117281A (zh) * 2015-08-24 2015-12-02 哈尔滨工程大学 一种基于任务申请信号和处理器内核执行代价值的任务调度方法
CN107797853A (zh) * 2016-09-07 2018-03-13 深圳市中兴微电子技术有限公司 一种任务调度方法、装置及多核处理器

Similar Documents

Publication Publication Date Title
Kraska et al. Sagedb: A learned database system
US10545789B2 (en) Task scheduling for highly concurrent analytical and transaction workloads
US9110697B2 (en) Sending tasks between virtual machines based on expiration times
US20230124520A1 (en) Task execution method and storage device
Gautam et al. A survey on job scheduling algorithms in big data processing
US8166022B2 (en) System, method, and apparatus for parallelizing query optimization
US20130263117A1 (en) Allocating resources to virtual machines via a weighted cost ratio
WO2017019879A1 (en) Multi-query optimization
CN104050042B (zh) Etl作业的资源分配方法及装置
EP1544753A1 (en) Partitioned database system
Wang et al. Toward elastic memory management for cloud data analytics
CN105677812A (zh) 一种数据查询方法及数据查询装置
CN103930875A (zh) 用于加速业务数据处理的软件虚拟机
JP2005196602A (ja) 無共有型データベース管理システムにおけるシステム構成変更方法
US10621000B2 (en) Regulating enterprise database warehouse resource usage of dedicated and shared process by using OS kernels, tenants, and table storage engines
US10771982B2 (en) Resource utilization of heterogeneous compute units in electronic design automation
CN110874271B (zh) 一种海量建筑图斑特征快速计算方法及系统
Senthilkumar et al. A survey on job scheduling in big data
US11954419B2 (en) Dynamic allocation of computing resources for electronic design automation operations
Wang et al. Elastic pipelining in an in-memory database cluster
EP3779720B1 (en) Transaction processing method and system, and server
Jin et al. Ditto: Efficient serverless analytics with elastic parallelism
WO2023232127A1 (zh) 任务调度方法、装置、系统及相关设备
Lei et al. Redoop: Supporting Recurring Queries in Hadoop.
CN114443686A (zh) 一种基于关系型数据的压缩图构建方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23815304

Country of ref document: EP

Kind code of ref document: A1