CN113568721A

CN113568721A - Task scheduling method and related equipment

Info

Publication number: CN113568721A
Application number: CN202010360908.9A
Authority: CN
Inventors: 姜凯华; 马达; 刘赫伟
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Cloud Computing Technologies Co Ltd
Priority date: 2020-04-29
Filing date: 2020-04-29
Publication date: 2021-10-29

Abstract

The application provides a task scheduling method and related equipment. Wherein, the method comprises the following steps: a task scheduler receives a first task; the task scheduler sends the task to a cluster according to a first resource proportion required by executing the task, the cluster is provided with a target node of the residual resources with a second resource proportion, so that the cluster can utilize the target node to execute the first task, the first resource proportion and the second resource proportion range belong to the same resource proportion range, and the first resource proportion and the second resource proportion are the same or different. The method can improve the cluster resource utilization rate and the node resource utilization rate.

Description

Task scheduling method and related equipment

Technical Field

The invention relates to the technical field of cloud computing, in particular to a task scheduling method and related equipment.

Background

With the advent of the digital age, the demand for network services has been on a complex and diversified trend, and service providers need to provide various types of resources to meet the increasing demand.

The platform as a service (PaaS) cloud service platform can integrate various existing service capabilities, measure and calculate basic service capabilities downwards according to service capability requirements, call hardware resources through an Application Programming Interface (API) provided by infrastructure as a service (IaaS), provide service scheduling center services upwards, monitor various resources of the platform in real time, and open the resources to users through the API. Because the PaaS cloud service platform has the advantages of low cost, high availability, easy maintenance, large load bearing and the like, and can be deployed quickly, realize services and reduce cost, it has become a common trend and common consensus in the industry to deploy services on the PaaS cloud service platform.

At present, a batch processing system (e.g., a volcano batch processing system) is mostly adopted for scheduling services (e.g., services on a container arrangement engine (kubernets) cluster) on a PaaS cloud service platform, and when a job (job) requires multiple types of resources, such as a Central Processing Unit (CPU) resource, a Graphics Processing Unit (GPU) resource, a memory resource, and the like, an existing scheduling method is adopted, so that resource imbalance on a single node is caused, resource utilization rate of the whole cluster is not high, and even scheduling failure is caused.

Therefore, how to avoid resource imbalance on a single node and improve the resource utilization rate of the cluster and the resource utilization rate of the node are problems to be solved urgently at present.

Disclosure of Invention

The embodiment of the invention discloses a task scheduling method and related equipment, which can avoid resource imbalance on a single node and improve the utilization rate of cluster resources.

In a first aspect, the present application provides a task scheduling method, where the method includes: a task scheduler receives a first task; the task scheduler sends the task to a cluster according to a first resource proportion required for executing the first task, the cluster is provided with a target node of the residual resources with a second resource proportion, so that the cluster can utilize the target node to execute the first task, the first resource proportion and the second resource proportion range belong to the same resource proportion range, and the first resource proportion and the second resource proportion are the same or different.

In the scheme provided by the application, the task scheduler determines a first resource proportion required by executing the task by analyzing the received first task, and then executes the task by using the target node of the residual resource with the second resource proportion in the cluster, so that the task scheduler is prevented from issuing the task according to the sequence of the time of receiving the task, and the resource utilization rate of the cluster can be improved.

With reference to the first aspect, in a possible implementation manner of the first aspect, a task scheduler binds the first task with a resource object pod, where the pod is deployed on the target node; the cluster sends the first task to a pod bound with the task.

In the scheme provided by the application, the task scheduler binds the first task with the pod, and directly allocates the first task to the node where the pod is deployed when allocating the task, so that the task can be smoothly executed.

With reference to the first aspect, in a possible implementation manner of the first aspect, the task scheduler determines the pod bound to the first task according to the first resource proportion and the resource priority.

In the scheme provided by the application, the task scheduler determines the pod bound with the first task through the first resource proportion and the resource priority required by the task execution, and ensures that the task with higher priority can be preferentially allocated with the pod, thereby ensuring that the first task can be smoothly executed.

With reference to the first aspect, in a possible implementation manner of the first aspect, the priority of the resource is a priority order of at least two resources: rare resources, a Central Processing Unit (CPU), a memory and a disk.

In the scheme provided by the application, the priority level of the rare resource is highest due to higher manufacturing cost of the rare resource, and the priority level of the memory and the disk is lower than that of the rare resource and the CPU due to lower values of the memory and the disk.

With reference to the first aspect, in a possible implementation manner of the first aspect, the rare resource includes one or more of the following: the system comprises a graphic processor GPU, an embedded neural network processor NPU and a field programmable gate array FPGA.

With reference to the first aspect, in a possible implementation manner of the first aspect, when the resource required for executing the first task includes a rare resource, the task scheduler sends the first task to a cluster having the rare resource, so that the cluster executes the first task by using the target node having the rare resource.

In the scheme provided by the application, the task scheduler identifies the resources required by the first task, so that the first task can be guaranteed to be sent to the target node capable of providing the corresponding resources, and the first task can be executed sequentially.

With reference to the first aspect, in a possible implementation manner of the first aspect, a task scheduler receives a plurality of tasks including the first task; and the task scheduler divides the tasks into a plurality of task queues according to the resource proportion required by executing each task in the tasks, wherein each task queue in the task queues corresponds to one resource proportion.

In the scheme provided by the application, the task scheduler divides the task queues according to the resource proportion required by each task, so that the received tasks can be divided into the task queues with different resource proportions, and the utilization rate of cluster resources can be further improved.

With reference to the first aspect, in a possible implementation manner of the first aspect, the task scheduler screens a plurality of tasks from the plurality of task queues according to a resource proportion, so as to send the screened plurality of tasks to the cluster and execute the screened plurality of tasks by using the target node by the cluster, a difference between remaining resources of the target node and resources required by the screened plurality of tasks is less than or equal to a threshold, and the screened plurality of tasks include the first task.

In the scheme provided by the application, the task scheduler obtains a plurality of tasks through screening to ensure that the difference value between the residual resources of the target node and the resources required by the plurality of screened tasks is less than or equal to a threshold value, so that the cluster resource utilization rate and the node resource utilization rate are improved.

With reference to the first aspect, in a possible implementation manner of the first aspect, when part of resources of the target node are completely occupied and at least one type of resources are left, the task scheduler replaces a task with a large occupied resource amount with a task with a small occupied resource amount and a same required resource ratio in the same task queue.

In the scheme provided by the application, when the task scheduler allocates the tasks, if the resources of the target node are found to be not fully utilized, the task replacement strategy is used, and the tasks with the same proportion and smaller resource occupation amount are used for replacing the tasks with the larger resource occupation amount, so that the resource utilization rate of the target node can be improved, and further the resource utilization rate of the cluster is improved.

In a second aspect, the present application provides a method for task scheduling, including: a task scheduler receives a plurality of tasks; the task scheduler divides the tasks into a plurality of task queues according to a resource proportion required by executing each task in the tasks, wherein each task queue of the task queues corresponds to one resource proportion; and the task scheduler screens at least one task from the task queues according to the resource proportion of the task queues, and the difference value between the residual resource of the target node and the total resource required for executing the at least one task is smaller than or equal to a threshold value, so that the at least one task is sent to the cluster and the cluster executes the at least one task by using the target node.

In the scheme provided by the application, the task scheduler divides the received tasks according to the resource proportion required by the tasks to obtain a plurality of task queues, and then screens out the tasks of which the difference value between the required resources and the residual resources of the target nodes is less than or equal to the threshold value, so that the cluster resources and the node resources can be fully utilized, and the utilization rate of the cluster resources and the utilization rate of the node resources can be improved.

In a third aspect, the present application provides a task scheduling apparatus, including: a receiving unit configured to receive a first task; and the processing unit is used for sending the task to a cluster according to a first resource proportion required by executing the first task, the cluster is provided with a target node of the residual resource with a second resource proportion, so that the cluster can utilize the target node to execute the first task, the first resource proportion and the second resource proportion range belong to the same resource proportion range, and the first resource proportion and the second resource proportion are the same or different.

With reference to the third aspect, in a possible implementation manner of the third aspect, the processing unit is further configured to bind the first task with a resource object pod, where the pod is deployed on the target node; the cluster sends the first task to a pod bound with the task.

With reference to the third aspect, in a possible implementation manner of the third aspect, the processing unit is further configured to determine a pod bound to the first task according to the first resource proportion and the resource priority.

With reference to the third aspect, in a possible implementation manner of the third aspect, the priority of the resource is a priority order of at least two resources: rare resources, a Central Processing Unit (CPU), a memory and a disk.

With reference to the third aspect, in a possible implementation manner of the third aspect, the rare resource includes one or more of the following: the system comprises a graphic processor GPU, an embedded neural network processor NPU and a field programmable gate array FPGA.

With reference to the third aspect, in a possible implementation manner of the third aspect, the processing unit is further configured to, when a resource required for executing the first task includes a rare resource, send the first task to a cluster having the rare resource, so that the cluster executes the first task by using the target node having the rare resource.

With reference to the third aspect, in a possible implementation manner of the third aspect, the receiving unit is further configured to receive a plurality of tasks including the first task; the processing unit is further configured to divide the plurality of tasks into a plurality of task queues according to a resource proportion required for executing each of the plurality of tasks, where each of the plurality of task queues corresponds to one resource proportion.

With reference to the third aspect, in a possible implementation manner of the third aspect, the processing unit is further configured to screen a plurality of tasks from the plurality of task queues according to a resource proportion, so as to send the screened plurality of tasks to the cluster and execute the screened plurality of tasks by using the target node by the cluster, where a difference between remaining resources of the target node and resources required by the screened plurality of tasks is smaller than or equal to a threshold, and the screened plurality of tasks includes the first task.

In a fourth aspect, the present application provides a task scheduling apparatus, including: a receiving unit for receiving a plurality of tasks; the processing unit is used for dividing the tasks into a plurality of task queues according to the resource proportion required by executing each task in the tasks, and each task queue in the task queues corresponds to one resource proportion; and screening at least one task from the task queues according to the resource proportion of the task queues, wherein the difference value between the residual resource of the target node and the total resource required for executing the at least one task is smaller than or equal to a threshold value, so that the at least one task is sent to the cluster and the cluster executes the at least one task by using the target node.

In a fifth aspect, the present application provides a computing device comprising a processor and a memory, wherein execution of the computer instructions stored by the memory causes the computing device to perform the first aspect and the method in combination with any one implementation manner of the first aspect.

In a sixth aspect, the present application provides a computing device comprising a processor and a memory, the processor executing computer instructions stored by the memory to cause the computing device to perform the second aspect and the method in combination with any one implementation manner of the second aspect.

In a seventh aspect, the present application provides a computer storage medium storing computer instructions for implementing the first aspect and a method combining any one of the implementation manners of the first aspect.

In an eighth aspect, the present application provides a computer storage medium storing computer instructions for implementing the second aspect and a method combining any one of the implementation manners of the second aspect.

In a ninth aspect, the present application provides a computer program product comprising computer instructions for implementing the first aspect and a method incorporating any one of the implementation manners of the first aspect.

In a tenth aspect, the present application provides a computer program product comprising computer instructions for implementing the second aspect described above and a method incorporating any one of the implementations of the second aspect described above.

Drawings

Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application;

fig. 2 is a schematic diagram of resource allocation provided in an embodiment of the present application;

FIG. 3 is a diagram of a system architecture provided by an embodiment of the present application;

fig. 4 is a flowchart illustrating a task scheduling method according to an embodiment of the present application;

FIG. 5 is a diagram illustrating a screening scheduling task according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a task alternative provided by an embodiment of the present application;

fig. 7 is a schematic structural diagram of a task scheduling apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a computing device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application are described below clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

First, a part of words and related technologies referred to in the present application will be explained with reference to the accompanying drawings so as to be easily understood by those skilled in the art.

A job (job) refers to a collection of program instances that need to be executed to complete a particular computing service, typically corresponding to a set of processes, containers, or other runtime entities on one or more computers.

Task refers to an individual instance in a collection of program instances within a job, typically corresponding to a process, container, or other runtime entity on a computer.

The resource (resource) is the sum of the computing, storage, network, etc. resources provided by the cloud that are needed to perform the task.

Another resource coordinator (YARN) is a new Hadoop resource manager, which is a universal resource management system that provides uniform resource management and scheduling for upper applications. The basic idea is to separate two main functions (resource management and job scheduling monitoring) of a job tracker (job tracker), and create a global resource manager (ResourceManager) and several application managers (applicationmasters) for applications.

A main resource fair scheduling algorithm (DRF) is a general maximum and minimum fair allocation strategy for multiple resources. The main idea is that in a multi-resource environment, the resource allocation of a user is determined by the dominant share of resources of the user, the dominant share of resources refers to a resource occupying the largest share of all the multiple resources already allocated to the user, and the DRF has four main characteristics, which are: sharing (sharing input), policy-enforcement (stripe-enforcement), pareto optimality (partial authorization) and jealousy-free are encouraged. The DRF provides encouragement sharing by ensuring that the resources of the system are statically and evenly distributed among users, who cannot obtain more resources than others; in addition, users cannot obtain more resource guarantee policy precautions by misreading their resource needs; the DRF allocates all available resources, and the pareto optimal is guaranteed without replacing the existing resource allocation; a user would not like the resource allocation guarantees of other users to be jealousy free.

Fig. 1 shows a possible application scenario of an embodiment of the present application. In this application scenario, resource manager 120 is configured to receive jobs (also referred to as applications) submitted by a plurality of client node clusters 110, such as jobs submitted by client nodes 1110 and 1120, with application clients deployed in each client node, such as application client 1111 deployed in client node 1110, for submitting jobs to resource manager 120. Resource manager 120 is responsible for resource management and allocation of all nodes in cluster 130, cluster 130 includes a plurality of nodes, such as node 1310 and node 1320, resource manager 120 includes scheduler 121 and job manager 122, scheduler 121 performs resource allocation according to resource requirements of jobs submitted by client nodes, resource allocation units are represented by resource objects (pod), and can be represented by resource containers (resource containers), for example, a dynamic resource allocation unit, which encapsulates resources such as memory, CPU, disk, network, and the like together to limit the amount of resources used by each task, and job manager 122 is responsible for managing all jobs in the system, including job submission, negotiating resources with scheduler 121 to start the node manager in each node, and selecting a corresponding node to issue a task. Each node is deployed with a corresponding node manager and one or more containers, for example, the node manager 1311 and the container 1312 are deployed in the node 1310, the node manager 1311 is responsible for resource and task management of the node 1310, and reports resource usage of the node and an operation state of the container 1312 to the resource manager 120 at regular time, and receives and processes various requests such as container start/stop from the job manager 122.

When allocating resources to send a task, the resource manager 120 has multiple scheduling policies, and generally selects a DRF policy for a task including multiple types of resource requests. And the DRF calculates the proportion of each user demand resource which occupies the most cluster resources, takes the resource with the largest proportion value as the main resource of the user, and then allocates the resource to the user corresponding to the minimum proportion of the current main resource occupying the cluster resources. And then, recalculating the proportion of the main resource of each user in the cluster resource, and repeatedly executing the allocation process until the cluster resource is allocated, so that the DRF realizes cluster load balance by performing maximized resource allocation on the user with the minimum proportion value of the main resource in the cluster resource in all the users, thereby improving the utilization rate of the cluster resource. For example, the total available resources of the cluster are 9 CPUs and 18 Gigabytes (GB) of memory, and it is assumed that each task of user a requests resources of 1 CPU and 4GB of memory and each task of user B requests resources of 3 CPUs and 1GB of memory. For the user A, the proportion of the CPU resource to the cluster resource is 1/9, and the proportion of the memory resource to the cluster resource is 2/9, so the main resource is the memory; for user B, the ratio of CPU resources to cluster resources is 1/3, and the ratio of memory resources to cluster resources is 1/18, so the primary resource is CPU. When the resource allocation is specifically performed, since both the user a and the user B have not allocated resources yet at the beginning, the main resource allocation amounts of the user a and the user B are both 0, and one of the users can be arbitrarily selected for allocation, for example, the user B is selected for resource allocation first, so that after the first allocation is finished, the user a does not allocate resources, the proportion of the main resource amount of the user a to the total amount of the cluster resources is 0, and the user B is allocated 3 CPUs, so that the proportion of the main resource amount of the user B to the total amount of the cluster resources is 1/3. Since 0 is less than 1/3, when allocating for the second time, it is necessary to allocate resources to user a, and after the second allocation is finished, the ratio of the main resource amount of user a to the total cluster resource amount is 2/9, and user B is still 1/3. Since 2/9 is less than 1/3, resources need to be allocated to user a continuously, and after the third allocation is finished, the proportion of the main resource amount of user a to the total cluster resource amount is increased to 4/9, and user B is still 1/3. Since 1/3 is smaller than 4/9, at this time, the resource needs to be allocated to user B, and after the fourth allocation is finished, the proportion of the main resource amount of user B to the total cluster resource amount is increased to 2/3, and user a is still 4/9. Since 4/9 is less than 2/3, resources are continuously allocated to user a, after allocation is finished, the proportion of the main resource amount of user a to the total cluster resource amount is increased to 2/3, and user B is still 2/3, at this time, after CPU allocation is finished, resource allocation needs to be stopped, and finally user a is allocated three times, and user B is allocated twice. The specific allocation result is shown in fig. 2, and it can be seen that user a is finally allocated with 3 CPUs and 12GB of memory, and user B is finally allocated with 6 CPUs and 2GB of memory.

It should be noted that, in a cluster, due to considerations such as cost and demand, rare resources such as a GPU, an embedded neural Network Processor (NPU), a Field Programmable Gate Array (FPGA) and the like are usually provided by only a few nodes, are sparsely distributed in the cluster, and are much smaller in number, and are generally 1 to 2 orders of magnitude smaller than common resources such as a CPU, a memory and the like. Therefore, when processing a task related to the demand of multiple types of resources of rare resources, due to the difference in the number of rare resources and common resources, the DRF policy causes an excessively large proportion of rare resources required by a user who demands rare resources to cluster rare resources, and after the allocation sequence is reached, until the user is allocated resources, common resources on a node may be already occupied and cannot execute the task, so that task scheduling is disabled, rare resources on the node are wasted, and the utilization rate of the cluster resources is reduced.

In order to solve the above problems, the present application provides a task scheduling method and related devices, which can classify tasks according to the proportion of resources required by the tasks before task scheduling, and screen the tasks to be scheduled according to the total amount and proportion of available resources of a cluster, so as to improve the utilization rate of the cluster resources and avoid wasting the resources.

As shown in fig. 3, the user set 310 includes a plurality of users, for example, a user 311, a user 312, and a user 313, each user corresponds to a plurality of jobs and submits the jobs to the task scheduler 320, the task scheduler 320 converts each received job into a task to be executed, and divides each task into a corresponding task queue, for example, a task queue 321, a task queue 322, and a task queue 323, according to a proportion of resources required by each task. Task scheduler 320 requests resource manager 330 to obtain the status of available resources of current cluster 340, resource manager 330 is used to manage the resources of the various nodes in cluster 340, for example, the node 341, the node 342, and the node 343, the resources of the node include a CPU, a memory, a GPU, a disk, etc., the resource manager 330 returns the total amount and the proportion of the available resources of each type of resources currently in the cluster 340 to the task scheduler 320 according to the resource usage reported by each node in the cluster 340, the task scheduler 320 selects the task to be scheduled in this round from each task queue according to the total amount and the proportion of the available resources of each type of resources currently in the cluster 340 returned by the resource manager 330, and issues the task to the cluster 340, each node in the cluster 340 executes the task issued by the task scheduler 320 by using the pod corresponding to the task, the execution result is then returned to the task scheduler 320, and the task scheduler 320 feeds back the result returned by the cluster 340 to the users in the user set 310.

The technical scheme of the embodiment of the application can be applied to a scene that task scheduling of multi-type resource requirements is required in any resource management system (such as YARN).

Referring to fig. 4 in conjunction with the application scenarios shown in fig. 1 and fig. 3, fig. 4 is a schematic flowchart of a task scheduling method according to an embodiment of the present application. As shown in fig. 4, the method includes, but is not limited to, the following steps:

s410: the task scheduler receives a task.

Specifically, the task scheduler may be the task scheduler 320 shown in fig. 3, and the task scheduler receives jobs submitted by various users, each job includes a task, and each task requires at least one resource type.

Optionally, the task scheduler receives a plurality of tasks, and divides the plurality of tasks into a plurality of task queues according to a resource proportion required for executing each task of the plurality of tasks, where each task queue of the plurality of task queues corresponds to one resource proportion.

Specifically, before the task scheduler performs task scheduling, for all received tasks, all the tasks are not arranged into a queue according to the receiving time sequence, but are classified according to the resource proportion required by each task, and all the tasks are respectively divided into task queues with corresponding resource proportions.

Illustratively, the task scheduler receives four tasks, which are task a, task B, task C and task D, respectively, where executing task a requires 3 CPUs and 2GB memories, executing task B requires 2 CPUs, 2GB memories and 1 GPU, executing task C requires 4 CPUs, 3GB memories and 2 NPUs, and executing task D requires 6 CPUs and 4GB memories. The proportion of resources required by task A is 3:2, the ratio of the task B is 2:2:1, the ratio of the task C is 4:3:2, the ratio of the task D is 3:2, the resource ratios required by the task A and the task D are the same, therefore, the task scheduler divides the task A and the task D into the same task queue, the resource ratios required by the task B and the task C are different, the task scheduler needs to divide the task B and the task C into the task queues with the corresponding resource ratios respectively, if the task queue with the corresponding resource ratio does not exist at present, a task queue with the resource ratio is created, and the task is divided into the newly-created task queue.

Optionally, after dividing the task into any queue, the task scheduler sets, for the task in each task queue, a priority of each task according to attributes such as a life cycle of the task, a user level corresponding to the task, and a resource size required by the task, and of course, the priority of the task may also be set according to actual needs, which is not limited in this application. After the setting is finished, when the task scheduler performs task scheduling, the tasks in the task queue are selected according to the priority level and executed.

Illustratively, the task scheduler sets the priority of the task according to the user level corresponding to the task and the life cycle of the task, the priority of each task is represented by a four-digit binary number, the first two digits represent the user level, the second two digits represent the life cycle, the two digits are arranged from large to small, and the larger the value is, the higher the priority is. For example, if the priority of a task is 0x0101, the priority of the task is higher than the priority 0x0110 of a task of the same user level but with a longer life cycle, but lower than the priority 0x1001 of a task of the same life cycle but with a higher user level. In addition, for a plurality of tasks belonging to the same priority, the tasks with large demand resource amount are arranged in front, that is, when the task scheduler schedules the tasks in the task queue, the tasks with the same priority but larger demand resource amount are preferentially selected.

S420: and the task scheduler selects a task set to be scheduled.

Specifically, the task scheduler obtains an available resource state of the current cluster, including a resource released after a task scheduled in a previous scheduling period is completed, and specifically may release an occupied cluster resource by deleting a pod that completes the task, and a resource recovered after the task expires. After the available resource state of the current cluster is obtained, the task set scheduled in the current round is selected from each task queue according to the total amount of each type of resource in the cluster and the proportion of each type of resource and in combination with the priority, so that the cluster resources can be fully utilized. The difference between the remaining resources of the cluster and the resources required by all tasks in the selected task set is less than or equal to a threshold, and the threshold may be set as needed, which is not limited in this application.

For example, as shown in fig. 5, assuming that the number of available CPUs, the number of available memories, and the number of available GPUs in the current cluster are 3, 5GB, and 1, the corresponding ratio is 3:5:1, and there are three task queues, namely a task queue 1, a task queue 2, and a task queue 3, where the ratio of the CPU, the memory, and the GPU required by each task in the task queue 1 is 1:2:0, the ratio of each task in the task queue 2 is 1:1:0, and the ratio of each task in the task queue 3 is 1:2: 1. According to a selection rule, the proportion of the CPU, the memory and the GPU distributed to the tasks selected from each task queue should meet the available resource ratio (i.e. 3:5:1), for example, a tasks are selected from the task queue 1 according to the priority, each task needs 1 CPU and 2GB memory, b tasks are selected from the task queue 2, each task needs 1 CPU and 1GB memory, c tasks are selected from the task queue 3, each task needs 1 CPU, 2GB memory and 1 GPU, and then (a + b + c): (2a + b +2 c): when (0+0+ c) is 3:5:1, a: b: when c is 1:1:1, each time a task is selected from the task queue 1, 1 task needs to be simultaneously selected from the task queue 2 and the task queue 3, respectively, so that the cluster resources can be fully utilized.

It should be noted that, in the case of more resource types and the number of task queues, the linear rule may be used to solve the ratio of the number of tasks selected from each task queue. Optionally, when there are multiple groups of solutions that meet the ratio of resources of each type of the current cluster, the best matching solution may be obtained by screening according to the number of tasks in each task queue and the total amount of resources available for the cluster. For example, if the number of tasks in a task queue is too large, a solution with a higher number of task queues may be selected as the best matching solution.

Optionally, the task scheduler obtains the remaining resources of each node in the current cluster from the resource manager, for each node, may obtain the remaining amount and the proportion of each type of resource of the node, determine, according to the remaining amount and the proportion of each type of resource of each node, a task queue having the same resource proportion as the node, and then screen the task to be scheduled from the task queue in combination with the priority, where a difference between the remaining resources of the node and the resources required by the screened task is less than or equal to a threshold, and the threshold may be set as needed, so that the screened task to be scheduled can complete occupation or maximally occupy the remaining resources of the node.

Illustratively, there are three nodes in the cluster, node 1, node 2, and node 3, where the remaining resources of node 1 include: 3 CPUs, 9GB memories, 3 GPUs and 0 NPUs in a ratio of 1:3:1:0, and the remaining resources of the node 2 comprise: 6 CPUs, 12GB memories, 0 GPUs and 0 NPUs in a ratio of 1:2:0:0, and the remaining resources of the node 3 comprise: 3 CPUs, 3GB memory, 0 GPU and 1 NPU, and the proportion is 3:3:0: 1. When selecting a task set to be scheduled, a task scheduler screens the tasks to be scheduled from a task queue with a resource ratio of 1:3:1:0 so that the total amount of resources required by the screened tasks to be scheduled is the same as or slightly smaller than the total amount of resources remaining in the node, and sends the screened tasks to the node 1 so that the resources of the node 1 can be fully utilized, and similarly, the task scheduler selects the tasks to be scheduled from the task queue with the resource ratio of 1:2:0:0 and sends the tasks to be scheduled to the node 2, and selects the tasks to be scheduled from the task queue with the resource ratio of 3:3:0:1 and sends the tasks to the node 3.

Therefore, the task scheduler can ensure that the cluster resources can be fully utilized and improve the utilization rate of the cluster resources by selecting the task sets to be scheduled which meet the proportion of the available resources of each type of the current cluster from each task queue.

S430: and the task scheduler sends the tasks in the task set to be scheduled to the cluster.

Specifically, a task scheduler binds a task in a task set to be scheduled with a resource object (pod), the pod is deployed on a node in a cluster, and the task scheduler sends the task to the node where the pod is deployed, so that the node can successfully execute the task.

Optionally, the task scheduler determines the pod bound to the task according to the resource proportion and the resource priority required by each task, and the resource priority may be set according to actual needs, which is not limited in this application, and for example, may be set according to the following rules: the priority of the rare resource is higher than that of the CPU, the priority of the CPU is higher than that of the memory, and the priority of the memory is higher than that of the disk. The rare resources may include special resources such as a GPU, an NPU, and an FPGA, and thus when the task scheduler issues the task, the task requiring the rare resources is preferentially sent to the node where the pod bound to the task is deployed, so that the node can preferentially execute the task.

Illustratively, the to-be-scheduled tasks selected by the task scheduler collectively include a task 1, a task 2 and a task 3, wherein resources required by the task 1 include a GPU, a CPU and a memory, resources required by the task 2 include the CPU and the memory, and resources required by the task 3 include the memory and a disk, so that when the task scheduler sends the task, the task 1 is preferentially issued to a corresponding node, so that the task 1 can preferentially occupy the resources of the node, then the task 2 is sent to the corresponding node, and finally the task 3 is sent.

It is easy to understand that the task scheduler sends the tasks needing rare resources preferentially according to the set priority, so that the rare resources can be fully utilized, and the cost is saved.

It should be noted that various types of resources of each node in the cluster are limited, and a situation may occur in which some types of resources are completely occupied and other types of resources remain, and at this time, a task issued to the node in the situation needs to be readjusted, so that the resources of the node can be fully utilized.

Optionally, when the resource of the node cannot be fully utilized, the task scheduler replaces the task with the larger resource amount with the task with the same required resource ratio but with the smaller resource amount, and repeatedly executes the replacement operation until the resource of the node is completely occupied or replaced with the task with the smaller granularity.

Illustratively, as shown in fig. 6, the remaining available resources of a node are 4 CPUs and 8GB memories, and 4 tasks to be scheduled exist in a task set to be scheduled, which are respectively task a1, task a2, task A3, and task B1, where executing task a1 requires 4 CPUs and 4GB memories, executing task a2 requires 2 CPUs and 2GB memories, executing task A3 requires 1 CPU and 1GB memory, executing task B1 requires 1 CPU and 5GB memories, and tasks a1, a2, and A3 require the same proportion of resources and belong to the same task queue. When the task scheduler issues the task, if the task a1 is issued to the node for execution, 4GB of memory cannot be used, and the resource utilization rate of the node is reduced, at this time, the task scheduler needs to replace the task a1 with the task a2 that belongs to the same task queue as the task a1 but occupies a smaller amount of resources, after the replacement, the remaining available resources of the node are 2 CPUs and 6GB of memory, and the resources that the task a2 and the task B1 need to occupy are the same as the remaining available resources of the node, so that the task scheduler issues the task A3 and the task B1 to the node for execution, and thus the resources of the node are fully occupied, no idle resources occur, and the resource utilization rate of the node reaches the maximum.

It can be seen that, in the scheduling process, the task scheduler replaces the task with larger occupied resource amount, which is already issued to the node, by using the task with smaller occupied resource amount in the same task queue, so that node resource idleness can be avoided, node resources can be fully utilized, and the resource utilization rate of the node is effectively improved, thereby further improving the utilization rate of cluster resources.

The method of the embodiments of the present application is described in detail above, and in order to better implement the above-mentioned aspects of the embodiments of the present application, correspondingly, the following also provides related equipment for implementing the above-mentioned aspects in a matching manner.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a task scheduling device according to an embodiment of the present application. As shown in fig. 7, the task scheduler 700 includes a receiving unit 710 and a processing unit 720. Wherein,

a receiving unit 710, configured to receive a first task.

Specifically, the receiving unit 710 is configured to perform the foregoing step S410, and optionally perform an optional method of the foregoing steps.

A processing unit 720, configured to send the first task to a cluster according to a first resource proportion required for executing the first task, where the cluster has a target node of remaining resources with a second resource proportion, so that the cluster executes the first task by using the target node, where the first resource proportion and the second resource proportion belong to the same resource proportion range, and the first resource proportion is the same as or different from the second resource proportion range.

Specifically, the processing unit 720 is configured to perform the foregoing steps S420 and S430, and optionally perform methods optional in the foregoing steps.

In a possible implementation manner, the processing unit 720 is further configured to bind the first task with a resource object pod, where the pod is deployed on the target node; the cluster sends the first task to a pod bound with the task.

In a possible implementation manner, the processing unit 720 is further configured to determine a pod bound to the first task according to the first resource proportion and the resource priority.

In one possible implementation, the priority of the resource is a priority order of at least two resources: rare resources, a Central Processing Unit (CPU), a memory and a disk.

In one possible implementation, the scarce resource includes one or more of: the system comprises a graphic processor GPU, an embedded neural network processor NPU and a field programmable gate array FPGA.

In a possible implementation manner, the processing unit 720 is further configured to, when the resource required for executing the first task includes a rare resource, send the first task to a cluster having the rare resource, so that the cluster executes the first task by using the target node having the rare resource.

In a possible implementation manner, the receiving unit 710 is further configured to receive a plurality of tasks including the first task; the processing unit 720 is further configured to divide the plurality of tasks into a plurality of task queues according to a resource proportion required for executing each task of the plurality of tasks, where each task queue of the plurality of task queues corresponds to one resource proportion.

In a possible implementation manner, the processing unit 720 is further configured to screen a plurality of tasks from the plurality of task queues according to a resource proportion, so as to send the screened plurality of tasks to the cluster and the cluster executes the screened plurality of tasks by using the target node, where a difference between the remaining resources of the target node and the resources required by the screened plurality of tasks is less than or equal to a threshold, and the screened plurality of tasks includes the first task.

It should be noted that the structure of the task scheduling device and the process of dividing the received task by the task scheduling device according to the required resource proportion to improve the utilization rate of the cluster resource are merely examples, and should not be specifically limited, and each unit in the task scheduling device may be increased, decreased, or combined as needed. In addition, for the sake of brevity, the operations and/or functions of each unit in the task scheduling device are not described again in order to implement the corresponding flow of the method described in fig. 4.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a computing device according to an embodiment of the present application. As shown in fig. 8, the computing device 800 includes: a processor 810, a communication interface 820, and a memory 830, the processor 810, the communication interface 820, and the memory 830 being interconnected by an internal bus 840.

The computing device 800 may be the resource manager 120 of fig. 1, in which the scheduler 121 and job manager 122 are deployed, or the task scheduler 320 of fig. 3. The functions performed by the resource manager of FIG. 1 and the task scheduler 320 of FIG. 3 are actually performed by the processor 810 of the computing device.

The processor 810 may be formed of one or more general-purpose processors, such as a Central Processing Unit (CPU), or a combination of a CPU and a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.

The bus 840 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus 840 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 8, but not only one bus or type of bus.

Memory 830 may include volatile memory (volatile memory), such as Random Access Memory (RAM); the memory 830 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory (flash memory), a Hard Disk Drive (HDD), or a solid-state drive (SSD); the memory 830 may also include combinations of the above. The program code may be for implementing the functional units shown in the task scheduler 700 or for implementing the method steps having the task scheduler as the execution entity in the method embodiment shown in fig. 4.

Embodiments of the present application also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, may implement part or all of the steps of any one of the method embodiments described above, and implement the functions of any one of the functional units described above in fig. 7.

Embodiments of the present application also provide a computer program product, which when run on a computer or a processor, causes the computer or the processor to perform one or more steps of any of the methods described above. The respective constituent elements of the above-mentioned apparatus may be stored in the computer-readable storage medium if they are implemented in the form of software functional units and sold or used as independent products.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It should also be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A method for task scheduling, comprising:

a task scheduler receives a first task;

and the task scheduler sends the task to a cluster according to a first resource proportion required by executing the first task, the cluster is provided with target nodes of residual resources with a second resource proportion, so that the cluster can utilize the target nodes to execute the first task, the first resource proportion and the second resource proportion range belong to the same resource proportion range, and the first resource proportion and the second resource proportion are the same or different.

2. The method of claim 1, wherein the method comprises:

the task scheduler binds the first task with a resource object pod, the pod being deployed on the target node;

the cluster sends the first task to a pod bound with the task.

3. The method of claim 2, wherein the task scheduler binding the first task with a pod, comprising:

and the task scheduler determines the pod bound with the first task according to the first resource proportion and the resource priority.

4. A method according to any of claims 1 to 3, wherein the resource priority is a priority order of at least two of the following resources:

rare resources, a Central Processing Unit (CPU), a memory and a disk.

5. The method of any of claims 1 to 4, wherein the rare resource comprises one or more of:

the system comprises a graphic processor GPU, an embedded neural network processor NPU and a field programmable gate array FPGA.

6. The method of any of claims 1 to 5, wherein the task scheduler sending the task to the cluster in a first proportion of resources required to execute the first task, comprising:

when the resources required for executing the first task comprise rare resources, sending the first task to a cluster with the rare resources so that the cluster executes the first task by utilizing the target node with the rare resources.

7. The method of any one of claims 1 to 6,

the task scheduler receives a first task, comprising: the task scheduler receiving a plurality of tasks including the first task;

the method comprises the following steps: and the task scheduler divides the tasks into a plurality of task queues according to the resource proportion required by executing each task in the tasks, wherein each task queue in the task queues corresponds to one resource proportion.

8. The method of claim 7, wherein the method comprises:

the task scheduler screens a plurality of tasks from the task queues according to a resource proportion so as to send the screened tasks to the cluster and the cluster executes the screened tasks by utilizing the target node, wherein a difference value between the residual resource of the target node and the resource required by the screened tasks is smaller than or equal to a threshold value, and the screened tasks comprise the first task.

9. A method for task scheduling, comprising:

the task scheduler receives a plurality of tasks;

the task scheduler divides the tasks into a plurality of task queues according to a resource proportion required by executing each task in the tasks, wherein each task queue of the task queues corresponds to one resource proportion;

and the task scheduler screens at least one task from the task queues according to the resource proportion of the task queues, and the difference value between the residual resource of the target node and the total resource required for executing the at least one task is smaller than or equal to a threshold value, so that the at least one task is sent to the cluster and the cluster executes the at least one task by using the target node.

10. A task scheduling apparatus, comprising:

a receiving unit configured to receive a first task;

and the processing unit is configured to send the first task to a cluster according to a first resource proportion required for executing the first task, where the cluster has a target node of a remaining resource with a second resource proportion, so that the cluster executes the first task by using the target node, where the first resource proportion and the second resource proportion belong to the same resource proportion range, and the first resource proportion and the second resource proportion are the same or different.

11. The apparatus of claim 10,

the processing unit is further configured to bind the first task with a resource object pod, where the pod is deployed on the target node; the cluster sends the first task to a pod bound with the task.

12. The apparatus of claim 11,

the processing unit is further configured to determine a pod bound to the first task according to the first resource proportion and the resource priority.

13. The apparatus according to any of claims 10 to 12, wherein the priority of the resources is a priority order of at least two of the following resources:

rare resources, a Central Processing Unit (CPU), a memory and a disk.

14. The apparatus of any of claims 10 to 13, wherein the rare resources comprise one or more of:

15. The apparatus of any one of claims 10 to 14,

the processing unit is further configured to, when the resource required for executing the first task includes a rare resource, send the first task to a cluster having the rare resource, so that the cluster executes the first task by using the target node having the rare resource.

16. The apparatus of any one of claims 10 to 15,

the receiving unit is further configured to receive a plurality of tasks including the first task;

the processing unit is further configured to divide the plurality of tasks into a plurality of task queues according to a resource proportion required for executing each of the plurality of tasks, where each of the plurality of task queues corresponds to one resource proportion.

17. The apparatus of claim 16,

the processing unit is further configured to screen a plurality of tasks from the plurality of task queues according to a resource proportion, so that the screened plurality of tasks are sent to the cluster and the cluster executes the screened plurality of tasks by using the target node, a difference between the remaining resources of the target node and the resources required by the screened plurality of tasks is smaller than or equal to a threshold, and the screened plurality of tasks include the first task.

18. A task scheduling apparatus, comprising:

a receiving unit for receiving a plurality of tasks;

the processing unit is used for dividing the tasks into a plurality of task queues according to the resource proportion required by executing each task in the tasks, and each task queue in the task queues corresponds to one resource proportion; and screening at least one task from the task queues according to the resource proportion of the task queues, wherein the difference value between the residual resource of the target node and the total resource required for executing the at least one task is smaller than or equal to a threshold value, so that the at least one task is sent to the cluster and the cluster executes the at least one task by using the target node.

19. A computing device, comprising a processor and a memory, the processor executing computer instructions stored by the memory to cause the computing device to perform the method of any of claims 1 to 8.

20. A computing device, comprising a processor and a memory, the processor executing computer instructions stored by the memory to cause the computing device to perform the method of claim 9.

21. A computer storage medium having computer instructions stored thereon for implementing the method of any one of claims 1 to 8.

22. A computer storage medium having computer instructions stored thereon for implementing the method of claim 9.

23. A computer program product comprising computer instructions for implementing the method of any one of claims 1 to 8.

24. A computer program product comprising computer instructions for implementing the method of claim 9.