CN112965797A

CN112965797A - Combined priority scheduling method for complex tasks under Kubernetes environment

Info

Publication number: CN112965797A
Application number: CN202110244427.6A
Authority: CN
Inventors: 陈静; 杜甜甜; 李娜; 郭莹; 肖恭翼; 王筠
Original assignee: Shandong Computer Science Center National Super Computing Center in Jinan
Current assignee: Shandong Computer Science Center National Super Computing Center in Jinan
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2021-06-15
Anticipated expiration: 2041-03-05
Also published as: CN112965797B

Abstract

The combined priority scheduling method for complex tasks under the Kubernetes environment is specifically realized by the following steps: a) calculating the actual parallelism of each group of tasks; b) acquiring the key degree of the task; c) acquiring a user priority; d) acquiring the dynamic priority of the user; e) calculating the task urgency; f) normalization processing of parallelism and urgency; g) seeking a priority value; h) pod ordering and scheduling. According to the combined priority scheduling method, due to the fact that task parallelism is considered when the priority is set, the problem that task execution fails due to the fact that other tasks occupy node resources in advance and parallel tasks cannot obtain the resources can be solved. Secondly, the task emergency degree is considered when the priority is set, and the emergency task can be ensured to preempt the resources occupied by the non-emergency task when the node resources are insufficient, so that the emergency task is successfully executed.

Description

Combined priority scheduling method for complex tasks under Kubernetes environment

Technical Field

The invention relates to a combined priority scheduling method, in particular to a combined priority scheduling method for complex tasks under a Kubernetes environment.

Background

Artificial intelligence has been applied and developed in many fields as the most promising technology in the new period, and not all artificial intelligence computing is performed on a cloud platform in a strict sense at present, but cloud computing is still the basic computing platform of artificial intelligence and is a convenient way for integrating the capability of artificial intelligence into millions of applications. Cloud computing is a service related to information technology, software, and the internet, and provides dynamic and easily extensible resources through the internet, and generally, these resources are virtualized resources, and a cloud refers to such a shared computing resource pool. The artificial intelligence not only enriches the characteristics of the cloud computing service, but also enables the cloud computing service to better meet the requirements of business scenes, and further liberates manpower. Machine learning is a method for realizing artificial intelligence, and is the key point of artificial intelligence technology. For large-scale data and computing tasks, machine learning usually requires thousands of times of iterative computation, so that the demand for cloud computing resources is very large, and the time cost for training and optimizing a model is also high. In order to complete the machine learning task quickly in the limited resources, the cloud computing resources need to be scheduled and allocated reasonably and effectively.

Kubernetes is an open source container cluster management platform which is very hot in the field of cloud computing, and has very complete cluster management capability. A pod is the smallest unit in kubernets that can be created and deployed, containing one or more containers. Tasks are mapped into one or more pods in kubernets, and the pods are also required to be prioritized because the tasks have precedence and need to be prioritized. Kubernetes divides pod into three QoS (quality of service) classes: guaranteed: the priority is highest; best Effort: the lowest priority; burst table: the priority is between the first two. In addition to QoS classes, kubernets also allow users to customize the priority of a pod. A definition of priority needs to be submitted in kubernets, where attributes value are assigned. After defining the priority, the pod can declare its use.

In the Kubernetes default priority definition, value needs to be assigned by a user, when a complex task is faced, a plurality of influence factors need to be considered, and how to give proper priority to the influence factors becomes a key.

Disclosure of Invention

In order to overcome the defects of the technical problems, the invention provides a combined priority scheduling method for complex tasks in a Kubernetes environment.

According to the combined priority scheduling method for complex tasks under the Kubernetes environment, the tasks needing to be scheduled through the Kubernetes resource management platform are respectively task1, task2, … and task n, and n tasks are counted; the n tasks are divided into q groups, q is more than or equal to 1 and less than or equal to n, and the ith group is set to contain h_iA task, i is less than or equal to q, h_iN is less than or equal to n, namely the parallelism of the ith group of tasks is h_iH in group i_iEach task is marked as task_i1、task_i2、…、

The combined priority scheduling method for the complex tasks under the Kubernetes environment is characterized by being specifically realized through the following steps:

a) calculating the actual parallelism of each group of tasks; setting the number of work nodes contained in the hardware resources as m, and the number of CPU cores used for task calculation on each work node as c, wherein the maximum task concurrency amount of mxc supported by the hardware resources is mxc; task parallelism h for each group_iThe maximum task concurrency supported by hardware resources should be prioritized to the minimum, so the actual parallelism P of the ith group of tasks_iThe calculation is carried out by the formula (1):

P_i＝min(h_i,m×c) (1)

until the actual parallelism of all task groups is completely solved;

b) acquiring the key degree of the task; for key task in all tasks task1, task2, … and task nDistributing a high key coefficient H to tasks, distributing a low key coefficient W to other tasks, wherein H is larger than W; for h within the ith group_iTask_i1、task_i2、…、

The jth task in the ith group is obtained by using a selection function (2)_ijTask criticality of k_ij：

k_ij＝choice(H,W) (2)

Wherein i is less than or equal to q, j is less than or equal to h_i，H∈N^*、W∈N^*；

c) Acquiring a user priority; assigning user priority U to all tasks, and setting h in ith group_iEach task is task_i1、task_i2、…、

Then its assigned user priority is Pr in turn_i1、Pr_i2、…、

Obtaining the jth task in the ith group by using a formula (3)_ijUser priority of (2):

U_ij＝Pr_ij (3)

wherein i is less than or equal to q, j is less than or equal to h_i，Pr_ij∈N^*；

d) Acquiring the dynamic priority of the user; the dynamic priority D of the user is determined by the idle time L of the task, and the task with smaller idle time has higher dynamic priority; for h within the ith group_iTask_i1、task_i2、…、

Solving the jth task in the ith group by using a formula (4)_ijDynamic priority D of_ij：

Wherein the content of the first and second substances,

as an upward rounding function, L_ijJth task in ith group_ijIdle time of L_ijThe value range is as follows: l is more than or equal to 1_ij≤50；

e) Calculating the task urgency degree, and calculating the jth task in the ith group according to a formula (5)_ijTask urgency degree J_ij：

J_ij＝k_ij+U_ij+D_ij (5)

f) Normalization processing of parallelism and urgency; setting the value range of the task parallelism as [ P ]_min,P_max]The value range of the emergency degree is [ J_min,J_max]Actual parallelism P of the ith group of tasks_iNormalization processing is performed using equation (6):

P_i-normal＝(P_i-P_min)/(P_max-P_min) (6)

jth task in ith group_ijDegree of emergency J_ijNormalization processing is performed using equation (7):

J_ij-normal＝(J_ij-J_min)/(J_max-J_min) (7)

g) seeking a priority value; one task can be mapped into a single pod or a plurality of pods, each pod in the group executes a subtask, the priority of the task is mapped to the priority of the single pod or the pod group in Kubernets, the priority value is expanded by a proper amount, and the jth task in the ith group is solved by using a formula (8)_ijCorresponding priority V_ij：

V_ij＝k′×(P_i-normal+J_ij-normal) (8)

Wherein k' is the magnification factor, P_i-normalTo normalize the actual parallelism of the processed i-th group of tasks, J_ij-normalIs the j task in the ith group after normalization processing_ijThe degree of urgency of (d);

h) pod ordering and scheduling; jth task in ith group_ijThe corresponding single pod or pod group is according to the priority V of the corresponding task_ijThe ordering is performed, with the big priority ranked in front, the small priority ranked behind, and the single pod or pod group ranked in front to schedule first.

In the combined priority scheduling method for complex tasks under the Kubernetes environment, in step h), for a pod group, each pod in the pod group corresponds to one subtask, and the priority setting is realized by the following steps:

h-1), firstly, establishing a directed acyclic graph between the pods in the group according to the dependency relationship of the subtasks;

h-2), in the directed acyclic graph, starting from any vertex with the degree of 0, randomly searching a vertex with the degree of 0 along a directed edge, and putting a pod corresponding to the vertex with the degree of 0 into a stack; performing step h-3);

h-3), returning the top-level vertex, and if the out degree of the top-level vertex except the vertex already put in the stack is 0, putting the pod corresponding to the vertex in the stack; if the out-degree of the top point of the previous stage is not 0 except the top point which is already put in the stack, searching a next point with the out-degree of 0 along a directed edge which does not contain the top point which is already put in the stack, putting the pod corresponding to the top point with the out-degree of 0 in the stack, and repeatedly executing the step until all the pods corresponding to the top points of the directed acyclic graph are all put in the stack;

h-4), when all the vertexes enter the stack, executing the stack popping operation, and obtaining the priority sequence of each pod in the pod group, wherein the priority of the pod which is popped later is higher than the priority of the pod which is popped first according to the principle that the stack is popped first and then popped later.

The combined priority scheduling method for complex tasks under the Kubernetes environment of the invention comprises the following steps of h), if the priority values of two tasks are equal, sequencing according to the following rules:

h-1-1), sorting according to key coefficients, for two tasks with equal priority values, firstly comparing the key coefficients, and if the key coefficients are different, arranging a single pod or pod group corresponding to the task with a high key coefficient in front of the two tasks and arranging a single pod or pod group corresponding to the task with a low key coefficient in back of the two tasks; if the key coefficients are equal, executing the step h-1-2);

h-1-2), ordering according to user priorities, comparing the user priorities of two tasks with equal priority values and key coefficients, and if the user priorities are different, arranging a single pod or pod group corresponding to the task with high user priority in front of the two tasks and arranging a single pod or pod group corresponding to the task with low user priority behind the two tasks; if the user priorities are equal, executing the step h-1-3);

h-1-3), ordering according to dynamic priority, comparing the dynamic priority of two tasks with equal priority values, key coefficients and user priorities, and if the dynamic priorities are different, arranging a single pod or pod group corresponding to a task with high dynamic priority in front of the two tasks and arranging a single pod or pod group corresponding to a task with dynamic priority behind the two tasks; if the dynamic priorities are equal, the two tasks are randomly ordered in tandem.

According to the combined priority scheduling method for complex tasks under the Kubernetes environment, the expansion multiple k' in the step g) is 1000000.

The invention has the beneficial effects that: according to the combined priority scheduling method for complex tasks in the Kubernetes environment, when complex tasks such as machine learning are faced, task parallelism is considered when setting priorities, and the problem of task execution failure caused by the fact that parallel tasks cannot obtain resources due to the fact that other tasks occupy node resources in advance can be avoided. Secondly, the task emergency degree is considered when the priority is set, and the emergency task can be ensured to preempt the resources occupied by the non-emergency task when the node resources are insufficient, so that the emergency task is successfully executed. The priority setting method comprehensively considers the two points, and the task execution success rate can be effectively improved when the node resources are scheduled by the complex task. In addition, when the group scheduling facing the machine learning task is carried out, the priority setting method of the other layer solves the problem that the pod in the group has a dependency relationship.

Drawings

FIG. 1 is a diagram of task scheduling mapping process in Kubernetes in the present invention;

FIG. 2 is an overall structure diagram of the task of the present invention;

FIG. 3 is a task parallel diagram of the present invention, wherein group A tasks include task1 through task3, and group B tasks include task4 through task 8;

FIG. 4 is a pod intra-group dependency directed acyclic graph in accordance with the present invention.

Detailed Description

The invention is further described with reference to the following figures and examples.

Task parallelism: for evaluating the number of tasks executed in parallel at a certain time. Whether or not a plurality of tasks specified by the user can be executed concurrently depends on the number of work nodes and the number of CPU cores for task calculation on each work node. And (4) counting the number of the currently executed work nodes as m, and counting the number of CPU cores used for task calculation on each work node as c, so that the maximum task concurrency amount supported by the hardware resources is mxc, and the value range is a positive integer. And setting the parallelism of the tasks as h, wherein the parallelism of the tasks depends on the number of subtasks of each task in the serial tasks, and the value range is a positive integer. The task parallelism h and the maximum task concurrency amount m multiplied by c supported by hardware resources should be prioritized to a small value, which is the actual parallelism P of the tasks.

The emergency degree of the task: the task urgency degree of each task is J, the task urgency degree is the combination of a fixed priority F and a dynamic priority D, wherein the fixed priority F is determined by the task criticality degree k and the user priority U; the dynamic priority D is determined by the idle time L of the task, and the dynamic priority of the task is higher when the idle time is smaller. And distributing a high key coefficient H to the key task set, distributing a low key coefficient W to the other tasks, wherein the value range of the key coefficient is a positive integer, and H is more than W. Each task in a batch of tasks is allocated with a unique user priority U, the value of U is a positive integer, and the user priorities of a group of tasks can be sequentially and incrementally allocated from 1.

As shown in fig. 1, a task scheduling mapping process diagram in kubernets in the present invention is given, and a pod is the smallest unit that can be created and deployed in kubernets, and contains one or more containers, and a task in kubernets will be mapped into a pod or pod group.

As shown in fig. 2, an overall structure diagram of the tasks in the present invention is given, where task1, task2, and task3 are in a group (denoted as group 1 task), and the parallelism is 3; task4, task5, task6, task7, and task8 are grouped into one group (referred to as group 2 task), and the parallelism is 5. As in fig. 2, there are 3 nodes, each having a core number of 2.

Calculating the actual parallelism P of the 1 st group of tasks by using formula (1)₁：

P₁＝min(h₁，m×c)＝min(3，3×2)＝3

Similarly, the actual parallelism P of the 2 nd group of tasks is obtained by using the formula (1)₂：

P₂＝min(h₂,m×c)＝min(5,3×2)＝5

As shown in table 1, the priority of the user is 1, 2, and 3 for the 1 st group of parallel 3 tasks task1, task2, and task3, the task1 and task3 are set as the key task set and configured with the high key coefficient 10, the task2 is set as the non-key task set and configured with the low key coefficient 5, and the idle time obtained by the system is 6, 3, and 2, respectively. In the 2 nd parallel group of 5 tasks task4, task5, task6, task7 and task8, the user priorities are 1, 2, 3, 4 and 5, respectively. Let task4, task5 configure a low key coefficient 5 for the non-key task set, task6, task7, task8 configure a high key coefficient 10 for the key task set, and the idle times obtained by the system are 4, 5, 3, 2, respectively.

TABLE 1

	User priority U	Key coefficient k	Idle time L	Dynamic priority D	Priority V
						task1	1	H＝10	6	17	0.483333×10⁶
task2	2	W＝5	3	34	0.766667×10⁶
						task3	3	H＝10	2	50	1.0×10⁶
task4	1	W＝5	4	25	0.85×10⁶
						task5	2	W＝5	5	20	0.816667×10⁶
task6	3	H＝10	3	34	1.116667×10⁶
						task7	4	H＝10	2	50	1.4×10⁶
task8	5	H＝10	2	50	1.416667×10⁶

The dynamic priority D of the tasks task1, task2 and task3 in the first group can be calculated by formula (4)₁₁、D₁₂、D₁₃17, 34, 50, the dynamic priority D of the 5 tasks task4, task5, task6, task7, task8 in group 2 can be calculated₂₁、D₂₂、D₂₃、D₂₄、D₂₅25, 20, 34, 50, respectively.

Then, 3 tasks of task1 and task2 in group 1 can be calculated according to formula (5)Task urgency J of task3₁₁、J₁₂、J₁₃Respectively 24, 41 and 55, and can calculate the task urgency degree J of 5 tasks in the group 2, namely task4, task5, task6, task7 and task8₂₁、J₂₂、J₂₃、J₂₄、J₂₅31, 27, 47, 64, 65, respectively.

Normalizing the parallelism and the urgency, and setting the value range of the parallelism as P_max9 and P _min1, the value of the emergency degree is J_min10 and J_maxThe parallelism normalization value P of task1 to task3 was obtained by equation (6) at 70_1-normalParallelism normalization value P of 0.25, task4 to task8_2-normalIs 0.5.

The normalized value J of the urgency levels from task1 to task3 can be obtained by the following formula (7)_11-normal、J_12-normal、J_13-normalNormalized value of urgency J0.233333, 0.516667, 0.750, task4 to task8 respectively_21-normal、J_22-normal、J_23-normal、J_24-normal、J_25-normal0.350, 0.316667, 0.616667, 0.90 and 0.916667 respectively.

The priorities V of task1 to task8 can be obtained by the following equation (8)₁₁、V₁₂、V₁₃、V₂₁、V₂₂、V₂₃、V₂₄、V₂₅Are respectively 0.483333X 10⁶、0.766667×10⁶、1.0×10⁶、0.85×10⁶、0.816667×10⁶、1.116667×10⁶、1.4×10⁶、1.416667×10⁶The task is ordered according to the size of the priority value as follows: task8, task7, task6, task3, task4, task5, task2, and task 1.

To this end, 8 tasks are mapped into 8 pods on kubernets, and the 8 pods will be scheduled in turn according to this sequence to work nodes that meet the resource requirements.

When facing a complex task, such as a machine learning task, each of the above 8 tasks may be mapped to multiple pods, and each pod corresponds to one subtask in the task, that is, one task may be mapped to a pod group having multiple pods. Given that the task8 with the highest priority needs to run 5 pods when executing, namely pod 8 is a pod group, which consists of 5 pods. As shown in FIG. 4, a dependency relationship directed acyclic graph within a pod group in the present invention is presented.

Next consider the priority issue of 5 pods within a pod 8 group individually. At this time, there is a dependency relationship between 5 pods in the group, and some pods will be the precondition for other pods. As shown in fig. 4, the default pod sequence is:

pod1→pod2→pod3→pod4→pod5

its topological sequence is computed from the directed graph. Firstly, selecting a vertex 1 with an in-degree of 0 as a starting point, searching a vertex with an out-degree of 0 along any directed edge, for example, finding a vertex 4 along the

vertexes

1, 2, 3 and 4 and putting the vertex 4 into a stack; returning to the top level of vertex 3, finding that the degree of departure of the vertex 3 except the directed edge pointing to the 4 is 0, and then putting the 3 into the stack; returning to the top level vertex 2 of the vertex 3, finding that the out-degree of the vertex 2 is not 0, because the vertex 3 is already put in the stack, the vertex 5 is reached along the directed edge, and because the vertex 4 is already put in the stack, the out-degree of the vertex 5 is 0, and the vertex 5 is put in the stack; returning to the top level vertex 2 again, wherein the out degree of the vertex 2 is 0, and placing the vertex 2 into a vertex 2 stack; and returning to the previous level of vertex 1, wherein the out degree of the vertex 1 is 0, and putting the vertex 1 into the stack. All the vertexes are put into the stack according to the sequence of 4, 3, 5, 2 and 1, and according to the principle that the stack is put in first and then put out, the sequence of the popped vertexes is 1, 2, 5, 3 and 4, and is a topological sequence corresponding to the directed graph. I.e. the priority order is:

pod1→pod2→pod5→pod3→pod4

therefore, the pod within the group is given priority from high to low in this order. The pod priority is first customized as a, b, c, d, e, and the declaration use is done by specifying the priority name to be used in the yaml file of the pod-status by pod.

The priority does not participate in the process of setting the parallelism and the urgency priority, and is only suitable for the priority sequencing when the dependency exists in the pod groups after the priority scheduling among the pod groups is completed.

The task parallelism and the task urgency degree are considered when the priority is set, so that the priority setting is more detailed, more standard and more reasonable, and the task execution success rate can be effectively improved when the resource requirements of the parallel task and the urgent task are met. In addition, when the group scheduling facing the machine learning task is carried out, the priority setting method of the other layer solves the problem that the pod in the group has a dependency relationship.

Claims

1. A combined priority scheduling method for complex tasks under a Kubernetes environment is characterized in that tasks needing to be scheduled through a Kubernetes resource management platform are task1, task2, … and task n, and n tasks are counted; the n tasks are divided into q groups, q is more than or equal to 1 and less than or equal to n, and the ith group is set to contain h_iA task, i is less than or equal to q, h_iN is less than or equal to n, namely the parallelism of the ith group of tasks is h_iH in group i_iEach task is respectively recorded as

a) calculating the actual parallelism of each group of tasks; setting the number of working nodes contained in the hardware resources as m, and the number of CPU cores used for task calculation on each working node as c, wherein the maximum task concurrency amount supported by the hardware resources is mxc; task parallelism h for each group_iThe maximum task concurrency m multiplied by c supported by hardware resources should be the minimum value first, so the actual parallelism P of the ith group of tasks_iThe calculation is carried out by the formula (1):

P_i＝min(h_i,m×c) (1)

until the actual parallelism of all task groups is completely solved;

b) acquiring the key degree of the task; distributing a high key coefficient H to key tasks in all tasks task1, task2, … and task, and distributing a low key coefficient W to the other tasks, wherein H is larger than W; for h within the ith group_iA task

k_ij＝choice(H,W) (2)

c) Acquiring a user priority; assigning user priority U to all tasks, and setting h in ith group_iEach task is

It is assigned a user priority of in turn

U_ij＝Pr_ij (3)

d) Acquiring the dynamic priority of the user; the dynamic priority D of the user is determined by the idle time L of the task, and the task with smaller idle time has higher dynamic priority; for h within the ith group_iA task

Wherein the content of the first and second substances,

J_ij＝k_ij+U_ij+D_ij (5)

P_i-normal＝(P_i-P_min)/(P_max-P_min) (6)

J_ij-normal＝(J_ij-J_min)/(J_max-J_min) (7)

V_ij＝k′×(P_i-normal+J_ij-normal) (8)

Wherein k' is the magnification factor, P_i-normalTo normalize the actual parallelism of the processed i-th group of tasks, J_ij-normalIs normalized toJ task in i group_ijThe degree of urgency of (d);

2. The Kubernetes environment combined priority scheduling method for complex tasks according to claim 1, wherein in step h), for a pod group, each pod in the pod group corresponds to one subtask, and the priority setting is realized by the following steps:

3. The Kubernetes environment combined priority scheduling method for complex tasks according to claim 1 or 2, characterized in that in step h), if there are two tasks with equal priority values, the tasks are ordered according to the following rules:

4. A combined priority scheduling method to complex tasks under a kubernets environment according to claim 1 or 2, characterized in that: the magnification k' stated in step g) is 1000000.