CN117573278A

CN117573278A - Pod scheduling method, device and storage medium

Info

Publication number: CN117573278A
Application number: CN202311528208.6A
Authority: CN
Inventors: 焦海燕; 李家兴; 王�琦
Original assignee: China United Network Communications Group Co Ltd; Unicom Digital Technology Co Ltd; Unicom Cloud Data Co Ltd
Current assignee: China United Network Communications Group Co Ltd; Unicom Digital Technology Co Ltd; Unicom Cloud Data Co Ltd
Priority date: 2023-11-16
Filing date: 2023-11-16
Publication date: 2024-02-20

Abstract

The application provides a Pod scheduling method, a Pod scheduling device and a storage medium, relates to the technical field of communication, and can determine nodes adapting to cluster pods. The method comprises the following steps: acquiring resource information of the pod; the resource information comprises any one of CPU resource, memory resource and display card resource; calculating average consumption corresponding to the resource information based on the resource information; under the condition that the average consumption is larger than a first threshold value, scheduling the pod from the initial node to a target node with the highest score in the node cluster; the initial node is the node bound for the last time by the pod.

Description

Pod scheduling method, device and storage medium

Technical Field

The present disclosure relates to the field of communications technologies, and in particular, to a Pod scheduling method, device, and storage medium.

Background

In the related art, along with the running of the cluster pod, the actually consumed resources will dynamically change, so that the resource utilization rate of the cluster pod is most likely to be unbalanced in time and space, and waste of resources and cost is caused, if the cluster pod is scheduled again, the cluster pod is most likely to be bound with unsuitable nodes, and the waste of resources is still caused. Therefore, how to determine the node adapted to the cluster pod is a current urgent problem to be solved.

Disclosure of Invention

The application provides a Pod scheduling method, a Pod scheduling device and a storage medium, which can determine nodes adapted to a cluster Pod.

In order to achieve the above purpose, the present application adopts the following technical scheme:

in a first aspect, the present application provides a Pod scheduling method, including: acquiring resource information of the pod; the resource information comprises any one of CPU resource, memory resource and display card resource; calculating average consumption corresponding to the resource information based on the resource information; under the condition that the average consumption is larger than a first threshold value, scheduling the pod from the initial node to a target node with the highest score in the node cluster; the initial node is the node bound for the last time by the pod.

With reference to the first aspect, in one possible implementation manner, the node cluster includes a first node cluster; the resource usage of each node in the first node cluster meets the preset multiple of the resource usage required by the pod; the target nodes comprise first target nodes corresponding to the first node clusters; scheduling the pod from the initial node to the highest-scoring target node in the node cluster if the average consumption is greater than a first threshold, further comprising: obtaining the maximum resource usage of pod at an initial node; determining a first node cluster under the condition that the resource usage amount of a plurality of nodes in the second node cluster is smaller than the maximum resource usage amount; the second node cluster is a node cluster where the initial node is located; and scheduling the pod node from the initial node to a first target node with the highest score in the first node cluster.

With reference to the first aspect, in a possible implementation manner, the target node further includes a second target node corresponding to a second node cluster in the node clusters; the method further comprises the steps of: under the condition that the resource usage amount of a plurality of nodes in the second node cluster is larger than the maximum resource usage amount, acquiring the resource usage rate of the pod at the node and the resource utilization rate of the node aiming at each node in the second node cluster, and determining the expected utilization rate of the pod at the node; the resource utilization rate of one node is equal to the sum of the resource utilization rates of a plurality of pod on the node; determining a score for the node based on the predicted utilization and the utilization threshold; and scheduling the pod from the initial node to a second target node with the highest score in the second node cluster.

With reference to the first aspect, in one possible implementation manner, the average consumption is determined based on a sum of resource occupancy of the plurality of containers of the pod and the preset time period.

In a second aspect, the present application provides a Pod scheduling apparatus, the apparatus comprising: a processing unit and an acquisition unit; an acquisition unit for acquiring resource information of the pod; the resource information comprises any one of CPU resource, memory resource and display card resource; the processing unit is used for calculating and obtaining average consumption corresponding to the resource information based on the resource information; the processing unit is further used for scheduling the pod from the initial node to a target node with the highest score in the node cluster under the condition that the average consumption is larger than a first threshold; the initial node is the node bound for the last time by the pod.

With reference to the second aspect, in one possible implementation manner, the node cluster includes a first node cluster; the resource usage of each node in the first node cluster meets the preset multiple of the resource usage required by the pod; the target nodes comprise first target nodes corresponding to the first node clusters; the acquisition unit is also used for acquiring the maximum resource usage of the pod at the initial node; the processing unit is further used for determining the first node cluster under the condition that the resource usage amount of the plurality of nodes in the second node cluster is smaller than the maximum resource usage amount; the second node cluster is a node cluster where the initial node is located; and the processing unit is also used for dispatching the pod node from the initial node to the first target node with the highest score in the first node cluster.

With reference to the second aspect, in a possible implementation manner, the target node further includes a second target node corresponding to a second node cluster in the node clusters; the processing unit is specifically used for: under the condition that the resource usage amount of a plurality of nodes in the second node cluster is larger than the maximum resource usage amount, acquiring the resource usage rate of the pod at the node and the resource utilization rate of the node aiming at each node in the second node cluster, and determining the expected utilization rate of the pod at the node; the resource utilization rate of one node is equal to the sum of the resource utilization rates of a plurality of pod on the node; determining a score for the node based on the predicted utilization and the utilization threshold; and scheduling the pod from the initial node to a second target node with the highest score in the second node cluster.

With reference to the second aspect, in one possible implementation manner, the average consumption is determined based on a sum of resource occupancy of the plurality of containers of the pod and the preset time period.

In a third aspect, the present application provides a Pod scheduling apparatus, including: a processor and a communication interface; the communication interface is coupled to a processor for running a computer program or instructions to implement the Pod scheduling method as described in any one of the possible implementations of the first aspect and the first aspect.

In a fourth aspect, the present application provides a computer readable storage medium having instructions stored therein which, when run on a terminal, cause the terminal to perform a Pod scheduling method as described in any one of the possible implementations of the first aspect and the first aspect.

In this application, the names of the Pod scheduling apparatuses described above do not constitute limitations on the devices or function modules themselves, and in actual implementations, these devices or function modules may appear under other names. Insofar as the function of each device or function module is similar to the present application, it is within the scope of the claims of the present application and the equivalents thereof.

These and other aspects of the present application will be more readily apparent from the following description.

Based on the above technical solution, according to the Pod scheduling method provided in the embodiments of the present application, the communication system may calculate, according to the obtained resource information of the Pod, an average consumption corresponding to the resource information, that is, in the case where the resource information is a cpu resource, the average consumption is a consumption of the cpu resource, and in the same way, in the case of a memory resource or a graphics card resource, the average consumption is a consumption of the memory resource or the graphics card resource, and further, the communication system determines whether the average consumption is greater than a first threshold, and in the case where the average consumption is greater than the first threshold, the problem of unbalance of resource utilization rates of the Pod already presented in time and space is illustrated, and the communication system schedules the Pod from the node bound last time to the target node with the highest score in the node cluster, so as to implement proper scheduling of the Pod.

Drawings

Fig. 1 is a schematic structural diagram of a communication system provided in the present application;

fig. 2 is a schematic structural diagram of a Pod scheduling device provided in the present application;

fig. 3 is a flowchart of a Pod scheduling method provided in the present application;

FIG. 4 is a flow chart of another Pod scheduling method provided in the present application;

fig. 5 is an interaction schematic diagram of a Pod scheduling method provided in the present application;

Fig. 6 is a schematic structural diagram of another Pod scheduling device provided in the present application.

Detailed Description

The following description of the embodiments of the present disclosure will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present disclosure. All other embodiments obtained by one of ordinary skill in the art based on the embodiments provided by the present disclosure are within the scope of the present disclosure.

Throughout the specification and claims, unless the context requires otherwise, the word "comprise" and its other forms such as the third person referring to the singular form "comprise" and the present word "comprising" are to be construed as open, inclusive meaning, i.e. as "comprising, but not limited to. In the description of the specification, the terms "one embodiment", "some embodiments", "exemplary embodiment (exemplary embodiments)", "example (example)", "specific example", "some examples (examples)", etc. are intended to indicate that a specific feature or characteristic related to the embodiment or example is included in at least one embodiment or example of the present disclosure. The schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features or characteristics may be included in any suitable manner in any one or more embodiments or examples.

The terms "first" and "second" are used below for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the embodiments of the present disclosure, unless otherwise indicated, the meaning of "a plurality" is two or more.

In describing some embodiments, expressions of "coupled" and "connected" and their derivatives may be used. For example, the term "connected" may be used in describing some embodiments to indicate that two or more elements are in direct physical or electrical contact with each other. As another example, the term "coupled" may be used in describing some embodiments to indicate that two or more elements are in direct physical or electrical contact. However, the term "coupled" or "communicatively coupled (communicatively coupled)" may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other. The embodiments disclosed herein are not necessarily limited to the disclosure herein.

At least one of "A, B and C" has the same meaning as at least one of "A, B or C," both include the following combinations of A, B and C: a alone, B alone, C alone, a combination of a and B, a combination of a and C, a combination of B and C, and a combination of A, B and C.

"A and/or B" includes the following three combinations: only a, only B, and combinations of a and B.

As used herein, the term "if" is optionally interpreted to mean "when … …" or "at … …" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if determined … …" or "if detected [ stated condition or event ]" is optionally interpreted to mean "upon determining … …" or "in response to determining … …" or "upon detecting [ stated condition or event ]" or "in response to detecting [ stated condition or event ]" depending on the context.

The use of "adapted" or "configured to" herein is meant to be an open and inclusive language that does not exclude devices adapted or configured to perform additional tasks or steps.

In addition, the use of "based on" means open and inclusive in that a process, step, calculation, or other action that is "based on" one or more conditions or values may in practice be based on additional conditions or exceeded values. Currently, the internet of things is a network for connecting and interacting various physical devices and sensors through the internet, and the core idea of the internet of things is to enable various devices to be intelligent, automatic and remotely controlled through the internet. The development of the Internet of things equipment refers to the process of developing, designing and realizing the Internet of things equipment, and comprises the work of hardware design, communication technology, software development and the like. The Internet of things equipment is developed to realize interconnection and intercommunication among the equipment, and traditional equipment and instruments can be intelligently and remotely controlled through the Internet of things technology, so that more efficient, convenient and intelligent application is realized.

The following explains the terms related to the embodiments of the present application, so as to facilitate the understanding of the reader.

Target Load Packing, which is a best fit variant of the bin pack algorithm, scores nodes mainly by their actual resource utilization, making all utilized nodes about x% available, and goes to the least fit once all nodes reach x% available.

The Load Variation Risk Balancing algorithm is a node sequencing plug-in, and sequences the nodes according to the average value and standard deviation of the node resource utilization rate, so as to balance the load and avoid the risk caused by load change.

The two algorithms described above perform better for pod with stable resource utilization, but both models can lead to low utilization and waste of cluster resources, as the resource utilization of active nodes is not considered in the scheduling decisions. In addition, it is difficult for a user to predict the correct use value of their pod series when defining pods, and for pods with unstable resource consumption, particularly those with unstable memory consumption, the OOM is easy when the memory consumption increases sharply, so that the pod is forced to be evicted.

The Kubernetes internal may enable automatic scheduling according to the resource requirements of the Pod by a Scheduler (Scheduler), which is one of the core components of the Kubernetes cluster, responsible for distributing the Pod to the appropriate nodes according to a series of policies and rules.

The following is how the Kubernetes scheduler implements a workflow that automatically schedules to the appropriate nodes according to Pod resource requirements:

1. the scheduler obtains node information.

2. The scheduler obtains the resource requirements of the Pod.

3. Preselection and preference stage: the scheduler selects the most appropriate node by pre-selecting and preferring the two phases.

4. Selecting an optimal node: after the optimization phase is completed, the scheduler selects the node with the highest score as a scheduling target.

5. Updating the scheduling information: and after the optimal node is selected by the scheduler, updating scheduling information of the Pod, and binding the Pod to the target node.

6. And (5) notifying a scheduling result: the scheduler will inform the Kubernetes control plane of the scheduling results enabling it to update the cluster state and configuration of the relevant components.

Through the workflow, the Kubernetes scheduler can automatically schedule the Pod to the matched nodes according to the resource requirements of the Pod, the resource capacity of the nodes and other policy rules, so that effective utilization, load balancing and high availability of the resources can be realized, and proper allocation and execution of the application programs in the clusters are ensured.

In the related art, the optimization strategy is much related to node resources, and a new scoring algorithm can be introduced in a component mode, so that the condition of unbalanced resource allocation in the initial stage of pod allocation can be met, the resources actually consumed are dynamically changed along with the running of cluster pods, the resource utilization rate of the cluster pods is most likely to show time and space unbalance, the waste of resources and cost is caused, and if the cluster pods are scheduled again, the cluster pods are most likely to be bound with unsuitable nodes, and the waste of resources is still caused. Therefore, how to determine the node adapted to the cluster pod is a current urgent problem to be solved.

In order to solve the problems in the prior art, the embodiment of the present application provides a Pod scheduling method, where a communication system may calculate, according to acquired resource information of a Pod, an average consumption corresponding to the resource information, that is, in a case where the resource information is a cpu resource, the average consumption is a consumption of the cpu resource, and in a similar manner, in a case where a memory resource or a graphics card resource, the average consumption is a consumption of the memory resource or the graphics card resource, and further, the communication system determines whether the average consumption is greater than a first threshold, and in a case where the average consumption is greater than the first threshold, the communication system indicates that an imbalance of resource utilization rates of the Pod is already present in time and space, and the communication system schedules the Pod from a node bound last time to a target node with a highest score in a node cluster, so as to implement proper scheduling of the Pod.

In addition, the embodiment of the application combines a Trimaran (a k8s scheduling plug-in based on actual load) and a descheduler rescheduling plug-in, and a cluster-autoscaler plug-in, optimizes a rescheduling pod expelling strategy and a scheduling strategy based on a promethaus monitoring index, solves the problem of insufficient scheduling (one-time scheduling) of Kubernetes, enables the resource utilization rate of a cluster to be more balanced, and satisfies the unbalanced scheduling of resources of pods with high memory demands.

As shown in fig. 1, a schematic structure diagram of a communication system 100 according to an embodiment of the present application is provided, where the system includes: cluster component kubernetes master101 and resource management center worker kubelet102. The cluster component kubernetes master is communicatively coupled to the resource management center worker kubelet102.

The cluster component kubernetes master includes a Server kube Api Server, a scheduler, a scoring plug-in (decreschedule), a storage database etcd.

The Server kube Api Server is a core component of the master, a unified access entry of the k8s cluster is provided, a coordinator of each component provides an interface mode through a RESTful API, and all the operations of adding, deleting, modifying and monitoring the object resources are submitted to the Api Server for processing and then submitted to a storage database etcd for persistent storage.

The scheduler may select a node for the newly created pod according to the scheduling algorithm.

The scoring plug-in descheduler can score node nodes and schedule Pod to target nodes under the condition that the resource utilization rate of Pod is unbalanced.

The storage database etcd may be a distributed key-value storage database for holding cluster state data.

The resource management center worker kubelet102 includes a monitoring server promethaus-server, pod.

The monitoring server promethaus-server is used for acquiring the utilization rate of resources such as CPU/memory/gpu and is in communication connection with the scoring plug-in deccheduler.

The pod is the most basic deployment schedule of the Kubernetes system, and a pod usually contains only one container, and occasionally two (say if two services are closely related, access each other very closely, and can share storage), in which the corresponding application runs.

The technical solution of the embodiments of the present application may be applied to any communication system supporting communication, where the communication system may be a 3GPP high frequency wireless communication system, for example, a 4th generation (4th generation,4G) mobile communication system, such as a long term evolution (long term evolution, LTE) system, an evolved LTE (eete) system, a worldwide interoperability for microwave access (worldwide interoperability for microwave access, wiMAX) communication system, a fifth generation (5th generation,5G) mobile communication system, such as a new radio, NR, new radio access technology (new radio access technology, NR), and future communication system, such as a sixth generation (6th generation,6G) mobile communication system, and may also be a non-3 GPP communication system, without limitation.

It should be noted that, the communication system described in the embodiments of the present application is for more clearly describing the technical solution of the embodiments of the present application, and does not constitute a limitation on the technical solution provided in the embodiments of the present application, and those skilled in the art can know that, with the evolution of the communication system and the appearance of other communication systems, the technical solution provided in the embodiments of the present application is applicable to similar technical problems.

In an example, fig. 2 is a schematic structural diagram of a Pod scheduling device provided in an embodiment of the present application. The Pod scheduling device comprises at least one processor 201, a communication line 202, and at least one communication interface 204, and may further comprise a memory 203. The processor 201, the memory 203, and the communication interface 204 may be connected through a communication line 202.

The processor 201 may be a central processing unit (central processing unit, CPU), an application specific integrated circuit (application specific integrated circuit, ASIC), or one or more integrated circuits configured to implement embodiments of the present application, such as: one or more digital signal processors (digital signal processor, DSP), or one or more field programmable gate arrays (field programmable gate array, FPGA).

Communication line 202 may include a path for communicating information between the above-described components.

The communication interface 204, for communicating with other devices or communication networks, may use any transceiver-like device, such as ethernet, radio access network (radio access network, RAN), wireless local area network (wireless local area networks, WLAN), etc.

The memory 203 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory, RAM) or other type of dynamic storage device that can store information and instructions, or an electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), a compact disc read-only memory (compact disc read-only memory) or other optical disc storage, optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to include or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

In a possible design, the memory 203 may exist separately from the processor 201, that is, the memory 203 may be a memory external to the processor 201, where the memory 203 may be connected to the processor 201 through a communication line 202, for storing execution instructions or application program codes, and the execution is controlled by the processor 201 to implement a network quality determining method provided in the embodiments described below. In yet another possible design, the memory 203 may be integrated with the processor 201, i.e., the memory 203 may be an internal memory of the processor 201, e.g., the memory 203 may be a cache, may be used to temporarily store some data and instruction information, etc.

As one implementation, processor 201 may include one or more CPUs, such as CPU0 and CPU1 in fig. 2. As another implementation, the Pod scheduler 200 may include multiple processors, such as the processor 201 and the processor 207 in fig. 2. As yet another implementation, the Pod scheduling apparatus 200 may further include an output device 205 and an input device 206.

From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the network node is divided into different functional modules to implement all or part of the functions described above. The specific working processes of the above-described system, module and network node may refer to the corresponding processes in the foregoing method embodiments, which are not described herein.

As shown in fig. 3, a flowchart of a Pod scheduling method provided in an embodiment of the present application is shown, where the Pod scheduling method provided in the embodiment of the present application may be applied to a communication system shown in fig. 1, and the device positioning method provided in the embodiment of the present application may be implemented by the following steps.

S301, acquiring resource information of the pod.

The resource information comprises any one of cpu resources, memory resources and display card resources.

In one example, a Server Api Server in the communication system listens to Pod and obtains resource information of Pod within 15 minutes from a service monitoring system promethaus.

It should be noted that, the communication system may acquire the state of Pod through container runtime, update the state to the Server Api Server, and finally write the state to the storage database etcd.

The state of pod includes the following:

1. pending: pod has been created but has not yet been assigned to a node and is started. This may be due to insufficient node resources or the Pod not being scheduled to the appropriate node due to scheduling problems.

2. Running: pod has been successfully scheduled onto the node and is running.

3. Successful success: the containers in Pod have successfully completed the task and have exited.

4. Failed: the container in Pod fails or terminates in operation, possibly because the container's application has been erratically or the container cannot be started normally.

5. Unknown unnknown: indicating that Kubernetes cannot acquire the current state of Pod; typically, such a state occurs when a connection problem with the cluster API server or other problem results in failure to obtain state information.

6. Termination in termination: pod has been marked as terminated and is being terminated gradually.

7. Container Creating in container creation: pod has been dispatched to the node, but the container has not yet been fully created and running, which typically occurs during the container image download or startup phase.

8. The crash cycle rolls back Crash Loop Back Off that the container has crashed and Kubernetes is attempting to restart the container, but it crashes again immediately after startup, possibly due to problems with the applications within the container.

S302, calculating and obtaining the average consumption corresponding to the resource information based on the resource information.

As a possible implementation manner, in the case that the resource information is a cpu resource, the scoring plugin descheduler in the communication system may determine the average consumption according to the sum of cpu occupancy of the plurality of containers under the pod and the preset period.

As an example, CPU utilization of Pod:

query＝sum by(pod_name)；

(rate(container_cpu_usage_seconds_total{pod_name＝～"$pod_name"}[1m]))

container_cpu_user_seconds_total (): the time the CPU is occupied;

rate (): cpu occupies an average rate of increase per second;

[1m ]: reading values at defined 1 minute intervals;

the sub (pod_name) addition contains the same pod name: all values of pod_name;

namespace= "$namespace", pod_name= "$pod name", container_name-! = "": under the filter condition, the name of the filter namespace is pod_name, and the pod container is not empty pod. For example, cpu occupies 75%/15=0.05 of the total.

As yet another possible implementation, in case the resource information is a memory resource (men), the scoring plugin descheduler in the communication system may determine the average consumption according to the sum of men occupancy of the plurality of containers at the pod and the preset time period.

As yet another example, the mem occupancy of Pod:

query＝sum by(pod_name)；

(container_memory_working_set_bytes{pod_name＝～"$pod_name"})/

1048576// units MiB;

the sub (pod_name) addition contains the same pod name: all values of pod_name;

container_memory_working_set_bytes (): the time the memory is occupied;

rate (): calculating the average rate of increase per second of the time series in the range vector;

[1m ]: values were read at defined 1 minute intervals. For example, men occupies a total of 50%/15=0.03.

As yet another possible implementation, in the case where the resource information is a graphics card resource (gpu), the scoring plugin deccheduler in the communication system may determine the average consumption according to a sum of the gpu occupancy of the plurality of containers under the pod and a preset period of time.

As yet another example, gpu occupies 60%/15=0.04.

And S303, under the condition that the average consumption is larger than a first threshold value, scheduling the pod from the initial node to a target node with the highest score in the node cluster.

The initial node is the node bound for the previous time of pod. The node cluster comprises a first node cluster. The resource usage of each node in the first node cluster satisfies a preset multiple of the resource usage required by the pod. The target node includes a first target node corresponding to a first cluster of nodes. The second node cluster is the node cluster where the initial node is located. The maximum resource usage is the peak of the resource since the pod was running, i.e., the maximum of the resource consumption history of a certain resource (cpu, men, gpu) since the pod was running at the initial node.

In one possible implementation, the communication system obtains a maximum resource usage of the pod at the initial node, determines the first node cluster if the resource usage of the plurality of nodes in the second node cluster is less than the maximum resource usage, and schedules the pod node from the initial node to a first target node with a highest score in the first node cluster.

In combination with one example in S302, under the condition that the memory men occupancy rate of the pod is calculated, the scheduling equalizer in the communication system determines that the memory men occupancy rate of the pod is greater than a first threshold value 0.02, and evicts the pod from the initial node, and further, an Api Server in the communication system may obtain the maximum usage amount of the pod in the memory men of the initial node, determine whether the resource usage amount of a plurality of nodes in the second node cluster can satisfy the maximum usage amount of the pod memory men, and if the resource usage amount of each node in the plurality of nodes cannot satisfy the maximum usage amount of the pod memory men, the scoring plug-in resccheduler in the communication system may score each node in the plurality of nodes in the second node cluster as 0, and at this time, the pod maintains a waiting scheduling state.

Furthermore, the cluster-autoscaler plug-in of the communication system can expand a plurality of nodes with the memory twice as much as the maximum memory usage of the pod to form a first node cluster, and the scoring plug-in descheduler of the communication system can determine the highest-score target node in the first node cluster according to the Target Load Packing algorithm or the Load Variation Risk Balancing algorithm.

Notably, when using a descheduler to drive off the Pods, the following principle is followed: 1. critical ports (with interactions schduler.alpha. Kubernet.io/Critical-ports) are never evicted key ports (annotated schduler.alpha. Kubernet.io/Critical-ports) are never evicted.

2. Pod (static or mirrored Pod or stand-alone Pod) that does not belong to RC, RS, deviyment or Job will never be evicted because these pods will not be recreated.

3. The Pod associated with Daemon sets is never evicted.

4. Pod with local storage is never evicted.

5. The QoS class BestEffort pod will be evicted before the class burst and guard pod.

Based on the technical solution of fig. 3, in the Pod scheduling method provided in the embodiment of the present application, the communication system may calculate, according to the obtained resource information of the Pod, an average consumption corresponding to the resource information, that is, in the case that the resource information is a cpu resource, the average consumption is a consumption of the cpu resource, and in the same way, in the case of a memory resource or a graphics card resource, the average consumption is a consumption of the memory resource or the graphics card resource, and further, the communication system determines whether the average consumption is greater than a first threshold, and in the case that the average consumption is greater than the first threshold, the communication system indicates that the problem of unbalance of the resource utilization rate of the Pod is presented in time and space, and the communication system schedules the Pod from the node bound last time to the target node with the highest score in the node cluster, so as to implement proper scheduling of the Pod.

In a possible implementation manner, in conjunction with fig. 3, as shown in fig. 4, in S303, in a case where the resource usage amount of the plurality of nodes in the second node cluster is greater than the maximum resource usage amount, the communication system may determine the target node in the second node cluster. Specifically, the method can be realized by the following steps S401 to S404.

S401, acquiring the resource utilization rate of the pod at the node and the resource utilization rate of the node for each node in the second node cluster.

It can be understood that, since the resource usage of the plurality of nodes in the second node cluster can meet the maximum usage of the pod memory men, the communication system only needs to find the target node in the second node cluster, without expanding the first cluster node.

In one possible implementation, the communication system obtains the utilization of the memory men of the node to be scored within a preset period of time, and the utilization of the memory men of the pod at the node.

As an example, the communication system obtains the memory men utilization of the Node to be scored within 15 minutes, e.g. denoted as node_men.

Furthermore, according to the limit configuration of the Pod, the initial preset memory men usage of the Pod and the capacity of the scoring Node are calculated, and the memory men usage of the Pod at the scoring Node is calculated, that is, the initial preset memory men usage of the Pod/the capacity of the scoring Node = memory men usage of the Pod at the scoring Node may be denoted as pod_men.

S402, determining the expected utilization rate of the pod at the node based on the resource utilization rate and the resource utilization rate.

Wherein the resource utilization of a node is equal to the sum of the resource utilization of a plurality of pod on the node.

As one possible implementation, the predicted utilization of the pod at the node may be target_men=node_men+pod_men, where target_men is the predicted utilization of the pod at the node.

In combination with the example in S401 described above, it is assumed that there are three nodes X, Y and Z, each node having four memories men, 1 memory men for the X node, 2 memory men for the Y node, and 3 memory men for the Z node. For simplicity, assume that there are 0 men requests and overheads for the pod to be scheduled.

The utilization of each node may be:

TARGET_men x→(1/4)*100+0＝25

TARGET_men y→(2/4)*100+0＝50

TARGET_men z→(3/4)*100+0＝75。

s403, determining the score of the node based on the expected utilization and the utilization threshold.

The utilization threshold may be a cluster_cpu (percentage).

In one possible implementation, the communication system may take the sum of the first ratio and the first parameter value as a fraction of the nodes in case the predicted utilization is less than or equal to the utilization threshold. The first ratio is the product of the first difference value and the expected utilization rate, and the ratio of the first difference value to the utilization rate threshold value, and the first difference value is the first difference value of the first parameter and the utilization rate threshold value.

In connection with one example in S402, set cluster_cpu=50%, and in the case of target_cpu < = cluster_cpu, then the value of (100-cluster_cpu) target_cpu/cluster_cpu+cluster_cpu is determined as the score of the scoring node. For example, the score of the X node is (100-50) 25/50+50=75.

In yet another possible implementation, the communication system may take the product of the second parameter and the second ratio as a fraction of the node in case the predicted utilization is greater than the utilization threshold and less than the 100% parameter. Wherein the second ratio is determined by the difference between the first parameter and the expected utilization and the difference between the first parameter and the utilization threshold.

In connection with one example in S402, in the case where cluster_cpu < target_cpu < =100%, a value of 50 (100-target_cpu)/(100-cluster_cpu) is determined as the score of the scoring node. For example, the score of Y node is (100-50) 50/50+50=100.

In a further possible implementation, in case the predicted utilization is greater than 100% of the parameters, then a preset value is determined as the score of the node.

In connection with one example in S402, in the case where the arget_cpu >100%, then 0 is determined as the score of the scoring node. For example, the score of Z node is 50 x (100-75)/(100-50) =25.

S404, dispatching the pod from the initial node to a second target node with the highest score in the second node cluster.

In connection with the example in S403, the communication system may schedule Pod to node Y.

Based on the technical scheme of fig. 4, in the Pod scheduling method provided in the embodiment of the present application, the communication system may schedule the Pod node to the node close to the utilization threshold through the resource utilization rate of the Pod at the node and the resource utilization rate of the node, that is, select the optimal node.

The method and the device for scheduling Pod according to the embodiments of the present application may divide functional modules or functional units according to the method examples described above, for example, each functional module or functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated modules may be implemented in hardware, or in software functional modules or functional units. The division of the modules or units in the embodiments of the present application is merely a logic function division, and other division manners may be implemented in practice.

Fig. 5 is an interaction schematic diagram of a Pod scheduling method according to an embodiment of the present application, where the method includes: the Api Server (REST Api) creates a Pod in response to the user operation, and in the case that the Api Server receives the data information, the Api Server may write the data information into the storage database etcd and return the storage result.

Since the scheduler can monitor the resource change through Api Server watch API, in the case that a new Pod exists, the new Pod is not bound with any Node at this time, so the scheduler can schedule, select a proper Node, bind the new Pod with the initial Node, and the Api Server can send the binding message to the storage database etcd for storage, which is the first scheduling, and adopts the default scheduler of k8 s. In case kubelet at the initial Node detects that a new Pod is scheduled via Api Server watch API, the relevant data for that Pod may be passed to a container run (container runtime), such as Docker/ctrlun.

The Server Api Server monitors the state of Pod on the initial node in real time, and can acquire the resource information of Pod within 15 minutes from the service monitoring system promethaus. The scoring plugin descheduler can calculate and obtain average consumption corresponding to the resource information based on the resource information, and schedule the pod from the initial node to a target node with highest score in the node cluster under the condition that the average consumption is larger than a first threshold value, namely bind the pod with the target node, and return a binding message to the Api Server, and the Api Server can send the binding message and the resource information of the pod to a storage database etcd for storage.

In the event that kubelet at the target node detects that a new Pod is scheduled via Api Server watch API, the relevant data for that Pod may be passed to a container run (container runtime), such as Docker/ctrlun. And acquiring the state of Pod real-time update through container runtime, updating to the Api Server, and writing the state into the storage database etcd by the Api Server.

Fig. 6 is a schematic structural diagram of a Pod scheduling device according to an embodiment of the present application, where the device includes: a processing unit 601 and an acquisition unit 602; an obtaining unit 602, configured to obtain resource information of a pod; the resource information comprises any one of CPU resource, memory resource and display card resource; a processing unit 601, configured to calculate, based on the resource information, an average consumption corresponding to the resource information; the processing unit 601 is further configured to schedule the pod from the initial node to a target node with a highest score in the node cluster if the average consumption is greater than a first threshold; the initial node is the node bound for the last time by the pod.

Optionally, the node cluster comprises a first node cluster; the resource usage of each node in the first node cluster meets the preset multiple of the resource usage required by the pod; the target nodes comprise first target nodes corresponding to the first node clusters; the obtaining unit 602 is further configured to obtain a maximum resource usage of the pod at the initial node; the processing unit 601 is further configured to determine a first node cluster when resource usage amounts of a plurality of nodes in the second node cluster are all less than a maximum resource usage amount; the second node cluster is a node cluster where the initial node is located; the processing unit 601 is further configured to schedule the pod node from the initial node to a first target node with a highest score in the first node cluster.

Optionally, the target node further includes a second target node corresponding to a second node cluster in the node clusters; the processing unit 601 is specifically configured to: under the condition that the resource usage amount of a plurality of nodes in the second node cluster is larger than the maximum resource usage amount, acquiring the resource usage rate of the pod at the node and the resource utilization rate of the node aiming at each node in the second node cluster, and determining the expected utilization rate of the pod at the node; the resource utilization rate of one node is equal to the sum of the resource utilization rates of a plurality of pod on the node; determining a score for the node based on the predicted utilization and the utilization threshold; and scheduling the pod from the initial node to a second target node with the highest score in the second node cluster.

Optionally, the average consumption is determined based on a sum of resource occupancy of the plurality of containers of the pod and the preset time period.

When implemented in hardware, the acquisition unit 602 in the embodiments of the present application may be integrated on a communication interface, and the processing unit 601 may be integrated on a processor. The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A Pod scheduling method, the method comprising:

acquiring resource information of the pod; the resource information comprises any one of CPU resource, memory resource and display card resource;

calculating average consumption corresponding to the resource information based on the resource information;

scheduling the pod from an initial node to a highest-score target node in a node cluster under the condition that the average consumption is greater than a first threshold; the initial node is the node bound by the pod last time.

2. The method of claim 1, wherein the cluster of nodes comprises a first cluster of nodes; the resource usage of each node in the first node cluster meets the preset multiple of the resource usage required by the pod; the target node comprises a first target node corresponding to the first node cluster;

and in the case that the average consumption is greater than a first threshold, scheduling the pod from the initial node to a target node with the highest score in the node cluster, and further comprising:

obtaining the maximum resource usage of the pod at the initial node;

determining the first node cluster under the condition that the resource usage amount of a plurality of nodes in the second node cluster is smaller than the maximum resource usage amount; the second node cluster is a node cluster where the initial node is located;

And dispatching the pod node from an initial node to the first target node with the highest score in the first node cluster.

3. The method of claim 2, wherein the target node further comprises a second target node corresponding to a second one of the node clusters;

the method further comprises the steps of:

under the condition that the resource usage amount of a plurality of nodes in a second node cluster is larger than the maximum resource usage amount, acquiring the resource usage rate of the pod at the node and the resource usage rate of the node aiming at each node in the second node cluster, and determining the expected utilization rate of the pod at the node; the resource utilization of one of the nodes is equal to the sum of the resource utilization of a plurality of pod on the node;

determining a score for the node based on the projected utilization and a utilization threshold;

and dispatching the pod from the initial node to the second target node with the highest score in the second node cluster.

4. A method according to any one of claims 1-3, wherein the average consumption is determined based on a sum of resource occupancy of a plurality of containers of the pod and a preset period of time.

5. A Pod scheduling apparatus, the apparatus comprising: a processing unit and an acquisition unit;

the acquisition unit is used for acquiring the resource information of the pod; the resource information comprises any one of CPU resource, memory resource and display card resource;

the processing unit is used for calculating and obtaining average consumption corresponding to the resource information based on the resource information;

the processing unit is further configured to schedule the pod from an initial node to a target node with a highest score in a node cluster if the average consumption is greater than a first threshold; the initial node is the node bound by the pod last time.

6. The apparatus of claim 5, wherein the cluster of nodes comprises a first cluster of nodes; the resource usage of each node in the first node cluster meets the preset multiple of the resource usage required by the pod; the target node comprises a first target node corresponding to the first node cluster;

the obtaining unit is further configured to obtain a maximum resource usage amount of the pod at the initial node;

the processing unit is further configured to determine the first node cluster when resource usage amounts of a plurality of nodes in the second node cluster are all less than the maximum resource usage amount; the second node cluster is a node cluster where the initial node is located;

The processing unit is further configured to schedule the pod node from an initial node to the first target node with the highest score in the first node cluster.

7. The apparatus of claim 6, wherein the target node further comprises a second target node corresponding to a second one of the node clusters;

the processing unit is specifically configured to:

8. The apparatus of any of claims 5-7, wherein the average consumption is determined based on a sum of resource occupancy of a plurality of containers of the pod and a preset period of time.

9. A Pod scheduling apparatus, comprising: a processor and a communication interface; the communication interface is coupled to the processor for running a computer program or instructions to implement the Pod scheduling method as claimed in any of claims 1-4.

10. A computer readable storage medium having instructions stored therein, wherein when executed by a computer, the computer performs the Pod scheduling method of any of the preceding claims 1-4.