CN114020407A

CN114020407A - Container management cluster container group scheduling optimization method, device and equipment

Info

Publication number: CN114020407A
Application number: CN202111265634.6A
Authority: CN
Inventors: 李瑞寒
Original assignee: Jinan Inspur Data Technology Co Ltd
Current assignee: Jinan Inspur Data Technology Co Ltd
Priority date: 2021-10-28
Filing date: 2021-10-28
Publication date: 2022-02-08

Abstract

The invention provides a method, a device and equipment for optimizing container group scheduling of a container management cluster, wherein the method comprises the following steps: monitoring system resources of the nodes and acquiring the current amount of the system resources of the nodes; when the obtained current quantity is smaller than the set corresponding eviction threshold value, sending an instruction for evicting the node container set; after receiving the instruction, acquiring the request quantity of the container group on each node; and performing optimized scheduling on the container groups on the nodes according to the request quantity of the container groups and the set resource limit value. Workloads are evicted from a node to free resources to handle other Pod or system task mechanisms. When the computing resources such as a node disk, an RAM or a CPU and the like are insufficient, the stability of the node can be greatly maintained.

Description

Container management cluster container group scheduling optimization method, device and equipment

Technical Field

The invention relates to the technical field of container management cluster resource optimization, in particular to a container management cluster container group scheduling optimization method, device and equipment.

Background

Kubernetes is an open-source container cluster management system, provides a series of complete functions such as deployment and operation, resource scheduling, service discovery, dynamic expansion and the like for containerized application on the basis of Docker technology, and improves the convenience of large-scale container cluster management. In kubernets, Pod groups are the smallest deployment units, each Pod consisting of one or more containers and each Pod containing a "root container" and also containing one or more tightly connected service containers. The distribution of a large number of Pod on which node of the cluster is achieved through a kubernets scheduling policy.

The kubernets scheduler schedules the Pod onto the working node according to a specific algorithm and policy. By default, the kubernets scheduler can meet most requirements, such as scheduling Pod to run on a node with sufficient resources, or scheduling Pod to be distributed to different nodes to balance cluster node resources. However, the kubernets default scheduling algorithm does not achieve the ideal scheduling state for some specific situations.

In an actual production environment, particularly in the case of a large amount of service Pod, it often appears that the Pod state is an eviction state, and generally, insufficient scheduling causes insufficient node resources.

Disclosure of Invention

In an actual production environment, particularly in the case of a large amount of service Pod, it often appears that the Pod state is an eviction state, and generally, insufficient scheduling causes insufficient node resources. In order to reasonably utilize cluster node resources and reasonably dispatch and distribute the Pod, the invention provides a container management cluster container group dispatching optimization method, device and equipment.

The technical scheme of the invention is as follows:

in a first aspect, a technical solution of the present invention provides a method for optimizing container group scheduling of a container management cluster, including the following steps:

monitoring system resources of the nodes and acquiring the current amount of the system resources of the nodes;

when the obtained current quantity is smaller than the set corresponding eviction threshold value, sending an instruction for evicting the node container set;

after receiving the instruction, acquiring the request quantity of the container group on each node;

and performing optimized scheduling on the container groups on the nodes according to the request quantity of the container groups and the set resource limit value.

Preferably, the step of monitoring the system resources of the node and obtaining the current amount of the system resources of the node comprises:

acquiring available node memories of all nodes;

calculating the memory availability of each node;

acquiring the residual space of the root directory and calculating the availability of the root directory;

and acquiring the residual capacity of the storage space of the container operation file, and calculating the availability of the storage space of the file.

When the computing resources such as cluster node disks, RAMs or CPUs (central processing units) are insufficient, the stability of the nodes is greatly maintained. Workloads are evicted from a node to free resources to handle other Pod or system tasks.

Preferably, the step of monitoring the system resources of the node and obtaining the current amount of the system resources of the node is preceded by:

setting an eviction threshold of a node system resource;

the eviction threshold of the node memory is a first threshold;

the eviction threshold of the node memory availability is a second threshold;

the eviction threshold of the root directory availability is a third threshold;

the eviction threshold for availability of file storage space is a fourth threshold.

The determination of when to reclaim resources is made by an eviction signal and an eviction threshold. The eviction signal is the current capacity of a system resource, such as memory or storage, and the eviction threshold is the minimum value of the resource maintained.

Preferably, when the obtained current amount is smaller than the set corresponding eviction threshold, the sending the instruction to evict the node container set includes:

when the available node memory is smaller than a first threshold value, sending an instruction for expelling the node container set;

when the available utilization rate of the node memory is smaller than a second threshold value, sending an instruction for expelling the node container group;

when the availability of the root directory is smaller than a third threshold value, sending an instruction for expelling the node container group;

and when the availability of the file storage space is less than a fourth threshold value, sending an instruction for expelling the node container set.

Preferably, the step of monitoring the system resources of the node and obtaining the current amount of the system resources of the node further comprises:

and carrying out priority division on the container groups according to the configuration principle of the container groups.

Preferably, the step of prioritizing the container groups according to the configuration principle of the container groups comprises:

acquiring resource limit values and request quantities of a CPU and an RAM in a set container;

dividing container groups with set resource limit values and request quantity equal to each other into high priority;

dividing container groups with different values of set resource limit values and request numbers into sub-priorities;

the group of containers for which the resource limit value and the number of requests are not set is classified as a low priority.

And the configuration of the node parameter threshold value realizes the Pod resource scheduling, thereby ensuring the node stability. This scheduling is done according to Pod priority.

Preferably, the step of performing optimized scheduling on the container group on the node according to the requested quantity of the container group and the set resource limit value includes:

if the request quantity exceeds the set resource limit value, stopping or limiting the container group;

if the number of requests does not exceed the set resource limit, the low priority container set is preferentially evicted.

The Pod is scheduled according to its request. Thus, to ensure that all containers and pods have the amount of RAM, CPU requested by them, if a Pod exceeds its resource request, the Pod may be terminated or restricted if it is guaranteed that the Pod or some system task requires a restricted resource. In some cases, some Pod that consumes less than the required amount will also be killed.

In a second aspect, a technical solution of the present invention provides a container group scheduling optimization apparatus for a container management cluster, including a monitoring module, a judgment processing module, a request obtaining module, and an optimization adjusting module;

the monitoring module is used for monitoring the system resources of the nodes and acquiring the current amount of the system resources of the nodes;

the judging and processing module is used for sending an instruction for expelling the node container group when the obtained current quantity is judged to be smaller than the set corresponding expelling threshold value;

the request acquisition module is used for acquiring the request quantity of the container group on each node after receiving the instruction;

and the optimization adjusting module is used for performing optimization scheduling on the container groups on the nodes according to the request quantity of the container groups and the set resource limit value.

Preferably, the monitoring module comprises an information acquisition unit and a calculation unit;

the information acquisition unit is used for acquiring node memories available for all the nodes; acquiring a root directory residual space; acquiring the residual capacity of the storage space of the container operation file;

and the computing unit is used for computing the memory availability, the root directory availability and the file storage space availability of each node according to the information acquired by the information acquisition unit.

Preferably, the apparatus further comprises a threshold setting unit, configured to set an eviction threshold of the node system resource;

the eviction threshold of the node memory is a first threshold;

the eviction threshold of the node memory availability is a second threshold;

the eviction threshold of the root directory availability is a third threshold;

Preferably, the determining and processing module is specifically configured to send an instruction for evicting the node container group when the available node memory is smaller than a first threshold; when the available utilization rate of the node memory is smaller than a second threshold value, sending an instruction for expelling the node container group; when the availability of the root directory is smaller than a third threshold value, sending an instruction for expelling the node container group; and when the availability of the file storage space is less than a fourth threshold value, sending an instruction for expelling the node container set.

Preferably, the apparatus further comprises a preprocessing module for prioritizing the container groups according to a configuration principle of the container groups.

Preferably, the preprocessing module comprises a parameter acquisition unit and a priority dividing unit;

a parameter acquiring unit for acquiring resource limit values and request numbers of the CPU and the RAM in the set container;

the priority classification unit is used for classifying the container groups with the set resource limit value and the request quantity which are equal to each other into high priority; dividing container groups with different values of set resource limit values and request numbers into sub-priorities; the group of containers for which the resource limit value and the number of requests are not set is classified as a low priority.

Preferably, the optimization adjustment module is configured to suspend or limit the container group if the number of requests exceeds a set resource limit; if the number of requests does not exceed the set resource limit, the low priority container set is preferentially evicted.

In a third aspect, the present invention further provides a computer device, including a processor and a memory, where the processor and the memory complete communication with each other through a bus; the memory stores program instructions executable by the processor, the processor invoking the program instructions to enable execution of the method for container management cluster container group scheduling optimization according to the first aspect.

According to the technical scheme, the invention has the following advantages: workloads are evicted from a node to free resources to handle other Pod or system task mechanisms. When the computing resources such as a node disk, an RAM or a CPU and the like are insufficient, the stability of the node can be greatly maintained.

In addition, the invention has reliable design principle, simple structure and very wide application prospect.

Therefore, compared with the prior art, the invention has prominent substantive features and remarkable progress, and the beneficial effects of the implementation are also obvious.

Drawings

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.

FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention.

Fig. 2 is a schematic block diagram of an apparatus of one embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, an embodiment of the present invention provides a container management cluster container group scheduling optimization method, including the following steps:

s11: monitoring system resources of the nodes and acquiring the current amount of the system resources of the nodes;

s12: when the obtained current quantity is smaller than the set corresponding eviction threshold value, sending an instruction for evicting the node container set;

s13: after receiving the instruction, acquiring the request quantity of the container group on each node;

s14: and performing optimized scheduling on the container groups on the nodes according to the request quantity of the container groups and the set resource limit value.

The embodiment of the invention provides a container management cluster container group scheduling optimization method, which comprises the following steps:

s21: setting an eviction threshold of a node system resource; setting an eviction threshold of a node memory as a first threshold; the eviction threshold of the node memory availability is a second threshold; the eviction threshold of the root directory availability is a third threshold; the eviction threshold for availability of file storage space is a fourth threshold.

S22: carrying out priority division on the container groups according to the configuration principle of the container groups;

in the step, firstly, the resource limit value and the request quantity of the CPU and the RAM in the set container are obtained; dividing container groups with set resource limit values equal to the request quantity into high-priority guarded container groups; dividing a container group with set resource limit value and request quantity and unequal values into secondary priority bursts; the group of containers for which the resource limit value and the number of requests are not set is divided into low priority BestEffort.

It should be noted that, 1) for a high priority, a Pod which cannot be scheduled randomly is defined as a Guaranteed class, the configuration principle is to set a resource limit and a request quantity request for a CPU and a RAM in a container, and the values of the limit and the request need to be equal. 2) For the secondary priority, Pod that may not be scheduled may be scheduled. Defining as a burst type, and setting resource limits and requests for a CPU and an RAM in a container according to a configuration principle, wherein the values of the limits and the requests are unequal. 3) For low priority, i.e. temporary services or Pod may be scheduled ad libitum. Defined as BestEffort, the configuration principle is to set resources for CPU and RAM in the container without limitation.

S23: monitoring system resources of the nodes and acquiring the current amount of the system resources of the nodes;

who needs to be able to do so, this step specifically includes:

s231: acquiring available node memories of all nodes;

s232: calculating the memory availability of each node;

s233: acquiring the residual space of the root directory and calculating the availability of the root directory;

s234: and acquiring the residual capacity of the storage space of the container operation file, and calculating the availability of the storage space of the file.

S24: when the obtained current quantity is smaller than the set corresponding eviction threshold value, sending an instruction for evicting the node container set;

it should be noted that, in this step, when the available node memory is smaller than the first threshold, an instruction for evicting the node container set is sent; when the available utilization rate of the node memory is smaller than a second threshold value, sending an instruction for expelling the node container group; when the availability of the root directory is smaller than a third threshold value, sending an instruction for expelling the node container group; and when the availability of the file storage space is less than a fourth threshold value, sending an instruction for expelling the node container set.

S25: after receiving the instruction, acquiring the request quantity of the container group on each node;

s26: and performing optimized scheduling on the container groups on the nodes according to the request quantity of the container groups and the set resource limit value. In this step, if the number of requests exceeds the set resource limit, the group of containers is suspended or limited; if the number of requests does not exceed the set resource limit, the low priority container set is preferentially evicted.

If no Pod exceeds its request, the Pod Priority is checked. It will try to forego Pod-by-Pod with lower priority. The priority is that guarded is greater than burst is greater than BestEffort.

The first eviction is that the restricted resource is used beyond the BestEffort and burst Pod of the request. If there are multiple such Pods, kubel orders them by priority. Finally, the resources are lower than the requested guarded and Burstable Pod, and the kubelelet may evict the consumption lower than the requested guarded Pod, in which case it will evict both guarded and Burstable Pod with the lowest priority.

In addition, the kubbelet is a main node component in kubernets, and when computing resources such as a cluster node disk, a RAM (random access memory) or a CPU (central processing unit) are insufficient, the kubbelet can greatly maintain the stability of the node. kubelet may evict workloads from nodes to free resources to handle other Pod or system tasks. The specific method of the embodiment is executed according to a kubel scheduling mechanism and a scheduling principle in the embodiment of the invention, and the process is as follows:

kubelet determines when to reclaim resources through an eviction signal and an eviction threshold. The eviction signal is the current capacity of a system resource, such as memory or storage, and the eviction threshold is the minimum value of resources maintained by kubel. kubelet supports the following eviction signal.

In a multi-working node scene, in order to achieve reasonable scheduling of the workload, kubel parameters of the working nodes need to be reasonably configured, and the kubel parameters can be modified in a kubel binary code, so that the kubel parameters can be effective in all the working nodes after deployment is completed. And the personalized configuration of the nodes can be realized through the kubel configuration file of the nodes.

kubelet determines when to reclaim resources through an eviction signal and an eviction threshold. The eviction signal is the current capacity of a system resource, such as memory or storage, and the eviction threshold is the minimum value of resources maintained by kubel. kubelet supports the following eviction signals:

available indicates that the default eviction threshold of the node memory is 300Mi, and kubel starts evicting Pod when the memory is smaller than 300 Mi. Thus, nodes with higher stability requirements are more important to the service, and the threshold can be moderately increased.

Available represents file systems of volumes, daemon logs and the like, and generally refers to the utilization rate of root directories of a work node system. In default, if nodefs.available < 10%, the third threshold is 10%; the kubelet starts to recycle node resources, and a utilization rate threshold value is moderately improved for service operation nodes with limited disk space or large log quantity.

nodesfree represents a signal of the memory state of the working node. Inodesfree < 5% by default, where the second threshold is 5%; kubelet will start to evict the workload. And if the working node with larger memory resource consumption can moderately improve the utilization rate threshold.

Available represents an optional file system used by the container runtime to store the container image and the container writable layer. Typically for/var/lib/docker. In default, if the imagefs.available is less than 15%, the fourth threshold is 15%; kubelet will start to evict the workload. The threshold may be gracefully adjusted if the work node container serves more. According to the importance of the service and the difference of the scheduling priority, the scheduling configuration of the priority needs to be carried out on the workload.

Once the working node evicts the service Pod due to the resource limitation problem, reasonable scheduling can be performed according to the configured priority, and the problem caused by a default scheduling strategy is avoided. For example, if some service components are placed in the same node as much as possible due to communication problems, if the initiated scheduling mechanism schedules one or some of the components to other nodes, communication efficiency problems between the components are caused. Therefore, the reasonable scheduling priority configuration is carried out on the workload, and the configuration can be carried out according to the preset priority when the working node starts Pod eviction, so that unnecessary Pod eviction is avoided.

And sequencing according to the priority of the service load, and configuring the resource limitation of the load configured from top to bottom according to the priority. And setting limits of requests for the service with the highest priority, and ensuring that the limits and the requests are consistent. And configuring the limits and the request for the workload of the secondary priority, wherein the limits and the request are not consistent. The configuration of limits and request is not needed for the workload scheduled arbitrarily by the node.

Besides, the actual resource consumption of the service application needs to be considered when the limits and the request are set for the service load, and the setting is not too large or too small. Because the kubel is scheduled according to the value of the request, if the resources consumed by the service are greater than the request, the kubel scheduling mechanism preferentially limits or stops the service load. If the service consumption is much less than the request, that is, the actual running resource utilization is too low, it will also be the subject of being killed.

It should be noted that, when performing request setting on the traffic load, the actual resource consumption of the traffic application needs to be considered, and the setting is not too large or too small. Because the kubel is scheduled according to the value of the request, if the resources consumed by the service are greater than the request, the kubel scheduling mechanism preferentially limits or stops the service load. If the service consumption is much less than the request, that is, the actual running resource utilization is too low, it will also be the subject of being killed.

As shown in fig. 2, the technical solution of the present invention provides a container group scheduling optimization apparatus for a container management cluster, which includes a monitoring module, a judgment processing module, a request obtaining module, and an optimization adjusting module;

The kubbeelet is a main node component in kubernets, and when computing resources such as a cluster node disk, a RAM (random access memory) or a CPU (central processing unit) are insufficient, the kubbeelet can greatly maintain the stability of the node. kubelet may evict workloads from nodes to free resources to handle other Pod or system tasks, i.e., implementation-specific processes, which may be part of kubelet.

In some embodiments, the monitoring module includes an information acquisition unit and a calculation unit;

In some embodiments, the apparatus further comprises a threshold setting unit for setting an eviction threshold of a node system resource;

the eviction threshold of the node memory is a first threshold;

the eviction threshold of the node memory availability is a second threshold;

the eviction threshold of the root directory availability is a third threshold;

In the practical application process, the node memory available default eviction threshold, that is, the first threshold is 300Mi, and kubel will start evicting Pod when the memory is smaller than 300 Mi. Therefore, the node which is important for service and has higher stability requirement can moderately increase the threshold;

under the default condition of the root directory availability, the third threshold is 10%, if the root directory availability is less than 10%, the node resources are recovered, and the utilization threshold is moderately improved for the service operation nodes with limited magnetic packing directory space or large log quantity;

inodesfree default, the second threshold is 5%, and if nodesfree < 5%, kubelet will start to evict the workload. If the utilization rate threshold value can be moderately improved for the working node with larger memory resource consumption of the index node;

the fourth threshold is set to 15% under the default condition of the availability of the file storage space, and if the availability of the file storage space is less than 15%, the kubelet will start to evict the workload, and if the service of the working node container is more, the threshold can be properly adjusted.

In some embodiments, the determining module is specifically configured to send an instruction to evict the node container set when the available node memory is smaller than a first threshold; when the available utilization rate of the node memory is smaller than a second threshold value, sending an instruction for expelling the node container group; when the availability of the root directory is smaller than a third threshold value, sending an instruction for expelling the node container group; and when the availability of the file storage space is less than a fourth threshold value, sending an instruction for expelling the node container set.

In some embodiments, the apparatus further comprises a pre-processing module for prioritizing the groups of containers according to a configuration principle of the groups of containers.

In some embodiments, the preprocessing module includes a parameter obtaining unit and a priority dividing unit;

Resource limitations that configure the load from top to bottom are configured according to the priority of the traffic load (i.e., those components do not initiate evictions as much as possible, and those components can evict). And setting limits of requests for the service with the highest priority, and ensuring that the limits and the requests are consistent. And configuring the limits and the request for the workload of the secondary priority, wherein the limits and the request are not consistent. The configuration of limits and request is not needed for the workload scheduled arbitrarily by the node.

In some embodiments, the optimization adjustment module is configured to abort or limit the group of containers if the number of requests exceeds a set resource limit; if the number of requests does not exceed the set resource limit, the low priority container set is preferentially evicted.

The computer device provided by the embodiment of the invention can comprise: the system comprises a processor, a communication interface, a memory and a bus, wherein the processor, the communication interface and the memory are communicated with each other through the bus. The bus may be used for information transfer between the electronic device and the sensor. The processor may call logic instructions in memory to perform the following method: s21: setting an eviction threshold of a node system resource; s22: prioritizing the container groups according to their configuration principles S23: monitoring system resources of the nodes and acquiring the current amount of the system resources of the nodes; s24: when the obtained current quantity is smaller than the set corresponding eviction threshold value, sending an instruction for evicting the node container set; s25: after receiving the instruction, acquiring the request quantity of the container group on each node; s26: and performing optimized scheduling on the container groups on the nodes according to the request quantity of the container groups and the set resource limit value.

In some specific embodiments, the program instructions executed by the processor may specifically implement the following steps: s231: acquiring available node memories of all nodes; s232: calculating the memory availability of each node; s233: acquiring the residual space of the root directory and calculating the availability of the root directory; s234: and acquiring the residual capacity of the storage space of the container operation file, and calculating the availability of the storage space of the file.

In some specific embodiments, the program instructions executed by the processor may specifically implement the following steps: when the available node memory is smaller than a first threshold value, sending an instruction for expelling the node container set; when the available utilization rate of the node memory is smaller than a second threshold value, sending an instruction for expelling the node container group; when the availability of the root directory is smaller than a third threshold value, sending an instruction for expelling the node container group; and when the availability of the file storage space is less than a fourth threshold value, sending an instruction for expelling the node container set.

In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A container management cluster container group scheduling optimization method is characterized by comprising the following steps:

2. The method of claim 1, wherein the step of monitoring system resources of the nodes and obtaining the current amount of system resources of the nodes comprises:

acquiring available node memories of all nodes;

calculating the memory availability of each node;

3. The method of claim 2, wherein the step of monitoring system resources of the nodes and obtaining the current amount of system resources of the nodes is preceded by the step of:

setting an eviction threshold of a node system resource;

the eviction threshold of the node memory is a first threshold;

the eviction threshold of the node memory availability is a second threshold;

the eviction threshold of the root directory availability is a third threshold;

4. The method according to claim 3, wherein when the obtained current amount is smaller than the set corresponding eviction threshold, sending an instruction to evict the node container group comprises:

5. The method of claim 4, wherein the step of monitoring the system resources of the nodes and obtaining the current amount of the system resources of the nodes is preceded by the step of:

6. The container group scheduling optimization method of claim 5, wherein the step of prioritizing the container groups according to their configuration rules comprises:

7. The method according to claim 6, wherein the step of performing optimized scheduling on the container group on the node according to the requested quantity and the set resource limit of the container group comprises:

8. A container group scheduling optimization device for a container management cluster is characterized by comprising a monitoring module, a judgment processing module, a request acquisition module and an optimization adjustment module;

9. The container management cluster container group scheduling optimization apparatus according to claim 8, wherein the monitoring module includes an information obtaining unit and a calculating unit;

10. A computer device comprising a processor and a memory, wherein the processor and the memory communicate with each other via a bus; the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of scheduling optimization for a container management cluster container group according to any of claims 1 to 7.