CN114064199A

CN114064199A - Cluster capacity management method and system

Info

Publication number: CN114064199A
Application number: CN202111149685.2A
Authority: CN
Inventors: 张勇
Original assignee: Inspur Software Technology Co Ltd
Current assignee: Inspur Software Technology Co Ltd
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2022-02-18

Abstract

The invention provides a cluster capacity management method and a system, comprising the following steps: acquiring a pod scheduling state of the kubernetes cluster; determining a preset high consumption resource pod, and pressurizing the kubernets cluster based on the preset high consumption resource pod; if judging that a plurality of pending pods exist in the first preset period range, acquiring a new virtual machine node from the node resource pool; and loading the plurality of pending pods to the new virtual machine node until the pending pods do not exist in the kubernets cluster within a preset running time. The invention realizes dynamic capacity expansion and capacity reduction through cluster nodes, effectively deals with the phenomenon of service tide, and accurately identifies the node resource occupation state of the cluster according to the state and the quantity of the pending pod.

Description

Cluster capacity management method and system

Technical Field

The present invention relates to the field of virtualization technologies, and in particular, to a cluster capacity management method and system.

Background

kubernets, K8s for short, is an open-source application for managing containerization on multiple hosts in a cloud platform, and aims to make deploying containerization applications simple and efficient, and provides a mechanism for application deployment, planning, updating and maintenance.

Kubernetes is used for managing containerization application on a plurality of hosts in a cloud platform, is an open-source platform, and can realize the functions of automatic deployment, automatic capacity expansion and reduction, maintenance and the like of a container cluster. Kubernets can rapidly deploy applications, rapidly expand applications, seamlessly interface new application functions, save resources and optimize the use of hardware resources. kubernets functions include: a plurality of pods working cooperatively; mounting a storage system; applying a health check; replication of the application instance; pod auto-scaling/expansion; registering and discovering; load balancing; updating in a rolling mode; monitoring resources; log access; debugging the application program; authentication and authorization is provided.

Disclosure of Invention

The invention provides a cluster capacity management method and a cluster capacity management system, which are used for overcoming the defects existing in the prior art when a kubernets cluster dynamically expands and shrinks the capacity.

In a first aspect, the present invention provides a cluster capacity management method, including:

acquiring a pod scheduling state of the kubernetes cluster;

determining a preset high consumption resource pod, and pressurizing the kubernets cluster based on the preset high consumption resource pod;

if judging that a plurality of pending pods exist in the first preset period range, acquiring a new virtual machine node from the node resource pool;

and loading the plurality of pending pods to the new virtual machine node until the pending pods do not exist in the kubernets cluster within a preset running time.

In one embodiment, obtaining the pod scheduling status of the kubernets cluster further includes:

acquiring an openstack environment, creating a plurality of virtual machines in the openstack environment, and creating the kubernets cluster based on the virtual machines;

and establishing a node resource pool in the openstack environment, wherein the node resource pool comprises a plurality of empty virtual machine nodes.

In one embodiment, the obtaining of the pod scheduling state of the kubernets cluster specifically includes:

and acquiring the IP address of the kubernets cluster through a monitoring process, acquiring the pod scheduling state based on the IP address, and periodically determining whether the kubernets cluster has pending pods with scheduling failure.

In one embodiment, if it is determined that a plurality of pending pods exist within the preset period range, acquiring a new virtual machine node from the node resource pool, and then:

and creating a new idle virtual machine node, and adding the new idle virtual machine node to the node resource pool.

In one embodiment, loading the plurality of pending pods to the new virtual machine node until no pending pod exists in the kubernets cluster within a preset running time, and then further comprising:

deleting the pressurized preset high consumption resource pod from the kubernets cluster;

if the system resource utilization rates are judged to be lower than the preset threshold value within the second preset period range, acquiring a node with the lowest resource utilization rate;

and finishing the node capacity reduction operation of the kubernets cluster based on the node with the lowest resource utilization rate.

In one embodiment, the completing, based on the node with the lowest resource utilization rate, a node capacity reduction operation of the kubernets cluster includes:

isolating the node with the lowest resource utilization rate, and stopping dispatching the pod to the node with the lowest resource utilization rate;

migrating the existing pod on the node with the lowest resource utilization rate to other nodes until the existing pod becomes 0, and deleting the node with the lowest resource utilization rate.

In a second aspect, the present invention further provides a cluster capacity management system, including:

the obtaining module is used for obtaining the pod scheduling state of the kubernets cluster;

the pressurization module is used for determining a preset high-consumption resource pod and pressurizing the kubernets cluster based on the preset high-consumption resource pod;

the newly-added module is used for acquiring a new virtual machine node from the node resource pool if judging that a plurality of pending pods exist in the first preset period range;

and the capacity expansion module is used for loading the plurality of pending pods to the new virtual machine node until the pending pods do not exist in the kubernets cluster within the preset operation time.

In a third aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the cluster capacity management method according to any one of the above.

In a fourth aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the cluster capacity management method as described in any of the above.

In a fifth aspect, the present invention also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the cluster capacity management method according to any of the above.

According to the cluster capacity management method and system provided by the invention, dynamic capacity expansion and capacity reduction are realized through cluster nodes, the service tide phenomenon is effectively responded, and the node resource occupation state of the cluster is accurately identified according to the state and the quantity of the pending pod.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a cluster capacity management method provided by the present invention;

fig. 2 is a schematic expansion flow diagram of a cluster capacity management method provided by the present invention;

FIG. 3 is a schematic diagram of a capacity reduction process of the cluster capacity management method provided by the present invention;

FIG. 4 is a schematic structural diagram of a cluster capacity management system provided in the present invention;

fig. 5 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Aiming at the problems in the prior art, the invention provides a method for carrying out capacity management on a kubernets cluster, which can automatically analyze the current resource state of the cluster, create a node resource pool in advance, take nodes from the node resource pool and add the nodes into the kubernets cluster, and simultaneously complement the number of the nodes in the node resource pool; when the resource utilization rate of the cluster resource state is reduced along with the service tide, judging whether a capacity reduction node is needed or not according to the condition that the average resource occupancy rate of the system is lower than a certain threshold value, judging the node needing to be isolated, expelling all the pod of the node, and deleting the node from the cluster.

Fig. 1 is a schematic flow diagram of a cluster capacity management method provided by the present invention, as shown in fig. 1, including:

s1, acquiring pod dispatching state of the kubernets cluster;

s2, determining a preset high-consumption resource pod, and pressurizing the kubernets cluster based on the preset high-consumption resource pod;

s3, if judging that a plurality of pending pods exist in the first preset period range, acquiring a new virtual machine node from the node resource pool;

s4, loading the plurality of pending pods to the new virtual machine node until the pending pods do not exist in the kubernets cluster within a preset running time.

It can be understood that the present invention is directed to a method for dynamically utilizing resources when a kubernets environment is deployed and operated in a production environment, and when capacity planning is not performed at the beginning, or when a tidal phenomenon of a service is handled, and nodes are dynamically added in a busy hour. In idle time, the node resources are dynamically contracted, so that the system resources are operated along with the service, and the resource utilization rate is higher.

Specifically, when a running virtual machine is created in an openstack environment and a kubernets cluster is deployed, a fixed number of nodes are initially created and added to the kubernets cluster for deployment. When a large amount of services are suddenly deployed in a cluster, resources such as cpus, memories, disks and the like are rapidly increased, and when a plurality of pods suddenly appear, suitable nodes which cannot be bound all the time, or node resources in the cluster are exhausted. At this time, a node resource pool can be created in an openstack environment, a plurality of nodes are created and added into the node resource pool, an openstack physical machine starts a monitoring script, whether there are pod which can not be scheduled all the time in a cluster is inquired circularly, if there are pod which can not be scheduled in a plurality of continuous inquiry periods, the cluster node resource is considered to be insufficient at this time, the monitoring script starts a process of adding the nodes into the cluster, one node is added at a time, after the nodes are added into the cluster, the nodes are in a ready state, at this time, a kubernets scheduler starts to schedule the pod to a new node, when all pending pods are bound to the nodes, the cluster resource is already enough, and meanwhile, a new idle node is created and added into the node resource pool. If there are pending pods which can not be scheduled, the monitor will take out the idle nodes from the node resource pool at this time, join in the kubernets cluster, and complement the number of the idle nodes in the node resource pool, if the pending pods are all bound to the appropriate nodes, the cluster capacity expansion process is finished.

The monitor continues to periodically inquire whether pending pod exists or not, the monitor also monitors the resource utilization rate of each node cpu, memory and disk of the cluster, when the application is in idle time, the system resource utilization rate is reduced, in order to save energy and reduce consumption, when the average resource utilization rate of the node is lower than a certain threshold value, the contraction process of the node is triggered, the node with the lowest resource occupancy rate is selected at first, the node is set to be in a non-scheduling state, the existing pod of the node is expelled to other nodes, and when the pod number of the node is 0, the node is deleted from the cluster.

Here, Pod is the smallest/simplest basic unit created or deployed by Kubernetes, and one Pod represents one process running on a cluster

The invention realizes dynamic capacity expansion and capacity reduction through cluster nodes, effectively deals with the phenomenon of service tide, and accurately identifies the node resource occupation state of the cluster according to the state and the quantity of the pending pod.

Based on the above embodiment, obtaining the pod scheduling state of the kubernets cluster further includes:

Specifically, the invention firstly creates an openstack environment, and the openstack adopts high-availability deployment.

Three virtual machines are established in openstack, a domain name, a network and the like are configured for each virtual machine, a kubernets cluster is established by taking the three virtual machines as nodes, high-availability configuration is adopted, a Master node and a worker node in the kubernets cluster are integrated, namely, a management node and a computing node are integrated for deployment, and resources are saved.

And (3) a node resource pool is created in the openstack environment, 3 empty nodes are created in the node resource pool and added into the node resource pool.

The invention ensures that the resources of the system can be dynamically adjusted and the high-reliability running state is ensured by deploying the cluster and the node resource pool in the openstack environment in advance.

Based on any of the above embodiments, acquiring the pod scheduling state of the kubernets cluster specifically includes:

Specifically, a monitoring process is started at an openstack host, the monitoring process acquires an IP address of a kubernets cluster, a pod scheduling state of the kubernets cluster is inquired, and whether a cluster has a pending pod with scheduling failure is inquired periodically;

then, a large number of high-consumption-resource pods are created, the cluster is pressurized, and a monitoring script queries whether the cluster has a large number of pending pods.

The invention monitors the number of pending pod in the cluster to obtain the system running state, thereby judging whether subsequent capacity expansion or capacity reduction operation is needed.

Based on any of the above embodiments, if it is determined that there are a plurality of pending pods within the preset period range, acquiring a new virtual machine node from the node resource pool, and then further including:

Specifically, in the present invention, within a preset period range, for example, if there is a pending pod in all three consecutive acquisition periods, the monitoring script selects a node from the node resource pool, adds the node to the kubernets cluster, the node initiates a node registration process, and the node state is restored to ready, so that the kubernets scheduler schedules the pending pod to the newly added node.

Simultaneously, calling openstack server create to create a new idle node instance, and simultaneously adding the node instance into an idle resource pool

According to the invention, through the management of the node resource pool, the nodes are quickly added to the kubernets cluster, and the node resources of the node resource pool are ensured to be stable.

Based on any of the above embodiments, loading the plurality of pending pods to the new virtual machine node until no pending pod exists in the kubernets cluster within a preset running time, and then further including:

Based on the node with the lowest resource utilization rate, the node capacity reduction operation of the kubernets cluster is completed, and the method specifically comprises the following steps:

Specifically, after the cluster is expanded and the system runs for a period of time, the pod is normally scheduled to a new node, and a pending pod does not exist in the cluster.

When the pressurized pod is deleted, the utilization rate of system resources will slowly decrease, and the monitor monitors that the utilization rate of cluster resources is reduced below a threshold value, and a cluster node capacity reduction process will be started.

And similarly, continuously judging that the resource utilization rates of the three acquisition periods are all lower than the threshold value, selecting a node with the lowest resource utilization rate from the cluster nodes, and starting to perform capacity reduction operation.

And stopping dispatching a new node to the node by using a kubecect command cordion (isolation), then migrating the existing pod on the node to other nodes, and deleting the node from the cluster when the pod number on the node is reduced to 0, thereby completing the node capacity reduction process.

When the service load is higher, the invention can automatically expand the nodes, and when the service load is reduced, the invention can contract the nodes, thereby ensuring the use efficiency of resources.

The following describes a cluster management scheme according to a specific embodiment of capacity expansion and capacity reduction, and fig. 2 is a schematic diagram of a capacity expansion flow of a cluster capacity management method provided by the present invention, including:

step one, preparing a physical machine environment and deploying a multi-node high-availability openstack environment.

And step two, a virtual machine is created in openstack for deploying the kubernets cluster, and the specification of the virtual machine is consistent with that of the system.

And step three, creating a virtual machine node for the node resource pool in the openstack, wherein the specification of the virtual machine is consistent with that of the system.

And step four, selecting an initial node to deploy the kubernets cluster.

And step five, starting a monitoring service at the openstack host physical node to periodically detect the status of the kubernets cluster pod and the utilization rate of kubernets cluster resources.

And step six, preparing a pod template, configuring request resources and limit resources, deploying guaranteed pod as a pressurized pod, deploying a large number of pods for the cluster, and observing the scheduling state of the pods.

And step seven, the cluster monitor periodically inquires the scheduling state of the pod and inquires whether a pending pod exists.

Step eight, if a large number of unscheduled pending pods are inquired in 3 continuous periods, the cluster resources are not enough.

And step nine, starting a cluster node capacity expansion flow, selecting a virtual machine node from the node resource pool, and adding the virtual machine node into the kubernets cluster.

Step ten, after the node is normally registered to the cluster, inquiring that the state of the node is ready, and starting to schedule the pending pod to the newly added node by the scheduler.

Step eleven, when the cluster monitor inquires that there is no pending pod in the cluster, the capacity expansion is successful.

And step twelve, creating a new node by the openstack, adding the new node into the node resource pool, and keeping the node number of the node resource pool stable.

And step thirteen, finishing the cluster capacity expansion process, and periodically monitoring the cluster state by the monitor.

Fig. 3 is a schematic diagram of a capacity reduction flow of the cluster capacity management method provided by the present invention, including:

preparing a physical node, creating an openstack environment, separating a control node from a computing node, and deploying the control node into a high-availability form.

And step two, establishing a specified number of virtual machine nodes for deploying the kubernets cluster, wherein the node specification is consistent with the system.

And step three, establishing a specified number of virtual machine nodes for adding into the node resource pool, wherein the specifications of the virtual machine nodes are consistent with the system.

And step four, starting and deploying the kubernets cluster.

And step five, the openstack main node starts monitoring service and periodically inquires the cluster resource state, wherein the cluster resource state comprises resource utilization rate indexes such as a cpu, an internal memory and a disk.

And step six, deploying a large number of pod with high resource consumption attributes, pressurizing the cluster, starting an expansion flow, finishing adding nodes to the cluster, wherein the nodes have no pending pod and normally participate in resource scheduling.

And seventhly, deleting the pod with high energy consumption after a period of time.

Step eight, the cluster resource utilization rate will slowly decrease.

And step nine, the cluster monitor periodically queries the resource utilization rate and finds that the overall resource utilization rate of the cluster is lower than the cluster contraction threshold.

Step ten, inquiring that the utilization rate of the cluster resources is lower than the cluster contraction threshold in 3 continuous cycles, and starting a cluster node contraction process.

Step eleven, selecting a node with the lowest resource utilization rate from the kubernets cluster, setting the node as cordin, not scheduling the node, and not allowing a new pod to be scheduled on the node.

Step twelve, setting the pod to be in a drain state, migrating the existing pod in the node to another node, and completing pod eviction.

And step thirteen, when the pod number is reduced to 0, deleting the node from the kubernets cluster.

Fourteen, finishing the cluster node capacity reducing process

The cluster capacity management system provided by the present invention is described below, and the cluster capacity management system described below and the cluster capacity management method described above may be referred to correspondingly.

Fig. 4 is a schematic structural diagram of a cluster capacity management system provided in the present invention, as shown in fig. 4, including: an obtaining module 41, a pressurizing module 42, a newly adding module 43, and an expansion module 44, wherein:

the obtaining module 41 is configured to obtain a pod scheduling state of the kubernets cluster; the pressurization module 42 is configured to determine a preset high consumption resource pod, and pressurize the kubernets cluster based on the preset high consumption resource pod; the newly-added module 43 is configured to obtain a new virtual machine node from the node resource pool if it is determined that a plurality of pending pods exist within the first preset period range; the capacity expansion module 44 is configured to load the plurality of pending pods to the new virtual machine node until the pending pods do not exist in the kubernets cluster within a preset operation time.

Fig. 5 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 5: a processor (processor)510, a communication Interface (Communications Interface)520, a memory (memory)530 and a communication bus 540, wherein the processor 510, the communication Interface 520 and the memory 530 communicate with each other via the communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform a cluster capacity management method comprising: acquiring a pod scheduling state of the kubernetes cluster; determining a preset high consumption resource pod, and pressurizing the kubernets cluster based on the preset high consumption resource pod; if judging that a plurality of pending pods exist in the first preset period range, acquiring a new virtual machine node from the node resource pool; and loading the plurality of pending pods to the new virtual machine node until the pending pods do not exist in the kubernets cluster within a preset running time.

Furthermore, the logic instructions in the memory 530 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product including a computer program, the computer program being stored on a non-transitory computer-readable storage medium, wherein when the computer program is executed by a processor, a computer is capable of executing the cluster capacity management method provided by the above methods, and the method includes: acquiring a pod scheduling state of the kubernetes cluster; determining a preset high consumption resource pod, and pressurizing the kubernets cluster based on the preset high consumption resource pod; if judging that a plurality of pending pods exist in the first preset period range, acquiring a new virtual machine node from the node resource pool; and loading the plurality of pending pods to the new virtual machine node until the pending pods do not exist in the kubernets cluster within a preset running time.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor, implements a method for cluster capacity management provided by the above methods, the method comprising: acquiring a pod scheduling state of the kubernetes cluster; determining a preset high consumption resource pod, and pressurizing the kubernets cluster based on the preset high consumption resource pod; if judging that a plurality of pending pods exist in the first preset period range, acquiring a new virtual machine node from the node resource pool; and loading the plurality of pending pods to the new virtual machine node until the pending pods do not exist in the kubernets cluster within a preset running time.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for cluster capacity management, comprising:

acquiring a pod scheduling state of the kubernetes cluster;

2. The method of claim 1, wherein obtaining the pod scheduling status of the kubernets cluster further comprises:

3. The method according to claim 1 or 2, wherein obtaining the pod scheduling state of the kubernets cluster specifically includes:

4. The cluster capacity management method according to claim 1, wherein if it is determined that there are a plurality of pending pods within the preset period range, acquiring a new virtual machine node from the node resource pool, and then further comprising:

5. The method according to claim 1, wherein the loading the plurality of pending pods to the new virtual machine node is performed until no pending pod exists in the kubernets cluster within a preset running time, and then the method further comprises:

6. The method according to claim 5, wherein the completing the node capacity reduction operation of the kubernets cluster based on the node with the lowest resource utilization rate specifically includes:

7. A cluster capacity management system, comprising:

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the cluster capacity management method according to any of claims 1 to 6 are implemented when the processor executes the program.

9. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program, when being executed by a processor, is adapted to carry out the steps of the cluster capacity management method according to any of the claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the cluster capacity management method according to any of claims 1 to 6 when executed by a processor.