CN113051075A

CN113051075A - Kubernetes intelligent capacity expansion method and device

Info

Publication number: CN113051075A
Application number: CN202110305822.0A
Authority: CN
Inventors: 马兵兵; 侯汉祎; 刘田龙
Original assignee: Fiberhome Telecommunication Technologies Co Ltd
Current assignee: Fiberhome Telecommunication Technologies Co Ltd
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2021-06-29
Anticipated expiration: 2041-03-23
Also published as: CN113051075B

Abstract

The invention relates to the technical field of cloud computing container platforms, and provides a Kubernetes intelligent capacity expansion method and a Kubernetes intelligent capacity expansion device, which comprise the steps of obtaining parameter information of each pod and/or operation indexes of each Node in a Kubernetes cluster; generating an actual ratio factor according to the parameter information of each pod, and judging whether capacity expansion is needed or not by comparing the actual ratio factor with a ratio factor threshold; generating an actual score factor according to the operation index of each Node, and judging whether the capacity reduction is needed or not by comparing the actual score factor with a score factor threshold; when the Node nodes need to be subjected to capacity reduction, the pod on the Node nodes needing capacity reduction is safely evicted and dispatched to other nodes, and finally the Node nodes needing capacity reduction are subjected to capacity reduction operation. The automatic expansion and contraction of the Node nodes is completed by real-time intelligent analysis of the expansion and contraction capacity service in the whole cluster, and the expansion and contraction capacity service can enable the expansion and contraction capacity of the Kubernetes cluster to be more accurate and efficient.

Description

Kubernetes intelligent capacity expansion method and device

[ technical field ] A method for producing a semiconductor device

The invention relates to the technical field of cloud computing container platforms, in particular to a Kubernetes intelligent capacity expansion method and device.

[ background of the invention ]

Kubernetes is a container cluster management system opened by Google in 2014, abbreviated as k8s, is a cloud product used by Google for nearly 20 years, is an open-source version of Borg, has attracted wide attention in the industry because of high maturity in the early stage of birth, and is rapidly becoming the mainstream of container arrangement tools. As a complete distributed system supporting platform, a series of complete functions of deployment operation, resource scheduling, service discovery, dynamic expansion and the like are provided for containerized application, and convenience in large-scale container cluster management is improved.

In the aspect of cluster management, Kubernetes divides machines in a cluster into a Master Node and a cluster working Node, a group of processes related to cluster management are operated on the Master Node, the Node serves as a working Node in the cluster and operates a real application program, and a minimum operation unit of Kubernetes management on the Node is pod. When deployed services in a cluster are exponentially increased, Node nodes in the Kubernetes cluster must be transversely expanded, and when the services in the cluster are reduced, the Node nodes in the cluster are subjected to capacity reduction operation, so that life cycle management of the Node nodes becomes an important link in the whole cluster management, a large workload is brought to operation and maintenance personnel in the process, the operation and maintenance difficulty is increased, and the stability of a cluster system faces potential challenges.

In view of the above, overcoming the drawbacks of the prior art is an urgent problem in the art.

[ summary of the invention ]

The technical problem to be solved by the invention is as follows:

when the prior art carries out capacity expansion operation on a Kubernetes cluster, only the load of the whole cluster resource is monitored, and the node increase is directly carried out, so that the process performance is lower, and more cluster resources are occupied; when the capacity reduction operation is carried out on the Kubernetes cluster, the Node nodes with low resource utilization rate are directly deleted only according to the resource utilization rate comparison of the Node nodes, and the Node nodes with the lowest comprehensive performance index are deleted without carrying out safe eviction on pod in the Node and safety. At present, the capacity expansion and contraction operation of the Kubernetes cluster is completed by manual intervention, so that the efficiency is low, and the operation and maintenance complexity is high.

The invention achieves the above purpose by the following technical scheme:

in a first aspect, the present invention provides a kubernets intelligent capacity expansion method, including:

acquiring parameter information of each pod and/or operation indexes of each Node in a Kubernetes cluster;

generating an actual ratio factor according to the parameter information of each pod, and judging whether capacity expansion is needed or not by comparing the actual ratio factor with a ratio factor threshold;

generating an actual score factor according to the operation index of each Node, and judging whether the capacity reduction is needed or not by comparing the actual score factor with a score factor threshold;

when the Node nodes need to be subjected to capacity reduction, the pod on the Node nodes needing capacity reduction is safely evicted and dispatched to other nodes, and finally the Node nodes needing capacity reduction are subjected to capacity reduction operation.

Preferably, the acquiring parameter information of each pod and/or operation index of each Node in the Kubernetes cluster specifically includes:

deploying proxy service on each Node in a Kubernetes cluster, wherein the proxy service is used for monitoring the operation index of each Node;

and deploying a capacity expansion service on a Master Node in the Kubernetes cluster, wherein the capacity expansion service interacts with the API Server and the proxy service and is respectively used for acquiring parameter information of each pod and operation indexes of each Node in the Kubernetes cluster.

Preferably, the generating an actual ratio factor according to the parameter information of each pod specifically includes:

the parameter information comprises state information and resource occupation amount;

when the pod to be created appears in the Kubernetes cluster, and the state information of the pod to be created is continuously in the pod to be created state in the first preset time, calculating the sum of the occupied resource amount of each pod to be created, so as to generate an actual ratio factor.

Preferably, the determining whether capacity expansion is required by comparing the actual ratio factor with a ratio factor threshold specifically includes:

when the actual ratio factor is smaller than the ratio factor threshold, not triggering the capacity expansion operation, firstly allocating the pod in the current Node, and then deploying the pod to be created to the Node with surplus resources;

and when the actual ratio factor is larger than or equal to the ratio factor threshold, triggering expansion operation, firstly adding a new Node in the Kubernetes cluster, and then deploying the pod to be created into the new Node.

Preferably, the adding of the new Node in the Kubernetes cluster specifically includes:

and calling a provider interface of a cloud platform where the Kubernetes cluster is located to add a new Node.

Preferably, the status information further includes: pod creation complete run-in, pod normal termination, and pod exception failure.

Preferably, the operation index includes one or more of a total amount of the node CPUs, a total amount of the node memories, a total amount of the node disks, a remaining amount of the node disks, and a load rate of the nodes.

Preferably, when a Node with a Node load rate continuously exceeding the Node load rate threshold value within a second preset time appears in the Kubernetes cluster, the capacity expansion operation is triggered.

Preferably, the step of comparing the actual score factor with the score factor threshold value to determine whether the reduction is required is specifically as follows:

when the actual score factor is larger than or equal to the score factor threshold value, the capacity reduction operation is triggered;

and when the actual score factor is smaller than the score factor threshold value, not triggering the capacity reduction operation.

In a second aspect, the present invention further provides a kubernets intelligent capacity expansion device, which includes at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor and programmed to perform the method of kubernets intelligent scalability according to the first aspect.

Compared with the prior art, the invention has the beneficial effects that:

the invention mainly monitors the operation index of each Node by deploying Node-exporter proxy service on the Node of Kubernetes cluster, obtains the operation index and pod parameter information of each Node by intelligent capacity expansion and contraction service deployed on Master Node, comprehensively evaluates the Node by the operation index, generates actual value factor, autonomously decides whether capacity reduction operation is needed according to whether the actual value factor reaches the value factor threshold, and safely expels pod on the Node when capacity reduction is needed, so that pod on the capacity reduction Node can normally operate on other nodes without causing service interruption contained in pod; and calculating an actual ratio factor according to the parameter information of the pod, and autonomously determining whether capacity expansion operation is required or not according to whether the actual ratio factor reaches a ratio factor threshold value or not. The automatic expansion and contraction of the Node nodes is completed by the real-time intelligent analysis of the expansion and contraction service in the whole cluster, so that the expansion and contraction of the Kubernetes cluster are more accurate, safe and efficient.

[ description of the drawings ]

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 is a flowchart of a Kubernetes intelligent capacity expansion method according to an embodiment of the present invention;

fig. 2 is an architecture diagram of Kubernetes intelligent scalability according to an embodiment of the present invention;

fig. 3 is a flowchart of a Kubernetes intelligent capacity expansion method according to an embodiment of the present invention;

fig. 4 is a flowchart of a Kubernetes intelligent capacity expansion method according to an embodiment of the present invention;

fig. 5 is an overall flowchart of a Kubernetes intelligent capacity expansion method according to an embodiment of the present invention;

fig. 6 is an architecture diagram of a Kubernetes intelligent scalable device according to an embodiment of the present invention.

[ detailed description ] embodiments

Kubernetes divides machines in a cluster into a Master Node and a cluster working Node, the Master Node is a management Node of the Kubernetes cluster, is provided with ETCD storage service (the service is optional service), runs an API Server process, a Controller Manager service process and a Scheduler service process, and is associated with each Node in the Kubernetes cluster, wherein the API Server process is an entrance process controlled by the Kubernetes cluster, the ETCD storage service stores parameter information of each pod, and the API Server directly interacts with the ETCD.

A Node is a working Node in a Kubernetes cluster, and is used to carry an allocated pod, each Node may have multiple pods, and a Node is a host of a pod, where each Node runs the following processes, and specifically includes: kubelet, Kube-proxy and pod, where each pod consists of several associated container containers.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the description of the present invention, the terms "inner", "outer", "longitudinal", "lateral", "upper", "lower", "top", "bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are for convenience only to describe the present invention without requiring the present invention to be necessarily constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.

In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Example 1:

in order to solve the problems that when the capacity expansion operation is carried out on a Kubernetes cluster in the prior art, the load of the whole cluster resource is only monitored, the node increase is directly carried out, the performance of the process is low, and more cluster resources are occupied; when the capacity reduction operation is carried out on the Kubernetes cluster, the Node nodes with low resource utilization rate are directly deleted only according to the resource utilization rate comparison of the Node nodes, and the Node nodes with the lowest comprehensive performance index are deleted without carrying out safe eviction on pod in the Node and safety. At present, the capacity expansion and contraction operation of the Kubernetes cluster is completed by manual intervention, so that the efficiency is low, and the operation and maintenance complexity is high.

The embodiment of the invention provides a Kubernets intelligent capacity expansion method, as shown in FIG. 1, comprising the following steps:

step S10, acquiring parameter information of each pod and/or operation index of each Node in the Kubernetes cluster;

in order to obtain parameter information of each pod and/or operation indexes of each Node in a Kubernetes cluster; the embodiment of the invention deploys an extended-reduced capacity service and an agent service on a Kubernetes cluster, specifically, as shown in FIG. 2, the agent service can be a Node-exporter agent service, the Node-exporter agent service is deployed on each Node in a DaemonSet mode and is used for acquiring the operation index of each Node, the Node-exporter agent service is provided and maintained by a prometheus official party, and cannot be bound and installed, and is an agent service for acquiring the operation index on a server level; the capacity expansion and reduction service is deployed on a Master Node in a depolyment mode, the capacity expansion and reduction service interacts with an API Server and a Node-exporter proxy service, parameter information of each pod is indirectly acquired from the API Server, and operation indexes of each Node are acquired from the Node-exporter proxy service; the parameter information of each pod specifically comprises state information, resource occupation amount and other information, the state information of the pod comprises that the pod is to be created, the pod is created and runs, the pod is normally terminated and the pod fails abnormally, wherein the state information of the pod is mainly stored in ETCD storage service, and the ETCD storage service is directly interacted with an API Server, so that the expansion and contraction capacity service can be directly interacted with the API Server to indirectly acquire the state information of the pod stored in the ETCD storage service; the operation indexes of each Node comprise the total amount of a Node CPU, the usage amount of the Node CPU, the total amount of a Node memory, the usage amount of the Node memory, the total amount of a Node disk, the residual amount of the Node disk, the Node load rate and the like.

Step S20, generating an actual ratio factor according to the parameter information of each pod, and judging whether capacity expansion is needed or not by comparing the actual ratio factor with a ratio factor threshold;

in this embodiment, an example in an actual scenario is provided, where after acquiring parameter information of each pod in real time from an API Server, the capacity expansion and reduction service automatically analyzes the parameter information according to information such as state information and resource occupancy of the pod to obtain an actual rate factor.

Step S201, when the capacity expansion and contraction service finds that n to-be-created pods appear in the Kubernetes cluster, and the state information of the n to-be-created pods is continuously in the pod to-be-created state within a first preset time, step S202, the sum of the resource occupation amounts of the to-be-created pods is calculated, so that an actual ratio factor is generated. The resource occupation amounts of the n pods to be created are respectively represented by the pod1, the pod2, the pod3, the pod … and the pod, and this embodiment is explained by taking the first preset time equal to 2 minutes as an example, that is, when the state information of the n pods to be created is continuously in the pod to-be-created state within 2 minutes, the total resource occupation amount (i.e., the actual ratio factor) of the pod1, the pod2, the pod3, the pod … and the pod is calculated, step S203, whether the actual ratio factor is greater than or equal to the ratio factor threshold is analyzed, and whether capacity expansion is required is determined, wherein the ratio factor threshold is represented by ratio in this embodiment. The first preset time and the ratio factor threshold value can be set according to requirements.

The actual ratio factor is calculated as:

ratio actual＝pod1+pod2+pod3+…+podn (1)

the ratio actual in equation (1) represents the actual ratio factor.

If the ratio _ actual < ratio, not triggering the capacity expansion operation;

if the ratio _ actual is larger than or equal to the ratio, the capacity expansion operation is triggered;

assuming that at a certain time, the capacity expansion and contraction service finds that 4 pod to be created appear in the kubernets cluster, and the state information of the 4 pod to be created is continuously in the pod to be created state within 2 minutes, if the resource occupation amounts pod1 ═ 1U2Gi, pod2 ═ 1U3Gi, pod3 ═ 2U3Gi, and pod4 ═ 2U6Gi of the 4 pod to be created, where U represents a cpu metric unit, and Gi represents a memory metric unit.

Then ratio actual is 1U2Gi +1U3Gi +2U3Gi +2U6Gi is 6U14 Gi;

if the ratio is 5U1Gi, the ratio actual > ratio since 6U14Gi >5U1Gi, step S205, at this time, the capacity expansion service triggers a capacity expansion operation.

If ratio is 5U26Gi, ratio actual > ratio, and the capacity expansion service triggers a capacity expansion operation, since 6U14Gi >5U26Gi (when comparing the sizes of ratio actual and ratio, cpu is used as a main indicator, i.e. cpu is large, the whole is large, and when cpu is the same, the memory size is compared).

If ratio is 6U26Gi, because 6U14Gi <6U26Gi, so ratio actual < ratio, step 204, at this time, the capacity expansion service does not trigger the capacity expansion operation, but preferentially analyzes the resource situation of each Node at present, determines whether a Node with surplus resources can be obtained after the Node in each Node at present is properly deployed to deploy a to-be-created pod (i.e., determines whether resources can be allocated or not), and deploys the to-be-created pod to a Node with surplus resources after the Node in each Node at present can be properly deployed to deploy the Node with surplus resources after the resource situation of each Node at present is analyzed, thereby realizing the maximum utilization of resources.

The specific blending mode is as follows: analyzing the resource situation of each Node, if the analysis result is that the Node with surplus resources can be obtained after the pod in each Node is properly allocated to deploy the pod to be created, then calling the management interface API Server of the Node in Kubernet cluster, in this embodiment, taking the first Node as an example (i.e. the first Node is the Node which can be reasonably allocated to obtain the Node with surplus resources to deploy the pod to be created), firstly setting the first Node as the undeployable pod mode, then using the surplus mode to dispatch a part of the suitable pod of the first Node to the Node nodes with other resources (here, the Node nodes take the second Node as an example), making the first Node have surplus resources, finally calling the management interface API Server of the Node of Kuberes, setting the first Node as the deployable network mode, so that the pod to be created can be deployed to the first Node.

If the Node nodes with surplus resources cannot be obtained by properly allocating the pod in the current Node nodes to deploy the pod to be created after analyzing the resource situation of the current Node nodes, the pod to be created is continuously in the pod to be created state, and when the sum of the resource occupation amounts (i.e. the actual rate factor) of the pod to be created continuously in the pod to be created state and the actual rate factor is greater than or equal to the rate factor threshold value, the capacity expansion service triggers the capacity expansion operation, that is, the step S205 is executed.

Step S30, generating actual value factor according to operation index of each Node, and judging whether volume reduction is needed by comparing the actual value factor with value factor threshold;

in this embodiment, an example in an actual scenario is provided, after the capacity expansion and reduction service obtains the operation indexes of each Node in real time from the Node-exporter proxy service, in step 301, an actual score factor is generated according to the operation indexes of each Node, where the operation indexes include a total amount of a Node CPU, a total amount of a Node memory, a total amount of a Node disk, a remaining amount of a Node disk, a Node load rate, and the like, and in step 302, whether or not the actual score factor is greater than or equal to a score factor threshold is analyzed to determine whether capacity reduction is required, where the actual score factor is represented by score _ actual in this embodiment, the score factor threshold is represented by score in this embodiment, and the score factor threshold can be set according to requirements.

The calculation formula of the actual score factor is as follows:

in the formula (2), cpu_tRepresenting the total amount of cpu of the node; CPU (Central processing Unit)_uRepresenting the CPU usage of the node; memory device_tRepresenting the total amount of the node memory; memory device_uRepresenting the usage amount of the node memory; disk_tRepresenting the total amount of the node disks; disk_rRepresenting the residual quantity of the node disk; LoadPressure represents a node load rate. A. B, C and D are weight values of each index of the cpu, the memory, the disk and the Node load rate, respectively, and the sum A, B, C, D is 100, wherein the weight values of each index of the cpu, the memory, the disk and the Node load rate can be set by self according to needs, and the Node load rate mainly reflects the proportion of the sum of the number of tasks currently running by the system and the number of tasks in an uninterruptible state to the maximum number of processable tasks by the system in a period of time of a server of the current Node.

If score _ actual is larger than or equal to score, capacity reduction operation is triggered;

if score actual < score, do not trigger the reduce operation;

suppose that there are currently two Node nodes in a Kubernetes cluster, a first Node and a second Node, respectively, and A, B, C and D are 30, 20 and 20, respectively, where the cpu of the first Node is at a certain time_t、cpu_u、memory_t、memory_u、disk_t、disk_rAnd LoadPressure is 16cores, 2cores, 16000M, 4000M, 40G, 10G and 60%, respectively; CPU of second Node_t、cpu_u、memory_t、memory_u、disk_t、disk_rAnd LoadPressure were 16cores, 14cores, 16000M, 14000M, 50G, 40G, and 70%, respectively.

The actual score factor of the first Node is:

the actual score factor of the second Node is:

if score 50, because the actual score factor score _ actual > score of the first Node, step S303, at this time, the capacity expansion service triggers a capacity expansion operation, first perform secure eviction on the pod in the first Node through a pod eviction policy provided by the Kubernetes cluster, so that the pod is scheduled to other Node nodes with surplus resources, and finally perform a capacity expansion operation on the first Node, where the purpose of the secure eviction is mainly to ensure that the service included in the pod in the first Node is not interrupted but runs normally on other nodes.

If score is 50, since the actual score factor score _ actual of the second Node is < score, in step S304, the capacity expansion service does not trigger the capacity reduction operation, i.e. the capacity reduction operation is not performed on the second Node.

In order to ensure the normal operation of the service contained in the pod on the Node needing capacity reduction, when the Node is judged to need capacity reduction, the pod on the Node needing capacity reduction is safely evicted and dispatched to other nodes, and finally the Node needing capacity reduction is subjected to capacity reduction operation.

When the capacity expansion and reduction service triggers the capacity reduction operation, the pod in the Node needing capacity reduction is firstly safely driven through a pod driving strategy provided by a Kubernetes cluster, so that the pod is scheduled to other Node nodes with surplus resources, and finally the capacity reduction operation is carried out on the first Node, so that the service contained in the pod in the Node needing capacity reduction is ensured to normally run in other nodes.

The acquiring parameter information of each pod and/or operation indexes of each Node in the Kubernetes cluster specifically includes: deploying proxy service on each Node in a Kubernetes cluster, wherein the proxy service is used for monitoring the operation index of each Node; and deploying a capacity expansion service on a Master Node in the Kubernetes cluster, wherein the capacity expansion service interacts with the API Server and the proxy service and is respectively used for acquiring parameter information of each pod and operation indexes of each Node in the Kubernetes cluster.

In order to obtain parameter information of each pod and/or operation indexes of each Node in a Kubernetes cluster; in the embodiment of the present invention, an extended/reduced capacity service and an agent service are deployed on a Kubernetes cluster, and specifically, as shown in fig. 2, the agent service is a Node-exporter agent service, and the Node-exporter agent service is deployed on each Node in a DaemonSet manner, and is used for acquiring an operation index of each Node; the capacity expansion and reduction service is deployed on a Master Node in a depolyment mode, interacts with an API Server and a Node-exporter proxy service, indirectly acquires parameter information of each pod from the API Server, and acquires operation indexes of each Node from the Node-exporter proxy service.

The generating of the actual ratio factor according to the parameter information of each pod specifically includes: the parameter information comprises state information and resource occupation amount, the state information comprises a pod to be created, the pod is created to finish operation, the pod is normally terminated and the pod is abnormally failed, wherein the state information of the pod is mainly stored in the ETCD storage service, and the ETCD storage service is directly interacted with the API Server, so that the expansion and contraction capacity service can be directly interacted with the API Server to indirectly acquire the state information of the pod stored in the ETCD storage service.

And when the capacity expansion and reduction service acquires the parameter information of each pod from the API Server in real time, automatically analyzing the parameter information according to the state information, the resource occupation amount and other information of the pod to obtain an actual ratio factor. When the capacity expansion and contraction service finds that n to-be-created pods appear in a Kubernetes cluster, and the state information of the n to-be-created pods is continuously in a pod to-be-created state within a first preset time, calculating the sum of the resource occupation amount of each to-be-created pod. The resource occupation amounts of the n pods to be created are respectively represented by pod1, pod2, pod3, … and pod, and the embodiment is explained by taking the first preset time equal to 2 minutes as an example, that is, when the state information of the n pods to be created is continuously in the pod to be created state within 2 minutes, the sum of the resource occupation amounts of the pod1, pod2, pod3, … and pod is calculated, so as to generate an actual ratio factor, and whether capacity expansion is needed is judged by comparing the actual ratio factor and a ratio factor threshold, wherein the actual ratio factor is represented by ratio actual in the embodiment, and the ratio factor threshold is represented by ratio in the embodiment. The first preset time and the ratio factor threshold value can be set according to requirements.

The comparing of the actual ratio factor and the ratio factor threshold value to determine whether capacity expansion is required specifically includes: when the actual ratio factor is smaller than the ratio factor threshold, the capacity expansion operation is not triggered, but the resource situation of each Node at present is preferentially analyzed, whether the Node with surplus resources can be obtained to deploy the pod to be created (i.e. whether the resources can be allocated or not) after the pod in each Node at present is appropriately allocated is judged, and if the Node with surplus resources can be obtained to deploy the pod to be created after the pod in each Node at present is analyzed, the pod to be created is deployed to the Node with surplus resources after allocation, so that the maximum utilization of the resources is realized.

Assume that the resource occupancy of a pod to be created is 4U6 Gi;

the resource surplus of the first Node is 3U6 Gi;

the resource surplus of the second Node is 3U6 Gi;

at this time, the pod to be created with the resource occupation amount of 4U6Gi cannot be deployed in both the first Node and the second Node, and will be continuously in the pod to be created state, at this time, the pod in the first Node is scheduled, and the pod running in the first Node (the resource occupation amount of the pod should be smaller than the resource residual amount of the second Node, 3U6Gi, for example, 2U2Gi) is expelled onto the second Node through a scheduling manner, then the resource residual amount of the first Node becomes 5U8Gi, and the pod to be created with the resource occupation amount of 4U6Gi can be deployed onto the first Node with surplus resources after being deployed.

And when the actual ratio factor is larger than or equal to the ratio factor threshold, triggering expansion operation, firstly adding a new Node in the Kubernetes cluster, and then deploying the pod to be created into the new Node. The specific capacity expansion operation is as follows: the Kubernetes cluster can be deployed on different cloud platforms according to different scenes, such as Array cloud, AWS and the like, therefore, when capacity expansion operation is carried out on the capacity expansion service, a provider interface of a corresponding cloud platform is called to create a new Node, components such as a Kubelet and a Kube-proxy are deployed on the created new Node, the created new Node is added to the Kubernetes cluster, and finally a pod to be created is dispatched to the created new Node to complete deployment of the pod to be created.

The adding of the new Node in the Kubernetes cluster specifically includes: and calling a provider interface of a cloud platform where Kubernetes is located to add a new Node. The state information further includes: pod creation complete run-in, pod normal termination, and pod exception failure. The operation indexes comprise one or more of total amount of node CPUs, usage amount of the node CPUs, total amount of node memories, usage amount of the node memories, total amount of node disks, residual amount of the node disks and node load rates.

In order to avoid the risk of downtime of the kubernets cluster, the Node-exporter proxy service is periodically called by the capacity expansion and contraction service to acquire the Node load rate of each Node, and in step S206, when the capacity expansion and contraction service finds that a Node with the Node load rate continuously exceeding the Node load rate threshold value in the kubernets cluster within a second preset time, capacity expansion operation is triggered, wherein the Node load rate threshold value and the second preset time can be set according to requirements.

Specifically, the capacity expansion and contraction service firstly sets a timing task, which can be set according to the requirement, and in this embodiment, once calls the Node-exporter proxy service on each Node every 30s to obtain the Node load rate on each Node, when it is found that the Node load rate on at least one Node on the Node exceeds 80% (Node load rate threshold) and the duration exceeds 5 minutes (second preset time), it is determined that the Node load rate on the Node is too high, at this time, the capacity expansion and contraction service will perform capacity expansion operation, firstly calls the provider interface of the corresponding cloud platform to create the Node, and deploys components such as Kubelet, Kube-proxy and the like on the created Node, and then shunts the pod in the Node with the too high Node load rate to the newly created Node, so as to realize shunting of pod in the Node, and reduce the pressure of the Node with the too high Node load rate, the risk of Kubernets cluster avalanche caused by downtime is avoided.

The step of judging whether the capacity reduction is needed or not by comparing the actual score factor with the score factor threshold specifically comprises the following steps: when the actual score factor is larger than or equal to the score factor threshold value, the capacity reduction operation is triggered;

when the actual value factor obtained by analysis is larger than or equal to the value factor threshold value, capacity reduction operation is triggered, safe eviction is firstly carried out on a pod expelling strategy provided by a Kubernetes cluster for the pod in the Node needing capacity reduction, so that the pod is dispatched to other Node nodes with surplus resources, then capacity reduction operation is carried out on the Node needing capacity reduction, and the purpose of safe eviction is mainly to ensure that the service contained in the pod in the Node needing capacity reduction cannot be interrupted but normally runs on other nodes.

The above is a complete process of the Kubernetes intelligent capacity expansion and reduction method provided in this embodiment, where reference is made to fig. 5 for a specific flow of the Kubernetes intelligent capacity expansion and reduction method, in the whole intelligent capacity expansion and reduction process, the capacity expansion and reduction service completely obtains the operation index of the Node and the parameter information of the real-time monitoring pod according to the periodicity, and completes the autonomous capacity expansion and reduction task after analysis and calculation. In fig. 5, the mutual exclusion tasks among the leftmost capacity reduction operation, the middle capacity expansion and then shunt operation, and the rightmost capacity expansion and then pod creation operation belong to a mutual exclusion task, only one of the tasks can be executed to the end at the same time, and after this step, the monitoring state is continued to be returned to wait for executing the next round of capacity expansion and reduction task.

In the embodiment, a Node-exporter proxy service is deployed on Node nodes of a Kubernetes cluster to monitor operation indexes of each Node, an intelligent capacity expansion and contraction service deployed on a Master Node acquires the operation indexes of each Node and parameter information of a pod, the Node is comprehensively evaluated through the operation indexes to generate an actual value factor, whether capacity reduction operation is needed or not is autonomously determined according to whether the actual value factor reaches a value factor threshold value, and when capacity reduction is needed, the pod on the Node is safely evicted, so that the pod on the capacity reduction Node can normally operate on other nodes without causing service interruption contained in the pod; and calculating an actual ratio factor according to the parameter information of the pod, and autonomously determining whether capacity expansion operation is required or not according to whether the actual ratio factor reaches a ratio factor threshold value or not. The automatic expansion and contraction of the Node nodes is completed by the real-time intelligent analysis of the expansion and contraction service in the whole cluster, so that the expansion and contraction of the Kubernetes cluster are more accurate, safe and efficient.

Example 2

For the situation in embodiment 1, when performing capacity expansion on a corresponding pod, this embodiment further provides another scenario that can be implemented, specifically, when a kubernets cluster performs capacity expansion on a certain pod in a current Node, first analyze resource conditions of each pod, safely expel a service (i.e., container in fig. 2) in the pod that needs to be expanded to the pod with surplus resources, and then perform capacity expansion operation on the pod that needs to be expanded. When the service in the pod needing capacity expansion is safely evicted to the pod with surplus resources, the data of the service which is safely evicted is kept in the pod needing capacity expansion, and the address pointer is used for establishing the mapping relation between the pod needing capacity expansion and the pod with surplus resources, so that the data in the pod needing capacity expansion and the pod with surplus resources are shared. When the safely evicted service needs to use the data in the pod needing capacity expansion, the data is directly acquired/written according to the address pointer, so that the safely evicted service is ensured to normally run without interruption.

When a pod needing to be expanded completes the expansion operation and prepares to recall the service which is safely evicted, the service which is safely evicted firstly generates a copy service and recalls the copy service to the pod which completes the expansion operation, then a temporary space is opened up on the pod which completes the expansion for storing the data generated when the recalled copy service operates, at the moment, the service which is safely evicted and the copy service operate simultaneously in the pod with surplus resources and the pod which completes the expansion operation respectively, and when the data in the temporary space and the data of the service which is safely evicted in the pod with the surplus resources are synchronous, the service which is safely evicted in the pod with the surplus resources is deleted.

Example 3

On the basis of the method for kubernets intelligent capacity expansion provided in embodiment 1, the present invention further provides a device for kubernets intelligent capacity expansion, which can be used for implementing the method, and as shown in fig. 6, the device is a schematic structural diagram of the device in the embodiment of the present invention. The kubernets intelligent capacity expansion apparatus of the present embodiment includes one or more processors 21 and a memory 22. In fig. 6, one processor 21 is taken as an example.

The processor 21 and the memory 22 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

The memory 22, which is a non-volatile computer-readable storage medium for a kubernets intelligent capacity expansion method, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as the kubernets intelligent capacity expansion method in embodiment 1. The processor 21 executes various functional applications and data processing of the kubernets intelligent capacity expansion and reduction device by running the nonvolatile software program, instructions and modules stored in the memory 22, that is, the kubernets intelligent capacity expansion and reduction method of embodiment 1 is implemented.

The memory 22 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 22 may optionally include memory located remotely from the processor 21, and these remote memories may be connected to the processor 21 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The program instructions/modules are stored in the memory 22 and, when executed by the one or more processors 21, perform the kubernets intelligent scalability method of embodiment 1 above, for example, perform the steps illustrated in fig. 1-5 described above.

Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the embodiments may be implemented by associated hardware as instructed by a program, which may be stored on a computer-readable storage medium, which may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A Kubernetes intelligent expansion and contraction method is characterized by comprising the following steps:

2. The method for Kubernets intelligent expansion and contraction capacity according to claim 1, wherein the obtaining of parameter information of each pod and/or operation indexes of each Node in the Kubernets cluster specifically includes:

3. The method of kubernets intelligent scalability according to claim 1, wherein the generating of the actual ratio factor according to the parameter information of each pod is specifically:

4. The Kubernetes intelligent capacity expansion method according to claim 3, wherein the comparison between the actual ratio factor and the ratio factor threshold value is used to determine whether capacity expansion is required, and specifically:

5. The Kubernetes intelligent capacity expansion method according to claim 4, wherein a new Node is added to the Kubernetes cluster, specifically:

6. The Kubernets intelligent capacity expansion method according to claim 3, wherein the status information further includes: pod creation complete run-in, pod normal termination, and pod exception failure.

7. The Kubernetes intelligent capacity expansion method according to claim 1, wherein the operation index comprises one or more of total amount of node CPUs, usage amount of node CPUs, total amount of node memories, usage amount of node memories, total amount of node disks, remaining amount of node disks and load rate of nodes.

8. The Kubernets intelligent scalability method according to claim 7,

and when Node nodes with the Node load rate continuously exceeding the Node load rate threshold value within second preset time appear in the Kubernetes cluster, triggering expansion operation.

9. The Kubernetes intelligent capacity expansion method according to any one of claims 1-8, wherein the comparison between the actual score factor and the score factor threshold value is used to determine whether capacity expansion is required, and specifically:

10. The Kubernetes intelligent capacity expansion device is characterized by comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor and programmed to perform the method of Kubernets intelligent scalability according to any of claims 1-9.