CN113051075A - Kubernetes intelligent capacity expansion method and device - Google Patents

Kubernetes intelligent capacity expansion method and device Download PDF

Info

Publication number
CN113051075A
CN113051075A CN202110305822.0A CN202110305822A CN113051075A CN 113051075 A CN113051075 A CN 113051075A CN 202110305822 A CN202110305822 A CN 202110305822A CN 113051075 A CN113051075 A CN 113051075A
Authority
CN
China
Prior art keywords
node
pod
capacity
kubernetes
expansion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110305822.0A
Other languages
Chinese (zh)
Other versions
CN113051075B (en
Inventor
马兵兵
侯汉祎
刘田龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fiberhome Telecommunication Technologies Co Ltd
Original Assignee
Fiberhome Telecommunication Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fiberhome Telecommunication Technologies Co Ltd filed Critical Fiberhome Telecommunication Technologies Co Ltd
Priority to CN202110305822.0A priority Critical patent/CN113051075B/en
Publication of CN113051075A publication Critical patent/CN113051075A/en
Application granted granted Critical
Publication of CN113051075B publication Critical patent/CN113051075B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of cloud computing container platforms, and provides a Kubernetes intelligent capacity expansion method and a Kubernetes intelligent capacity expansion device, which comprise the steps of obtaining parameter information of each pod and/or operation indexes of each Node in a Kubernetes cluster; generating an actual ratio factor according to the parameter information of each pod, and judging whether capacity expansion is needed or not by comparing the actual ratio factor with a ratio factor threshold; generating an actual score factor according to the operation index of each Node, and judging whether the capacity reduction is needed or not by comparing the actual score factor with a score factor threshold; when the Node nodes need to be subjected to capacity reduction, the pod on the Node nodes needing capacity reduction is safely evicted and dispatched to other nodes, and finally the Node nodes needing capacity reduction are subjected to capacity reduction operation. The automatic expansion and contraction of the Node nodes is completed by real-time intelligent analysis of the expansion and contraction capacity service in the whole cluster, and the expansion and contraction capacity service can enable the expansion and contraction capacity of the Kubernetes cluster to be more accurate and efficient.

Description

Kubernetes intelligent capacity expansion method and device
[ technical field ] A method for producing a semiconductor device
The invention relates to the technical field of cloud computing container platforms, in particular to a Kubernetes intelligent capacity expansion method and device.
[ background of the invention ]
Kubernetes is a container cluster management system opened by Google in 2014, abbreviated as k8s, is a cloud product used by Google for nearly 20 years, is an open-source version of Borg, has attracted wide attention in the industry because of high maturity in the early stage of birth, and is rapidly becoming the mainstream of container arrangement tools. As a complete distributed system supporting platform, a series of complete functions of deployment operation, resource scheduling, service discovery, dynamic expansion and the like are provided for containerized application, and convenience in large-scale container cluster management is improved.
In the aspect of cluster management, Kubernetes divides machines in a cluster into a Master Node and a cluster working Node, a group of processes related to cluster management are operated on the Master Node, the Node serves as a working Node in the cluster and operates a real application program, and a minimum operation unit of Kubernetes management on the Node is pod. When deployed services in a cluster are exponentially increased, Node nodes in the Kubernetes cluster must be transversely expanded, and when the services in the cluster are reduced, the Node nodes in the cluster are subjected to capacity reduction operation, so that life cycle management of the Node nodes becomes an important link in the whole cluster management, a large workload is brought to operation and maintenance personnel in the process, the operation and maintenance difficulty is increased, and the stability of a cluster system faces potential challenges.
In view of the above, overcoming the drawbacks of the prior art is an urgent problem in the art.
[ summary of the invention ]
The technical problem to be solved by the invention is as follows:
when the prior art carries out capacity expansion operation on a Kubernetes cluster, only the load of the whole cluster resource is monitored, and the node increase is directly carried out, so that the process performance is lower, and more cluster resources are occupied; when the capacity reduction operation is carried out on the Kubernetes cluster, the Node nodes with low resource utilization rate are directly deleted only according to the resource utilization rate comparison of the Node nodes, and the Node nodes with the lowest comprehensive performance index are deleted without carrying out safe eviction on pod in the Node and safety. At present, the capacity expansion and contraction operation of the Kubernetes cluster is completed by manual intervention, so that the efficiency is low, and the operation and maintenance complexity is high.
The invention achieves the above purpose by the following technical scheme:
in a first aspect, the present invention provides a kubernets intelligent capacity expansion method, including:
acquiring parameter information of each pod and/or operation indexes of each Node in a Kubernetes cluster;
generating an actual ratio factor according to the parameter information of each pod, and judging whether capacity expansion is needed or not by comparing the actual ratio factor with a ratio factor threshold;
generating an actual score factor according to the operation index of each Node, and judging whether the capacity reduction is needed or not by comparing the actual score factor with a score factor threshold;
when the Node nodes need to be subjected to capacity reduction, the pod on the Node nodes needing capacity reduction is safely evicted and dispatched to other nodes, and finally the Node nodes needing capacity reduction are subjected to capacity reduction operation.
Preferably, the acquiring parameter information of each pod and/or operation index of each Node in the Kubernetes cluster specifically includes:
deploying proxy service on each Node in a Kubernetes cluster, wherein the proxy service is used for monitoring the operation index of each Node;
and deploying a capacity expansion service on a Master Node in the Kubernetes cluster, wherein the capacity expansion service interacts with the API Server and the proxy service and is respectively used for acquiring parameter information of each pod and operation indexes of each Node in the Kubernetes cluster.
Preferably, the generating an actual ratio factor according to the parameter information of each pod specifically includes:
the parameter information comprises state information and resource occupation amount;
when the pod to be created appears in the Kubernetes cluster, and the state information of the pod to be created is continuously in the pod to be created state in the first preset time, calculating the sum of the occupied resource amount of each pod to be created, so as to generate an actual ratio factor.
Preferably, the determining whether capacity expansion is required by comparing the actual ratio factor with a ratio factor threshold specifically includes:
when the actual ratio factor is smaller than the ratio factor threshold, not triggering the capacity expansion operation, firstly allocating the pod in the current Node, and then deploying the pod to be created to the Node with surplus resources;
and when the actual ratio factor is larger than or equal to the ratio factor threshold, triggering expansion operation, firstly adding a new Node in the Kubernetes cluster, and then deploying the pod to be created into the new Node.
Preferably, the adding of the new Node in the Kubernetes cluster specifically includes:
and calling a provider interface of a cloud platform where the Kubernetes cluster is located to add a new Node.
Preferably, the status information further includes: pod creation complete run-in, pod normal termination, and pod exception failure.
Preferably, the operation index includes one or more of a total amount of the node CPUs, a total amount of the node memories, a total amount of the node disks, a remaining amount of the node disks, and a load rate of the nodes.
Preferably, when a Node with a Node load rate continuously exceeding the Node load rate threshold value within a second preset time appears in the Kubernetes cluster, the capacity expansion operation is triggered.
Preferably, the step of comparing the actual score factor with the score factor threshold value to determine whether the reduction is required is specifically as follows:
when the actual score factor is larger than or equal to the score factor threshold value, the capacity reduction operation is triggered;
and when the actual score factor is smaller than the score factor threshold value, not triggering the capacity reduction operation.
In a second aspect, the present invention further provides a kubernets intelligent capacity expansion device, which includes at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor and programmed to perform the method of kubernets intelligent scalability according to the first aspect.
Compared with the prior art, the invention has the beneficial effects that:
the invention mainly monitors the operation index of each Node by deploying Node-exporter proxy service on the Node of Kubernetes cluster, obtains the operation index and pod parameter information of each Node by intelligent capacity expansion and contraction service deployed on Master Node, comprehensively evaluates the Node by the operation index, generates actual value factor, autonomously decides whether capacity reduction operation is needed according to whether the actual value factor reaches the value factor threshold, and safely expels pod on the Node when capacity reduction is needed, so that pod on the capacity reduction Node can normally operate on other nodes without causing service interruption contained in pod; and calculating an actual ratio factor according to the parameter information of the pod, and autonomously determining whether capacity expansion operation is required or not according to whether the actual ratio factor reaches a ratio factor threshold value or not. The automatic expansion and contraction of the Node nodes is completed by the real-time intelligent analysis of the expansion and contraction service in the whole cluster, so that the expansion and contraction of the Kubernetes cluster are more accurate, safe and efficient.
[ description of the drawings ]
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 is a flowchart of a Kubernetes intelligent capacity expansion method according to an embodiment of the present invention;
fig. 2 is an architecture diagram of Kubernetes intelligent scalability according to an embodiment of the present invention;
fig. 3 is a flowchart of a Kubernetes intelligent capacity expansion method according to an embodiment of the present invention;
fig. 4 is a flowchart of a Kubernetes intelligent capacity expansion method according to an embodiment of the present invention;
fig. 5 is an overall flowchart of a Kubernetes intelligent capacity expansion method according to an embodiment of the present invention;
fig. 6 is an architecture diagram of a Kubernetes intelligent scalable device according to an embodiment of the present invention.
[ detailed description ] embodiments
Kubernetes divides machines in a cluster into a Master Node and a cluster working Node, the Master Node is a management Node of the Kubernetes cluster, is provided with ETCD storage service (the service is optional service), runs an API Server process, a Controller Manager service process and a Scheduler service process, and is associated with each Node in the Kubernetes cluster, wherein the API Server process is an entrance process controlled by the Kubernetes cluster, the ETCD storage service stores parameter information of each pod, and the API Server directly interacts with the ETCD.
A Node is a working Node in a Kubernetes cluster, and is used to carry an allocated pod, each Node may have multiple pods, and a Node is a host of a pod, where each Node runs the following processes, and specifically includes: kubelet, Kube-proxy and pod, where each pod consists of several associated container containers.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the description of the present invention, the terms "inner", "outer", "longitudinal", "lateral", "upper", "lower", "top", "bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are for convenience only to describe the present invention without requiring the present invention to be necessarily constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Example 1:
in order to solve the problems that when the capacity expansion operation is carried out on a Kubernetes cluster in the prior art, the load of the whole cluster resource is only monitored, the node increase is directly carried out, the performance of the process is low, and more cluster resources are occupied; when the capacity reduction operation is carried out on the Kubernetes cluster, the Node nodes with low resource utilization rate are directly deleted only according to the resource utilization rate comparison of the Node nodes, and the Node nodes with the lowest comprehensive performance index are deleted without carrying out safe eviction on pod in the Node and safety. At present, the capacity expansion and contraction operation of the Kubernetes cluster is completed by manual intervention, so that the efficiency is low, and the operation and maintenance complexity is high.
The embodiment of the invention provides a Kubernets intelligent capacity expansion method, as shown in FIG. 1, comprising the following steps:
step S10, acquiring parameter information of each pod and/or operation index of each Node in the Kubernetes cluster;
in order to obtain parameter information of each pod and/or operation indexes of each Node in a Kubernetes cluster; the embodiment of the invention deploys an extended-reduced capacity service and an agent service on a Kubernetes cluster, specifically, as shown in FIG. 2, the agent service can be a Node-exporter agent service, the Node-exporter agent service is deployed on each Node in a DaemonSet mode and is used for acquiring the operation index of each Node, the Node-exporter agent service is provided and maintained by a prometheus official party, and cannot be bound and installed, and is an agent service for acquiring the operation index on a server level; the capacity expansion and reduction service is deployed on a Master Node in a depolyment mode, the capacity expansion and reduction service interacts with an API Server and a Node-exporter proxy service, parameter information of each pod is indirectly acquired from the API Server, and operation indexes of each Node are acquired from the Node-exporter proxy service; the parameter information of each pod specifically comprises state information, resource occupation amount and other information, the state information of the pod comprises that the pod is to be created, the pod is created and runs, the pod is normally terminated and the pod fails abnormally, wherein the state information of the pod is mainly stored in ETCD storage service, and the ETCD storage service is directly interacted with an API Server, so that the expansion and contraction capacity service can be directly interacted with the API Server to indirectly acquire the state information of the pod stored in the ETCD storage service; the operation indexes of each Node comprise the total amount of a Node CPU, the usage amount of the Node CPU, the total amount of a Node memory, the usage amount of the Node memory, the total amount of a Node disk, the residual amount of the Node disk, the Node load rate and the like.
Step S20, generating an actual ratio factor according to the parameter information of each pod, and judging whether capacity expansion is needed or not by comparing the actual ratio factor with a ratio factor threshold;
in this embodiment, an example in an actual scenario is provided, where after acquiring parameter information of each pod in real time from an API Server, the capacity expansion and reduction service automatically analyzes the parameter information according to information such as state information and resource occupancy of the pod to obtain an actual rate factor.
Step S201, when the capacity expansion and contraction service finds that n to-be-created pods appear in the Kubernetes cluster, and the state information of the n to-be-created pods is continuously in the pod to-be-created state within a first preset time, step S202, the sum of the resource occupation amounts of the to-be-created pods is calculated, so that an actual ratio factor is generated. The resource occupation amounts of the n pods to be created are respectively represented by the pod1, the pod2, the pod3, the pod … and the pod, and this embodiment is explained by taking the first preset time equal to 2 minutes as an example, that is, when the state information of the n pods to be created is continuously in the pod to-be-created state within 2 minutes, the total resource occupation amount (i.e., the actual ratio factor) of the pod1, the pod2, the pod3, the pod … and the pod is calculated, step S203, whether the actual ratio factor is greater than or equal to the ratio factor threshold is analyzed, and whether capacity expansion is required is determined, wherein the ratio factor threshold is represented by ratio in this embodiment. The first preset time and the ratio factor threshold value can be set according to requirements.
The actual ratio factor is calculated as:
ratio actual=pod1+pod2+pod3+…+podn (1)
the ratio actual in equation (1) represents the actual ratio factor.
If the ratio _ actual < ratio, not triggering the capacity expansion operation;
if the ratio _ actual is larger than or equal to the ratio, the capacity expansion operation is triggered;
assuming that at a certain time, the capacity expansion and contraction service finds that 4 pod to be created appear in the kubernets cluster, and the state information of the 4 pod to be created is continuously in the pod to be created state within 2 minutes, if the resource occupation amounts pod1 ═ 1U2Gi, pod2 ═ 1U3Gi, pod3 ═ 2U3Gi, and pod4 ═ 2U6Gi of the 4 pod to be created, where U represents a cpu metric unit, and Gi represents a memory metric unit.
Then ratio actual is 1U2Gi +1U3Gi +2U3Gi +2U6Gi is 6U14 Gi;
if the ratio is 5U1Gi, the ratio actual > ratio since 6U14Gi >5U1Gi, step S205, at this time, the capacity expansion service triggers a capacity expansion operation.
If ratio is 5U26Gi, ratio actual > ratio, and the capacity expansion service triggers a capacity expansion operation, since 6U14Gi >5U26Gi (when comparing the sizes of ratio actual and ratio, cpu is used as a main indicator, i.e. cpu is large, the whole is large, and when cpu is the same, the memory size is compared).
If ratio is 6U26Gi, because 6U14Gi <6U26Gi, so ratio actual < ratio, step 204, at this time, the capacity expansion service does not trigger the capacity expansion operation, but preferentially analyzes the resource situation of each Node at present, determines whether a Node with surplus resources can be obtained after the Node in each Node at present is properly deployed to deploy a to-be-created pod (i.e., determines whether resources can be allocated or not), and deploys the to-be-created pod to a Node with surplus resources after the Node in each Node at present can be properly deployed to deploy the Node with surplus resources after the resource situation of each Node at present is analyzed, thereby realizing the maximum utilization of resources.
The specific blending mode is as follows: analyzing the resource situation of each Node, if the analysis result is that the Node with surplus resources can be obtained after the pod in each Node is properly allocated to deploy the pod to be created, then calling the management interface API Server of the Node in Kubernet cluster, in this embodiment, taking the first Node as an example (i.e. the first Node is the Node which can be reasonably allocated to obtain the Node with surplus resources to deploy the pod to be created), firstly setting the first Node as the undeployable pod mode, then using the surplus mode to dispatch a part of the suitable pod of the first Node to the Node nodes with other resources (here, the Node nodes take the second Node as an example), making the first Node have surplus resources, finally calling the management interface API Server of the Node of Kuberes, setting the first Node as the deployable network mode, so that the pod to be created can be deployed to the first Node.
If the Node nodes with surplus resources cannot be obtained by properly allocating the pod in the current Node nodes to deploy the pod to be created after analyzing the resource situation of the current Node nodes, the pod to be created is continuously in the pod to be created state, and when the sum of the resource occupation amounts (i.e. the actual rate factor) of the pod to be created continuously in the pod to be created state and the actual rate factor is greater than or equal to the rate factor threshold value, the capacity expansion service triggers the capacity expansion operation, that is, the step S205 is executed.
Step S30, generating actual value factor according to operation index of each Node, and judging whether volume reduction is needed by comparing the actual value factor with value factor threshold;
in this embodiment, an example in an actual scenario is provided, after the capacity expansion and reduction service obtains the operation indexes of each Node in real time from the Node-exporter proxy service, in step 301, an actual score factor is generated according to the operation indexes of each Node, where the operation indexes include a total amount of a Node CPU, a total amount of a Node memory, a total amount of a Node disk, a remaining amount of a Node disk, a Node load rate, and the like, and in step 302, whether or not the actual score factor is greater than or equal to a score factor threshold is analyzed to determine whether capacity reduction is required, where the actual score factor is represented by score _ actual in this embodiment, the score factor threshold is represented by score in this embodiment, and the score factor threshold can be set according to requirements.
The calculation formula of the actual score factor is as follows:
Figure BDA0002987726100000091
in the formula (2), cputRepresenting the total amount of cpu of the node; CPU (Central processing Unit)uRepresenting the CPU usage of the node; memory devicetRepresenting the total amount of the node memory; memory deviceuRepresenting the usage amount of the node memory; disktRepresenting the total amount of the node disks; diskrRepresenting the residual quantity of the node disk; LoadPressure represents a node load rate. A. B, C and D are weight values of each index of the cpu, the memory, the disk and the Node load rate, respectively, and the sum A, B, C, D is 100, wherein the weight values of each index of the cpu, the memory, the disk and the Node load rate can be set by self according to needs, and the Node load rate mainly reflects the proportion of the sum of the number of tasks currently running by the system and the number of tasks in an uninterruptible state to the maximum number of processable tasks by the system in a period of time of a server of the current Node.
If score _ actual is larger than or equal to score, capacity reduction operation is triggered;
if score actual < score, do not trigger the reduce operation;
suppose that there are currently two Node nodes in a Kubernetes cluster, a first Node and a second Node, respectively, and A, B, C and D are 30, 20 and 20, respectively, where the cpu of the first Node is at a certain timet、cpuu、memoryt、memoryu、diskt、diskrAnd LoadPressure is 16cores, 2cores, 16000M, 4000M, 40G, 10G and 60%, respectively; CPU of second Nodet、cpuu、memoryt、memoryu、diskt、diskrAnd LoadPressure were 16cores, 14cores, 16000M, 14000M, 50G, 40G, and 70%, respectively.
The actual score factor of the first Node is:
Figure BDA0002987726100000101
the actual score factor of the second Node is:
Figure BDA0002987726100000102
if score 50, because the actual score factor score _ actual > score of the first Node, step S303, at this time, the capacity expansion service triggers a capacity expansion operation, first perform secure eviction on the pod in the first Node through a pod eviction policy provided by the Kubernetes cluster, so that the pod is scheduled to other Node nodes with surplus resources, and finally perform a capacity expansion operation on the first Node, where the purpose of the secure eviction is mainly to ensure that the service included in the pod in the first Node is not interrupted but runs normally on other nodes.
If score is 50, since the actual score factor score _ actual of the second Node is < score, in step S304, the capacity expansion service does not trigger the capacity reduction operation, i.e. the capacity reduction operation is not performed on the second Node.
In order to ensure the normal operation of the service contained in the pod on the Node needing capacity reduction, when the Node is judged to need capacity reduction, the pod on the Node needing capacity reduction is safely evicted and dispatched to other nodes, and finally the Node needing capacity reduction is subjected to capacity reduction operation.
When the capacity expansion and reduction service triggers the capacity reduction operation, the pod in the Node needing capacity reduction is firstly safely driven through a pod driving strategy provided by a Kubernetes cluster, so that the pod is scheduled to other Node nodes with surplus resources, and finally the capacity reduction operation is carried out on the first Node, so that the service contained in the pod in the Node needing capacity reduction is ensured to normally run in other nodes.
The acquiring parameter information of each pod and/or operation indexes of each Node in the Kubernetes cluster specifically includes: deploying proxy service on each Node in a Kubernetes cluster, wherein the proxy service is used for monitoring the operation index of each Node; and deploying a capacity expansion service on a Master Node in the Kubernetes cluster, wherein the capacity expansion service interacts with the API Server and the proxy service and is respectively used for acquiring parameter information of each pod and operation indexes of each Node in the Kubernetes cluster.
In order to obtain parameter information of each pod and/or operation indexes of each Node in a Kubernetes cluster; in the embodiment of the present invention, an extended/reduced capacity service and an agent service are deployed on a Kubernetes cluster, and specifically, as shown in fig. 2, the agent service is a Node-exporter agent service, and the Node-exporter agent service is deployed on each Node in a DaemonSet manner, and is used for acquiring an operation index of each Node; the capacity expansion and reduction service is deployed on a Master Node in a depolyment mode, interacts with an API Server and a Node-exporter proxy service, indirectly acquires parameter information of each pod from the API Server, and acquires operation indexes of each Node from the Node-exporter proxy service.
The generating of the actual ratio factor according to the parameter information of each pod specifically includes: the parameter information comprises state information and resource occupation amount, the state information comprises a pod to be created, the pod is created to finish operation, the pod is normally terminated and the pod is abnormally failed, wherein the state information of the pod is mainly stored in the ETCD storage service, and the ETCD storage service is directly interacted with the API Server, so that the expansion and contraction capacity service can be directly interacted with the API Server to indirectly acquire the state information of the pod stored in the ETCD storage service.
When the pod to be created appears in the Kubernetes cluster, and the state information of the pod to be created is continuously in the pod to be created state in the first preset time, calculating the sum of the occupied resource amount of each pod to be created, so as to generate an actual ratio factor.
And when the capacity expansion and reduction service acquires the parameter information of each pod from the API Server in real time, automatically analyzing the parameter information according to the state information, the resource occupation amount and other information of the pod to obtain an actual ratio factor. When the capacity expansion and contraction service finds that n to-be-created pods appear in a Kubernetes cluster, and the state information of the n to-be-created pods is continuously in a pod to-be-created state within a first preset time, calculating the sum of the resource occupation amount of each to-be-created pod. The resource occupation amounts of the n pods to be created are respectively represented by pod1, pod2, pod3, … and pod, and the embodiment is explained by taking the first preset time equal to 2 minutes as an example, that is, when the state information of the n pods to be created is continuously in the pod to be created state within 2 minutes, the sum of the resource occupation amounts of the pod1, pod2, pod3, … and pod is calculated, so as to generate an actual ratio factor, and whether capacity expansion is needed is judged by comparing the actual ratio factor and a ratio factor threshold, wherein the actual ratio factor is represented by ratio actual in the embodiment, and the ratio factor threshold is represented by ratio in the embodiment. The first preset time and the ratio factor threshold value can be set according to requirements.
The comparing of the actual ratio factor and the ratio factor threshold value to determine whether capacity expansion is required specifically includes: when the actual ratio factor is smaller than the ratio factor threshold, the capacity expansion operation is not triggered, but the resource situation of each Node at present is preferentially analyzed, whether the Node with surplus resources can be obtained to deploy the pod to be created (i.e. whether the resources can be allocated or not) after the pod in each Node at present is appropriately allocated is judged, and if the Node with surplus resources can be obtained to deploy the pod to be created after the pod in each Node at present is analyzed, the pod to be created is deployed to the Node with surplus resources after allocation, so that the maximum utilization of the resources is realized.
The specific blending mode is as follows: analyzing the resource situation of each Node, if the analysis result is that the Node with surplus resources can be obtained after the pod in each Node is properly allocated to deploy the pod to be created, then calling the management interface API Server of the Node in Kubernet cluster, in this embodiment, taking the first Node as an example (i.e. the first Node is the Node which can be reasonably allocated to obtain the Node with surplus resources to deploy the pod to be created), firstly setting the first Node as the undeployable pod mode, then using the surplus mode to dispatch a part of the suitable pod of the first Node to the Node nodes with other resources (here, the Node nodes take the second Node as an example), making the first Node have surplus resources, finally calling the management interface API Server of the Node of Kuberes, setting the first Node as the deployable network mode, so that the pod to be created can be deployed to the first Node.
Assume that the resource occupancy of a pod to be created is 4U6 Gi;
the resource surplus of the first Node is 3U6 Gi;
the resource surplus of the second Node is 3U6 Gi;
at this time, the pod to be created with the resource occupation amount of 4U6Gi cannot be deployed in both the first Node and the second Node, and will be continuously in the pod to be created state, at this time, the pod in the first Node is scheduled, and the pod running in the first Node (the resource occupation amount of the pod should be smaller than the resource residual amount of the second Node, 3U6Gi, for example, 2U2Gi) is expelled onto the second Node through a scheduling manner, then the resource residual amount of the first Node becomes 5U8Gi, and the pod to be created with the resource occupation amount of 4U6Gi can be deployed onto the first Node with surplus resources after being deployed.
If the Node nodes with surplus resources cannot be obtained by properly allocating the pod in the current Node nodes to deploy the pod to be created after analyzing the resource situation of the current Node nodes, the pod to be created is continuously in the pod to be created state, and when the sum of the resource occupation amounts (i.e. the actual rate factor) of the pod to be created continuously in the pod to be created state and the actual rate factor is greater than or equal to the rate factor threshold value, the capacity expansion service triggers the capacity expansion operation, that is, the step S205 is executed.
And when the actual ratio factor is larger than or equal to the ratio factor threshold, triggering expansion operation, firstly adding a new Node in the Kubernetes cluster, and then deploying the pod to be created into the new Node. The specific capacity expansion operation is as follows: the Kubernetes cluster can be deployed on different cloud platforms according to different scenes, such as Array cloud, AWS and the like, therefore, when capacity expansion operation is carried out on the capacity expansion service, a provider interface of a corresponding cloud platform is called to create a new Node, components such as a Kubelet and a Kube-proxy are deployed on the created new Node, the created new Node is added to the Kubernetes cluster, and finally a pod to be created is dispatched to the created new Node to complete deployment of the pod to be created.
The adding of the new Node in the Kubernetes cluster specifically includes: and calling a provider interface of a cloud platform where Kubernetes is located to add a new Node. The state information further includes: pod creation complete run-in, pod normal termination, and pod exception failure. The operation indexes comprise one or more of total amount of node CPUs, usage amount of the node CPUs, total amount of node memories, usage amount of the node memories, total amount of node disks, residual amount of the node disks and node load rates.
In order to avoid the risk of downtime of the kubernets cluster, the Node-exporter proxy service is periodically called by the capacity expansion and contraction service to acquire the Node load rate of each Node, and in step S206, when the capacity expansion and contraction service finds that a Node with the Node load rate continuously exceeding the Node load rate threshold value in the kubernets cluster within a second preset time, capacity expansion operation is triggered, wherein the Node load rate threshold value and the second preset time can be set according to requirements.
Specifically, the capacity expansion and contraction service firstly sets a timing task, which can be set according to the requirement, and in this embodiment, once calls the Node-exporter proxy service on each Node every 30s to obtain the Node load rate on each Node, when it is found that the Node load rate on at least one Node on the Node exceeds 80% (Node load rate threshold) and the duration exceeds 5 minutes (second preset time), it is determined that the Node load rate on the Node is too high, at this time, the capacity expansion and contraction service will perform capacity expansion operation, firstly calls the provider interface of the corresponding cloud platform to create the Node, and deploys components such as Kubelet, Kube-proxy and the like on the created Node, and then shunts the pod in the Node with the too high Node load rate to the newly created Node, so as to realize shunting of pod in the Node, and reduce the pressure of the Node with the too high Node load rate, the risk of Kubernets cluster avalanche caused by downtime is avoided.
The step of judging whether the capacity reduction is needed or not by comparing the actual score factor with the score factor threshold specifically comprises the following steps: when the actual score factor is larger than or equal to the score factor threshold value, the capacity reduction operation is triggered;
when the actual value factor obtained by analysis is larger than or equal to the value factor threshold value, capacity reduction operation is triggered, safe eviction is firstly carried out on a pod expelling strategy provided by a Kubernetes cluster for the pod in the Node needing capacity reduction, so that the pod is dispatched to other Node nodes with surplus resources, then capacity reduction operation is carried out on the Node needing capacity reduction, and the purpose of safe eviction is mainly to ensure that the service contained in the pod in the Node needing capacity reduction cannot be interrupted but normally runs on other nodes.
And when the actual score factor is smaller than the score factor threshold value, not triggering the capacity reduction operation.
The above is a complete process of the Kubernetes intelligent capacity expansion and reduction method provided in this embodiment, where reference is made to fig. 5 for a specific flow of the Kubernetes intelligent capacity expansion and reduction method, in the whole intelligent capacity expansion and reduction process, the capacity expansion and reduction service completely obtains the operation index of the Node and the parameter information of the real-time monitoring pod according to the periodicity, and completes the autonomous capacity expansion and reduction task after analysis and calculation. In fig. 5, the mutual exclusion tasks among the leftmost capacity reduction operation, the middle capacity expansion and then shunt operation, and the rightmost capacity expansion and then pod creation operation belong to a mutual exclusion task, only one of the tasks can be executed to the end at the same time, and after this step, the monitoring state is continued to be returned to wait for executing the next round of capacity expansion and reduction task.
In the embodiment, a Node-exporter proxy service is deployed on Node nodes of a Kubernetes cluster to monitor operation indexes of each Node, an intelligent capacity expansion and contraction service deployed on a Master Node acquires the operation indexes of each Node and parameter information of a pod, the Node is comprehensively evaluated through the operation indexes to generate an actual value factor, whether capacity reduction operation is needed or not is autonomously determined according to whether the actual value factor reaches a value factor threshold value, and when capacity reduction is needed, the pod on the Node is safely evicted, so that the pod on the capacity reduction Node can normally operate on other nodes without causing service interruption contained in the pod; and calculating an actual ratio factor according to the parameter information of the pod, and autonomously determining whether capacity expansion operation is required or not according to whether the actual ratio factor reaches a ratio factor threshold value or not. The automatic expansion and contraction of the Node nodes is completed by the real-time intelligent analysis of the expansion and contraction service in the whole cluster, so that the expansion and contraction of the Kubernetes cluster are more accurate, safe and efficient.
Example 2
For the situation in embodiment 1, when performing capacity expansion on a corresponding pod, this embodiment further provides another scenario that can be implemented, specifically, when a kubernets cluster performs capacity expansion on a certain pod in a current Node, first analyze resource conditions of each pod, safely expel a service (i.e., container in fig. 2) in the pod that needs to be expanded to the pod with surplus resources, and then perform capacity expansion operation on the pod that needs to be expanded. When the service in the pod needing capacity expansion is safely evicted to the pod with surplus resources, the data of the service which is safely evicted is kept in the pod needing capacity expansion, and the address pointer is used for establishing the mapping relation between the pod needing capacity expansion and the pod with surplus resources, so that the data in the pod needing capacity expansion and the pod with surplus resources are shared. When the safely evicted service needs to use the data in the pod needing capacity expansion, the data is directly acquired/written according to the address pointer, so that the safely evicted service is ensured to normally run without interruption.
When a pod needing to be expanded completes the expansion operation and prepares to recall the service which is safely evicted, the service which is safely evicted firstly generates a copy service and recalls the copy service to the pod which completes the expansion operation, then a temporary space is opened up on the pod which completes the expansion for storing the data generated when the recalled copy service operates, at the moment, the service which is safely evicted and the copy service operate simultaneously in the pod with surplus resources and the pod which completes the expansion operation respectively, and when the data in the temporary space and the data of the service which is safely evicted in the pod with the surplus resources are synchronous, the service which is safely evicted in the pod with the surplus resources is deleted.
Example 3
On the basis of the method for kubernets intelligent capacity expansion provided in embodiment 1, the present invention further provides a device for kubernets intelligent capacity expansion, which can be used for implementing the method, and as shown in fig. 6, the device is a schematic structural diagram of the device in the embodiment of the present invention. The kubernets intelligent capacity expansion apparatus of the present embodiment includes one or more processors 21 and a memory 22. In fig. 6, one processor 21 is taken as an example.
The processor 21 and the memory 22 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.
The memory 22, which is a non-volatile computer-readable storage medium for a kubernets intelligent capacity expansion method, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as the kubernets intelligent capacity expansion method in embodiment 1. The processor 21 executes various functional applications and data processing of the kubernets intelligent capacity expansion and reduction device by running the nonvolatile software program, instructions and modules stored in the memory 22, that is, the kubernets intelligent capacity expansion and reduction method of embodiment 1 is implemented.
The memory 22 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 22 may optionally include memory located remotely from the processor 21, and these remote memories may be connected to the processor 21 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The program instructions/modules are stored in the memory 22 and, when executed by the one or more processors 21, perform the kubernets intelligent scalability method of embodiment 1 above, for example, perform the steps illustrated in fig. 1-5 described above.
Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the embodiments may be implemented by associated hardware as instructed by a program, which may be stored on a computer-readable storage medium, which may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A Kubernetes intelligent expansion and contraction method is characterized by comprising the following steps:
acquiring parameter information of each pod and/or operation indexes of each Node in a Kubernetes cluster;
generating an actual ratio factor according to the parameter information of each pod, and judging whether capacity expansion is needed or not by comparing the actual ratio factor with a ratio factor threshold;
generating an actual score factor according to the operation index of each Node, and judging whether the capacity reduction is needed or not by comparing the actual score factor with a score factor threshold;
when the Node nodes need to be subjected to capacity reduction, the pod on the Node nodes needing capacity reduction is safely evicted and dispatched to other nodes, and finally the Node nodes needing capacity reduction are subjected to capacity reduction operation.
2. The method for Kubernets intelligent expansion and contraction capacity according to claim 1, wherein the obtaining of parameter information of each pod and/or operation indexes of each Node in the Kubernets cluster specifically includes:
deploying proxy service on each Node in a Kubernetes cluster, wherein the proxy service is used for monitoring the operation index of each Node;
and deploying a capacity expansion service on a Master Node in the Kubernetes cluster, wherein the capacity expansion service interacts with the API Server and the proxy service and is respectively used for acquiring parameter information of each pod and operation indexes of each Node in the Kubernetes cluster.
3. The method of kubernets intelligent scalability according to claim 1, wherein the generating of the actual ratio factor according to the parameter information of each pod is specifically:
the parameter information comprises state information and resource occupation amount;
when the pod to be created appears in the Kubernetes cluster, and the state information of the pod to be created is continuously in the pod to be created state in the first preset time, calculating the sum of the occupied resource amount of each pod to be created, so as to generate an actual ratio factor.
4. The Kubernetes intelligent capacity expansion method according to claim 3, wherein the comparison between the actual ratio factor and the ratio factor threshold value is used to determine whether capacity expansion is required, and specifically:
when the actual ratio factor is smaller than the ratio factor threshold, not triggering the capacity expansion operation, firstly allocating the pod in the current Node, and then deploying the pod to be created to the Node with surplus resources;
and when the actual ratio factor is larger than or equal to the ratio factor threshold, triggering expansion operation, firstly adding a new Node in the Kubernetes cluster, and then deploying the pod to be created into the new Node.
5. The Kubernetes intelligent capacity expansion method according to claim 4, wherein a new Node is added to the Kubernetes cluster, specifically:
and calling a provider interface of a cloud platform where the Kubernetes cluster is located to add a new Node.
6. The Kubernets intelligent capacity expansion method according to claim 3, wherein the status information further includes: pod creation complete run-in, pod normal termination, and pod exception failure.
7. The Kubernetes intelligent capacity expansion method according to claim 1, wherein the operation index comprises one or more of total amount of node CPUs, usage amount of node CPUs, total amount of node memories, usage amount of node memories, total amount of node disks, remaining amount of node disks and load rate of nodes.
8. The Kubernets intelligent scalability method according to claim 7,
and when Node nodes with the Node load rate continuously exceeding the Node load rate threshold value within second preset time appear in the Kubernetes cluster, triggering expansion operation.
9. The Kubernetes intelligent capacity expansion method according to any one of claims 1-8, wherein the comparison between the actual score factor and the score factor threshold value is used to determine whether capacity expansion is required, and specifically:
when the actual score factor is larger than or equal to the score factor threshold value, the capacity reduction operation is triggered;
and when the actual score factor is smaller than the score factor threshold value, not triggering the capacity reduction operation.
10. The Kubernetes intelligent capacity expansion device is characterized by comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor and programmed to perform the method of Kubernets intelligent scalability according to any of claims 1-9.
CN202110305822.0A 2021-03-23 2021-03-23 Kubernetes intelligent capacity expansion method and device Active CN113051075B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110305822.0A CN113051075B (en) 2021-03-23 2021-03-23 Kubernetes intelligent capacity expansion method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110305822.0A CN113051075B (en) 2021-03-23 2021-03-23 Kubernetes intelligent capacity expansion method and device

Publications (2)

Publication Number Publication Date
CN113051075A true CN113051075A (en) 2021-06-29
CN113051075B CN113051075B (en) 2022-09-09

Family

ID=76514338

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110305822.0A Active CN113051075B (en) 2021-03-23 2021-03-23 Kubernetes intelligent capacity expansion method and device

Country Status (1)

Country Link
CN (1) CN113051075B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112882794A (en) * 2021-02-25 2021-06-01 重庆紫光华山智安科技有限公司 pod capacity expansion method, device, node and storage medium
CN113806010A (en) * 2021-08-13 2021-12-17 济南浪潮数据技术有限公司 Dynamic adjustment method, device and medium for service personalized configuration
CN114168071A (en) * 2021-10-29 2022-03-11 济南浪潮数据技术有限公司 Distributed cluster capacity expansion method, distributed cluster capacity expansion device and medium
CN114296909A (en) * 2021-12-02 2022-04-08 新浪网技术(中国)有限公司 Automatic node capacity expansion and reduction method and system according to kubernets event
CN114356557A (en) * 2021-12-16 2022-04-15 北京穿杨科技有限公司 Cluster capacity expansion method and device
CN114500538A (en) * 2022-03-30 2022-05-13 重庆紫光华山智安科技有限公司 Node management method, node management device, monitoring node and storage medium
CN114661312A (en) * 2022-03-25 2022-06-24 江苏安超云软件有限公司 OpenStack cluster nested deployment method and system
CN115617517A (en) * 2022-10-12 2023-01-17 中航信移动科技有限公司 Data processing system for application pod control
CN115454680B (en) * 2022-10-12 2023-07-07 中航信移动科技有限公司 Application control system
CN116455817A (en) * 2023-03-23 2023-07-18 中国人民解放军军事科学院系统工程研究院 Software-defined cloud network fusion architecture and route implementation method
CN117806815A (en) * 2023-11-27 2024-04-02 本原数据(北京)信息技术有限公司 Data processing method, system, electronic device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180332367A1 (en) * 2017-05-09 2018-11-15 EMC IP Holding Company LLC Dynamically scaling a number of stream segments that dynamically store streaming data while preserving the order of writes
US10346216B1 (en) * 2015-11-16 2019-07-09 Turbonomic, Inc. Systems, apparatus and methods for management of software containers
CN110262899A (en) * 2019-06-20 2019-09-20 无锡华云数据技术服务有限公司 Monitor component elastic telescopic method, apparatus and controlled terminal based on Kubernetes cluster
CN111464355A (en) * 2020-03-31 2020-07-28 北京金山云网络技术有限公司 Method and device for controlling expansion capacity of Kubernetes container cluster and network equipment
CN111752489A (en) * 2020-06-30 2020-10-09 重庆紫光华山智安科技有限公司 Expansion method of PVC (polyvinyl chloride) module in Kubernetes and related device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10346216B1 (en) * 2015-11-16 2019-07-09 Turbonomic, Inc. Systems, apparatus and methods for management of software containers
US20180332367A1 (en) * 2017-05-09 2018-11-15 EMC IP Holding Company LLC Dynamically scaling a number of stream segments that dynamically store streaming data while preserving the order of writes
CN110262899A (en) * 2019-06-20 2019-09-20 无锡华云数据技术服务有限公司 Monitor component elastic telescopic method, apparatus and controlled terminal based on Kubernetes cluster
CN111464355A (en) * 2020-03-31 2020-07-28 北京金山云网络技术有限公司 Method and device for controlling expansion capacity of Kubernetes container cluster and network equipment
CN111752489A (en) * 2020-06-30 2020-10-09 重庆紫光华山智安科技有限公司 Expansion method of PVC (polyvinyl chloride) module in Kubernetes and related device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ENERYSOBER: "Kubernetes pod驱逐机制", 《HTTPS://BLOG.CSDN.NET/ENERYSOBER/ARTICLE/DETAILS/96592238》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112882794A (en) * 2021-02-25 2021-06-01 重庆紫光华山智安科技有限公司 pod capacity expansion method, device, node and storage medium
CN113806010A (en) * 2021-08-13 2021-12-17 济南浪潮数据技术有限公司 Dynamic adjustment method, device and medium for service personalized configuration
CN114168071B (en) * 2021-10-29 2023-11-03 济南浪潮数据技术有限公司 Distributed cluster capacity expansion method, distributed cluster capacity expansion device and medium
CN114168071A (en) * 2021-10-29 2022-03-11 济南浪潮数据技术有限公司 Distributed cluster capacity expansion method, distributed cluster capacity expansion device and medium
CN114296909A (en) * 2021-12-02 2022-04-08 新浪网技术(中国)有限公司 Automatic node capacity expansion and reduction method and system according to kubernets event
CN114296909B (en) * 2021-12-02 2024-07-26 新浪技术(中国)有限公司 Automatic capacity expansion and contraction method and system for nodes according to kubernetes events
CN114356557A (en) * 2021-12-16 2022-04-15 北京穿杨科技有限公司 Cluster capacity expansion method and device
CN114661312A (en) * 2022-03-25 2022-06-24 江苏安超云软件有限公司 OpenStack cluster nested deployment method and system
CN114661312B (en) * 2022-03-25 2023-06-09 安超云软件有限公司 OpenStack cluster nesting deployment method and system
CN114500538A (en) * 2022-03-30 2022-05-13 重庆紫光华山智安科技有限公司 Node management method, node management device, monitoring node and storage medium
CN115454680B (en) * 2022-10-12 2023-07-07 中航信移动科技有限公司 Application control system
CN115617517A (en) * 2022-10-12 2023-01-17 中航信移动科技有限公司 Data processing system for application pod control
CN115617517B (en) * 2022-10-12 2023-11-10 中航信移动科技有限公司 Data processing system for applying pod control
CN116455817A (en) * 2023-03-23 2023-07-18 中国人民解放军军事科学院系统工程研究院 Software-defined cloud network fusion architecture and route implementation method
CN117806815A (en) * 2023-11-27 2024-04-02 本原数据(北京)信息技术有限公司 Data processing method, system, electronic device and storage medium

Also Published As

Publication number Publication date
CN113051075B (en) 2022-09-09

Similar Documents

Publication Publication Date Title
CN113051075B (en) Kubernetes intelligent capacity expansion method and device
CN108845884B (en) Physical resource allocation method, device, computer equipment and storage medium
CN108632365B (en) Service resource adjusting method, related device and equipment
CN111880936B (en) Resource scheduling method, device, container cluster, computer equipment and storage medium
CN105049268A (en) Distributed computing resource allocation system and task processing method
CN110389843B (en) Service scheduling method, device, equipment and readable storage medium
CN108965014A (en) The service chaining backup method and system of QoS perception
CN111258746B (en) Resource allocation method and service equipment
CN111930493B (en) NodeManager state management method and device in cluster and computing equipment
CN113867959A (en) Training task resource scheduling method, device, equipment and medium
CN112148481B (en) Method, system, equipment and medium for executing simulation test task
CN114064199A (en) Cluster capacity management method and system
CN113918647A (en) Distributed database elastic expansion method, device, equipment and storage medium
CN116069496A (en) GPU resource scheduling method and device
CN113886058A (en) Cross-cluster resource scheduling method and device
CN113760549B (en) Pod deployment method and device
CN111209098A (en) Intelligent rendering scheduling method, server, management node and storage medium
US20230315531A1 (en) Method of creating container, electronic device and storage medium
CN114416355A (en) Resource scheduling method, device, system, electronic equipment and medium
CN113608838A (en) Deployment method and device of application image file, computer equipment and storage medium
CN112261125B (en) Centralized unit cloud deployment method, device and system
CN109257256A (en) Apparatus monitoring method, device, computer equipment and storage medium
CN115866059B (en) Block chain link point scheduling method and device
CN116483546A (en) Distributed training task scheduling method, device, equipment and storage medium
CN114579298A (en) Resource management method, resource manager, and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant