CN114443302A

CN114443302A - Container cluster capacity expansion method, system, terminal and storage medium

Info

Publication number: CN114443302A
Application number: CN202210098223.0A
Authority: CN
Inventors: 芮法玲
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2022-01-27
Filing date: 2022-01-27
Publication date: 2022-05-06

Abstract

The invention relates to the technical field of container clusters, in particular to a container cluster capacity expansion method, a system, a terminal and a storage medium, wherein the method comprises the following steps: respectively creating occupation container nodes with the same specification for various types of working container nodes, wherein the occupation container nodes are in a dormant state, and the types are two-dimensional classification results including specification types and application types; acquiring the belonged type of an abnormal working container node which cannot be scheduled, positioning the nodes of the same type of working container according to the belonged type of the abnormal working container node, and acquiring the specification parameters of the nodes of the same type of working container; and searching target occupancy container nodes with the specification not lower than the specification parameter from all the occupancy container nodes, and releasing the resources occupied by the target occupancy container nodes to the abnormal working container nodes. The invention can relieve the application pressure and ensure the normal operation when the service pressure is increased.

Description

Container cluster capacity expansion method, system, terminal and storage medium

Technical Field

The invention relates to the technical field of container clusters, in particular to a container cluster capacity expansion method, a system, a terminal and a storage medium.

Background

The cluster elastic expansion is usually performed by monitoring the resource use condition of the cluster, and when the load of the cluster is high, elastic expansion is started to expand the cluster so as to relieve the application pressure. This approach requires a separate deployment of the monitoring component to obtain the cluster's resource usage. If the monitoring component is deployed in the cluster, the monitoring component can cause pressure on the cluster, and if the monitoring component is deployed outside the cluster, a series of limitations such as authority control can be involved; secondly, the resource utilization rate of the cluster is low and does not represent that the resource of the application request is few, so that the purpose of capacity expansion according to the application use condition can not always be achieved by monitoring according to the cluster resource utilization rate to start flexible capacity expansion, and the capacity expansion triggered by taking the fact that pod cannot be scheduled as a trigger is a simpler, more intuitive and effective capacity expansion triggering mode, but when pod cannot be scheduled, the cluster is expanded, the pod can be scheduled and started in a delayed manner until the cluster capacity expansion is completed, and if the service is important, the delay of the period of time can cause very bad influence.

Disclosure of Invention

The invention provides a container cluster capacity expansion method, a system, a terminal and a storage medium, aiming at the problems that in the prior art, a monitoring component occupies resources, capacity expansion is delayed to start a pod when the pod cannot be scheduled, and service continuity is poor.

In a first aspect, the present invention provides a container cluster capacity expansion method, including:

respectively creating occupation container nodes with the same specification for various types of working container nodes, wherein the occupation container nodes are in a dormant state, and the types are two-dimensional classification results including specification types and application types;

acquiring the belonged type of an abnormal working container node which cannot be scheduled, positioning the nodes of the same type of working container according to the belonged type of the abnormal working container node, and acquiring the specification parameters of the nodes of the same type of working container;

and searching target occupancy container nodes with the specification not lower than the specification parameter from all the occupancy container nodes, and releasing the resources occupied by the target occupancy container nodes to the abnormal working container nodes.

Further, before creating placeholder container nodes with equivalent specifications for various kinds of work container nodes, the method further includes:

classifying the working container nodes according to specification sizes, and marking the working container nodes by specification types, wherein the specifications refer to hardware resources occupied by the working container nodes;

and classifying the work container nodes according to application division and marking the work container nodes by application types.

Further, obtaining the belonged type of the abnormal working container node which cannot be scheduled, positioning the same type of working container node according to the belonged type of the abnormal working container node, and obtaining the specification parameters of the same type of working container node, includes:

monitoring the state of the working container nodes, and taking the working container nodes with the state of being incapable of being scheduled as abnormal working container nodes;

acquiring the marking content of the abnormal working container node, and searching the similar working container node with the marking content consistent with the abnormal working container node, wherein the marking content comprises specification types and application types;

and calculating hardware resource parameters occupied by the nodes of the similar working containers, and outputting the hardware resource parameters as specification parameters.

Further, searching target placeholder container nodes with specifications not lower than the specification parameters from all placeholder container nodes, and releasing resources occupied by the target placeholder container nodes to the abnormal working container nodes, including:

sequencing all the abnormal working container nodes of the scheduling queue from small to large according to the request resources, and selecting target abnormal working container nodes from the abnormal working container nodes according to the sequence;

screening out a plurality of to-be-selected occupation container nodes with the specification not lower than the corresponding specification parameters of the target abnormal working container node from all the occupation container nodes;

sorting the occupation container nodes to be selected from small to large according to the specification, and selecting the most front occupation container node as the occupation container node to be matched;

calculating the actual hardware resources of the placeholder container nodes to be matched, and judging whether the actual hardware resources are not lower than the request resource size of the target abnormal working container nodes or not:

if yes, migrating the target abnormal work container node to the hardware resource of the occupation container node to be matched, wherein the occupation container node to be matched is expelled by the target abnormal work container node with high priority;

and if not, sequentially selecting the next to-be-selected occupying container node as the occupying container node to be matched.

In a second aspect, the present invention provides a container cluster capacity expansion system, including:

the device comprises an occupation creating unit, a storage unit and a classifying unit, wherein the occupation creating unit is used for creating occupation container nodes with the same specification for various types of working container nodes respectively, the occupation container nodes are in a dormant state, and the types are two-dimensional classification results including specification types and application types;

the system comprises an exception acquisition unit, a scheduling unit and a scheduling unit, wherein the exception acquisition unit is used for acquiring the belonged type of an abnormal working container node which cannot be scheduled, positioning the nodes of the same type of working container according to the belonged type of the abnormal working container node and acquiring the specification parameters of the nodes of the same type of working container;

and the node capacity expansion unit is used for searching target space-occupying container nodes with specifications not lower than the specification parameters from all the space-occupying container nodes and releasing the resources occupied by the target space-occupying container nodes to the abnormal working container nodes.

Further, the system further comprises:

the first classification module is used for classifying the working container nodes according to the specification size and marking the working container nodes by specification types, wherein the specification refers to hardware resources occupied by the working container nodes;

and the second classification module is used for classifying the working container nodes according to application division and marking the working container nodes by application types.

Further, the abnormality acquisition unit includes:

the abnormal monitoring module is used for monitoring the state of the working container node and taking the working container node with the state of being incapable of being scheduled as an abnormal working container node;

the mark searching module is used for acquiring mark contents of the abnormal working container nodes and searching similar working container nodes with the mark contents consistent with the abnormal working container nodes, wherein the mark contents comprise specification types and application types;

and the specification acquisition module is used for calculating hardware resource parameters occupied by the nodes of the similar working containers and outputting the hardware resource parameters as specification parameters.

Further, the node capacity expansion unit includes:

the scheduling and sequencing module is used for sequencing all the abnormal working container nodes of the scheduling queue from small to large according to the request resources and selecting target abnormal working container nodes from the abnormal working container nodes according to the sequence;

the to-be-selected screening module is used for screening a plurality of to-be-selected occupied container nodes with the specification not lower than the corresponding specification parameters of the target abnormal working container node from all occupied container nodes;

the occupation ordering module is used for ordering the occupation container nodes to be selected from small to large according to the specification, and selecting the most front occupation container node as the occupation container node to be matched;

the resource judgment module is used for calculating the actual hardware resources of the occupied container node to be matched and judging whether the actual hardware resources are not lower than the request resource size of the target abnormal working container node or not:

the node migration module is used for migrating the target abnormal working container node to the hardware resource of the occupation container node to be matched if the judgment result of the resource judgment module is yes, and the occupation container node to be matched is evicted by the target abnormal working container node with high priority;

and the target reselection module is used for sequentially selecting the next to-be-selected occupation container node as the occupation container node to be matched if the judgment result of the resource judgment module is negative.

In a third aspect, a terminal is provided, including:

a processor, a memory, wherein,

the memory is used for storing a computer program which,

the processor is used for calling and running the computer program from the memory so as to make the terminal execute the method of the terminal.

In a fourth aspect, a computer storage medium is provided having stored therein instructions that, when executed on a computer, cause the computer to perform the method of the above aspects.

The container cluster capacity expansion method, the system, the terminal and the storage medium provided by the invention have the beneficial effects that the method, the system, the terminal and the storage medium can quickly cope with the situation of insufficient infrastructure resources caused by the increase of application copy number in a k8s cluster caused by the rapid increase of business volume or the increase of application resources caused by newly added applications, so as to relieve the application pressure and ensure the normal operation when the business pressure is increased.

In addition, the invention has reliable design principle, simple structure and very wide application prospect.

Drawings

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.

FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention.

FIG. 2 is a schematic block diagram of a system of one embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.

The following explains key terms appearing in the present invention.

The Kubernetes (k8s) Google open source container cluster management system provides a series of complete functions such as deployment, operation, resource scheduling, service discovery and dynamic expansion for containerized applications, and improves the convenience of large-scale container cluster management

Pod is the smallest unit of the Kubernetes schedule. A Pod may contain one or more containers and thus may be considered a logical host of internal containers, the Pod nodes in the present invention being the pods.

FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention. The execution subject in fig. 1 may be a container cluster capacity expansion system.

As shown in fig. 1, the method includes:

step 110, creating occupation container nodes with the same specification for various types of working container nodes respectively, wherein the occupation container nodes are in a dormant state, and the types are two-dimensional classification results including specification types and application types;

step 120, obtaining the belonged type of the abnormal working container node which cannot be dispatched, positioning the same type of working container node according to the belonged type of the abnormal working container node, and obtaining the specification parameters of the same type of working container node;

and step 130, searching target placeholder container nodes with specifications not lower than the specification parameters from all placeholder container nodes, and releasing the resources occupied by the target placeholder container nodes to the abnormal working container nodes.

In order to facilitate understanding of the present invention, the principle of the container cluster capacity expansion method of the present invention is combined with the process of capacity expansion of the container cluster in the embodiment, so as to further describe the container cluster capacity expansion method provided by the present invention.

Specifically, the container cluster capacity expansion method includes:

s1, creating occupation container nodes with the same specification for various types of working container nodes respectively, wherein the occupation container nodes are in a dormant state, and the types are two-dimensional classification results including specification types and application types.

And dividing the work container nodes in the cluster into different node groups according to two dimensions of resource specification and application division. Firstly, dividing according to the specification; then, dividing the application into multiple parts according to application division, wherein the application can be simply divided into normal, important and critical according to the application importance degree, and can also be divided into app1, app2 and app3 according to the application; when dividing the second dimension, different labels are marked for nodes belonging to different node groups.

Node group instance: [ (s1.large: critical), (s1.large: normal), (s1.medium: critical), (s2.medium: normal) ],

applications in the cluster are classified and corresponding labels, such as critical, normal, are added to the applications.

In order to respond to capacity expansion requirements timely, the nodes of the occupied container are configured, for example, App1 needs to have 2 redundant copy resources, App2 needs to have 1 redundant copy resource, and App3 does not need to have redundancy. The occupied container node is a pod only occupying resources and not using the resources, so that the occupied device of the App1 is 2 permanently dormant pods which are the same as the requesting resources in App1, and the occupied pods occupying the resources for different applications are labeled with no permission. Because the essence of the position occupying device is a pod, the position occupying device can trigger the expansion when the scheduling is impossible to realize the requirement of over-configuring the nodes for the cluster, and the pod does not really use the resources of the nodes because the pod is in a permanent sleep state; and using priority and preemption to realize the Pod for creating real Pod and then expelling the occupied Pod, and using PodPriorityClass to configure the priority of the occupied Pod for the Pod with lower priority than that of the working Pod.

S2, obtaining the belonged type of the abnormal working container node which can not be dispatched, positioning the same type of working container node according to the belonged type of the abnormal working container node, and obtaining the specification parameters of the same type of working container node.

Monitoring the state of the working container node, and taking the working container node with the state of being incapable of being dispatched as an abnormal working container node; acquiring the marking content of the abnormal working container node, and searching the similar working container node with the marking content consistent with the abnormal working container node, wherein the marking content comprises specification types and application types; and calculating hardware resource parameters occupied by the nodes of the similar working containers, and outputting the hardware resource parameters as specification parameters.

For example, the label of the abnormal working container node is (s1.large: critical), App 3: and (3) normal, searching for a working container node with the same label as the abnormal working container node, wherein the searched working container node and the abnormal working container node belong to the same type. And acquiring hardware resource parameters occupied by the nodes of the working container, namely the specification parameters.

S3, searching target occupancy container nodes with specifications not lower than the specification parameters from all the occupancy container nodes, and releasing the resources occupied by the target occupancy container nodes to the abnormal working container nodes.

Sequencing all the abnormal working container nodes of the scheduling queue from small to large according to the request resources, and selecting target abnormal working container nodes from the abnormal working container nodes according to the sequence; screening out a plurality of to-be-selected occupation container nodes with the specification not lower than the corresponding specification parameters of the target abnormal working container node from all the occupation container nodes; sorting the occupation container nodes to be selected from small to large according to the specification, and selecting the most front occupation container node as the occupation container node to be matched; calculating the actual hardware resources of the placeholder container nodes to be matched, and judging whether the actual hardware resources are not lower than the request resource size of the target abnormal working container nodes or not: if yes, migrating the target abnormal work container node to the hardware resource of the occupation container node to be matched, wherein the occupation container node to be matched is expelled by the target abnormal work container node with high priority; and if not, sequentially selecting the next occupation container node to be selected as the occupation container node to be matched.

For example, all the non-dispatchable pods are sorted from large to small according to the sizes of CPUs in their requested resources (CPUs), and an attempt is made to find a suitable placeholder container node for each non-dispatchable pod to expand the capacity. Firstly, searching a corresponding placeholder container node according to an applied label, wherein the number of the corresponding placeholder container nodes is multiple, starting from a placeholder container node template with the minimum specification in sequence, supposing whether a newly added node can accommodate the current pod, and if all placeholder container node templates are unavailable, abandoning the pod; if yes, adding the specification of the node template of the placeholder container as new node data into the snapshot data of the cluster state, and calculating whether the rest pod can be put down, so that the appropriate placeholder container nodes are found for all unscheduled pods in a circular attempt; and if the plurality of occupied container nodes can meet the requirement, selecting the occupied container node with the minimum expansion node number.

The capacity of the cluster is expanded in the container cluster by adding nodes to accommodate more applications or provide more copies of the applications to cope with large traffic pressure. The time for expanding the capacity and adding the nodes depends on the cloud provider, the time is usually not less than 3 minutes, and the time is longer and longer as the cluster is increased. For common applications, when a cluster cannot accommodate enough copies, the expansion of the cluster added nodes is probably not influenced, but for important applications needing timely response, when the cluster needs to be expanded in advance without host resources, the expansion time is often intolerable.

As shown in fig. 2, the system 200 includes:

an occupation creating unit 210, configured to create occupation container nodes with the same specification for multiple types of work container nodes, where the occupation container nodes are in a dormant state, and the types are two-dimensional classification results including specification types and application types;

an exception obtaining unit 220, configured to obtain a category to which an exception working container node that cannot be scheduled belongs, and locate a similar working container node according to the category to which the exception working container node belongs, to obtain a specification parameter of the similar working container node;

and the node capacity expansion unit 230 is configured to search, from all occupied container nodes, a target occupied container node whose specification is not lower than the specification parameter, and release the resource occupied by the target occupied container node to the abnormal working container node.

Optionally, as an embodiment of the present invention, the system further includes:

Optionally, as an embodiment of the present invention, the exception obtaining unit includes:

the abnormal monitoring module is used for monitoring the state of the working container node and taking the working container node with the state of being incapable of being dispatched as the abnormal working container node;

and the specification acquisition module is used for calculating the hardware resource parameters occupied by the nodes of the similar working containers and outputting the hardware resource parameters as specification parameters.

Optionally, as an embodiment of the present invention, the node capacity expansion unit includes:

Fig. 3 is a schematic structural diagram of a terminal 300 according to an embodiment of the present invention, where the terminal 300 may be used to execute the container cluster capacity expansion method according to the embodiment of the present invention.

Among them, the terminal 300 may include: a processor 310, a memory 320, and a communication unit 330. The components communicate via one or more buses, and those skilled in the art will appreciate that the architecture of the servers shown in the figures is not intended to be limiting, and may be a bus architecture, a star architecture, a combination of more or less components than those shown, or a different arrangement of components.

The memory 320 may be used for storing instructions executed by the processor 310, and the memory 320 may be implemented by any type of volatile or non-volatile storage terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk. The executable instructions in memory 320, when executed by processor 310, enable terminal 300 to perform some or all of the steps in the method embodiments described below.

The processor 310 is a control center of the storage terminal, connects various parts of the entire electronic terminal using various interfaces and lines, and performs various functions of the electronic terminal and/or processes data by operating or executing software programs and/or modules stored in the memory 320 and calling data stored in the memory. The processor may be composed of an Integrated Circuit (IC), for example, a single packaged IC, or a plurality of packaged ICs connected with the same or different functions. For example, the processor 310 may include only a Central Processing Unit (CPU). In the embodiment of the present invention, the CPU may be a single operation core, or may include multiple operation cores.

A communication unit 330, configured to establish a communication channel so that the storage terminal can communicate with other terminals. And receiving user data sent by other terminals or sending the user data to other terminals.

The present invention also provides a computer storage medium, wherein the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments provided by the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).

Therefore, the present invention can rapidly cope with the situation that the number of application copies in the k8s cluster increases due to the increase of the traffic volume or the situation that the infrastructure resources are insufficient due to the increase of the application resources caused by the newly added application, so as to relieve the application pressure and ensure the normal operation even when the traffic pressure increases.

Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented using software plus any required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied in the form of a software product, where the computer software product is stored in a storage medium, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like, and the storage medium can store program codes, and includes instructions for enabling a computer terminal (which may be a personal computer, a server, or a second terminal, a network terminal, and the like) to perform all or part of the steps of the method in the embodiments of the present invention.

The same and similar parts in the various embodiments in this specification may be referred to each other. Especially, for the terminal embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant points can be referred to the description in the method embodiment.

In the several embodiments provided in the present invention, it should be understood that the disclosed system and method may be implemented in other manners. For example, the above-described system embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, systems or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A container cluster capacity expansion method is characterized by comprising the following steps:

2. The method of claim 1, wherein prior to creating placeholder container nodes with equivalent specifications for a plurality of categories of work container nodes, respectively, the method further comprises:

3. The method according to claim 2, wherein obtaining the category to which the abnormal working container node which cannot be scheduled belongs, and positioning the same-class working container node according to the category to which the abnormal working container node belongs, and obtaining the specification parameters of the same-class working container node comprises:

monitoring the state of the working container node, and taking the working container node with the state of being incapable of being dispatched as an abnormal working container node;

4. The method of claim 1, wherein searching for a target placeholder container node with a specification not lower than the specification parameter from all placeholder container nodes, and releasing the resources occupied by the target placeholder container node to the abnormal working container node comprises:

if yes, the target abnormal working container node is migrated to the hardware resource of the occupation container node to be matched, and the occupation container node to be matched is evicted by the target abnormal working container node with high priority;

and if not, sequentially selecting the next occupation container node to be selected as the occupation container node to be matched.

5. A container cluster capacity expansion system, comprising:

the device comprises an occupation creating unit and an application creating unit, wherein the occupation creating unit is used for creating occupation container nodes with the same specification for various types of working container nodes respectively, the occupation container nodes are in a dormant state, and the types are two-dimensional classification results including specification types and application types;

6. The system of claim 5, further comprising:

7. The system of claim 6, wherein the anomaly obtaining unit comprises:

8. The system of claim 5, wherein the node capacity expansion unit comprises:

9. A terminal, comprising:

a processor;

a memory for storing instructions for execution by the processor;

wherein the processor is configured to perform the method of any one of claims 1-4.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-4.