CN113297031A - Container group protection method and device in container cluster - Google Patents

Container group protection method and device in container cluster Download PDF

Info

Publication number
CN113297031A
CN113297031A CN202110499084.8A CN202110499084A CN113297031A CN 113297031 A CN113297031 A CN 113297031A CN 202110499084 A CN202110499084 A CN 202110499084A CN 113297031 A CN113297031 A CN 113297031A
Authority
CN
China
Prior art keywords
cluster
container
state
container group
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110499084.8A
Other languages
Chinese (zh)
Inventor
赵明山
张振
王思宇
黄涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Innovation Co
Original Assignee
Alibaba Singapore Holdings Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Singapore Holdings Pte Ltd filed Critical Alibaba Singapore Holdings Pte Ltd
Priority to CN202110499084.8A priority Critical patent/CN113297031A/en
Publication of CN113297031A publication Critical patent/CN113297031A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available

Abstract

The present specification provides a container group protection method and apparatus in a container cluster, wherein the container group protection method in the container cluster includes: monitoring an operational status of each container group in the container cluster; determining a cluster state value for the container cluster based on the operating state of each container group and the expected state of the container cluster; receiving an interrupt instruction for a first target container group; and rejecting the interruption instruction under the condition that the cluster state value is inconsistent with the expected state.

Description

Container group protection method and device in container cluster
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a container group protection method in a container cluster. The present description also relates to a container group guard in a container cluster, a computing device, and a computer-readable storage medium.
Background
With the development of computer technology, the cloud native concept has also become popular, and two major systems, i.e., Docker (a container technology in Linux) and Kubernetes (an arrangement management system of a container group, abbreviated as k88s), have been widely applied to cloud computing platforms of various companies.
A container group (Pod) is a minimum unit managed in a kubernets cluster, and is a combination of one or more containers, in the kubernets cluster, in order to guarantee high availability of pods, a powerful workload is often used to define and manage the number of copies of an application Pod, but the workload can only help the number of copies of the application Pod to reach a desired value as soon as possible, and cannot guarantee that the number of copies of the application Pod available at any time is kept at the desired value, such as a case of project interruption or SLA (service level agreement) degradation in a volitional interruption scene (active interruption scene). Further, a convenient and practical protection method is needed to solve the problem of project interruption or SLA degradation caused by large amount of unusable Pod in the volume delivery scenario.
Disclosure of Invention
In view of this, the present specification provides a container group protection method in a container cluster. The present specification also relates to a container group guard in a container cluster, a computing device, and a computer readable storage medium to address technical deficiencies in the prior art.
According to a first aspect of embodiments herein, there is provided a container group guarding method in a container cluster, comprising:
monitoring an operational status of each container group in the container cluster;
determining a cluster state value for the container cluster based on the operating state of each container group and the expected state of the container cluster;
receiving an interrupt instruction for a first target container group;
rejecting the interrupt instruction if the cluster state value is inconsistent with the expected state.
According to a second aspect of embodiments herein, there is provided a container group guard in a container cluster, comprising:
a monitoring module configured to monitor an operational status of each container group in the container cluster;
a determination module configured to determine a cluster state value for the container cluster based on the operating state of each container group and the expected state of the container cluster;
an interrupt instruction receiving module configured to receive an interrupt instruction for a first target container group;
a rejection module configured to reject the interrupt instruction if the cluster state value is inconsistent with the expected state.
According to a third aspect of embodiments herein, there is provided a computing device comprising:
a memory and a processor;
the memory is to store computer instructions, and the processor is to execute the computer instructions to:
monitoring an operational status of each container group in the container cluster;
determining a cluster state value for the container cluster based on the operating state of each container group and the expected state of the container cluster;
receiving an interrupt instruction for a first target container group;
rejecting the interrupt instruction if the cluster state value is inconsistent with the expected state.
According to a fourth aspect of embodiments herein, there is provided a computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of a container group guard method in any of the container clusters.
In the container group protection method in the container cluster provided by the present specification, the operating state of each container group in the container cluster is monitored; determining a cluster state value for the container cluster based on the operating state of each container group and the expected state of the container cluster; receiving an interrupt instruction for a first target container group; rejecting the interrupt instruction if the cluster state value is inconsistent with the expected state. The embodiment of the description realizes that various situations in an active interrupt scene are converted into interrupt instructions for Pod, has high universality, and can effectively prevent project interrupt influence caused by massive unavailability of Pod in the active interrupt scene and maintain project stability.
Drawings
Fig. 1 is a flowchart of a container group protection method in a container cluster according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of a container group protection method in a container cluster according to a second embodiment of the present disclosure;
fig. 3 is a process flow diagram of a container group protection method in a container cluster according to a second embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a container group guard of a container cluster, according to an embodiment of the present disclosure;
fig. 5 is a block diagram of a computing device according to an embodiment of the present disclosure.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
First, the noun terms to which one or more embodiments of the present specification relate are explained.
Kubernetes: abbreviated K88s, an open source container orchestration engine that supports automated deployment, large-scale scalable, application containerization management.
Pod: the container group, an abstraction of the application in Kubernetes, is the smallest unit that can be scheduled, consists of a set of closely associated containers, and is scheduled onto the same node.
And (3) workload: the controller for application management in kubernets provides powerful copy management and grey scale publishing capabilities.
volentary displacement: an (active interruption scenario) refers to a scenario where some action triggered by a user or a cluster administrator causes a partial application pod to be unavailable, resulting in a disruption of the project or SLA degradation.
Pod dispensing Budget: PDB is used for limiting the maximum unavailable pod copy number or the minimum available pod copy number which should be kept when the volentary dispatch is encountered, so as to ensure the high availability of the application. PDB Object simply defines the user's application expectation state, and what really goes to maintain this expectation state is the dispatch controller. It will calculate and update pdb object PodDisruptBudgetStatus when it hears add/update/delete events for pod/pdb. The most important elements of the Pod dispatch Budget Status are dispatch deputedpads and poddispatch sallowed.
Disputedpods: the pod that has passed the pod issue process, but has not yet been processed by the dispatch controller discovery process, is recorded.
PodDisrutionsAllowed: indicating the number of pod currently allowed to dispatch.
Disruption Contoller: the value of PodDisrutionsAllowed is calculated from the status of pod and the number of DisrutedPods, and updating pdb object status is completed.
In the present specification, a method for protecting a container group in a container cluster is provided, and the present specification relates to a container group protecting apparatus in a container cluster, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments one by one.
In Kubernets cluster, in order to guarantee high availability of applications, powerful workloads (e.g., deployment, stateful, etc.) are often used to define and manage applications. The method not only can manage the number of pod copies of the application, but also provides rich gray level capability such as MaxUnavailable and Partition. However, it is not satisfactory that workload can only help the number of pod copies of an application to reach a desired value as soon as possible, and it cannot guarantee that the number of available pod copies at any time remains at the desired value. In the volentary deletion scenario, there are still cases of project interruption or SLA degradation.
The Kubernetes EvictionREST interface is responsible for logic for processing pod eviction resources, and when a pod triggers eviction in some scenes, the EvictionREST determines whether the current eviction action is allowed to pass through by checking a PDB PodDisruptionsAllowed value corresponding to the pod.
When PodDisruptionsAllowed < >0, it indicates that the current evict action is prohibited, and the pod is not allowed to be evicted.
When poddistributionsallowed >0, indicating that this pod is allowed to be evicted, poddistributionsallowed is decremented by one and this pod issue is added to poddistributionsallowed, eventually triggering a pod delete operation.
According to the principle of the scheme, the Kubernets PDB scheme can only protect the scene that the pod is evicted, and the protection capability of the application can not be realized in the scene that the pod cascade is deleted and the application is upgraded due to the deletion of the application workload, so that a large number of pods are unavailable and the like.
Based on this, the application defines a protection mechanism of the PodUnnavailableBudge (hereinafter referred to as PUB) to provide a container group protection method in the container cluster.
Fig. 1 shows a flowchart of a container group protection method in a container cluster according to an embodiment of the present specification, which specifically includes the following steps:
step 102: monitoring an operational status of each container group in the container cluster.
In the present application, a container cluster is a kubernets cluster, the container cluster directly manages container groups (pods), a Pod is a minimum basic unit in the container cluster and is an abstraction of the container cluster to an application, at least one application can be run in a Pod, and in general, one Pod runs one or more containers.
The container cluster manages a plurality of pods, one or more containers run in each Pod, the container is a lightweight virtualization technology and provides an isolation function similar to a virtual machine, the container cluster manages a mirror image by using a layered combined file system technology, the environment operation and maintenance process is greatly simplified, and the container can be used as a cloud host and can also be used as a micro-service.
When the container is used as a cloud host, the container can be used as a container cloud host of a business system, can be started, operated, shut down and restarted as a common host, and can be used for building a website, providing a cloud hard disk, a cloud database, deploying services and the like on the container cloud host, for example, the container cloud host can be used as a cloud host of business of e-commerce and ticket-selling websites.
The container can also be used as a micro-service, provides a whole set of platform-independent standardized technology, and deploys various services, such as audio and video processing, artificial intelligence model training and application, e-commerce service, upper-layer business service and the like.
Based on this, the container group protection method in the container cluster provided by the application can be applied to diversified application scenarios in practical application, such as e-commerce, telecommunication, audio and video processing, artificial intelligence, online ticket selling, data statistics, real-time monitoring and the like.
In another embodiment provided herein, monitoring the operational status of each container group in the container cluster comprises:
receiving a creation instruction for a third target container group;
creating the third target container group in the container cluster in response to the creation instruction and creating a corresponding target container group guard object for the third target container group;
and under the condition that the target container group protection object is successfully established, acquiring a container group list and an expected state of the container cluster, and monitoring the running state of each container group in the container group list.
In practical applications, each time a new Pod is created, the container cluster receives a creation instruction for the new Pod, the container cluster also creates a corresponding container group guard Object (PUB Object) for the newly created Pod, and after the PUB management unit monitors the PUB Object creation event, a Selector (Selector) in the container cluster can be used to obtain a list of the pods in the container cluster and an expected state defined by the PUB. While monitoring the operational status of each Pod in the cluster of containers.
Step 104: a cluster state value for the container cluster is determined based on the operating state of each container group and the expected state of the container cluster.
The cluster state value refers to an unavailable allowed value in the PUB, that is, the number of currently allowed Pod to be interrupted in the container cluster, and if 10 pods are also allowed to be interrupted in the container cluster, the cluster state value is 10.
The expected state of the container cluster is some constraint (object specification Spec) to be satisfied by the container cluster during operation, such as maximum unavailability of Pod in the container cluster, minimum operation value, and the like.
Specifically, the expected state includes a maximum unavailability; the determining a cluster state value for the container cluster based on the operating state of each container group and the expected state of the container cluster comprises:
determining a cluster state value for the cluster of containers based on the operating state of each group of containers and the maximum unavailability.
The expected state of the container cluster includes a maximum unavailability (MaxUnavailable), that is, in the current container cluster, the proportion of the maximum interrupt Pod is allowed, for example, MaxUnavailable equals to 20%. The Unnavailable allowed of the container cluster can be determined according to the running state and the maximum unavailability of each container group.
In practical applications, the operating state includes available or unavailable; the determining a cluster state value for the cluster of containers based on the operating state of each group of containers and the maximum unavailability includes:
counting the number of container groups in the container cluster, and determining a maximum unavailable value according to the number of the container groups and the maximum unavailable rate;
counting an unavailable container group value of which the running state is unavailable in the container cluster;
determining the cluster state value based on the maximum unavailable value and the unavailable container group value.
First, the total number of the Pod registered in the container cluster is counted, and then the maximum unavailability value can be determined according to the total number of the Pod and the maximum unavailability, for example, if there are 100 pods in the container cluster in total, and the maximum unavailability is 20%, the maximum unavailability value can be determined to be 20. In practical applications, the expected state may directly include a maximum unavailable value, e.g., MaxUnavailable ═ 20, i.e., only 20 Pod interrupts are allowed in the container cluster at most.
The operating Status of the Pod can be monitored by monitoring the Pod parameter or in other manners, for example, when the Status parameter of the Pod is Running and the Ready parameter of the Pod is true, the operating Status of the Pod can be determined to be available, and in other cases, the operating Status of the Pod is determined to be unavailable. Counting the number of container groups with an unavailable operation status in the current container cluster, that is, an unavailable container group value, and after determining the maximum unavailable value and the unavailable container group value, determining a cluster state value of the container cluster, for example, the maximum unavailable value is 20, and there are 5 container groups with an unavailable operation status in the container cluster, that is, the unavailable container group value is 5, based on which, unavailable allowed 20-5-15 may be determined.
Step 106: an interrupt instruction for a first target group of containers is received.
The first target container group specifically refers to a Pod corresponding to the interrupt instruction, and the interrupt instruction in the present application includes not only an eviction Pod, but also a modification Pod and a deletion Pod. Specifically, receiving an interrupt instruction for a first target container group includes:
receiving a modification instruction for the first target container group metadata and/or configuration file;
receiving a deletion instruction for a first target container group; or
An eviction instruction is received for a first target container set.
The metadata of the Pod is used to describe attribute information of the Pod, such as a name of the Pod, an IP address of the Pod, a namespace in which the Pod is located, a name of a node on which the Pod runs, limitations of a CPU and a memory that can be used by the Pod, and the like.
The Pod-description configuration file (Spec definition) is a file describing Pod version, including source file location, required files, build settings of the application, common element names, version and description information, and the like.
After modifying metadata and/or Spec definitions of a Pod, the Pod needs to be restarted to validate the modification information, but restarting the Pod causes application service interruption within the Pod, so that when a modification instruction for Pod metadata and/or configuration files is received, it can be considered that an interruption instruction for the Pod is received.
In practical application, the interrupt instruction for a Pod further includes a delete instruction for the Pod, that is, a deletionTimestamp parameter assignment for the Pod, and when the deletionTimestamp parameter assignment for a certain Pod is made, the Pod is considered to be deleted.
In practical applications, the interruption instruction for Pod further includes Pod eviction instructions (Pod evictions), and in a container cluster, the eviction mechanism of Pod is divided into active eviction and passive eviction. Under the condition that the node resources of the container cluster are in short supply, in order to ensure the stability of the nodes, the Kubelet in the container cluster triggers Pod eviction, and the Pod which is not used temporarily on the nodes is recycled to release the node resources, wherein the Kubelet eviction is also called passive eviction; when a certain node needs to be maintained or deleted, the Pod deployed in the node needs to be actively evicted manually so as not to influence the continuity of the project. In the container cluster, when an eviction instruction of a Pod is received, it may be considered that an interrupt instruction for the Pod is received.
In the application, the problem of modifying the Pod resources is solved by converting a plurality of situations in the cloud delivery scenario of the Pod, and the scenario can be regarded as the cloud delivery scenario as long as a modification, deletion or eviction instruction for the Pod in the container cluster is monitored, so that the scenario has high universality.
In practical applications, after receiving an interrupt instruction for the first target container group, the method further includes:
judging whether the cluster state value meets the expected state;
if so, determining that the cluster state value is consistent with the expected state;
if not, determining that the cluster state value is inconsistent with the expected state.
In practical application, after receiving an interrupt instruction for a target Pod, intercepting the interrupt instruction by using a Pub Webhook technology, where in the Hook technology (also called a Hook function) before a system does not execute the interrupt instruction, a Hook program captures the instruction, the Hook function obtains control right first, and the Hook function can process (change) the instruction and also can forcibly end the transfer of the instruction.
After intercepting the interrupt instruction by the Webhook technology, it needs to be determined whether the cluster state value of the container cluster at this time satisfies the expected state of the container cluster. Specifically, the desired state includes a minimum operational value; judging whether the cluster state value meets the expected state, specifically including:
judging whether the cluster state value is greater than 0 or not, and judging whether the cluster state value is greater than the minimum operation value or not;
determining that the cluster state value is consistent with the expected state if the cluster state value is greater than 0 and the cluster state value is greater than the minimum operational value;
determining that the cluster state value is inconsistent with the desired state if the cluster state value is less than or equal to 0 or the cluster state value is less than or equal to the minimum operational value.
Firstly, judging whether the cluster state value of the container cluster is greater than 0, if the cluster state value is greater than 0, indicating that the Pod can be interrupted again in the container cluster, if the cluster state value is less than or equal to 0, indicating that the Pod can not be interrupted again in the container cluster, and if the Pod is interrupted again, causing project interruption or SLA degradation.
Whether the cluster state value of the container cluster is greater than a minimum operation value (MinAvailable) in an expected state is further judged, wherein the minimum operation value specifically refers to the minimum Pod number required for ensuring normal operation of the project, for example, an Etcd storage system is deployed in the container cluster, and the current Etcd has 5 copies, and the Etcd storage system can be normally used only when the Etcd copy number is greater than 3 copies based on the implementation principle of the Etcd. At this time, MinAvailable ═ 3 in the PUB expected state corresponding to the Etcd is defined, that is, when the Pod needs a minimum of 3 normal operations, the normal use of the Etcd system can be guaranteed. When the administrator performs service capacity reduction, the number of available Pods is set to be 2 by mistake, after monitoring the change of the configured parameters, the control manager starts to perform capacity reduction operation one by one, and when the 1 st Pod is deleted, the state value of the container cluster at the moment is judged to be larger than MinAvailable, and the cluster state value is determined to be larger than the minimum operation value; when the 3 rd Pod is deleted, the cluster state value is determined to be equal to the minimum operation value.
The cluster state value may be determined to be consistent with the desired state only if the cluster state value is greater than 0 and the cluster state value is greater than a minimum operational value in the desired state. And determining that the cluster state value is inconsistent with the expected state under the condition that the cluster state value is less than or equal to 0 and/or the cluster state value is less than or equal to the minimum operation value in the expected state.
Step 108: rejecting the interrupt instruction if the cluster state value is inconsistent with the expected state.
If the cluster state value of the container cluster is determined to be inconsistent with the expected state through judgment, the current interruption instruction can cause project interruption or SLA degradation, and normal use of the project is influenced. At which point the interrupt instruction needs to be rejected. In the above steps, the interruption instruction is intercepted by using the Pub Webhook technology, and the interruption instruction can be directly rejected under the condition that the cluster state value is inconsistent with the expected state. It should be noted that after the interrupt instruction is rejected, it may be determined again whether the cluster state value satisfies the expected state after a preset time period.
In another specific embodiment provided herein, the method further comprises:
receiving an operating instruction aiming at a second target container group;
executing the second target container group in response to the execution instruction;
and under the condition that the running state of the second target container group is changed into running, updating the cluster state value according to the running change state value corresponding to the running instruction.
In practical application, the cluster state value of the container cluster is not constant, and the cluster state value dynamically changes according to the operation state of the Pod in the container cluster, for example, when the second target Pod receives an operation instruction in an unavailable state, the second target Pod is started to operate in the container cluster, after the second target Pod successfully operates, the operation change state value corresponding to the operation instruction needs to be acquired, and the operation change state value is usually set to +1, that is, the cluster state value is increased by 1, that is, the number of Pod allowed to be interrupted in the container cluster is increased by one. If the running instruction for the second target Pod is received just after the interruption instruction for the first target Pod is rejected, and the cluster state value of the container cluster may already satisfy the expected state, another embodiment provided in this application may be implemented, where the method further includes:
and under the condition that the cluster state value is consistent with the expected state, executing the interrupt instruction, and updating the cluster state value according to an interrupt change state value corresponding to the interrupt instruction.
Under the condition that the cluster state value is consistent with the expected state, it is described that the interrupt instruction does not affect the item of the container cluster, that is, the interrupt instruction is allowed, after the interrupt instruction is executed, the cluster state value of the container cluster needs to be updated in time according to the interrupt change state value corresponding to the interrupt instruction, in general, in this application, the interrupt change state value may be set to-1, that is, after the interrupt instruction is executed, the cluster state value-1 may be set to identify that the number of Pod allowed to be interrupted in the container cluster at this time is reduced by one.
In the container group protection method in the container cluster provided by the present specification, the operating state of each container group in the container cluster is monitored; determining a cluster state value for the container cluster based on the operating state of each container group and the expected state of the container cluster; receiving an interrupt instruction for a first target container group; rejecting the interrupt instruction if the cluster state value is inconsistent with the expected state. The embodiment of the specification realizes that various conditions in an active interrupt scene are converted into interrupt instructions for Pod, has high universality, and can effectively prevent project interrupt influence caused by a large amount of unavailable Pod in the active interrupt scene and maintain the stability of the project.
Secondly, by means of the Webhook technology, on the premise that the interior of a related project system is not invaded, the project system is comprehensively protected, project interruption caused by an active interruption scene is avoided, and the stability of the project is further maintained. The use experience of the user is improved.
The method for protecting the container group in the container cluster is further described below with reference to fig. 2 and 3. Fig. 2 is a schematic diagram illustrating an architecture of a container group protection method in a container cluster according to a second embodiment of the present disclosure.
As shown in fig. 2, a control unit of a Pod group protection mechanism Pod (hereinafter referred to as PUB) monitors a running state of each Pod in a container cluster API Server. And calculating a container cluster state value (Unnavailable Allowed) according to the Pod running state and the expected state of the PUB, and saving the cluster state value to the API Server.
Deleting a workload, upgrading the workload, reducing the workload, upgrading a Pod in place, a Drain node and the like by a user or a container cluster administrator to trigger a volumtary deletion scene and further trigger the modification of the Pod resources, wherein the modification of the Pod resources comprises the following steps: a) modifying Pod metadata & spec; b) deleting the Pod; c) the Pod is evicted.
The API Server intercepts the modification instruction through the Admission Webhook, inquires whether the modification is allowed or not from the PUB Webhook, and the PUB Webhook determines whether the modification instruction is allowed or not according to the cluster state value and the expected state stored in the API Server.
If the modification instruction is allowed, the corresponding operation is executed, and if the modification instruction is not allowed, the operation can be retried again after a period of time.
Fig. 3 is a processing flow chart of a container group protection method in a container cluster according to a second embodiment of the present disclosure, which specifically includes the following steps:
step 302: a corresponding PubObject is created for each Pod in the container cluster.
Step 304: and under the condition that a PubObject creation event is monitored, acquiring a Pod list and an expected state of the container cluster, wherein the expected state comprises the maximum unavailability and the minimum running value.
Step 306: counting the number of container groups in the container cluster, and determining a maximum unavailable value according to the number of container groups and the maximum unavailable rate.
Step 308: and counting an unavailable container group value of which the running state is unavailable in the container cluster, and determining the cluster state value according to the maximum unavailable value and the unavailable container group value.
Step 310: an interrupt instruction for a first target group of containers is received.
Step 312: and judging whether the cluster state value meets the expected state. If yes, go to step 314, otherwise go to step 316.
Step 314: and executing the interrupt instruction, and updating the cluster state value according to the interrupt change state value corresponding to the interrupt instruction.
Step 316: the interrupt instruction is rejected.
In the container group protection method in the container cluster provided by the present specification, the operating state of each container group in the container cluster is monitored; determining a cluster state value for the container cluster based on the operating state of each container group and the expected state of the container cluster; receiving an interrupt instruction for a first target container group; rejecting the interrupt instruction if the cluster state value is inconsistent with the expected state. The embodiment of the specification realizes that various conditions in an active interrupt scene are converted into interrupt instructions for Pod, has high universality, and can effectively prevent project interrupt influence caused by a large amount of unavailable Pod in the active interrupt scene and maintain the stability of the project.
Secondly, by means of the Webhook technology, on the premise that the interior of a related project system is not invaded, the project system is comprehensively protected, project interruption caused by an active interruption scene is avoided, and the stability of the project is further maintained. The use experience of the user is improved.
Corresponding to the above method embodiment, the present specification further provides an embodiment of a container group protection device in a container cluster, and fig. 4 shows a schematic structural diagram of a container group protection device in a container cluster provided in an embodiment of the present specification. As shown in fig. 4, the apparatus includes:
a monitoring module 402 configured to monitor an operational status of each container group in the container cluster;
a determination module 404 configured to determine a cluster state value for the container cluster based on the operating state of each container group and the expected state of the container cluster;
an interrupt instruction receiving module 406 configured to receive an interrupt instruction for a first target container group;
a rejection module 408 configured to reject the interrupt instruction if the cluster state value is inconsistent with the expected state.
Optionally, the apparatus further comprises:
and the execution module is configured to execute the interrupt instruction under the condition that the cluster state value is consistent with the expected state, and update the cluster state value according to an interrupt change state value corresponding to the interrupt instruction.
Optionally, the apparatus further comprises:
a determining module configured to determine whether the cluster state value satisfies the expected state; if so, determining that the cluster state value is consistent with the expected state; if not, determining that the cluster state value is inconsistent with the expected state.
Optionally, the desired state comprises a minimum operational value;
the determination module further configured to:
judging whether the cluster state value is greater than 0 or not, and judging whether the cluster state value is greater than the minimum operation value or not;
determining that the cluster state value is consistent with the expected state if the cluster state value is greater than 0 and the cluster state value is greater than the minimum operational value;
determining that the cluster state value is inconsistent with the desired state if the cluster state value is less than or equal to 0 or the cluster state value is less than or equal to the minimum operational value.
Optionally, the apparatus further comprises:
an execution instruction receiving module configured to receive an execution instruction for the second target container group;
an execution module configured to execute the second target container group in response to the execution instruction;
and the updating module is configured to update the cluster state value according to the operation change state value corresponding to the operation instruction when the operation state of the second target container group is changed into operation.
Optionally, the desired state includes a maximum unavailability;
the determining module 404 is further configured to:
determining a cluster state value for the cluster of containers based on the operating state of each group of containers and the maximum unavailability.
Optionally, the operating state includes available or unavailable;
the determining module 404 is further configured to:
counting the number of container groups in the container cluster, and determining a maximum unavailable value according to the number of the container groups and the maximum unavailable rate;
counting an unavailable container group value of which the running state is unavailable in the container cluster;
determining the cluster state value based on the maximum unavailable value and the unavailable container group value.
Optionally, the monitoring module 402 is further configured to:
receiving a creation instruction for a third target container group;
creating the third target container group in the container cluster in response to the creation instruction and creating a corresponding target container group guard object for the third target container group;
and under the condition that the target container group protection object is successfully established, acquiring a container group list and an expected state of the container cluster, and monitoring the running state of each container group in the container group list.
Optionally, the interrupt instruction receiving module 406 is further configured to:
receiving a modification instruction for the first target container group metadata and/or configuration file;
receiving a deletion instruction for a first target container group; or
An eviction instruction is received for a first target container set.
The container group protection device in a container cluster provided by the present specification monitors the operating state of each container group in the container cluster; determining a cluster state value for the container cluster based on the operating state of each container group and the expected state of the container cluster; receiving an interrupt instruction for a first target container group; rejecting the interrupt instruction if the cluster state value is inconsistent with the expected state. The embodiment of the specification realizes that various conditions in an active interrupt scene are converted into interrupt instructions for Pod, has high universality, and can effectively prevent project interrupt influence caused by a large amount of unavailable Pod in the active interrupt scene and maintain the stability of the project.
Secondly, by means of the Webhook technology, on the premise that the interior of a related project system is not invaded, the project system is comprehensively protected, project interruption caused by an active interruption scene is avoided, and the stability of the project is further maintained. The use experience of the user is improved.
The above is a schematic solution of the container group protection device in a container cluster of the present embodiment. It should be noted that the technical solution of the container group protection device in the container cluster is the same as the technical solution of the container group protection method in the container cluster, and details of the technical solution of the container group protection device in the container cluster, which are not described in detail, can be referred to the description of the technical solution of the container group protection method in the container cluster.
Fig. 5 illustrates a block diagram of a computing device 500 provided according to an embodiment of the present description. The components of the computing device 500 include, but are not limited to, a memory 510 and a processor 520. Processor 520 is coupled to memory 510 via bus 530, and database 550 is used to store data.
Computing device 500 also includes access device 540, access device 540 enabling computing device 500 to communicate via one or more networks 560. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 540 may include one or more of any type of network interface, e.g., a Network Interface Card (NIC), wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present description, the above-described components of computing device 500, as well as other components not shown in FIG. 5, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 5 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.
Computing device 500 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 500 may also be a mobile or stationary server.
Wherein processor 520 is configured to execute the following computer instructions:
monitoring an operational status of each container group in the container cluster;
determining a cluster state value for the container cluster based on the operating state of each container group and the expected state of the container cluster;
receiving an interrupt instruction for a first target container group;
rejecting the interrupt instruction if the cluster state value is inconsistent with the expected state.
The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the container group protection method in the container cluster described above belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the container group protection method in the container cluster described above.
An embodiment of the present specification also provides a computer readable storage medium storing computer instructions that, when executed by a processor, are operable to:
monitoring an operational status of each container group in the container cluster;
determining a cluster state value for the container cluster based on the operating state of each container group and the expected state of the container cluster;
receiving an interrupt instruction for a first target container group;
rejecting the interrupt instruction if the cluster state value is inconsistent with the expected state.
The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the container group protection method in the container cluster, and details of the technical solution of the storage medium, which are not described in detail, can be referred to the description of the technical solution of the container group protection method in the container cluster.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present disclosure is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present disclosure. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for this description.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the specification and its practical application, to thereby enable others skilled in the art to best understand the specification and its practical application. The specification is limited only by the claims and their full scope and equivalents.

Claims (12)

1. A method of container group shielding in a container cluster, comprising:
monitoring an operational status of each container group in the container cluster;
determining a cluster state value for the container cluster based on the operating state of each container group and the expected state of the container cluster;
receiving an interrupt instruction for a first target container group;
rejecting the interrupt instruction if the cluster state value is inconsistent with the expected state.
2. The method of container group guarding in a container cluster according to claim 1, the method further comprising:
and under the condition that the cluster state value is consistent with the expected state, executing the interrupt instruction, and updating the cluster state value according to an interrupt change state value corresponding to the interrupt instruction.
3. The container group guard method in a container cluster of claim 2, after receiving an interrupt instruction for a first target container group, the method further comprising:
judging whether the cluster state value meets the expected state;
if so, determining that the cluster state value is consistent with the expected state;
if not, determining that the cluster state value is inconsistent with the expected state.
4. The container group guard method in a container cluster of claim 3, said desired state comprising a minimum operational value;
determining whether the cluster state value satisfies the expected state includes:
judging whether the cluster state value is greater than 0 or not, and judging whether the cluster state value is greater than the minimum operation value or not;
determining that the cluster state value is consistent with the expected state if the cluster state value is greater than 0 and the cluster state value is greater than the minimum operational value;
determining that the cluster state value is inconsistent with the desired state if the cluster state value is less than or equal to 0 or the cluster state value is less than or equal to the minimum operational value.
5. The method of container group guarding in a container cluster according to claim 1, the method further comprising:
receiving an operating instruction aiming at a second target container group;
executing the second target container group in response to the execution instruction;
and under the condition that the running state of the second target container group is changed into running, updating the cluster state value according to the running change state value corresponding to the running instruction.
6. The container group guard method in a container cluster of claim 1, said desired state comprising a maximum unavailability;
the determining a cluster state value for the container cluster based on the operating state of each container group and the expected state of the container cluster comprises:
determining a cluster state value for the cluster of containers based on the operating state of each group of containers and the maximum unavailability.
7. The container group guard method in a container cluster of claim 6, said operational status comprising available or unavailable;
the determining a cluster state value for the cluster of containers based on the operating state of each group of containers and the maximum unavailability includes:
counting the number of container groups in the container cluster, and determining a maximum unavailable value according to the number of the container groups and the maximum unavailable rate;
counting an unavailable container group value of which the running state is unavailable in the container cluster;
determining the cluster state value based on the maximum unavailable value and the unavailable container group value.
8. The method of container group guarding in a container cluster according to any one of claims 1 to 7, monitoring an operational status of each container group in the container cluster, comprising:
receiving a creation instruction for a third target container group;
creating the third target container group in the container cluster in response to the creation instruction and creating a corresponding target container group guard object for the third target container group;
and under the condition that the target container group protection object is successfully established, acquiring a container group list and an expected state of the container cluster, and monitoring the running state of each container group in the container group list.
9. The container group guard method in a container cluster of claim 1, receiving an interrupt instruction for a first target container group, comprising:
receiving a modification instruction for the first target container group metadata and/or configuration file;
receiving a deletion instruction for a first target container group; or
An eviction instruction is received for a first target container set.
10. A container group guard in a container cluster, comprising:
a monitoring module configured to monitor an operational status of each container group in the container cluster;
a determination module configured to determine a cluster state value for the container cluster based on the operating state of each container group and the expected state of the container cluster;
an interrupt instruction receiving module configured to receive an interrupt instruction for a first target container group;
a rejection module configured to reject the interrupt instruction if the cluster state value is inconsistent with the expected state.
11. A computing device, comprising:
a memory and a processor;
the memory is configured to store computer instructions that the processor is configured to execute to implement the steps of the container group guarding method in the container cluster according to any one of claims 1 to 9.
12. A computer readable storage medium storing computer instructions which, when executed by a processor, carry out the steps of the container group guarding method in a container cluster according to any one of claims 1 to 9.
CN202110499084.8A 2021-05-08 2021-05-08 Container group protection method and device in container cluster Pending CN113297031A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110499084.8A CN113297031A (en) 2021-05-08 2021-05-08 Container group protection method and device in container cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110499084.8A CN113297031A (en) 2021-05-08 2021-05-08 Container group protection method and device in container cluster

Publications (1)

Publication Number Publication Date
CN113297031A true CN113297031A (en) 2021-08-24

Family

ID=77321170

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110499084.8A Pending CN113297031A (en) 2021-05-08 2021-05-08 Container group protection method and device in container cluster

Country Status (1)

Country Link
CN (1) CN113297031A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168951A (en) * 2022-02-11 2022-03-11 阿里云计算有限公司 Abnormality detection method and apparatus
CN114756261A (en) * 2022-03-23 2022-07-15 广域铭岛数字科技有限公司 Container cluster upgrading method and system, electronic equipment and medium
CN115510167A (en) * 2022-11-23 2022-12-23 安超云软件有限公司 Distributed database system and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666088A (en) * 2020-06-07 2020-09-15 中信银行股份有限公司 Pod replacement method and device, electronic equipment and computer-readable storage medium
CN111857953A (en) * 2020-07-17 2020-10-30 苏州浪潮智能科技有限公司 Container cluster management method, device, equipment and readable storage medium
CN111897558A (en) * 2020-07-23 2020-11-06 北京三快在线科技有限公司 Kubernets upgrading method and device for container cluster management system
CN112540829A (en) * 2020-12-16 2021-03-23 恒生电子股份有限公司 Container group eviction method, device, node equipment and storage medium
CN112559186A (en) * 2020-12-22 2021-03-26 北京云思畅想科技有限公司 Novel Kubernetes container resource expansion and contraction method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666088A (en) * 2020-06-07 2020-09-15 中信银行股份有限公司 Pod replacement method and device, electronic equipment and computer-readable storage medium
CN111857953A (en) * 2020-07-17 2020-10-30 苏州浪潮智能科技有限公司 Container cluster management method, device, equipment and readable storage medium
CN111897558A (en) * 2020-07-23 2020-11-06 北京三快在线科技有限公司 Kubernets upgrading method and device for container cluster management system
CN112540829A (en) * 2020-12-16 2021-03-23 恒生电子股份有限公司 Container group eviction method, device, node equipment and storage medium
CN112559186A (en) * 2020-12-22 2021-03-26 北京云思畅想科技有限公司 Novel Kubernetes container resource expansion and contraction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KEVIN YAN: "使用PDB避免Kubernetes集群中断", pages 1 - 10, Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/360521649> *
王思宇: "OpenKruise v1.2新增PersistentPodState实现有状态Pod拓扑固定与IP复用", pages 1 - 5, Retrieved from the Internet <URL:https://developer.aliyun.com/article/948815> *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168951A (en) * 2022-02-11 2022-03-11 阿里云计算有限公司 Abnormality detection method and apparatus
CN114168951B (en) * 2022-02-11 2022-08-16 阿里云计算有限公司 Abnormality detection method and apparatus
CN114756261A (en) * 2022-03-23 2022-07-15 广域铭岛数字科技有限公司 Container cluster upgrading method and system, electronic equipment and medium
CN114756261B (en) * 2022-03-23 2023-04-18 广域铭岛数字科技有限公司 Container cluster upgrading method and system, electronic equipment and medium
CN115510167A (en) * 2022-11-23 2022-12-23 安超云软件有限公司 Distributed database system and electronic equipment

Similar Documents

Publication Publication Date Title
US10255110B2 (en) Node selection for a new application in a multi-tenant cloud hosting environment
US10372435B2 (en) System, method and program product for updating virtual machine images
CN113297031A (en) Container group protection method and device in container cluster
US20130297668A1 (en) Application idling in a multi-tenant cloud-based application hosting environment
US20130297672A1 (en) Mechanism for tracking and utilizing facts about a node of a multi-tenant cloud hosting environment
CN106371889B (en) Method and device for realizing high-performance cluster system of scheduling mirror image
US8949415B2 (en) Activity-based virtual machine availability in a networked computing environment
CN110661842B (en) Resource scheduling management method, electronic equipment and storage medium
US10782949B2 (en) Risk aware application placement modeling and optimization in high turnover DevOps environments
JP7161560B2 (en) Artificial intelligence development platform management method, device, medium
CN113204353B (en) Big data platform assembly deployment method and device
CN112783607A (en) Task deployment method and device in container cluster
EP4206920A1 (en) Instance creation method, device and system
CN114461260A (en) Function calling method and device
CN112199200B (en) Resource scheduling method and device, computer equipment and storage medium
CN112286930A (en) Method, device, storage medium and electronic equipment for resource sharing of redis business side
CN111949442A (en) System and method for extensible backup services
CN111355605A (en) Virtual machine fault recovery method and server of cloud platform
US20220027201A1 (en) Resource and operation management on a cloud platform
US20230058193A1 (en) Computer system and storage medium
US20210377718A1 (en) Pattern affinity for discovery
CN116166413A (en) Lifecycle management for workloads on heterogeneous infrastructure
CN113326052A (en) Method and device for upgrading service component, computer equipment and storage medium
US11687269B2 (en) Determining data copy resources
US10298437B2 (en) Distributed data collection in an enterprise network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40059136

Country of ref document: HK

TA01 Transfer of patent application right

Effective date of registration: 20240228

Address after: # 03-06, Lai Zan Da Building 1, 51 Belarusian Road, Singapore

Applicant after: Alibaba Innovation Co.

Country or region after: Singapore

Address before: Room 01, 45th Floor, AXA Building, 8 Shanton Road, Singapore

Applicant before: Alibaba Singapore Holdings Ltd.

Country or region before: Singapore

TA01 Transfer of patent application right