CN110825490A

CN110825490A - Kubernetes container-based application health check method and system

Info

Publication number: CN110825490A
Application number: CN201911023195.0A
Authority: CN
Inventors: 赵凯麟; 王志雄; 韦克璐; 罗明; 谭林春
Original assignee: Guilin Dongxinyun Technology Co Ltd
Current assignee: Guilin Dongxinyun Technology Co Ltd
Priority date: 2019-10-25
Filing date: 2019-10-25
Publication date: 2020-02-21

Abstract

The invention discloses a Kubernetes container application health check method, belongs to the technical field of program control of software application operation, and particularly relates to program starting. The invention postpones the checking operation of the existing health checking probe by adding a new health checking probe so as to achieve the aim that the slow-start container can be quickly recovered from deadlock or other faults when being started, and also ensure that the container cannot be restarted by errors in a normal starting process.

Description

Kubernetes container-based application health check method and system

Technical Field

The invention relates to the technical field of program control of software application operation, in particular to program starting, and particularly relates to a Kubernetes container-based application health check method.

Background

The Kubernetes current container health check method is divided into two categories, namely a survival probe (liveness probe) and a ready probe (ready probe), wherein the survival probe indicates whether the container is running or not. If the survival probe fails, kubel will kill the container and the container will be affected by its restart policy. The ready probe indicates whether the container is ready for a service request. If the ready probing fails, the endpoint controller will delete the IP address of the Pod from all Service endpoints matching the Pod, i.e., will not continue to direct traffic to the Pod.

However, the two existing health check methods have difficulty solving the problem of starting slower applications. Firstly, the slow-start application may be killed and restarted by kubel due to a health check mechanism before the slow-start application is successfully started because the start time is too long; second, if the application deadlock occurs during the startup phase, it is possible to wait for a very long time before the application is restarted according to the existing health check configuration.

Disclosure of Invention

The invention aims to solve the problems and provide a method for applying health check based on a Kubernetes container, which postpones the check operation of the existing health check probe by adding a new health check probe so as to quickly recover from deadlock or other faults when a slow-start container is started and ensure that the container cannot be restarted by errors in a normal starting process.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a method of applying a health check based on a Kubernetes container, comprising:

s1, container start detection program: the Kubernetes component Kubelet periodically detects the health state of the container by adopting a starting probe;

s2, container start judging program: judging whether the container is started successfully according to the detection result of the starting probe; if the container is successfully started, executing the next program, if the container is failed to be started, restarting or killing the container, and ending the flow;

s3, container health check program: kubelet performs a corresponding health check on the container using a survival probe and a ready probe, respectively.

As an option, the details of step S1 are as follows:

s11, the Kubelet acquires the newly-built Pod information;

s12, starting a starting probe according to the probe configuration of the Pod;

s13, configuring a failure threshold and a detection period of the start probe; wherein, Kubelet configures the failure threshold and detection period of the start probe according to the failureThreshold and period of the start probe;

s14, starting the probe to select a checking method according to the checking configuration; the checking method comprises an http get method, a tcp method and an execution command method;

s15, periodically carrying out health examination by adopting the examination method; specifically, the probe is activated to perform a health check on the container every period seconds using the selected inspection method described above.

As an option, the details of step S2 are as follows:

s21, judging the detection result according to the status code returned by the health examination; if the current state code is a preset successful detection code, the detection result is successful detection, the container is started successfully, and the following S23 procedure is executed; if the current state code is a preset activity detection failure code, the detection result is activity detection failure, the container is not started successfully, and the following S22 procedure is executed;

s22, calculating the continuous activity detection failure times, judging whether the current activity detection failure times reach a failure threshold value, judging that the container is failed to start after the continuous failure failureThreshold times, restarting or killing the container, and ending the process; otherwise, continuously maintaining the current starting state containerstatus.

S23, updating the current starting state status field of the container from initial state to true, and executing the next procedure.

As an option, in step S3, the survival probe and the ready probe are activated simultaneously with the activation probe, and the survival probe and the ready probe are always in the inhibition state after activation, and the current activation state of the container is monitored, and when the container is successfully activated, the inhibition state of the survival probe and the ready probe is released, and the container is subjected to the corresponding health check after the container is successfully activated and the activated state of the container is updated from the initial state false to true.

As an option, in step S3, the survival probe and the ready probe are activated after the container is successfully activated and the current activation state status of the container is updated from initial state false to true, and then the containers are respectively subjected to corresponding health checks.

Due to the adoption of the technical scheme, the invention has the following beneficial effects:

1. the invention shortens the preparation time of the application with longer starting time and improves the starting efficiency.

2. When deadlock occurs in the starting stage of the application container, the container is killed as soon as possible and restarted, and the fault recovery speed is increased.

3. The health check at the time of starting is independent, so that the survival probe can be more timely found and restarted to recover when the container is in failure.

Drawings

FIG. 1 is a flow chart of a health check method of the present invention.

FIG. 2 is a flow chart of an example health check method of the present invention.

FIG. 3 is a schematic diagram of an operating system of an example health check method of the present invention.

FIG. 4 is a schematic diagram showing three probe states of an example of the health examination method of the present invention.

Detailed Description

The following further describes the embodiments of the present invention with reference to the drawings.

Example 1

As shown in fig. 1, the method for applying health check based on kubernets container of this embodiment includes the following steps:

step S1, container start detection program: the Kubernetes component Kubelet periodically detects the health state of the container by adopting a starting probe; the specific content of step S1 is as follows:

s11, the Kubelet acquires the newly-built Pod information;

s12, starting a starting probe according to the probe configuration of the Pod;

s14, starting the probe to select a checking method according to the checking configuration; the checking method comprises an http get method, a tcp method and an execution command method; the inspection method can be selected from one or a plurality of combinations, when the combination of the plurality of inspection methods is adopted, a priority level is set, the operation condition of the inspection method is checked according to the priority level and a self-checking program, and the inspection method is selected successively when an error occurs in operation;

Step S2, container start-up determination program: judging whether the container is started successfully according to the detection result of the starting probe; if the container is successfully started, executing the next program, if the container is failed to be started, restarting or killing the container, and ending the flow; wherein, the container is started successfully, and the starting probe is closed; if the container fails to start, the starting probe stops working, and other probes cannot perform subsequent detection; the specific content of step S2 is as follows:

s22, calculating the continuous activity detection failure times, judging whether the current activity detection failure times reach a failure threshold value, judging that the container is failed to start after the continuous failure failureThreshold times, restarting or killing the container, and ending the process; otherwise, the current starting state of the container, namely the started field, is continuously maintained at the initial state false, and the step returns to the step of S15 to execute the periodical health check program; the container is restarted, namely the container is killed and restarted, and the container is killed;

Step S3, container health check program: kubelet performs a corresponding health check on the container using a survival probe and a ready probe, respectively. The respective health check process of the containers by the survival probe and the ready probe is prior art and will not be described here.

As an option, in an embodiment, in step S3, the survival probe and the ready probe are activated simultaneously with the activation probe, and the survival probe and the ready probe are always in the inhibition state after activation, and the current activation state of the container is monitored, and when the container activation succeeds and the activated field is updated from the initial state false to true, the inhibition state of the survival probe and the ready probe is released in response to the field update operation or after the field update is monitored, and the container is subjected to the corresponding health check.

As an option, in an embodiment, in step S3, the survival probe and the ready probe are activated after the container is successfully activated and the current activation state status of the container is updated from initial state false to true, and then the containers are respectively subjected to corresponding health checks.

Of course, there are also health check situations where the survival probe or the ready probe is used alone, i.e., the probe is configured to be both the survival probe and the start probe, or both the ready probe and the start probe.

The following will be specifically exemplified: application health check method example based on Kubernetes container

As shown in fig. 3, in this embodiment, 3 Master nodes are used as control nodes, the control nodes are not responsible for running workloads, only some components of kubernets run on the kubernets in the form of containers, and the kubernets cluster loads and runs a plurality of modules, including an application program interface Server (API Server), a controller management controller (ControllerManager), and a Scheduler (Scheduler). The API Server on each Master Node is connected with the distributed database etcd and used for storing various resource configurations and states in the cluster.

As shown in fig. 4, the start Probe (start Probe) is run by the Kubelet assembly.

The API Server is a component attached to a kubernets cluster and can receive a request for creating a new container and a corresponding probe.

Container status is an additional attribute of the Container to indicate the current start-up state of the Container.

As shown in fig. 2, the kubernets container-based health check method comprises the following steps:

executing the step 1: assuming that the container's survival, ready, and start probes are all configured in this example, the Kubelet periodically checks the health of the container based on the configuration of the start probe to determine if the start was successful. During which the surviving and ready probes are inhibited.

Referring to fig. 4, the step 1 specifically includes:

and 11, the Kubelet acquires the container information needing to be newly built from the API Server.

12. The Kubelet activates three health check probes, respectively, depending on the health check probe configuration of the container. The three health check probes include a survival probe, a ready probe and a start probe.

13. Depending on the configuration of the start probe of the container, three ways may be selected for performing a health check on the container, including 1.http request ping, 2.tcp connection ping, and 3. command ping is performed in the container, which takes http request ping as an example.

14. The start probe sends an http get request to the configured container url and port every period seconds, if the returned status code is 200, the probe is considered to be successful, and the status field is set to true, which indicates that the container is successfully started. If the returned status code is not 200, then the probing is considered to have failed, and the status field is maintained as false.

The initial default state of the field of the current starting state of the container is a false state, which means that the container is not started successfully. The surviving probe and the ready probe will detect the start-up status of the container every period seconds according to the configured interval time, the start-up status during the start-up is always false, and the two probes keep the inhibition status.

And 2, executing the step 2, judging whether the container is started successfully according to the detection result of the start probe, and updating a container state (ContainerStatus) field of the Pod.

Referring to fig. 4, the step 2 specifically includes:

21. the start probe sends an http get request to the configured container url and port every period seconds, if the returned status code is 200, the probe is considered to be successful, and the status field is set to true, which indicates that the container is successfully started. When a successful detection occurs, the container is considered to have been successfully started. If the returned status code is not 200, then the probing is considered to have failed, and the status field is maintained as false.

22. When the detection of the activity in step 21 continuously fails failureThreshold times, it is determined that the container is failed to start, Kubelet will kill the container, and the container is controlled to restart according to a restart policy configured in advance for the container. Of course, whether to restart the container is configured by the pre-configured restart policy, such as to restart or not restart.

Step 3 is performed and the Kubelet's survival probe and ready probe decide whether to continue to maintain the inhibition state based on the continentalstatus.

Referring to fig. 4, the step 3 specifically includes:

kubelet activates both the survival probe and the ready probe after container creation, which monitor the contetainerstatus started field in a suppressed state prior to actual health check probing. The surviving and ready probes will each perform a corresponding health check on the container every period seconds, monitoring the containerstatus.

32. When the status field is monitored to be true, the survival probe and the ready probe are released from the inhibition state, and real probe operation on the container is started.

As shown in fig. 4, only if the container is configured with a start probe, the survival probe and ready probe will be inhibited by default of the containerstatus. When a start probe is not configured, the contenanterstatus.

As described above, if the initialdelayscontrols field is not configured, the prior art may cause a phenomenon that "the startup time is too long, and the application is killed and restarted by kubelet due to the health check mechanism if the startup is not successful" for the application with a long startup time. If the initialDelaySeconds field is configured and a longer time is set, then for an application with a longer start time, a problem may occur that "if the application has a deadlock phenomenon in the start phase, the application may wait for a very long time before being restarted according to the existing health check configuration"; that is, the linveniss probe waits for initialddelayseconcs seconds to start detecting, and then waits for period seconds, failureThreshold seconds to kill and restart the application.

Therefore, the method adds a starting probe as a new health check probe, the starting probe is actually in the same grade as the liveness probe and the access probe, the related configurable fields can be the same, the difference is that in the sequence of starting and checking operations, the starting probe starts to run first, the starting condition of the container is detected first, the liveness probe and the access probe are inhibited, and the health check operation is carried out by the liveness probe and the access probe after the container is started, so as to solve the problems.

As mentioned above, the method postpones the checking operation of the existing health check probe by adding a new start probe, so that the slow start container can be quickly recovered from deadlock or other faults during starting, and the container is also ensured not to be restarted by errors in a normal starting process. The method has the following specific advantages:

1. the time for preparing the application which consumes longer time for starting is shortened, and the starting efficiency is improved.

Example 2

Based on the above embodiment 1, the health check system based on the kubernets container application of the present embodiment will be described below, and for a detailed description, refer to the above embodiment 1.

The health check system based on the Kubernetes container application of the embodiment comprises the following contents:

the container starts the detection module: the health status of the container is periodically detected by the actuation probe used in the Kubernetes module Kubelet. Specifically, the container start detection module includes the following contents:

an acquisition module: the method is used for the Kubelet to acquire newly-built Pod information;

a starting module: for activating the start probe according to the probe configuration of the Pod;

a configuration module: a failure threshold and detection period for configuring a start probe; wherein, Kubelet configures the failure threshold and detection period of the start probe according to the failureThreshold and period of the start probe;

a selection module: for enabling the probe to select an inspection method according to the inspection configuration; the checking method comprises an httpget method, a tcp method and an execution command method;

a period checking module: the health examination is periodically carried out by adopting the examination method; specifically, the probe is activated to perform a health check on the container every period seconds using the selected inspection method described above.

The container starts the judging module: the detection device is used for judging whether the container is started successfully or not according to the detection result of the starting probe; and if the container is started successfully, executing a checking program of the container health checking module, and if the container is started unsuccessfully, restarting or killing the container and ending the flow. The container starting judgment module comprises the following contents:

a judging module: the health check module is used for judging a detection result according to the status code returned by the health check; if the current state code is a preset successful detection code, the detection result is successful detection, the container is started successfully, and the updating program of the following field updating module is executed; if the current state code is a preset activity detection failure code, the detection result is activity detection failure, the container is not started successfully, and the following judgment program of a failure judgment module is executed;

a failure determination module: the system is used for calculating the continuous activity detection failure times, judging whether the current activity detection failure times reach a failure threshold value or not, judging that the container is failed to start after the continuous failure failureThreshold times, restarting or killing the container and ending the process; otherwise, continuously maintaining the current starting state containerstatus field of the container in an initial state false, and returning to the periodical health check program for executing the periodical check module;

a field updating module: the checking procedure of the container health check module described below is performed to update the current starting state container status field from initial state false to true.

A container health check module: for Kubelet a corresponding health check was performed on the container using a survival probe and a ready probe, respectively.

As an option, in the container health check module, the survival probe and the ready probe are activated simultaneously with the start probe, and the survival probe and the ready probe are always in a suppression state after being activated, and monitor a current start state contained status.

As an option, in the container health check module, the survival probe and the ready probe are started after the container is successfully started and the current starting state status of the container is updated to true from the initial state false, and then the containers are respectively subjected to corresponding health check.

As mentioned above, the system postpones the checking operation of the existing health check probe by adding the start probe, so that the slow-start container can be quickly recovered from deadlock or other faults during starting, and the container is also ensured not to be restarted by errors in the normal starting process.

The foregoing description is directed to the details of preferred and exemplary embodiments of the invention, and not to the limitations defined thereby, which are intended to cover all modifications and equivalents of the invention as may come within the spirit and scope of the invention.

Claims

1. A method for applying a health check based on a Kubernetes container, comprising:

2. The method of applying a health check based on a kubernets container of claim 1, wherein: the specific content of step S1 is as follows:

s11, the Kubelet acquires the newly-built Pod information;

s12, starting a starting probe according to the probe configuration of the Pod;

3. The method of applying a health check based on a kubernets container of claim 1, wherein: the specific content of step S2 is as follows:

4. The method of applying a health check based on a kubernets container of claim 1, wherein: in step S3, the survival probe and the ready probe are activated simultaneously with the start probe, and the survival probe and the ready probe are always in the inhibition state after being activated, and the current start state contained status.

5. The method of applying a health check based on a kubernets container of claim 1, wherein: in step S3, the survival probe and the ready probe are activated after the container is successfully activated and the current activation status of the container is updated from initial status false to true, and then the containers are respectively subjected to corresponding health checks.

6. The Kubernetes container-based application health check system of claim 1, comprising:

the container starts the detection module: the Kubernetes component Kubelet is used for periodically detecting the health state of the container by adopting a starting probe;

the container starts the judging module: the detection device is used for judging whether the container is started successfully or not according to the detection result of the starting probe; if the container is started successfully, executing a checking program of the container health checking module, if the container is started unsuccessfully, restarting or killing the container and ending the flow;

7. The Kubernetes container application health check system as claimed in claim 6, wherein: the container start detection module comprises the following contents:

a selection module: for enabling the probe to select an inspection method according to the inspection configuration; the checking method comprises an http get method, a tcp method and an execution command method;

8. The Kubernetes container application health check system as claimed in claim 6, wherein: the container starting judgment module comprises the following contents:

a failure determination module: the system is used for calculating the continuous activity detection failure times, judging whether the current activity detection failure times reach a failure threshold value or not, judging that the container is failed to start after the continuous failure failureThreshold times, restarting or killing the container and ending the process; otherwise, continuously maintaining the current starting state containerstatus.

9. The Kubernetes container application health check system as claimed in claim 6, wherein: in the container health inspection module, a survival probe and a ready probe are started simultaneously with a starting probe, the survival probe and the ready probe are always in a suppression state after being started, a current starting state contained status field of the container is monitored, and when the container is successfully started and the contained status field is updated to true from initial state false, the inhibition state of the survival probe and the ready probe is released, and the container is respectively subjected to corresponding health inspection.

10. The Kubernetes container application health check system as claimed in claim 6, wherein: in the container health check module, the survival probe and the ready probe are started after the container is started successfully and the current starting state of the container is updated to true from the initial state false, and then the corresponding health check is performed on the container respectively.