CN111966467A

CN111966467A - Method and device for disaster recovery based on kubernetes container platform

Info

Publication number: CN111966467A
Application number: CN202010852391.5A
Authority: CN
Inventors: 刘娜
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2020-08-21
Filing date: 2020-08-21
Publication date: 2020-11-20
Anticipated expiration: 2040-08-21
Also published as: CN111966467B

Abstract

The invention provides a method and a device for disaster recovery based on a kubernets container platform, wherein the method comprises the following steps: respectively arranging a first backup agent component and a second backup agent component on a main and standby kubernets container platform; setting a first backup agent component to monitor pod operation information in the main container platform, and synchronizing the pod operation information to a second backup agent component of the standby container platform to serve as backup pod operation information; the pod operation information and the backup pod operation information both comprise the corresponding relation between the pod and the mount volume of the main storage; setting a second backup agent component to monitor the health state of the first backup agent component, creating a new pod according to the running information of the backup pod when the health state of the first backup agent component is monitored to be abnormal, and automatically identifying the mounted volume for data recovery by the standby storage of the new pod; the standby storage is synchronized with the main storage data, and the mount volume information is consistent.

Description

Method and device for disaster recovery based on kubernetes container platform

Technical Field

The invention belongs to the technical field of data center disaster recovery, and particularly relates to a method and a device for platform disaster recovery based on kubernetes containers.

Background

Kuberents is used as a container arranging and managing platform, the strong container automation operation and maintenance management capacity, large-scale cluster resource management and scheduling capacity are accepted by a plurality of enterprises, more and more systems are used as development platforms to expand or run own services based on the kuberents platform, and therefore, the kuberents is higher and higher in status along with the continuous development of container technology.

The scheme that a single data center of kubernets deploys and uses multiple control nodes guarantees a high-availability scheme of a cluster in a single data center scene, but when a data center fails, such as a machine room power failure, an earthquake and a fire disaster, all nodes of the data center are down, all services deployed on the kubernets cluster are interrupted due to the machine room failure, most importantly, service data can be lost due to the fact that an effective backup mechanism is not available, data backup in different places is achieved even if a data backup mechanism of storage hardware is used, hardware backup is only hard backup of file data and does not correspond to a management mechanism of kubernets for storage, and PODs cannot read backup data information even if the PODs are started in a new kubernets system.

This is a disadvantage of the prior art, and therefore, it is necessary to provide a method and an apparatus for kubernets container platform based disaster recovery.

Disclosure of Invention

The invention provides a method and a device for platform disaster recovery based on a kubernets container, aiming at the defect that backup data information cannot be read due to lack of a management mechanism for storage even if the existing data center based on kubernets in the prior art carries out hardware backup, so as to solve the technical problems.

In a first aspect, the present invention provides a method for disaster recovery based on a kubernetes container platform, comprising the following steps:

s1, setting a first backup agent component on a main kubernets container platform, and setting a second backup agent component on a backup kubernets container platform;

s2, setting a first backup agent component to monitor pod operation information in the main kubernetes container platform, and synchronizing the pod operation information to a second backup agent component of the backup kubernetes container platform to serve as backup pod operation information; the pod operation information and the backup pod operation information both comprise the corresponding relation between the pod and the mount volume of the main storage;

s3, setting a second backup agent component to monitor the health state of the first backup agent component, and starting backup recovery when the health state of the first backup agent component is monitored to be abnormal;

s4, setting a second backup agent component to create a new pod according to backup pod operation information, and enabling the new pod to automatically identify the mounted volume from the standby storage for data recovery; the standby storage is synchronized with the main storage data, and the mount volume information is consistent.

Further, the step S2 specifically includes the following steps:

s21, setting a first backup agent component to monitor each pod operation information recorded by a main etcd component in a main kubernets container platform;

and S22, setting a first backup agent component to synchronize the pod operation information to a second backup agent component of the backup kubernets container platform in an incremental synchronization mode to serve as backup pod operation information. And ensuring that the pod operation information acquired by the first backup agent component and the second backup agent component is consistent. And the first backup agent component synchronizes the pod operation information to the second backup agent component in a jason message mode.

Further, the pod operation information also includes resource quota and mirror information; the resource quota comprises cpu and memory information;

the correspondence relationship between the pod and the mount volume of the main storage includes the ID of the mount volume corresponding to the pod.

Further, the step S3 specifically includes the following steps:

s31, setting a second backup agent component to monitor the health state of the first backup agent component in real time through a heartbeat mechanism;

s32, when the second backup agent component detects that the pod operation information of the first backup agent component is failed to be received and the duration exceeds a set threshold value, judging that the health state of the first backup agent component is abnormal;

and S33, setting a second backup agent component to start backup recovery.

Further, the step S4 specifically includes the following steps:

s41, a second backup agent component is arranged to transmit backup pod operation information to a standby user interface of a standby kubernetes container platform;

and S42, setting a standby user interface to simulate a user creating process, creating a new pod according to the backup pod operation information, automatically identifying the mount volume associated with the new pod from the standby storage according to the mount volume ID, and recovering data. The new pod with the kubernets container platform takes over the service of the pod in the kubernets container platform.

Further, step S21 includes that the master user interface in the master kubernets container platform obtains the pod creation, deletion, and change operations of the user, manages the pod through the master lifecycle management component kebuelet, and stores the pod operation information to the master etcd component;

step S2 further includes:

s23, setting a main storage to synchronize the mounting volume data to a standby storage, and keeping the mounting volume ID of the main storage consistent with that of the standby storage;

in step S42, a backup user interface is set to simulate a user creation process, and a new pod is created through the backup lifecycle management component kubelelet according to the backup pod operation information. The main storage can adopt a real-time copy mode or a timing copy mode to synchronize the mount volume data to the standby storage.

In a second aspect, the present invention provides a device for disaster recovery based on a kubernets container platform, comprising:

the backup proxy component setting module is used for setting a first backup proxy component on a main kubernets container platform and setting a second backup proxy component on a backup kubernets container platform;

the pod operation information backup module is used for setting a first backup agent component to monitor pod operation information in the main kubernets container platform and synchronizing the pod operation information to a second backup agent component of the backup kubernets container platform to serve as backup pod operation information; the pod operation information and the backup pod operation information both comprise the corresponding relation between the pod and the mount volume of the main storage;

the health state monitoring module is used for setting a second backup agent component to monitor the health state of the first backup agent component, and starting backup recovery when the health state of the first backup agent component is monitored to be abnormal;

the pod and data recovery module is used for setting a second backup agent component to create a new pod according to the backup pod operation information, and the new pod automatically identifies the mounted volume from the standby storage to perform data recovery; the standby storage is synchronized with the main storage data, and the mount volume information is consistent.

Further, the pod operation information backup module comprises:

a pod operation information monitoring unit, configured to set a first backup proxy component to monitor each pod operation information recorded by a main etcd component in a main kubernets container platform;

and the pod operation information backup unit is used for setting the first backup agent component to synchronize the pod operation information to the second backup agent component of the backup kubernets container platform in an incremental synchronization mode to serve as the backup pod operation information.

Further, the health status monitoring module comprises:

the standby agent component monitoring unit is used for setting a second standby agent component to monitor the health state of the first standby agent component in real time through a heartbeat mechanism;

the health state abnormity determining unit is used for setting that when the second backup agent component detects that the pod operation information of the first backup agent component fails to be received and the duration exceeds a set threshold value, the health state abnormity of the first backup agent component is determined;

and the backup recovery starting unit is used for setting the second backup agent component to start backup recovery.

Further, the pod and data recovery module includes:

the backup pod operation information transmission unit is used for setting a second backup agent component to transmit the backup pod operation information to a standby user interface of a standby kubernets container platform;

and the pod creation simulation unit is used for setting a device user interface to simulate a user creation flow, creating a new pod according to the backup pod operation information, automatically identifying the mount volume associated with the new pod from the standby storage according to the mount volume ID, and recovering data.

The beneficial effect of the invention is that,

according to the method and the device for disaster recovery based on the kubernets container platform, the association relationship between the pod and the stored mount volume is added on the basis of hardware data backup, so that the association between the backup data and the pod is realized, the backed-up file data can be directly used when the pod is started in a new kubernets container, and the defect that the original kubernets container platform management service capability is insufficient is overcome.

In addition, the invention has reliable design principle, simple structure and very wide application prospect.

Therefore, compared with the prior art, the invention has prominent substantive features and remarkable progress, and the beneficial effects of the implementation are also obvious.

Drawings

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.

FIG. 1 is a first schematic flow chart of the method of the present invention;

FIG. 2 is a second schematic flow chart of the method of the present invention;

FIG. 3 is a schematic diagram of the system of the present invention;

in the figure, 1-backup agent component setting module; 2-pod operation information backup module; 2.1-pod operation information monitoring unit; 2.2-pod operation information backup unit; 3-a health status monitoring module; 3.1-Standby agent component monitoring Unit; 3.2-abnormal health status determination unit; 3.3-backup recovery starting unit; 4-pod and data recovery module; 4.1-backup pod operation information transfer unit; 4.2-pod creates the simulation unit.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Pod, in the kubernets cluster, a Pod is the basis for all traffic types, which is a combination of one or more containers. In a Pod, all containers are identically arranged and scheduled for a particular application, the Pod is their logical host, and the Pod contains multiple application containers that are business related.

In a distributed cluster management system, in a kubberlenes, a worker is operated on each node to manage the life cycle of a container, and the worker program is the kubbelet.

The method comprises the steps that etcd is a CoreOS open source project, the goal is to construct a high-availability distributed key value (key-value) database, a raft protocol is adopted in the etcd as a consistency algorithm, the etcd is realized based on Go language, and the etcd is equivalent to a database of a kubernets system and used for storing all service data generated in the management process of the kubernets system.

Example 1:

as shown in fig. 1, the present invention provides a method for disaster recovery based on a kubernetes container platform, comprising the following steps:

In some embodiments, the first backup component and the second backup proxy component are implemented in a go language that ensures interactivity between the components and the kubernets container platform.

Example 2:

as shown in fig. 2, the present invention provides a method for disaster recovery based on kubernets container platform, comprising the following steps:

s2, setting a first backup agent component to monitor pod operation information in the main kubernetes container platform, and synchronizing the pod operation information to a second backup agent component of the backup kubernetes container platform to serve as backup pod operation information; the pod operation information and the backup pod operation information both comprise the corresponding relation between the pod and the mount volume of the main storage; the pod operation information also comprises resource quota and mirror image information; the resource quota comprises cpu and memory information; the correspondence between the pod and the mount volume of the main storage comprises the ID of the mount volume corresponding to the pod; the method comprises the following specific steps:

s22, setting a first backup agent component to synchronize the pod operation information to a second backup agent component of a backup kubernets container platform in an incremental synchronization mode to serve as backup pod operation information;

s3, setting a second backup agent component to monitor the health state of the first backup agent component, and starting backup recovery when the health state of the first backup agent component is monitored to be abnormal; the method comprises the following specific steps:

s33, setting a second backup agent component to start backup recovery;

s4, setting a second backup agent component to create a new pod according to backup pod operation information, and enabling the new pod to automatically identify the mounted volume from the standby storage for data recovery; the standby storage is synchronous with the main storage data, and the mounting volume information is consistent; the method comprises the following specific steps:

and S42, setting a standby user interface to simulate a user creating process, creating a new pod according to the backup pod operation information, automatically identifying the mount volume associated with the new pod from the standby storage according to the mount volume ID, and recovering data.

In some embodiments, step S21 further includes that the master user interface in the master kubernets container platform obtains the pod creation, deletion, and change operations of the user, manages the pod through the master lifecycle management component kebuelet, and stores the pod operation information to the master etcd component;

in some embodiments, step S2 further includes:

s23, setting a main storage to synchronize the mounting volume data to a standby storage, and keeping the mounting volume ID of the main storage consistent with the mounting volume ID of the standby storage.

In some embodiments, in step S42, the backup user interface is configured to simulate a user creation process, and a new pod is created through the backup lifecycle management component kubelet according to the backup pod operation information.

In the kubernets cluster deployment process, a shared storage mode is adopted for storage, PODs in a cluster store business data generated in a business operation process to shared storage equipment in a mount volume mode, association relations between PODs and mount volumes are stored under the appointed purpose of control nodes in a file mode, each mount volume generates a folder named by a unique ID in a shared storage space, and data are stored under the mount volume folder with the association relations in the POD operation process.

Example 3:

as shown in fig. 3, the present invention provides a device for disaster recovery based on kubernets container platform, comprising:

the backup proxy component setting module 1 is used for setting a first backup proxy component on a main kubernets container platform and setting a second backup proxy component on a backup kubernets container platform;

the pod operation information backup module 2 is used for setting a first backup agent component to monitor pod operation information in the main kubernets container platform, and synchronizing the pod operation information to a second backup agent component of the backup kubernets container platform to serve as backup pod operation information; the pod operation information and the backup pod operation information both comprise the corresponding relation between the pod and the mount volume of the main storage; the pod operation information backup module 2 includes:

a pod operation information monitoring unit 2.1, configured to set a first backup proxy component to monitor each pod operation information recorded by a main etcd component in a main kubernets container platform;

a pod operation information backup unit 2.2, configured to set the first backup proxy component to synchronize the pod operation information to a second backup proxy component of the backup kubernets container platform in an incremental synchronization manner, where the second backup proxy component serves as backup pod operation information;

the health state monitoring module 3 is used for setting a second backup agent component to monitor the health state of the first backup agent component, and starting backup recovery when the health state of the first backup agent component is monitored to be abnormal; the health state monitoring module 3 includes:

the standby agent component monitoring unit 3.1 is used for setting a second standby agent component to monitor the health state of the first standby agent component in real time through a heartbeat mechanism;

the health state abnormity determining unit 3.2 is used for setting that when the second backup agent component detects that the pod operation information of the first backup agent component fails to be received and the duration exceeds a set threshold value, the health state abnormity of the first backup agent component is determined;

a backup recovery starting unit 3.3, configured to set a second backup proxy component to start backup recovery;

the pod and data recovery module 4 is used for setting a second backup agent component to create a new pod according to the backup pod operation information, and the new pod automatically identifies the mounted volume from the standby storage to perform data recovery; the standby storage is synchronous with the main storage data, and the mounting volume information is consistent; the pod and data recovery module 4 includes:

the backup pod operation information transmission unit 4.1 is used for setting a second backup agent component to transmit the backup pod operation information to a standby user interface of a standby kubernets container platform;

and the pod creation simulation unit 4.2 is used for setting a device user interface to simulate a user creation flow, creating a new pod according to the backup pod operation information, automatically identifying the mount volume associated with the new pod from the standby storage according to the mount volume ID, and recovering data.

Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method for disaster recovery based on a kubernets container platform is characterized by comprising the following steps:

2. The method of claim 1, wherein step S2 is as follows:

and S22, setting a first backup agent component to synchronize the pod operation information to a second backup agent component of the backup kubernets container platform in an incremental synchronization mode to serve as backup pod operation information.

3. The method of claim 2, wherein the pod operation information further includes resource quotas and mirror information; the resource quota comprises cpu and memory information;

4. The method of claim 1, wherein step S3 is as follows:

and S33, setting a second backup agent component to start backup recovery.

5. The method of claim 3, wherein step S4 is as follows:

6. The method according to claim 5, wherein the step S21 further includes the main user interface in the main kubernets container platform obtaining the pod creation, deletion and change operations of the user, managing the pod through the main lifecycle management component kebuelet, and storing the pod running information to the main etcd component;

step S2 further includes:

in step S42, a backup user interface is set to simulate a user creation process, and a new pod is created through the backup lifecycle management component kubelelet according to the backup pod operation information.

7. A device based on kubernets container platform disaster recovery, characterized by comprising:

the backup proxy component setting module (1) is used for setting a first backup proxy component on a main kubernets container platform and setting a second backup proxy component on a backup kubernets container platform;

the pod operation information backup module (2) is used for setting a first backup agent component to monitor pod operation information in the main kubernets container platform and synchronizing the pod operation information to a second backup agent component of the backup kubernets container platform to serve as backup pod operation information; the pod operation information and the backup pod operation information both comprise the corresponding relation between the pod and the mount volume of the main storage;

the health state monitoring module (3) is used for setting a second backup agent component to monitor the health state of the first backup agent component, and starting backup recovery when the health state of the first backup agent component is monitored to be abnormal;

the pod and data recovery module (4) is used for setting a second backup agent component to create a new pod according to the backup pod operation information, and the new pod automatically identifies the mounted volume from the standby storage to perform data recovery; the standby storage is synchronized with the main storage data, and the mount volume information is consistent.

8. The apparatus according to claim 7, wherein the pod operation information backup module (2) comprises:

a pod operation information monitoring unit (2.1) for setting a first backup agent component to monitor each pod operation information recorded by a main etcd component in a main kubernets container platform;

and the pod operation information backup unit (2.2) is used for setting the first backup agent component to synchronize the pod operation information to the second backup agent component of the backup kubernets container platform in an incremental synchronization mode to serve as backup pod operation information.

9. The apparatus according to claim 7, wherein the health monitoring module (3) comprises:

the standby agent component monitoring unit (3.1) is used for setting a second standby agent component to monitor the health state of the first standby agent component in real time through a heartbeat mechanism;

the health state abnormity determining unit (3.2) is used for setting that when the second backup agent component detects that the pod operation information of the first backup agent component fails to be received and the duration exceeds a set threshold value, the health state abnormity of the first backup agent component is determined;

and the backup recovery starting unit (3.3) is used for setting the second backup agent component to start backup recovery.

10. The apparatus according to claim 7, wherein the pod and data recovery module (4) comprises:

the backup pod operation information transmission unit (4.1) is used for setting a second backup agent component to transmit the backup pod operation information to a standby user interface of a standby kubernets container platform;

and the pod creation simulation unit (4.2) is used for setting a device user interface to simulate a user creation flow, creating a new pod according to the backup pod operation information, automatically identifying the mount volume associated with the new pod from the standby storage according to the mount volume ID, and recovering data.