WO2024103594A1 - 容器容灾方法、系统、装置、设备及计算机可读存储介质 - Google Patents

容器容灾方法、系统、装置、设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2024103594A1
WO2024103594A1 PCT/CN2023/084590 CN2023084590W WO2024103594A1 WO 2024103594 A1 WO2024103594 A1 WO 2024103594A1 CN 2023084590 W CN2023084590 W CN 2023084590W WO 2024103594 A1 WO2024103594 A1 WO 2024103594A1
Authority
WO
WIPO (PCT)
Prior art keywords
container
disaster recovery
storage system
cluster
data
Prior art date
Application number
PCT/CN2023/084590
Other languages
English (en)
French (fr)
Inventor
郭春庭
Original Assignee
济南浪潮数据技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 济南浪潮数据技术有限公司 filed Critical 济南浪潮数据技术有限公司
Publication of WO2024103594A1 publication Critical patent/WO2024103594A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation

Definitions

  • the present application relates to the technical field of disaster recovery, and in particular to a container disaster recovery method, system, device, equipment and computer-readable storage medium.
  • container-based applications are increasingly being adopted in enterprises, from non-core businesses to core businesses, and from stateless applications to stateful applications.
  • the core of this transformation is that container applications use more data persistence, and data persistence within the cluster is prone to disasters. At this time, it is necessary to build disaster recovery capabilities for container applications.
  • the present application provides a container disaster recovery method, which is applied to a first container cluster, including:
  • the container application metadata is copied to obtain first disaster recovery data, and the first disaster recovery data is stored in the first storage system, so that the second container cluster uses the first disaster recovery data in the first storage system to rebuild the container application and obtain the reconstructed container application;
  • the container service data is copied to obtain second disaster recovery data, and the second disaster recovery data is stored in the second storage system, so that the second container cluster uses the second disaster recovery data in the second storage system to restore the container service in the rebuilt container application to obtain the restored container service.
  • copying the container application metadata to obtain first disaster recovery data includes:
  • the container application metadata is fully replicated to obtain the first disaster recovery data
  • copying the container application metadata to obtain first disaster recovery data, and storing the first disaster recovery data in a first storage system includes:
  • the container application metadata of the container to be protected is copied to obtain first disaster recovery data, and the first disaster recovery data is added to a preset protection unit; in the preset protection unit, the identification information and the first disaster recovery data are stored correspondingly;
  • the preset protection unit is stored in the first storage system.
  • storing the second disaster recovery data in the second storage system includes:
  • the replication mode is centralized storage replication
  • the second disaster recovery data is stored in the centralized storage system of the second container cluster
  • a remote replication relationship is established between the centralized storage system of the first container cluster and the centralized storage system of the second container cluster, and the centralized storage system of the second container cluster is the second storage system;
  • the replication mode is distributed storage replication
  • the second disaster recovery data is stored in the distributed storage system
  • the distributed storage system is the second storage system
  • the second disaster recovery data is stored in the form of file blocks in the object storage system; the object storage system is the second storage system.
  • the present application provides another container disaster recovery method, which is applied to a second container cluster, including:
  • first disaster recovery data is retrieved from the first storage system, and the container application is rebuilt using the first disaster recovery data to obtain a rebuilt container application; wherein the first disaster recovery data is obtained by the first container cluster replicating its own container application metadata;
  • the second disaster recovery data is retrieved from the second storage system, and the container service is restored in the reconstructed container application using the second disaster recovery data to obtain the restored container service; wherein the second disaster recovery data is obtained by the first container cluster replicating its own container service data.
  • retrieving first disaster recovery data from a first storage system and using the first disaster recovery data to rebuild a container application to obtain a rebuilt container application includes:
  • the container application is rebuilt using the first disaster recovery data and each container application image to obtain a rebuilt container application.
  • retrieving the second disaster recovery data from the second storage system includes:
  • the second disaster recovery data is retrieved from the centralized storage system of the second container cluster.
  • the centralized storage system of the second container cluster has a remote replication relationship with the centralized storage system of the first container cluster.
  • the centralized storage system of the second container cluster is the second storage system.
  • the second disaster recovery data is retrieved from the distributed storage system;
  • the distributed storage system is the second storage system;
  • the second disaster recovery data in the form of file blocks is retrieved from the object storage system; the object storage system is the second storage system.
  • the present application provides another container disaster recovery method, which is applied to a container management platform, including:
  • the container cluster to be protected is configured according to the preset disaster recovery configuration information to obtain a first container cluster and a second container cluster;
  • a disaster recovery command is issued to the second container cluster, so that the second container cluster responds to the disaster recovery command and performs disaster recovery using the backup data.
  • the method before sending the disaster recovery command to the second container cluster, the method further includes:
  • a shutdown instruction is issued to the first container cluster to stop the operation of each container application in the first container cluster.
  • the container disaster recovery method further includes:
  • the backup disaster recovery information is stored in the platform storage system.
  • the present application further discloses a container disaster recovery system, including:
  • the container management platform is used to send a disaster recovery command to the first container cluster and send a disaster recovery command to the second container cluster;
  • the first container cluster is used to perform disaster recovery backup according to the disaster recovery backup command to obtain backup data
  • the second container cluster is used to respond to the disaster recovery command and perform disaster recovery using the backup data.
  • the present application further discloses a container disaster recovery device, which is applied to a first container cluster, and includes:
  • a backup command receiving module is used to receive the disaster recovery backup command issued by the container management platform
  • a first replication module is used to respond to the disaster recovery backup command, replicate the container application metadata to obtain first disaster recovery data, and store the first disaster recovery data in the first storage system, so that the second container cluster uses the first disaster recovery data in the first storage system to rebuild the container application and obtain the reconstructed container application;
  • the second replication module is used to replicate the container business data to obtain second disaster recovery data, and store the second disaster recovery data in the second storage system, so that the second container cluster uses the second disaster recovery data in the second storage system to restore the container business in the rebuilt container application to obtain the restored container business.
  • the present application further discloses another container disaster recovery device, which is applied to a second container cluster, including:
  • a recovery command receiving module is used to receive disaster recovery commands issued by the container management platform
  • a container application reconstruction module which is used to respond to the disaster recovery command, retrieve the first disaster recovery data from the first storage system, and use the first disaster recovery data to rebuild the container application to obtain a reconstructed container application; wherein the first disaster recovery data is obtained by the first container cluster replicating its own container application metadata;
  • the container business recovery module is used to retrieve the second disaster recovery data from the second storage system, and use the second disaster recovery data to recover the container business in the rebuilt container application to obtain the recovered container business; wherein the second disaster recovery data is obtained by the first container cluster copying its own container business data.
  • the present application also discloses another container disaster recovery device, which is applied to a container management platform, including:
  • a container cluster configuration module used to configure the container cluster to be protected according to preset disaster recovery configuration information to obtain a first container cluster and a second container cluster;
  • a first command issuing module is used to issue a disaster recovery backup command to the first container cluster, so that the first container cluster performs disaster recovery backup according to the disaster recovery backup command to obtain backup data;
  • the second command issuing module is used to issue a disaster recovery command to the second container cluster, so that the second container cluster responds to the disaster recovery command and performs disaster recovery using the backup data.
  • the present application further discloses a container disaster recovery device, including:
  • a processor is used to implement the steps of any one of the above container disaster recovery methods when executing a computer program.
  • the present application further discloses a non-volatile computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of any of the above container disaster recovery methods are implemented.
  • a container disaster recovery solution across container clusters is implemented by constructing a primary and backup container cluster and a container management platform.
  • One container cluster is used for normal business processing and responds to the command of the container management platform for disaster recovery backup.
  • the container application metadata and container business data generated by itself are backed up and stored; the other container cluster responds to the command of the container management platform for disaster recovery.
  • the backup data of the previous container cluster can be directly called to rebuild the container application and restore the container business. In this way, efficient and flexible container disaster recovery is achieved, which can effectively ensure the rapid recovery of container business.
  • FIG1 is a schematic diagram of the structure of a container disaster recovery system provided in some embodiments of the present application.
  • FIG2 is a schematic diagram of a process of a container disaster recovery method provided in some embodiments of the present application.
  • FIG3 is a schematic diagram of a flow chart of another container disaster recovery method provided in some embodiments of the present application.
  • FIG4 is a schematic diagram of a flow chart of another container disaster recovery method provided in some embodiments of the present application.
  • FIG5 is a working principle diagram of a disaster recovery protection unit state machine provided in some embodiments of the present application.
  • FIG6 is a schematic diagram of the structure of another container disaster recovery system provided in some embodiments of the present application.
  • FIG7 is a schematic diagram of a process flow of a container disaster recovery device provided in some embodiments of the present application.
  • FIG8 is a schematic diagram of a process flow of another container disaster recovery device provided in some embodiments of the present application.
  • FIG9 is a schematic diagram of a process flow of another container disaster recovery device provided by the present application in some embodiments.
  • FIG10 is a schematic diagram of the structure of a container disaster recovery device provided in some embodiments of the present application.
  • FIG. 11 is a schematic diagram of the structure of a non-volatile computer-readable storage medium provided in some embodiments of the present application.
  • the container disaster recovery method is applied to a container disaster recovery system.
  • FIG1 a schematic diagram of the structure of a container disaster recovery system provided in some embodiments of the present application is shown.
  • the container disaster recovery system includes a first container cluster 100, a second container cluster 200, and a container management platform 300.
  • the first container cluster 100 and the second container cluster 200 are deployed remotely.
  • the remote distance can be reasonably selected according to the bandwidth and latency requirements of the business.
  • the management platform 300 can be deployed at a third party location, or deployed together with one of the two container clusters.
  • the first container cluster 100 is the main container cluster, used to achieve disaster recovery and backup;
  • the second container cluster 200 is the backup container cluster, used to achieve disaster recovery (this is just an example, the first container cluster 100 and the second container cluster 200 can be the main and backup container clusters to each other);
  • the container management platform 300 is used to achieve container cluster management. Based on this container disaster recovery system, when a container cluster fails, the container application deployed on it will be switched to another container cluster to continue to provide services, thereby achieving the effect of disaster recovery.
  • FIG. 2 it is a flow chart of a container disaster recovery method provided in some embodiments of the present application.
  • the container disaster recovery method can be applied to a first container cluster, and includes the following S101 to S103 .
  • This step can be used to obtain a disaster recovery backup command, which is issued by the container management platform to instruct the first container cluster to perform a disaster recovery backup operation.
  • the disaster recovery backup command can be issued to the first container cluster while the first container cluster is started, so that the first container cluster can perform the disaster recovery backup operation while entering the running state.
  • the container application metadata is copied to obtain first disaster recovery data, and the first disaster recovery data is stored in the first storage system, so that the second container cluster uses the first disaster recovery data in the first storage system to rebuild the container application to obtain a reconstructed container application;
  • This step can be used to implement the replication and storage of container application metadata.
  • the first container cluster After receiving the disaster recovery backup command issued by the container management platform, it can immediately respond to the disaster recovery backup command, copy the container application metadata generated during its operation, obtain the copied container application metadata, that is, the above-mentioned first disaster recovery data, and store it in the first storage system.
  • the container application metadata is the metadata information of each container application in the first container cluster (it may be all container applications, or it may be a specified part of container applications, which can be determined by analyzing the disaster recovery backup command).
  • the first disaster recovery data can be directly retrieved from the first storage system. Since the container application metadata is the metadata information of each container application in the first container cluster, and the first disaster recovery data is obtained by copying the container application metadata, the second container cluster can use the first disaster recovery data to rebuild the container application and obtain the above-mentioned reconstructed container application.
  • the first storage system can adopt an object storage system.
  • S103 Copy the container service data to obtain second disaster recovery data, and store the second disaster recovery data in the second storage system, so that the second container cluster uses the second disaster recovery data in the second storage system to recover the container service in the reconstructed container application, and obtain the recovered container service.
  • This step can be used to implement the replication and storage of container business data.
  • the first container cluster after receiving the disaster recovery backup command issued by the container management platform, it can immediately respond to the disaster recovery backup command, copy the container business data generated during its operation, obtain the copied container business data, that is, the above-mentioned second disaster recovery data, and store it in the second storage system.
  • the container business data is the ongoing business data information of each container application in the first container cluster (it may be all container applications, or it may be a specified part of container applications, which can be determined by analyzing the disaster recovery backup command).
  • the second disaster recovery data can be directly retrieved from the second storage system. Since the container business data is the container business data of each container application in the first container cluster, and the second disaster recovery data is obtained by copying the container business data, the second container cluster can use the second disaster recovery data to restore the container business in the rebuilt container application and obtain the restored container business. At this point, the business switching between the first container cluster and the second container cluster is completed.
  • the second storage system can adopt an object storage system, a distributed storage system, or a centralized storage system.
  • first storage system and the second storage system can be the same storage system or different storage systems, and this application does not limit this.
  • the execution order of the replication storage of container application metadata in S102 and the replication storage of container business data in S103 is not unique. To ensure work efficiency, the two can be executed at the same time. The replication storage operations of the two can be executed periodically or in real time, and this application also does not limit this.
  • the container disaster recovery method realizes a container disaster recovery solution across container clusters by constructing a primary and backup container cluster and a container management platform.
  • One container cluster is used for normal business processing and performs disaster recovery backup in response to the command of the container management platform.
  • the container application metadata and container business data generated by itself are backed up and stored; the other container cluster responds to the command of the container management platform for disaster recovery.
  • the backup data of the previous container cluster can be directly called to rebuild the container application and restore the container business. In this way, efficient and flexible container disaster recovery is realized, which can effectively ensure the rapid recovery of container business.
  • the above-mentioned copying of the container application metadata to obtain the first disaster recovery data may include the following steps:
  • the number of replications in the first container cluster is obtained; when the number of replications is zero, the container application metadata is fully replicated to obtain the first disaster recovery data; when the number of replications is not zero, the container application metadata is incrementally replicated to obtain the first disaster recovery data.
  • the container application metadata therein may or may not change, which is determined by the container business performed therein. Based on this, in order to effectively reduce the amount of copied data, save resources such as network bandwidth, and improve backup efficiency, a full copy can be performed during the initial backup, and an incremental copy can be performed during non-initial backups.
  • the first container cluster can accumulate and save the number of data replications it has performed in real time.
  • the container application metadata needs to be replicated, it can first determine whether the number of replications recorded in its own record is zero. If it is zero, it means that this replication is the first backup, and its own container application metadata can be fully replicated; if it is not zero, it means that this replication is not the first backup, and its own container application metadata can be incrementally replicated.
  • the above-mentioned copying of the container application metadata to obtain the first disaster recovery data, and storing the first disaster recovery data in the first storage system may include the following steps:
  • a method for implementing replication and storage of container application metadata is provided.
  • the object of disaster recovery protection is a container application
  • a container application includes various types of resources in the cluster, such as deployment (a type of resource in k8s (Kubernetes), a stateless application), statefulset (a type of resource in k8s, a stateful application), PVC (PersistentVolumeClaim, container persistent volume) (a type of resource in k8s), etc.
  • Source here taking the k8s container cluster as an example
  • each resource includes multiple resource instances. Therefore, a disaster recovery protection unit can be designed for the container application, which can protect the data consistently.
  • the disaster recovery backup command may be parsed first to determine the container applications in the first container cluster that need to be backed up for disaster recovery, i.e., the container applications to be protected (which may be all container applications in the first container cluster, or may be partially executed container applications); secondly, the identification information of each container application to be protected is added to a preset protection unit.
  • This process may adopt a method of adding container application identification information one by one, or may add identification information according to a namesapce namespace in the cluster, wherein the latter may add identification information of all container applications in the namesapce namespace to the preset protection unit.
  • the identification information of the application container shall be unique, which may be a unique code, a unique name or an ID number, etc.; further, the container application metadata of each container application to be protected is copied to obtain first disaster recovery data, and the first disaster recovery data is stored in the preset protection unit in correspondence with each identification information in the preset protection unit, i.e., the corresponding identification information in the preset protection unit and the first disaster recovery data correspond to the same container application to be protected; finally, the preset protection unit is stored in the first storage system to realize disaster recovery backup of the container application metadata.
  • storing the second disaster recovery data in the second storage system may include the following steps:
  • the second container cluster can be designed to support multiple backend storage disaster recovery methods, including centralized storage, distributed storage, and local storage.
  • the first container cluster replicates and stores container business data, it can support three implementation methods: centralized storage replication, distributed storage replication, and local storage replication.
  • the second container cluster can support multiple backend storage disaster recovery methods that can be implemented using different plug-ins.
  • the currently specified replication mode can be determined according to the disaster recovery backup command, and then the first disaster recovery data can be stored in different storage systems according to different replication modes.
  • the second storage system may be a centralized storage system of the second container cluster.
  • corresponding centralized storage systems can be constructed for the first container cluster and the second container cluster, and a remote replication relationship can be established between the two to achieve remote synchronous replication between the two. Therefore, after the container business data is replicated to obtain the second disaster recovery data, it can be stored in the centralized storage system of the second container cluster through the remote replication relationship between the two centralized storage systems, so that the second container cluster can directly call it.
  • the second storage system can be a distributed storage system.
  • a distributed storage system can be created in advance, and both the first container cluster and the second container cluster can access its data. Therefore, after the container business data is replicated to obtain the second disaster recovery data, it can be directly stored in the distributed storage system for the second container cluster to call. It should be noted that the implementation of this process depends on the multi-copy mechanism of the distributed storage system.
  • the second storage system may be an object storage system.
  • an object storage system may be created in advance, and both the first container cluster and the second container cluster may access data thereon. Therefore, after the container service data is replicated to obtain the second disaster recovery data, it may be stored in the object storage system in the form of file blocks so that the second container cluster can call it.
  • FIG. 3 it is a flow chart of another container disaster recovery method provided in some embodiments of the present application.
  • the container disaster recovery method can be applied to the second container cluster, and includes the following S201 to S203 .
  • This step can be used to obtain a disaster recovery command, which is issued by the container management platform to instruct the second container cluster to perform a disaster recovery operation.
  • the disaster recovery command can be a planned command or an unplanned command.
  • the planned disaster recovery command is used to implement business switching between normal container clusters, and the unplanned disaster recovery command is used to implement business switching when a container cluster fails.
  • S202 Responding to the disaster recovery command, retrieving first disaster recovery data from the first storage system, and using the first disaster recovery data to rebuild the container application to obtain a rebuilt container application; wherein the first disaster recovery data is obtained by the first container cluster replicating its own container application metadata;
  • This step can be used to achieve container application reconstruction.
  • the first container cluster After receiving the disaster recovery command issued by the container management platform, it can immediately respond to the disaster recovery command and retrieve the first disaster recovery data from the first storage system.
  • the first disaster recovery data is obtained by the first container cluster copying its own container application metadata, and the container application metadata is the metadata information of the container application in the first container cluster. Therefore, the second container cluster can directly use the first disaster recovery data to rebuild the container application and obtain the above-mentioned reconstructed container application.
  • S203 retrieve second disaster recovery data from the second storage system, and use the second disaster recovery data to recover the container service in the reconstructed container application to obtain the recovered container service; wherein the second disaster recovery data is obtained by the first container cluster replicating its own container service data.
  • This step can be used to implement container business recovery. After the container application is rebuilt, the container business can be restored to effectively avoid container business interruption.
  • the second disaster recovery data can be directly retrieved from the second storage system.
  • the second disaster recovery data is obtained by the first container cluster copying its own container business data, and the container business data is the business data information of the container application in the first container cluster. Therefore, the second container cluster can directly use the second disaster recovery data to restore the container business and obtain the restored container business.
  • the container disaster recovery method realizes a container disaster recovery solution across container clusters by constructing a primary and backup container cluster and a container management platform.
  • One container cluster is used for normal business processing and performs disaster recovery backup in response to the command of the container management platform.
  • the container application metadata and container business data generated by itself are backed up and stored; the other container cluster responds to the command of the container management platform for disaster recovery.
  • the backup data of the previous container cluster can be directly called to rebuild the container application and restore the container business. In this way, efficient and flexible container disaster recovery is realized, which can effectively ensure the rapid recovery of container business.
  • retrieving the first disaster recovery data from the first storage system and using the first disaster recovery data to rebuild the container application to obtain the reconstructed container application may include:
  • the container application is rebuilt using the first disaster recovery data and each container application image to obtain a rebuilt container application.
  • a method for implementing reconstruction of a container application is provided.
  • a backup method with the protection unit as a whole can be adopted, and in the protection unit, the identification information of the container application and the first disaster recovery data are stored correspondingly.
  • the identification information of the container application to be protected can be first retrieved from the first storage system; in some embodiments, it can be retrieved from a protection unit in a storage system, and then the container application image corresponding to each identification information is pulled from the container management platform, and the container application image is used to realize the corresponding container application reconstruction, wherein the container management platform pre-stores the image data of each container application in each main container cluster (which may refer to the first container cluster).
  • the first disaster recovery data is retrieved from the first storage system, which can also be retrieved from a protection unit in a storage system; thus, the container application image and the container application metadata are combined to achieve container application reconstruction and obtain a reconstructed container application.
  • each container application image and the corresponding container application metadata can be first distributed to each cluster node in the first container cluster, and then the container application is reconstructed on the cluster node.
  • retrieving the second disaster recovery data from the second storage system may include the following steps:
  • the second disaster recovery data is retrieved from the centralized storage system of the second container cluster.
  • the centralized storage system of the second container cluster has a remote replication relationship with the centralized storage system of the first container cluster.
  • the centralized storage system of the second container cluster is the second storage system.
  • the second disaster recovery data is retrieved from the distributed storage system;
  • the distributed storage system is the second storage system;
  • the second disaster recovery data in the form of file blocks is retrieved from the object storage system; the object storage system is the second storage system.
  • the second container cluster can be designed to support multiple backend storage disaster recovery methods, including centralized storage, distributed storage, and local storage. Then, for different types of storage systems, different implementation methods can be used when retrieving the second disaster recovery data.
  • one storage method is centralized storage.
  • the second storage system may be a centralized storage system of the second container cluster.
  • corresponding centralized storage systems can be constructed for the first container cluster and the second container cluster, and a remote replication relationship can be established between the two to achieve remote synchronous replication between the two. Therefore, after the first container cluster replicates the container business data to obtain the second disaster recovery data, it can store it in the centralized storage system of the second container cluster through the remote replication relationship between the two centralized storage systems, and the second container cluster can directly call the second disaster recovery data in its own centralized storage system to recover the container business.
  • another storage method is distributed storage.
  • the second storage system may be a distributed storage system.
  • a distributed storage system can be created in advance, and both the first container cluster and the second container cluster can access its data. Therefore, after the first container cluster copies the container business data to obtain the second disaster recovery data, it can directly store it in the distributed storage system for the second container cluster to call. It should be noted that the implementation of this process depends on the multi-copy mechanism of the distributed storage system.
  • another storage method is local storage.
  • the second storage system may be an object storage system.
  • an object storage system can be created in advance, and both the first container cluster and the second container cluster can access its data. Therefore, after the first container cluster copies the container business data to obtain the second disaster recovery data, it can store it in the object storage system in the form of file blocks for the second container cluster to call.
  • the present application provides yet another container disaster recovery method.
  • FIG. 4 it is a flow chart of another container disaster recovery method provided in some embodiments of the present application.
  • the container disaster recovery method can be applied to a container management platform, and includes the following S301 to S303 .
  • S301 configuring a container cluster to be protected according to preset disaster recovery configuration information to obtain a first container cluster and a second container cluster;
  • This step can be used to implement disaster recovery configuration.
  • the target object is the container cluster to be protected.
  • disaster recovery configuration a first container cluster and a second container cluster that are mutually active and standby can be obtained.
  • the configuration process can be implemented according to preset disaster recovery configuration information, and the preset disaster recovery configuration information is set by technical personnel according to actual needs, and this application does not limit this.
  • the preset disaster recovery configuration information mainly includes disaster recovery cluster configuration information and disaster recovery protection unit information.
  • the disaster recovery cluster configuration information mainly includes the identification of the two clusters for disaster recovery and the storage information used by each cluster;
  • the disaster recovery protection unit information mainly includes the used disaster recovery configuration, the protected container application identification and the protection unit status, among which the protection unit state machine design is shown in Figure 5, which is a working principle diagram of a disaster recovery protection unit state machine provided in some embodiments of the present application.
  • S302 Send a disaster recovery backup command to the first container cluster, so that the first container cluster performs disaster recovery backup according to the disaster recovery backup command to obtain backup data;
  • This step can be used to implement the issuance of a disaster recovery backup command, and the disaster recovery backup command is issued to the first container cluster, so that the first container cluster responds to the disaster recovery backup command and performs disaster recovery backup.
  • the first container cluster performs disaster recovery backup mainly by backing up some data information during its operation, mainly including container application metadata and container business data, to obtain corresponding backup data. Then, the obtained backup data is stored in the corresponding storage system so that the second container cluster can directly call it.
  • S303 Send a disaster recovery command to the second container cluster, so that the second container cluster responds to the disaster recovery command and performs disaster recovery using the backup data.
  • This step can be used to implement the issuance of a disaster recovery command, and the disaster recovery command is issued to the second container cluster, so that the second container cluster responds to the disaster recovery command and performs disaster recovery.
  • the second container cluster performs disaster recovery mainly by rebuilding the operating state of the first container cluster within the cluster, mainly including container application reconstruction and container business recovery. Since the first container cluster has already backed up data in S302, the second container cluster can directly call the backup data in the storage system and perform disaster recovery in this step.
  • the container disaster recovery method realizes a container disaster recovery solution across container clusters by building a primary and backup container cluster and a container management platform.
  • One container cluster is used for normal business processing and responds to the command of the container management platform for disaster recovery backup.
  • the disaster recovery backup process it backs up the data information generated by itself; the other container cluster responds to the command of the container management platform for disaster recovery.
  • the disaster recovery process it can directly call the backup data of the previous container cluster for disaster recovery. In this way, efficient and flexible container disaster recovery is realized, which can effectively ensure the rapid recovery of container business. Quick recovery.
  • the method may further include sending a shutdown command to the first container cluster to stop the operation of each container application in the first container cluster.
  • the disaster recovery command issued by the container management platform can be a planned command or an unplanned command.
  • the planned disaster recovery command is used to implement business switching between normal container clusters
  • the unplanned disaster recovery command is used to implement business switching when the container cluster fails.
  • the container application in the first container cluster can be shut down to prevent new requests from entering and causing access errors, and at the same time, the consistency of the switching can be effectively guaranteed. Therefore, before issuing the disaster recovery command to the second container cluster, a shutdown command can be issued to the first container cluster so that the first container cluster responds to the disaster recovery command and stops the operation of each container application in the cluster.
  • the container disaster recovery method may further include: copying the platform disaster recovery information to obtain backup disaster recovery information; and storing the backup disaster recovery information in the platform storage system.
  • the platform disaster recovery information can be further backed up to obtain backup disaster recovery information and store it in the corresponding platform storage system.
  • the backup disaster recovery information in the platform storage system can be used to rebuild the container management platform.
  • the platform disaster recovery information can include two parts, one part is the management information of the container management platform itself, and the other part is the above-mentioned preset disaster recovery configuration information.
  • the container disaster recovery system may include:
  • the container management platform 300 is used to send a disaster recovery command to the first container cluster 100 and send a disaster recovery command to the second container cluster 200;
  • the first container cluster 100 is used to perform disaster recovery backup according to the disaster recovery backup command to obtain backup data;
  • the second container cluster 200 is used to respond to the disaster recovery command and perform disaster recovery using the backup data.
  • the container disaster recovery system realizes a container disaster recovery solution across container clusters by constructing a primary and backup container cluster and a container management platform.
  • One container cluster is used for normal business processing and performs disaster recovery backup in response to the command of the container management platform.
  • the data information generated by itself is backed up; the other container cluster responds to the command of the container management platform for disaster recovery.
  • the backup data of the previous container cluster can be directly called for disaster recovery. In this way, efficient and flexible container disaster recovery is realized, which can effectively ensure the rapid recovery of container business.
  • FIG. 6 is a structural diagram of another container disaster recovery system provided in some embodiments of the present application.
  • the container disaster recovery system shown in Figure 6 includes a main k8s cluster, a backup k8s cluster, a container management platform and various storage systems, wherein the main k8s cluster and the backup k8s cluster are used to realize cluster disaster recovery, the container management platform is used to realize the management of the main k8s cluster and the backup k8s cluster, and various storage systems are used to realize data storage.
  • the main k8s cluster includes a container application metadata replication module and a business data replication module
  • the backup k8s cluster includes an image preheating module, a container application replay module and a business data recovery module
  • the container management platform includes a container image service, a disaster recovery control module and a disaster recovery metadata replication module
  • the storage system includes an object storage system, a centralized storage system, a distributed storage system and a platform storage system (metadata backup shown in Figure 6).
  • This module is responsible for the operation of the entire disaster recovery process and calls other modules for disaster recovery according to the various requests received.
  • you need to design the disaster recovery configuration including the identification of the two container clusters that need to be disaster recovered, and The storage pool and storage type used by each cluster for disaster recovery are configured and used when creating a disaster recovery protection unit.
  • the two container clusters that need to be disaster-free are paired.
  • the database fields included in the pairing relationship can be as shown in Table 1:
  • Table 1 A disaster recovery configuration information table
  • the corresponding protection strategy can be executed according to the selected disaster recovery configuration.
  • the real-time transaction log backup technology of the database can be used to back up the data of the production database instance to the backup database instance in real time.
  • the data backed up here is the disaster recovery information of the above platform.
  • Container application metadata replication module
  • the protection unit is used as the basic unit to extract all the container application metadata in it and save it to the object storage system.
  • full replication is used for the first replication
  • incremental replication is used for subsequent replication.
  • Incremental replication is achieved by monitoring the events of metadata changes of all container applications in the protection unit, which can effectively reduce the amount of data to be replicated and save network bandwidth and other resources.
  • metadata changes are monitored by the event mechanism, real-time configuration modifications can be made to container applications that have enabled data protection, such as the number of copies, CPU (Central Processing Unit) and memory specifications.
  • plug-ins it mainly implements the replication of business data of multiple storage backends in the form of plug-ins.
  • different plug-ins are called to perform data replication operations according to the storage type in the disaster recovery configuration. It mainly includes:
  • the disaster recovery control module obtains the metadata of the container application that needs disaster recovery protection in the main k8s cluster, and extracts all container volumes (i.e. PVC) related to the business data, and then builds the PVC in the backup k8s cluster through the container application replay module. Building PVC in the k8s cluster is equivalent to building an actual storage volume in the storage system (centralized storage system), which corresponds to the PVC one by one.
  • PVC container volumes
  • the business data replication module can use the remote replication plug-in to establish a remote replication relationship between the storage volumes corresponding to the PVC in the primary and backup k8s clusters (equivalent to establishing a remote replication relationship between the centralized storage systems of the primary and backup k8s clusters), and start real-time synchronous replication of data to ensure that the data is copied to the centralized storage system corresponding to the backup k8s cluster without loss.
  • the two clusters are far apart and the latency and bandwidth are limited, you can choose the periodic asynchronous replication method.
  • the plug-in of this type in the business data replication module does not need to do additional processing. It only needs to detect that the distributed storage system has enabled multiple copies, and then rely on the distributed storage system's multi-copy mechanism to automatically synchronize the container business data from the local copy to the remote copy, which can achieve real-time synchronization without data loss.
  • container applications can use local disks as a low-cost solution for business data persistence.
  • the disaster recovery control module obtains the metadata of the container application that needs disaster recovery protection in the main k8s cluster, and extracts all container volumes related to the business data.
  • the business data replication module copies the data in the host directory corresponding to the container volume to the object storage system in the form of file blocks.
  • the first replication uses full replication, and subsequent replication uses incremental replication to reduce network bandwidth and object storage space.
  • the file-based replication method is also a periodic backup.
  • Container application replay module
  • the main purpose is to obtain the container application metadata of the main k8s cluster backed up in the object storage system, and restore the container application based on the container application metadata.
  • Different recovery strategies can be used according to different storage types.
  • Image preheating module
  • This module mainly reads the container application metadata backed up to the object storage system at regular intervals.
  • This module extracts the container image name (corresponding to the above identification information) used by all container applications in the protection unit, and initiates a pull image request to the container image service, and then distributes the container image to each node of the container cluster.
  • the disaster recovery control module controls the container application replay module to pull up the service, and the container application replay module obtains the container application metadata backed up in the object storage system and performs container application recovery:
  • the container application replay module restores all container application metadata, and then the business data recovery module pulls the file block backup in the object storage system to restore it locally, and copies it to the host file directory corresponding to the container volume.
  • the k8s container cluster is used as the overall fault domain.
  • the container application can be quickly restored in the container cluster in the other location.
  • the container application can be switched when both clusters are operating normally.
  • FIG. 7 it is a schematic diagram of the structure of a container disaster recovery device provided in some embodiments of the present application.
  • the container disaster recovery device can be applied to a first container cluster, including:
  • the backup command receiving module 1 is used to receive the disaster recovery backup command issued by the container management platform;
  • the first replication module 2 is used to respond to the disaster recovery backup command, replicate the container application metadata to obtain first disaster recovery data, and store the first disaster recovery data in the first storage system, so that the second container cluster uses the first disaster recovery data in the first storage system to rebuild the container application and obtain the reconstructed container application;
  • the second replication module 3 is used to replicate the container business data to obtain second disaster recovery data, and store the second disaster recovery data in the second storage system, so that the second container cluster uses the second disaster recovery data in the second storage system to restore the container business in the rebuilt container application to obtain the restored container business.
  • the first replication module 2 can be used to obtain the number of replications within the first container cluster; when the number of replications is zero, the container application metadata is fully replicated to obtain the first disaster recovery data; when the number of replications is not zero, the container application metadata is incrementally replicated to obtain the first disaster recovery data.
  • the above-mentioned first replication module 2 can be used to determine the container application to be protected in the first container cluster according to the disaster recovery backup command; add the identification information of the container to be protected to the preset protection unit; copy the container application metadata of the container to be protected to obtain the first disaster recovery data, and add the first disaster recovery data to the preset protection unit; in the preset protection unit, the identification information and the first disaster recovery data are stored correspondingly; and the preset protection unit is stored in the first storage system.
  • the second replication module 3 can be used to determine the replication mode according to the disaster recovery backup command; when the replication mode is centralized storage replication, the second disaster recovery data is stored in the centralized storage system of the second container cluster; the centralized storage system of the first container cluster and the centralized storage system of the second container cluster establish a remote replication relationship, and the centralized storage system of the second container cluster is the second storage system; when the replication mode is distributed storage replication, the second disaster recovery data is stored in the centralized storage system of the second container cluster; The second disaster recovery data is stored in the distributed storage system; the distributed storage system is the second storage system; when the replication mode is local storage replication, the second disaster recovery data is stored in the object storage system in the form of file blocks; the object storage system is the second storage system.
  • FIG8 it is a schematic diagram of the structure of another container disaster recovery device provided in some embodiments of the present application.
  • the container disaster recovery device can be applied to the second container cluster, including:
  • the recovery command receiving module 4 is used to receive the disaster recovery command issued by the container management platform
  • the container application reconstruction module 5 is used to respond to the disaster recovery command, retrieve the first disaster recovery data from the first storage system, and use the first disaster recovery data to rebuild the container application to obtain a reconstructed container application; wherein the first disaster recovery data is obtained by the first container cluster replicating its own container application metadata;
  • the container service recovery module 6 is used to retrieve the second disaster recovery data from the second storage system, and use the second disaster recovery data to recover the container service in the rebuilt container application to obtain the recovered container service; wherein the second disaster recovery data is obtained by the first container cluster copying its own container service data.
  • the above-mentioned container application reconstruction module 5 can be used to retrieve the identification information of the container application to be protected from the first storage system; in the first storage system, the identification information and the first disaster recovery data are stored correspondingly; the container application image corresponding to each identification information is pulled from the container management platform; the first disaster recovery data is retrieved from the first storage system; the container application is reconstructed using the first disaster recovery data and each container application image to obtain a reconstructed container application.
  • the above-mentioned container business recovery module 6 can be used to determine the storage mode according to the disaster recovery command; when the storage mode is centralized storage, the second disaster recovery data is retrieved from the centralized storage system of the second container cluster, the centralized storage system of the second container cluster has a remote replication relationship with the centralized storage system of the first container cluster, and the centralized storage system of the second container cluster is the second storage system; when the storage mode is distributed storage, the second disaster recovery data is retrieved from the distributed storage system; the distributed storage system is the second storage system; when the storage mode is local storage, the second disaster recovery data in the form of file blocks is retrieved from the object storage system; the object storage system is the second storage system.
  • FIG. 9 it is a schematic diagram of a structure of a container disaster recovery device provided in some embodiments of the present application.
  • the container disaster recovery device can be applied to a container management platform, including:
  • a container cluster configuration module 7 configured to configure the container cluster to be protected according to preset disaster recovery configuration information to obtain a first container cluster and a second container cluster;
  • a first command issuing module 8 is used to issue a disaster recovery backup command to the first container cluster, so that the first container cluster performs disaster recovery backup according to the disaster recovery backup command to obtain backup data;
  • the second command issuing module 9 is used to issue a disaster recovery command to the second container cluster, so that the second container cluster responds to the disaster recovery command and performs disaster recovery using the backup data.
  • the container disaster recovery device may further include a shutdown module, which is used to send a shutdown instruction to the first container cluster before sending the disaster recovery command to the second container cluster, so as to stop the operation of each container application in the first container cluster.
  • a shutdown module which is used to send a shutdown instruction to the first container cluster before sending the disaster recovery command to the second container cluster, so as to stop the operation of each container application in the first container cluster.
  • the container disaster recovery device may further include a backup module for copying the platform disaster recovery information to obtain backup disaster recovery information; and storing the backup disaster recovery information in the platform storage system.
  • the container disaster recovery device may include:
  • the processor is used to implement the steps of any one of the above container disaster recovery methods when executing a computer program.
  • FIG10 it is a schematic diagram of the composition structure of a container disaster recovery device, which may include: a processor 10, a memory 11, a communication interface 12, and a communication bus 13.
  • the processor 10, the memory 11, and the communication interface 12 communicate with each other through the communication bus 13.
  • the processor 10 may be a central processing unit (CPU), an application specific integrated circuit, a digital signal processor, a field programmable gate array, or other programmable logic devices.
  • CPU central processing unit
  • application specific integrated circuit a digital signal processor
  • field programmable gate array a field programmable gate array
  • the processor 10 may call a program stored in the memory 11. In some embodiments, the processor 10 may execute operations in an embodiment of the container disaster recovery method.
  • the memory 11 is used to store one or more programs, which may include program codes, and the program codes include computer operation instructions.
  • the memory 11 stores at least a program for implementing the following functions:
  • the container application metadata is copied to obtain first disaster recovery data, and the first disaster recovery data is stored in the first storage system, so that the second container cluster uses the first disaster recovery data in the first storage system to rebuild the container application and obtain the reconstructed container application;
  • the container service data is copied to obtain second disaster recovery data, and the second disaster recovery data is stored in the second storage system, so that the second container cluster uses the second disaster recovery data in the second storage system to restore the container service in the rebuilt container application to obtain the restored container service.
  • the memory 11 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application required for at least one function, etc.; the data storage area may store data created during use.
  • the memory 11 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk storage device or other volatile solid-state storage device.
  • the communication interface 12 may be an interface of a communication module, and is used to connect to other devices or systems.
  • the structure shown in FIG. 10 does not constitute a limitation on the container disaster recovery device.
  • the container disaster recovery device may include more or fewer components than those shown in FIG. 10 , or combine certain components.
  • a non-volatile computer-readable storage medium 1100 is provided in some embodiments of the present application.
  • a computer program 1101 is stored on the non-volatile computer-readable storage medium 1100.
  • the steps of any of the above-mentioned container disaster recovery methods can be implemented.
  • the non-volatile computer-readable storage medium 1100 may include: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and other media that can store program codes.
  • the steps of the method or algorithm described in conjunction with the embodiments disclosed herein may be implemented directly using hardware, a software module executed by a processor, or a combination of the two.
  • the software module may be placed in a random access memory (RAM), a memory, a read-only memory (ROM), an electrically programmable ROM, an electrically erasable programmable ROM, a register, a hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种容器容灾方法、系统、装置、设备及计算机可读存储介质,应用于容灾技术领域,方法应用于第一容器集群,包括接收容器管理平台下发的容灾备份命令;响应容灾备份命令,对容器应用元数据进行复制获得第一容灾数据,并将第一容灾数据存储至第一存储系统,以便第二容器集群利用第一存储系统中的第一容灾数据进行容器应用重建,获得重建容器应用;对容器业务数据进行复制获得第二容灾数据,并将第二容灾数据存储至第二存储系统,以便第二容器集群利用第二存储系统中的第二容灾数据在重建容器应用中进行容器业务恢复,获得恢复后的容器业务。本技术方案可以实现高效灵活的容器容灾,保障容器业务的快速恢复。

Description

容器容灾方法、系统、装置、设备及计算机可读存储介质
相关申请的交叉引用
本申请要求于2022年11月14日提交中国专利局,申请号为202211417472.8,申请名称为“容器容灾方法、系统、装置、设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及容灾技术领域,特别涉及一种容器容灾方法、系统、装置、设备及计算机可读存储介质。
背景技术
目前,基于容器的应用在企业中的采用度越来越高,从非核心业务到核心业务,从无状态应用到有状态应用,这一转变的核心在于容器应用更多的使用到了数据持久化,而在集群内的数据持久化很容易面临灾难问题,此时,则需要对容器应用进行容灾能力的建设。
传统的容灾方法一般依赖于应用本身自行处理,但如果应用厂商没有容灾能力则存在业务连续性不可保障的风险。另一种则是使用专门的容灾软件,但专门的容灾软件厂商一般是基于侵入式的代理来进行文件级复制,实时性和灵活性较差。
因此,如何实现高效灵活的容器容灾,保障容器业务的快速恢复是本领域技术人员亟待解决的问题。
发明内容
本申请的目的是提供一种容器容灾方法,该容器容灾方法可以实现高效灵活的容器容灾,保障容器业务的快速恢复;本申请的另一目的是提供一种容器容灾装置、系统、设备及计算机可读存储介质,均具有上述有益效果。
第一方面,本申请提供了一种容器容灾方法,应用于第一容器集群,包括:
接收容器管理平台下发的容灾备份命令;
响应容灾备份命令,对容器应用元数据进行复制获得第一容灾数据,并将第一容灾数据存储至第一存储系统,以便第二容器集群利用第一存储系统中的第一容灾数据进行容器应用重建,获得重建容器应用;
对容器业务数据进行复制获得第二容灾数据,并将第二容灾数据存储至第二存储系统,以便第二容器集群利用第二存储系统中的第二容灾数据在重建容器应用中进行容器业务恢复,获得恢复后的容器业务。
在一些实施例中,对容器应用元数据进行复制获得第一容灾数据,包括:
获取第一容器集群内的已复制次数;
当已复制次数为零时,对容器应用元数据进行全量复制,获得第一容灾数据;
当已复制次数不为零时,对容器应用元数据进行增量复制,获得第一容灾数据。
在一些实施例中,对容器应用元数据进行复制获得第一容灾数据,并将第一容灾数据存储至第一存储系统,包括:
根据容灾备份命令确定第一容器集群中的待保护容器应用;
将待保护容器的标识信息添加至预设保护单元;
对待保护容器的容器应用元数据进行复制,获得第一容灾数据,并将第一容灾数据添加至预设保护单元;在预设保护单元中,标识信息和第一容灾数据对应存储;
将预设保护单元存储至第一存储系统。
在一些实施例中,将第二容灾数据存储至第二存储系统,包括:
根据容灾备份命令确定复制方式;
当复制方式为集中式存储复制时,将第二容灾数据存储至第二容器集群的集中式存储系统;第一容器集群的集中式存储系统与第二容器集群的集中式存储系统建立有远程复制关系,第二容器集群的集中式存储系统为第二存储系统;
当复制方式为分布式存储复制时,将第二容灾数据存储至分布式存储系统;分布式存储系统为第二存储系统;
当复制方式为本地存储复制时,将第二容灾数据以文件块的形式存储至对象存储系统;对象存储系统为第二存储系统。
第二方面,本申请提供了另一种容器容灾方法,应用于第二容器集群,包括:
接收容器管理平台下发的容灾恢复命令;
响应容灾恢复命令,从第一存储系统中调取第一容灾数据,并利用第一容灾数据进行容器应用重建,获得重建容器应用;其中,第一容灾数据由第一容器集群对自身的容器应用元数据进行复制得到;
从第二存储系统中调取第二容灾数据,并利用第二容灾数据在重建容器应用中进行容器业务恢复,获得恢复后的容器业务;其中,第二容灾数据由第一容器集群对自身的容器业务数据进行复制得到。
在一些实施例中,从第一存储系统中调取第一容灾数据,并利用第一容灾数据进行容器应用重建,获得重建容器应用,包括:
从第一存储系统中调取待保护容器应用的标识信息;在第一存储系统中,标识信息和第一容灾数据对应存储;
从容器管理平台拉取各标识信息对应的容器应用镜像;
从第一存储系统中调取第一容灾数据;
利用第一容灾数据和各容器应用镜像进行容器应用重建,获得重建容器应用。
在一些实施例中,从第二存储系统中调取第二容灾数据,包括:
根据容灾恢复命令确定存储方式;
当存储方式为集中式存储时,从第二容器集群的集中式存储系统中调取第二容灾数据,第二容器集群的集中式存储系统与第一容器集群的集中式存储系统建立有远程复制关系,第二容器集群的集中式存储系统为第二存储系统;
当存储方式为分布式存储时,从分布式存储系统中调取第二容灾数据;分布式存储系统为第二存储系统;
当存储方式为本地存储时,从对象存储系统中调取文件块形式的第二容灾数据;对象存储系统为第二存储系统。
第三方面,本申请提供了又一种容器容灾方法,应用于容器管理平台,包括:
根据预设容灾配置信息对待保护容器集群进行配置,得到第一容器集群和第二容器集群;
下发容灾备份命令至第一容器集群,以使第一容器集群根据容灾备份命令进行容灾备份,得到备份数据;
下发容灾恢复命令至第二容器集群,以使第二容器集群响应容灾恢复命令,利用备份数据进行容灾恢复。
在一些实施例中,下发容灾恢复命令至第二容器集群之前,还包括:
下发停机指令至第一容器集群,以使第一容器集群中的各容器应用停止运行。
在一些实施例中,容器容灾方法还包括:
对平台容灾信息进行复制,获得备份容灾信息;
将备份容灾信息存储至平台存储系统。
第四方面,本申请还公开了一种容器容灾系统,包括:
容器管理平台,用于下发容灾备份命令至第一容器集群,以及下发容灾恢复命令至第二容器集群;
第一容器集群,用于根据容灾备份命令进行容灾备份,得到备份数据;
第二容器集群,用于响应容灾恢复命令,并利用备份数据进行容灾恢复。
第五方面,本申请还公开了一种容器容灾装置,应用于第一容器集群,包括:
备份命令接收模块,用于接收容器管理平台下发的容灾备份命令;
第一复制模块,用于响应容灾备份命令,对容器应用元数据进行复制获得第一容灾数据,并将第一容灾数据存储至第一存储系统,以便第二容器集群利用第一存储系统中的第一容灾数据进行容器应用重建,获得重建容器应用;
第二复制模块,用于对容器业务数据进行复制获得第二容灾数据,并将第二容灾数据存储至第二存储系统,以便第二容器集群利用第二存储系统中的第二容灾数据在重建容器应用中进行容器业务恢复,获得恢复后的容器业务。
第六方面,本申请还公开了另一种容器容灾装置,应用于第二容器集群,包括:
恢复命令接收模块,用于接收容器管理平台下发的容灾恢复命令;
容器应用重建模块,用于响应容灾恢复命令,从第一存储系统中调取第一容灾数据,并利用第一容灾数据进行容器应用重建,获得重建容器应用;其中,第一容灾数据由第一容器集群对自身的容器应用元数据进行复制得到;
容器业务恢复模块,用于从第二存储系统中调取第二容灾数据,并利用第二容灾数据在重建容器应用中进行容器业务恢复,获得恢复后的容器业务;其中,第二容灾数据由第一容器集群对自身的容器业务数据进行复制得到。
第七方面,本申请还公开了又一种容器容灾装置,应用于容器管理平台,包括:
容器集群配置模块,用于根据预设容灾配置信息对待保护容器集群进行配置,得到第一容器集群和第二容器集群;
第一命令下发模块,用于下发容灾备份命令至第一容器集群,以使第一容器集群根据容灾备份命令进行容灾备份,得到备份数据;
第二命令下发模块,用于下发容灾恢复命令至第二容器集群,以使第二容器集群响应容灾恢复命令,利用备份数据进行容灾恢复。
第八方面,本申请还公开了一种容器容灾设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行计算机程序时实现如上的任一种容器容灾方法的步骤。
第九方面,本申请还公开了一种非易失性计算机可读存储介质,非易失性计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现如上的任一种容器容灾方法的步骤。
应用本申请在一些实施例中所提供的技术方案,通过构建主备容器集群和容器管理平台实现了跨容器集群的容器容灾方案,一个容器集群用于进行正常的业务处理,并响应于容器管理平台的命令进行容灾备份,在容灾备份过程中,将自身产生的容器应用元数据和容器业务数据进行备份存储;另一个容器集群则响应容器管理平台的命令进行容灾恢复,在容灾恢复过程中,则可以直接调用前一容器集群的备份数据进行容器应用重建和容器业务恢复。由此,实现了高效灵活的容器容灾,可以有效保障容器业务的快速恢复。
附图说明
为了更清楚地说明现有技术和本申请实施例中的技术方案,下面将对现有技术和本申请实施例描述中需要使用的附图作简要的介绍。当然,下面有关本申请实施例的附图描述的仅仅是本申请中的一部分实施例,对于本领域普通技术人员来说,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图,所获得的其他附图也属于本申请的保护范围。
图1为本申请在一些实施例中所提供的一种容器容灾系统的结构示意图;
图2为本申请在一些实施例中所提供的一种容器容灾方法的流程示意图;
图3为本申请在一些实施例中所提供的另一种容器容灾方法的流程示意图;
图4为本申请在一些实施例中所提供的又一种容器容灾方法的流程示意图;
图5为本申请在一些实施例中所提供的一种容灾保护单元状态机的工作原理图;
图6为本申请在一些实施例中所提供的另一种容器容灾系统的结构示意图;
图7为本申请在一些实施例中所提供的一种容器容灾装置的流程示意图;
图8为本申请在一些实施例中所提供的另一种容器容灾装置的流程示意图;
图9为本申请在一些实施例中所提供的又一种容器容灾装置的流程示意图;
图10为本申请在一些实施例中所提供的一种容器容灾设备的结构示意图;
图11为本申请在一些实施例中所提供的一种非易失性计算机可读存储介质的结构示意图。
具体实施方式
本申请的核心是提供一种容器容灾方法,该容器容灾方法可以实现高效灵活的容器容灾,保障容器业务的快速恢复;本申请的另一核心是提供一种容器容灾装置、系统、设备及计算机可读存储介质,均具有上述有益效果。
为了对本申请实施例中的技术方案进行更加清楚、完整地描述,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行介绍。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,本申请在一些实施例中所提供的容器容灾方法应用于容器容灾系统,参考图1,为本申请在一些实施例中所提供的一种容器容灾系统的结构示意图,该容器容灾系统包括第一容器集群100、第二容器集群200以及容器管理平台300,第一容器集群100和第二容器集群200异地部署,异地距离可以根据业务需要的带宽和时延要求进行合理选择,容器 管理平台300可以部署于第三方的地方,也可以与两个容器集群中的一个部署到一起。其中,第一容器集群100为主容器集群,用于实现容灾备份;第二容器集群200为备容器集群,用于实现容灾恢复(此处仅做举例,第一容器集群100和第二容器集群200可以互为主备容器集群);容器管理平台300则用于实现容器集群管理。基于该容器容灾系统,当一个容器集群故障后,将部署在其上的容器应用切换到另外一个容器集群上,继续提供服务,即可实现容灾的效果。
参考图2,为本申请在一些实施例中所提供的一种容器容灾方法的流程示意图,该容器容灾方法可应用于第一容器集群,包括如下S101至S103。
S101:接收容器管理平台下发的容灾备份命令;
本步骤可以用于实现容灾备份命令的获取,该容灾备份命令由容器管理平台下发,用于指示第一容器集群执行容灾备份操作。在一些实施例中,为保证容器容灾系统的高可靠性,避免由于第一容器集群突发故障导致其中的容器业务中断,可以在第一容器集群启动的同时,将容灾备份命令下发至第一容器集群,以便于第一容器集群在进入运行状态的同时执行容灾备份操作。
S102:响应容灾备份命令,对容器应用元数据进行复制获得第一容灾数据,并将第一容灾数据存储至第一存储系统,以便第二容器集群利用第一存储系统中的第一容灾数据进行容器应用重建,获得重建容器应用;
本步骤可以用于实现容器应用元数据的复制与存储。对于第一容器集群而言,其在接收到容器管理平台下发的容灾备份命令之后,即可立即响应该容灾备份命令,对自身在运行过程中所产生的容器应用元数据进行复制,得到复制的容器应用元数据,也即上述第一容灾数据,并将其存储至第一存储系统。其中,容器应用元数据即为第一容器集群中各个容器应用(可能为所有的容器应用,也可能为指定的部分容器应用,可以根据容灾备份命令解析确定)的元数据信息。
进一步,对于第二容器集群而言,当需要进行容器容灾(如第一容器集群发生故障)时,即可直接从第一存储系统中调取得到第一容灾数据,由于容器应用元数据为第一容器集群中各个容器应用的元数据信息,而第一容灾数据由容器应用元数据复制得到,因此,第二容器集群则可以利用该第一容灾数据进行容器应用重建,获得上述重建容器应用。在一些实施例中,第一存储系统可以采用对象存储系统。
S103:对容器业务数据进行复制获得第二容灾数据,并将第二容灾数据存储至第二存储系统,以便第二容器集群利用第二存储系统中的第二容灾数据在重建容器应用中进行容器业务恢复,获得恢复后的容器业务。
本步骤可以用于实现容器业务数据的复制与存储。对于第一容器集群而言,其在接收到容器管理平台下发的容灾备份命令之后,即可立即响应该容灾备份命令,对自身在运行过程中所产生的容器业务数据进行复制,得到复制的容器业务数据,也即上述第二容灾数据,并将其存储至第二存储系统。其中,容器业务数据即为第一容器集群中各个容器应用(可能为所有的容器应用,也可能为指定的部分容器应用,可以根据容灾备份命令解析确定)正在进行的业务数据信息。
在一些实施例中,对于第二容器集群而言,当需要进行容器容灾(如第一容器集群发生故障)时,在完成容器应用重建之后,即可直接从第二存储系统中调取得到第二容灾数据, 由于容器业务数据为第一容器集群中各个容器应用的容器业务数据,而第二容灾数据由容器业务数据复制得到,因此,第二容器集群则可以利用该第二容灾数据在重建容器应用中进行容器业务恢复,获得恢复后的容器业务。至此,完成第一容器集群与第二容器集群之间的业务切换。在一些实施例中,第二存储系统可以采用对象存储系统或者分布式存储系统或者集中式存储系统。
需要说明的是,上述第一存储系统和第二存储系统可以为同一存储系统,也可以为不同的存储系统,本申请对此不做限定。此外,对于第一容器集群而言,S102中对容器应用元数据的复制存储和S103中对容器业务数据的复制存储的执行顺序并不唯一,为保证工作效率,二者可同时执行。二者的复制存储操作可以为定时执行,也可以为实时执行,本申请对此同样不做限定。
在一些实施例中所提供的容器容灾方法,通过构建主备容器集群和容器管理平台实现了跨容器集群的容器容灾方案,一个容器集群用于进行正常的业务处理,并响应于容器管理平台的命令进行容灾备份,在容灾备份过程中,将自身产生的容器应用元数据和容器业务数据进行备份存储;另一个容器集群则响应容器管理平台的命令进行容灾恢复,在容灾恢复过程中,则可以直接调用前一容器集群的备份数据进行容器应用重建和容器业务恢复。由此,实现了高效灵活的容器容灾,可以有效保障容器业务的快速恢复。
在一些实施例中,上述对容器应用元数据进行复制获得第一容灾数据,可以包括如下步骤:
获取第一容器集群内的已复制次数;当已复制次数为零时,对容器应用元数据进行全量复制,获得第一容灾数据;当已复制次数不为零时,对容器应用元数据进行增量复制,获得第一容灾数据。
可以理解的是,在第一容器集群的实际运行过程中,其中的容器应用元数据可能会发生变化,也可能不会发生变化,这是由其中所进行的容器业务所决定的,基于此,为有效减少复制的数据量,节省网络带宽等资源,提高备份效率,可以采用在初次备份时进行全量复制,在非初次备份时进行增量复制的方式。
在实际实现过程中,第一容器集群可以实时累计自身所进行的数据复制次数,并进行保存,当需要对容器应用元数据进行复制时,可以先判断自身记录的已复制次数是否为零,若为零,则可以说明此次复制为初次备份,则可以对自身的容器应用元数据进行全量复制;若不为零,则说明此次复制不是初次备份,则可以对自身的容器应用元数据进行增量复制。
在一些实施例中,上述对容器应用元数据进行复制获得第一容灾数据,并将第一容灾数据存储至第一存储系统,可以包括如下步骤:
根据容灾备份命令确定第一容器集群中的待保护容器应用;将待保护容器的标识信息添加至预设保护单元;对待保护容器的容器应用元数据进行复制,获得第一容灾数据,并将第一容灾数据添加至预设保护单元;在预设保护单元中,标识信息和第一容灾数据对应存储;将预设保护单元存储至第一存储系统。
在一些实施例中,提供了一种对容器应用元数据进行复制存储的实现方法。可以理解的是,容灾保护的对象为容器应用,一个容器应用包括集群中的deployment(k8s(Kubernetes)中的一类资源,无状态应用)、statefulset(k8s中的一类资源,有状态应用)、PVC(PersistentVolumeClaim,容器持久卷)(k8s中的一类资源)等各种类型的资 源(此处与k8s容器集群为例),每种资源又都包括多个资源实例,因此,可以对应容器应用设计一个容灾保护单元,该保护单元可以一致性的进行数据的保护。
在一些实施例中,可以先对容灾备份命令进行解析,以确定第一容器集群中需要进行容灾备份的容器应用,即上述待保护容器应用(可能为第一容器集群中的全部容器应用,也可能为部分执行的容器应用);其次,将各待保护容器应用的标识信息添加至预设保护单元,该过程可以采用逐个添加容器应用标识信息的方式,也可以根据集群中的namesapce命名空间进行标识信息添加,其中,后者可以将namesapce命名空间中所有容器应用的标识信息添加至预设保护单元,此外,应用容器的标识信息应当具有唯一性,可以为唯一编码、唯一命名或者ID号等;进一步,对各个待保护容器应用的容器应用元数据进行复制,得到第一容灾数据,并将其与预设保护单元中的各个标识信息相对应,存储至预设保护单元中,即预设保护单元中相互对应的标识信息和第一容灾数据对应于同一待保护容器应用;最后,将预设保护单元存储至第一存储系统,实现容器应用元数据的容灾备份。
在一些实施例中,上述将第二容灾数据存储至第二存储系统,可以包括如下步骤:
根据容灾备份命令确定复制方式;当复制方式为集中式存储复制时,将第二容灾数据存储至第二容器集群的集中式存储系统;第一容器集群的集中式存储系统与第二容器集群的集中式存储系统建立有远程复制关系,第二容器集群的集中式存储系统为第二存储系统;当复制方式为分布式存储复制时,将第二容灾数据存储至分布式存储系统;分布式存储系统为第二存储系统;当复制方式为本地存储复制时,将第二容灾数据以文件块的形式存储至对象存储系统;对象存储系统为第二存储系统。
可以理解的是,用户在建设容器云平台时,根据数据中心规划和成本,可能会使用不同的存储系统,为解决该问题,可以设计第二容器集群可支持多种后端存储的容灾方式,包括集中式存储、分布式存储以及本地存储。相对应的,第一容器集群在对容器业务数据进行复制存储时,则可以支持集中式存储复制、分布式存储复制以及本地存储复制三种实现方式。其中,第二容器集群可支持多种后端存储的容灾方式可以采用不同的插件实现。
在此基础上,在对容器业务数据进行复制得到第二容灾数据之后,可以先根据容灾备份命令确定当前指定的复制方式,然后针对不同的复制方式,将第一容灾数据存储至不同的存储系统中。
当复制方式为集中式存储复制时,第二存储系统可以为第二容器集群的集中式存储系统。
在一些实施例中,针对集中式存储方式,可以分别为第一容器集群和第二容器集群构建对应的集中式存储系统,并建立二者之间的远程复制关系,以实现二者之间的远程同步复制。因此,在对容器业务数据进行复制得到第二容灾数据之后,即可通过两个集中式存储系统之间的远程复制关系将其存储至第二容器集群的集中式存储系统中,以便第二容器集群直接调用。
当复制方式为分布式存储复制时,第二存储系统可以为分布式存储系统。在一些实施例中,针对分布式存储方式,可以预先创建分布式存储系统,第一容器集群和第二容器集群均可对其进行数据访问。因此,在对容器业务数据进行复制得到第二容灾数据之后,即可直接将其存储至分布式存储系统中,以便第二容器集群调用。需要说明的是,该过程的实现依赖于分布式存储系统的多副本机制。
当复制方式为本地存储复制时,第二存储系统可以为对象存储系统。在一些实施例中,针对本地存储方式,可以预先创建对象存储系统,第一容器集群和第二容器集群均可对其进行数据访问。因此,在对容器业务数据进行复制得到第二容灾数据之后,即可将其以文件块的形式存储至对象存储系统中,以便第二容器集群调用。
参考图3,为本申请在一些实施例中所提供的另一种容器容灾方法的流程示意图,该容器容灾方法可应用于第二容器集群,包括如下S201至S203。
S201:接收容器管理平台下发的容灾恢复命令;
本步骤可以用于实现容灾恢复命令的获取,该容灾恢复命令由容器管理平台下发,用于指示第二容器集群执行容灾恢复操作。容灾恢复命令可以是计划内下发的命令,也可以是计划外下发的命令,计划内下发的容灾恢复命令用于实现正常容器集群之间的业务切换,计划外下发的容灾恢复命令用于实现容器集群故障时的业务切换。
S202:响应容灾恢复命令,从第一存储系统中调取第一容灾数据,并利用第一容灾数据进行容器应用重建,获得重建容器应用;其中,第一容灾数据由第一容器集群对自身的容器应用元数据进行复制得到;
本步骤可以用于实现容器应用重建。对于第一容器集群而言,其在接收到容器管理平台下发的容灾恢复命令之后,即可立即响应该容灾恢复命令,从第一存储系统中调取得到第一容灾数据,该第一容灾数据是由第一容器集群对自身的容器应用元数据进行复制得到的,而容器应用元数据又是第一容器集群中容器应用的元数据信息,因此,第二容器集群可以直接利用该第一容灾数据进行容器应用重建,获得上述重建容器应用。
S203:从第二存储系统中调取第二容灾数据,并利用第二容灾数据在重建容器应用中进行容器业务恢复,获得恢复后的容器业务;其中,第二容灾数据由第一容器集群对自身的容器业务数据进行复制得到。
本步骤可以用于实现容器业务恢复,在完成容器应用重建之后,即可恢复其中的容器业务,以有效避免容器业务中断。在实现过程中,可以直接从第二存储系统中调取获得第二容灾数据,该第二容灾数据是由第一容器集群对自身的容器业务数据进行复制得到的,而容器业务数据又是第一容器集群中容器应用正在进行的业务数据信息,因此,第二容器集群可以直接利用该第二容灾数据进行容器业务恢复,得到恢复后的容器业务。
在一些实施例中所提供的容器容灾方法,通过构建主备容器集群和容器管理平台实现了跨容器集群的容器容灾方案,一个容器集群用于进行正常的业务处理,并响应于容器管理平台的命令进行容灾备份,在容灾备份过程中,将自身产生的容器应用元数据和容器业务数据进行备份存储;另一个容器集群则响应容器管理平台的命令进行容灾恢复,在容灾恢复过程中,则可以直接调用前一容器集群的备份数据进行容器应用重建和容器业务恢复。由此,实现了高效灵活的容器容灾,可以有效保障容器业务的快速恢复。
在一些实施例中,上述从第一存储系统中调取第一容灾数据,并利用第一容灾数据进行容器应用重建,获得重建容器应用,可以包括:
从第一存储系统中调取待保护容器应用的标识信息;在第一存储系统中,标识信息和第一容灾数据对应存储;
从容器管理平台拉取各标识信息对应的容器应用镜像;
从第一存储系统中调取第一容灾数据;
利用第一容灾数据和各容器应用镜像进行容器应用重建,获得重建容器应用。
在一些实施例中,提供了一种重建容器应用的实现方法。针对容器应用元数据的复制存储,可以采用以保护单元为整体的备份方式,并且,在保护单元中,容器应用的标识信息和第一容灾数据对应存储。
基于此,可以先从第一存储系统中调取待保护容器应用的标识信息;在一些实施例中,可以是从一存储系统中的保护单元中调取,然后从容器管理平台拉取各标识信息对应的容器应用镜像,该容器应用镜像用于实现相应的容器应用重建,其中,容器管理平台预存有各主容器集群(可以指第一容器集群)中各容器应用的镜像数据。
然后,继续从第一存储系统中调取第一容灾数据,同样可以是从一存储系统中的保护单元中调取;由此,结合容器应用镜像和容器应用元数据,即可实现容器应用重建,得到重建容器应用。其中,在结合容器应用镜像和容器应用元数据进行容器应用重建的过程中,可以是先将各个容器应用镜像以及对应的容器应用元数据分发到第一容器集群中的各个集群节点上,然后在集群节点上进行容器应用重建。
在一些实施例中,上述从第二存储系统中调取第二容灾数据,可以包括如下步骤:
根据容灾恢复命令确定存储方式;
当存储方式为集中式存储时,从第二容器集群的集中式存储系统中调取第二容灾数据,第二容器集群的集中式存储系统与第一容器集群的集中式存储系统建立有远程复制关系,第二容器集群的集中式存储系统为第二存储系统;
当存储方式为分布式存储时,从分布式存储系统中调取第二容灾数据;分布式存储系统为第二存储系统;
当存储方式为本地存储时,从对象存储系统中调取文件块形式的第二容灾数据;对象存储系统为第二存储系统。
在一些实施例中,用户在建设容器云平台时,根据数据中心规划和成本,可能会使用不同的存储系统。为解决该问题,可以设计第二容器集群可支持多种后端存储的容灾方式,包括集中式存储、分布式存储以及本地存储。那么,针对不同类型的存储系统,在进行第二容灾数据调取时,则可以采用不同的实现方式。
在一些实施例中,一种存储方式为集中式存储,此时,第二存储系统可以为第二容器集群的集中式存储系统。
针对集中式存储方式,可以分别为第一容器集群和第二容器集群构建对应的集中式存储系统,并建立二者之间的远程复制关系,以实现二者之间的远程同步复制。因此,第一容器集群在对容器业务数据进行复制得到第二容灾数据之后,即可通过两个集中式存储系统之间的远程复制关系将其存储至第二容器集群的集中式存储系统中,而第二容器集群则可以直接调用自身集中式存储系统中的第二容灾数据进行容器业务恢复。
在一些实施例中,另一种存储方式为分布式存储,此时,第二存储系统可以为分布式存储系统。
针对分布式存储方式,可以预先创建分布式存储系统,第一容器集群和第二容器集群均可对其进行数据访问。因此,第一容器集群在对容器业务数据进行复制得到第二容灾数据之后,即可直接将其存储至分布式存储系统中,以便第二容器集群调用。需要说明的是,该过程的实现依赖于分布式存储系统的多副本机制。
在一些实施例中,又一种存储方式为本地存储,此时,第二存储系统可以为对象存储系统。
针对本地存储方式,可以预先创建对象存储系统,第一容器集群和第二容器集群均可对其进行数据访问。因此,第一容器集群在对容器业务数据进行复制得到第二容灾数据之后,即可将其以文件块的形式存储至对象存储系统中,以便第二容器集群调用。
本申请在一些实施例中提供了又一种容器容灾方法。
参考图4,为本申请在一些实施例中所提供的又一种容器容灾方法的流程示意图,该容器容灾方法可应用于容器管理平台,包括如下S301至S303。
S301:根据预设容灾配置信息对待保护容器集群进行配置,得到第一容器集群和第二容器集群;
本步骤可以用于实现容灾配置;在进行容器容灾之前,需要对需要进行容器容灾的目标对象进行配置,此处目标对象为待保护容器集群,通过容灾配置,即可得到互为主备的第一容器集群和第二容器集群。
在一些实施例中,配置过程可以根据预设容灾配置信息实现,该预设容灾配置信息由技术人员根据实际需求进行设置,本申请对此不做限定。
在一些实施例中,预设容灾配置信息主要包括容灾集群配置信息和容灾保护单元信息。其中,容灾集群配置信息主要包括进行容灾的两个集群的标识和每个集群使用的存储信息;容灾保护单元信息主要包括使用的容灾配置、保护的容器应用标识以及保护单元状态,其中,保护单元状态机设计如图5所示,为本申请在一些实施例中所提供的一种容灾保护单元状态机的工作原理图。
S302:下发容灾备份命令至第一容器集群,以使第一容器集群根据容灾备份命令进行容灾备份,得到备份数据;
本步骤可以用于实现容灾备份命令的下发,将容灾备份命令下发至第一容器集群,使得第一容器集群响应该容灾备份命令,并进行容灾备份。
在一些实施例中,第一容器集群进行容灾备份主要是指对自身在运行过程中的一些数据信息进行备份,主要包括容器应用元数据和容器业务数据,得到相应的备份数据。然后,将得到的备份数据存储至相应的存储系统中,以便第二容器集群可以直接调用。
S303:下发容灾恢复命令至第二容器集群,以使第二容器集群响应容灾恢复命令,利用备份数据进行容灾恢复。
本步骤可以用于实现容灾恢复命令的下发,将容灾恢复命令下发至第二容器集群,使得第二容器集群响应该容灾恢复命令,并进行容灾恢复。
在一些实施例中,第二容器集群进行容灾恢复主要是在集群内重建第一容器集群中的运行状态,主要包括容器应用重建和容器业务恢复。由于S302中第一容器集群已经进行了数据备份,因此,该步骤中第二容器集群即可直接在存储系统中调用备份数据并进行荣在恢复。
在一些实施例中所提供的容器容灾方法,通过构建主备容器集群和容器管理平台实现了跨容器集群的容器容灾方案,一个容器集群用于进行正常的业务处理,并响应于容器管理平台的命令进行容灾备份,在容灾备份过程中,备份自身产生的数据信息;另一个容器集群则响应容器管理平台的命令进行容灾恢复,在容灾恢复过程中,则可以直接调用前一容器集群的备份数据进行容灾恢复。由此,实现了高效灵活的容器容灾,可以有效保障容器业务的快 速恢复。
在一些实施例中,上述下发容灾恢复命令至第二容器集群之前,还可以包括:下发停机指令至第一容器集群,以使第一容器集群中的各容器应用停止运行。
可以理解的是,容器管理平台下发的容灾恢复命令可以是计划内下发的命令,也可以是计划外下发的命令,计划内下发的容灾恢复命令用于实现正常容器集群之间的业务切换,计划外下发的容灾恢复命令用于实现容器集群故障时的业务切换。其中,当容灾恢复命令是在计划内下发时,在下发该容灾恢复命令之前,可以先关闭第一容器集群中的容器应用,以防止有新的请求进入造成访问错误的情况,同时也可以有效保证切换的一致性。因此,在下发容灾恢复命令至第二容器集群之前,可以先下发一个停机指令至第一容器集群,使得第一容器集群响应该容灾恢复命令,停止集群内各容器应用的运行。
在一些实施例中,该容器容灾方法还可以包括:对平台容灾信息进行复制,获得备份容灾信息;将备份容灾信息存储至平台存储系统。
可以想到的是,在容器管理平台运行过程中同样可能面临意外故障的情况,为解决该问题,还可以进一步对平台容灾信息进行备份,得到备份容灾信息,并存储至对应的平台存储系统。由此,当容器管理平台发生故障时,则可以利用平台存储系统中的备份容灾信息重建容器管理平台。其中,平台容灾信息可以包括两部分内容,一部分是容器管理平台本身的管理信息,另一部分则是上述预设容灾配置信息。
如图1所示,该容器容灾系统可以包括:
容器管理平台300,用于下发容灾备份命令至第一容器集群100,以及下发容灾恢复命令至第二容器集群200;
第一容器集群100,用于根据容灾备份命令进行容灾备份,得到备份数据;
第二容器集群200,用于响应容灾恢复命令,并利用备份数据进行容灾恢复。
在一些实施例中所提供的容器容灾系统,通过构建主备容器集群和容器管理平台实现了跨容器集群的容器容灾方案,一个容器集群用于进行正常的业务处理,并响应于容器管理平台的命令进行容灾备份,在容灾备份过程中,备份自身产生的数据信息;另一个容器集群则响应容器管理平台的命令进行容灾恢复,在容灾恢复过程中,则可以直接调用前一容器集群的备份数据进行容灾恢复。由此,实现了高效灵活的容器容灾,可以有效保障容器业务的快速恢复。
在此基础上,以k8s集群为例,参考图6,为本申请在一些实施例中所提供的另一种容器容灾系统的结构示意图。图6所示容器容灾系统,包括主k8s集群、备k8s集群、容器管理平台以及各种存储系统,其中,主k8s集群和备k8s集群用于实现集群容灾,容器管理平台用于实现主k8s集群和备k8s集群的管理,各种存储系统用于实现数据存储。其中,主k8s集群包括容器应用元数据复制模块和业务数据复制模块,备k8s集群包括镜像预热模块、容器应用重放模块以及业务数据恢复模块,容器管理平台包括容器镜像服务、容灾控制模块以及容灾元数据复制模块,存储系统包括对象存储系统、集中式存储系统、分布式存储系统以及平台存储系统(图6所示元数据备份)。基于各个功能模块实现容器容灾方法的流程如下:
1、容灾控制模块:
本模块负责整个容灾流程的运行,根据接收到的各类请求调用其他模块进行容灾。在进入容灾流程之前,需要先进行容灾配置设计,包括需要进行容灾的两个容器集群的标识,以 及每个集群容灾时采用的存储池和存储类型等,该配置在创建容灾保护单元时使用。其中,将需要进行容灾的两个容器集群进行配对设置,在一些实施例中,配对关系包括的数据库字段可以如表1所示:
表1,一种容灾配置信息表
那么,在进行容灾过程中,则可以根据选择的容灾配置执行对应的保护策略。
2、容灾元数据复制模块:
可以采用数据库的实时事务日志备份技术将生产数据库实例的数据实时备份到备份数据库实例。此处备份的数据即为上述平台容灾信息。
3、容器应用元数据复制模块:
主要是以保护单元为基本单位,将其内的所有容器应用元数据统一提取,并保存到对象存储系统中。在复制过程中,首次复制采用全量复制,后续复制采用增量复制,增量复制通过监听保护单元内所有容器应用元数据变化的事件实现,可以有效减少复制的数据量,节省网络带宽等资源。此外,正因为以事件机制监听元数据变化,所以可以对已经开启数据保护的容器应用进行实时配置修改,比如副本数、CPU(Central Processing Unit,中央处理器)和内存规格等。
4、业务数据复制模块:
主要是以插件的形式实现多种存储后端的业务数据的复制,在一些实施例中,则是根据容灾配置里的存储类型调用不同的插件去执行数据复制操作。主要包括:
4.1、基于集中式存储远程复制方式:容灾控制模块获取主k8s集群中需要容灾保护的容器应用元数据,并从中提取出与业务数据相关的所有容器卷(即PVC),然后在备k8s集群中通过容器应用重放模块构建PVC。在k8s集群中构建PVC,也就相当于在存储系统(集中式存储系统)中构建了实际的存储卷,与PVC一一对应。由此,业务数据复制模块即可利用远程复制方式的插件,将主备两个k8s集群中PVC对应的存储卷建立远程复制关系(相当于建立主备k8s集群的集中式存储系统的远程复制关系),并开启数据实时同步复制,保证数据无丢失的复制到备k8s集群对应的集中式存储系统中。此外,当两个集群距离较远时,时延和带宽受限时,可以选择周期异步复制的方式。
4.2、基于分布式存储多副本方式:业务数据复制模块中该类方式的插件无需做额外的处理,只需要检测分布式存储系统已经开启了多副本,然后依赖分布式存储系统的多副本机制,自动将容器业务数据在本地副本同步至异地副本上,可以做到实时同步,且无数据丢失。
4.3、基于远程文件级拷贝方式:在没有任何商业存储时,容器应用可以使用本地磁盘作为业务数据持久化的低成本方案。此方式下,容灾控制模块获取主k8s集群中需要容灾保护的容器应用元数据,并从中提取出与业务数据相关的所有容器卷。业务数据复制模块则将容器卷对应的主机目录下的数据以文件块的形式复制到对象存储系统中,其中,首次复制采用全量复制,后续复制采用增量复制,以减少网络带宽和对象存储空间。另外,基于文件的复制方式,同样也是周期性的备份。
5、容器应用重放模块:
主要是获取对象存储系统中备份的主k8s集群的容器应用元数据,并基于该容器应用元数据进行容器应用恢复。其中,可以根据不同的存储类型,使用不同的恢复策略。
6、业务数据恢复模块:
主要是根据不同的存储类型,选择不同的插件去实现业务数据的恢复。
7、镜像预热模块:
主要是定时读取备份到对象存储系统中的容器应用元数据,该模块会将保护单元中所有容器应用使用到的容器镜像名称(对应于上述标识信息)提取出来,并向容器镜像服务发起拉取镜像请求,然后将容器镜像分发到容器集群的各个节点。基于此,在进行集群业务切换时,容灾控制模块控制容器应用重放模块进行业务拉起,容器应用重放模块获取对象存储系统中备份的容器应用元数据并进行容器应用恢复:选择集中式存储远程复制方式时,在进行容器应用恢复时,需要过滤掉PVC,因为该方式下,PVC已经创建并且不能覆盖,否则数据丢失;选择分布式存储多副本方式时,可以将全部容器应用元数据进行恢复,备k8s会自动寻找PVC对应的分布式存储在本地的副本进行业务数据恢复;选择远程文件拷贝方式时,容器应用重放模块将全部容器应用元数据进行恢复,然后,业务数据恢复模块拉取对象存储系统中的文件块备份在本地进行恢复,并复制到容器卷对应的主机文件目录中。
对于上述容器容灾系统,以k8s容器集群为整体的故障域,当两地的一个容器集群故障后,可以将容器应用在异地的容器集群进行快速恢复,也可以在两个集群都正常运行的情况,实现容器应用的切换。
参考图7,为本申请在一些实施例中所提供的一种容器容灾装置的结构示意图,该容器容灾装置可应用于第一容器集群,包括:
备份命令接收模块1,用于接收容器管理平台下发的容灾备份命令;
第一复制模块2,用于响应容灾备份命令,对容器应用元数据进行复制获得第一容灾数据,并将第一容灾数据存储至第一存储系统,以便第二容器集群利用第一存储系统中的第一容灾数据进行容器应用重建,获得重建容器应用;
第二复制模块3,用于对容器业务数据进行复制获得第二容灾数据,并将第二容灾数据存储至第二存储系统,以便第二容器集群利用第二存储系统中的第二容灾数据在重建容器应用中进行容器业务恢复,获得恢复后的容器业务。
在一些实施例中,上述第一复制模块2可用于获取第一容器集群内的已复制次数;当已复制次数为零时,对容器应用元数据进行全量复制,获得第一容灾数据;当已复制次数不为零时,对容器应用元数据进行增量复制,获得第一容灾数据。
在一些实施例中,上述第一复制模块2可用于根据容灾备份命令确定第一容器集群中的待保护容器应用;将待保护容器的标识信息添加至预设保护单元;对待保护容器的容器应用元数据进行复制,获得第一容灾数据,并将第一容灾数据添加至预设保护单元;在预设保护单元中,标识信息和第一容灾数据对应存储;将预设保护单元存储至第一存储系统。
在一些实施例中,上述第二复制模块3可用于根据容灾备份命令确定复制方式;当复制方式为集中式存储复制时,将第二容灾数据存储至第二容器集群的集中式存储系统;第一容器集群的集中式存储系统与第二容器集群的集中式存储系统建立有远程复制关系,第二一容器集群的集中式存储系统为第二存储系统;当复制方式为分布式存储复制时,将第二容灾数 据存储至分布式存储系统;分布式存储系统为第二存储系统;当复制方式为本地存储复制时,将第二容灾数据以文件块的形式存储至对象存储系统;对象存储系统为第二存储系统。
对于本申请在一些实施例中提供的装置的介绍请参照上述方法实施例,本申请在此不做赘述。
参考图8,为本申请在一些实施例中所提供的另一种容器容灾装置的结构示意图,该容器容灾装置可应用于第二容器集群,包括:
恢复命令接收模块4,用于接收容器管理平台下发的容灾恢复命令;
容器应用重建模块5,用于响应容灾恢复命令,从第一存储系统中调取第一容灾数据,并利用第一容灾数据进行容器应用重建,获得重建容器应用;其中,第一容灾数据由第一容器集群对自身的容器应用元数据进行复制得到;
容器业务恢复模块6,用于从第二存储系统中调取第二容灾数据,并利用第二容灾数据在重建容器应用中进行容器业务恢复,获得恢复后的容器业务;其中,第二容灾数据由第一容器集群对自身的容器业务数据进行复制得到。
在一些实施例中,上述容器应用重建模块5可用于从第一存储系统中调取待保护容器应用的标识信息;在第一存储系统中,标识信息和第一容灾数据对应存储;从容器管理平台拉取各标识信息对应的容器应用镜像;从第一存储系统中调取第一容灾数据;利用第一容灾数据和各容器应用镜像进行容器应用重建,获得重建容器应用。
在一些实施例中,上述容器业务恢复模块6可用于根据容灾恢复命令确定存储方式;当存储方式为集中式存储时,从第二容器集群的集中式存储系统中调取第二容灾数据,第二容器集群的集中式存储系统与第一容器集群的集中式存储系统建立有远程复制关系,第二容器集群的集中式存储系统为第二存储系统;当存储方式为分布式存储时,从分布式存储系统中调取第二容灾数据;分布式存储系统为第二存储系统;当存储方式为本地存储时,从对象存储系统中调取文件块形式的第二容灾数据;对象存储系统为第二存储系统。
对于本申请在一些实施例中提供的装置的介绍请参照上述方法实施例,本申请在此不做赘述。
参考图9,为本申请在一些实施例中所提供的一种容器容灾装置的结构示意图,该容器容灾装置可应用于容器管理平台,包括:
容器集群配置模块7,用于根据预设容灾配置信息对待保护容器集群进行配置,得到第一容器集群和第二容器集群;
第一命令下发模块8,用于下发容灾备份命令至第一容器集群,以使第一容器集群根据容灾备份命令进行容灾备份,得到备份数据;
第二命令下发模块9,用于下发容灾恢复命令至第二容器集群,以使第二容器集群响应容灾恢复命令,利用备份数据进行容灾恢复。
在一些实施例中,该容器容灾装置还可以包括停机模块,用于在上述下发容灾恢复命令至第二容器集群之前,下发停机指令至第一容器集群,以使第一容器集群中的各容器应用停止运行。
在一些实施例中,该容器容灾装置还可以包括备份模块,用于对平台容灾信息进行复制,获得备份容灾信息;将备份容灾信息存储至平台存储系统。
对于本申请在一些实施例中提供的装置的介绍请参照上述方法实施例,本申请在此不做 赘述。
参考图10,为本申请在一些实施例中所提供的一种容器容灾设备的结构示意图,该容器容灾设备可包括:
存储器,用于存储计算机程序;
处理器,用于执行计算机程序时可实现如上述任意一种容器容灾方法的步骤。
如图10所示,为容器容灾设备的组成结构示意图,容器容灾设备可以包括:处理器10、存储器11、通信接口12和通信总线13。处理器10、存储器11、通信接口12均通过通信总线13完成相互间的通信。
在本申请在一些实施例中,处理器10可以为中央处理器(Central Processing Unit,CPU)、特定应用集成电路、数字信号处理器、现场可编程门阵列或者其他可编程逻辑器件等。
处理器10可以调用存储器11中存储的程序,在一些实施例中,处理器10可以执行容器容灾方法的实施例中的操作。
存储器11中用于存放一个或者一个以上程序,程序可以包括程序代码,程序代码包括计算机操作指令,在本申请在一些实施例中,存储器11中至少存储有用于实现以下功能的程序:
接收容器管理平台下发的容灾备份命令;
响应容灾备份命令,对容器应用元数据进行复制获得第一容灾数据,并将第一容灾数据存储至第一存储系统,以便第二容器集群利用第一存储系统中的第一容灾数据进行容器应用重建,获得重建容器应用;
对容器业务数据进行复制获得第二容灾数据,并将第二容灾数据存储至第二存储系统,以便第二容器集群利用第二存储系统中的第二容灾数据在重建容器应用中进行容器业务恢复,获得恢复后的容器业务。
在一些实施例中,存储器11可包括存储程序区和存储数据区,其中,存储程序区可存储操作系统,以及至少一个功能所需的应用程序等;存储数据区可存储使用过程中所创建的数据。
此外,存储器11可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件或其他易失性固态存储器件。
通信接口12可以为通信模块的接口,用于与其他设备或者系统连接。
当然,需要说明的是,图10所示的结构并不构成对容器容灾设备的限定,在实际应用中容器容灾设备可以包括比图10所示的更多或更少的部件,或者组合某些部件。
参考图11,为本申请在一些实施例中所提供的非易失性计算机可读存储介质1100,非易失性计算机可读存储介质1100上存储有计算机程序1101,计算机程序1101被处理器执行时可实现如上述任意一种容器容灾方法的步骤。
该非易失性计算机可读存储介质1100可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
对于本申请在一些实施例中提供的非易失性计算机可读存储介质的介绍请参照上述方法实施例,本申请在此不做赘述。
说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM或技术领域内所公知的任意其它形式的存储介质中。
以上对本申请所提供的技术方案进行了详细介绍。本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请的保护范围内。

Claims (23)

  1. 一种容器容灾方法,其特征在于,应用于第一容器集群,包括:
    接收容器管理平台下发的容灾备份命令;
    响应所述容灾备份命令,对容器应用元数据进行复制获得第一容灾数据,并将所述第一容灾数据存储至第一存储系统,以便第二容器集群利用所述第一存储系统中的第一容灾数据进行容器应用重建,获得重建容器应用;
    对容器业务数据进行复制获得第二容灾数据,并将所述第二容灾数据存储至第二存储系统,以便所述第二容器集群利用所述第二存储系统中的第二容灾数据在所述重建容器应用中进行容器业务恢复,获得恢复后的容器业务。
  2. 根据权利要求1所述的容器容灾方法,其特征在于,所述对容器应用元数据进行复制获得第一容灾数据,包括:
    获取所述第一容器集群内的已复制次数;
    当所述已复制次数为零时,对所述容器应用元数据进行全量复制,获得所述第一容灾数据;
    当所述已复制次数不为零时,对所述容器应用元数据进行增量复制,获得所述第一容灾数据。
  3. 根据权利要求1所述的容器容灾方法,其特征在于,所述对容器应用元数据进行复制获得第一容灾数据,并将所述第一容灾数据存储至第一存储系统,包括:
    根据所述容灾备份命令确定所述第一容器集群中的待保护容器应用;
    将所述待保护容器的标识信息添加至预设保护单元;
    对所述待保护容器的容器应用元数据进行复制,获得所述第一容灾数据,并将所述第一容灾数据添加至所述预设保护单元;在所述预设保护单元中,所述标识信息和所述第一容灾数据对应存储;
    将所述预设保护单元存储至所述第一存储系统。
  4. 根据权利要求1所述的容器容灾方法,其特征在于,所述将所述第二容灾数据存储至第二存储系统,包括:
    根据所述容灾备份命令确定复制方式;
    当所述复制方式为集中式存储复制时,将所述第二容灾数据存储至所述第二容器集群的集中式存储系统;所述第一容器集群的集中式存储系统与所述第二容器集群的集中式存储系统建立有远程复制关系,所述第二容器集群的集中式存储系统为所述第二存储系统;
    当所述复制方式为分布式存储复制时,将所述第二容灾数据存储至分布式存储系统;所述分布式存储系统为所述第二存储系统;
    当所述复制方式为本地存储复制时,将所述第二容灾数据以文件块的形式存储至对象存储系统;所述对象存储系统为所述第二存储系统。
  5. 根据权利要求1所述的容器容灾方法,其特征在于,所述容器应用元数据为第一容器集群中各个容器应用的元数据信息,所述容器业务数据为第一容器集群中各个容器应用正在进行的业务数据信息。
  6. 根据权利要求1所述的容器容灾方法,其特征在于,所述第一存储系统为对象存储系统,所述第二存储系统为对象存储系统或者分布式存储系统或者集中式存储系统。
  7. 一种容器容灾方法,其特征在于,应用于第二容器集群,包括:
    接收容器管理平台下发的容灾恢复命令;
    响应所述容灾恢复命令,从第一存储系统中调取第一容灾数据,并利用所述第一容灾数据进行容器应用重建,获得重建容器应用;其中,所述第一容灾数据由第一容器集群对自身的容器应用元数据进行复制得到;
    从第二存储系统中调取第二容灾数据,并利用所述第二容灾数据在所述重建容器应用中进行容器业务恢复,获得恢复后的容器业务;其中,所述第二容灾数据由所述第一容器集群对自身的容器业务数据进行复制得到。
  8. 根据权利要求7所述的容器容灾方法,其特征在于,所述从第一存储系统中调取第一容灾数据,并利用所述第一容灾数据进行容器应用重建,获得重建容器应用,包括:
    从所述第一存储系统中调取待保护容器应用的标识信息;在所述第一存储系统中,所述标识信息和所述第一容灾数据对应存储;
    从所述容器管理平台拉取各所述标识信息对应的容器应用镜像;
    从所述第一存储系统中调取所述第一容灾数据;
    利用所述第一容灾数据和各所述容器应用镜像进行容器应用重建,获得所述重建容器应用。
  9. 根据权利要求7所述的容器容灾方法,其特征在于,所述从第二存储系统中调取第二容灾数据,包括:
    根据所述容灾恢复命令确定存储方式;
    当所述存储方式为集中式存储时,从第二容器集群的集中式存储系统中调取所述第二容灾数据,所述第二容器集群的集中式存储系统与所述第一容器集群的集中式存储系统建立有远程复制关系,所述第二容器集群的集中式存储系统为所述第二存储系统;
    当所述存储方式为分布式存储时,从分布式存储系统中调取所述第二容灾数据;所述分布式存储系统为所述第二存储系统;
    当所述存储方式为本地存储时,从对象存储系统中调取文件块形式的所述第二容灾数据;所述对象存储系统为所述第二存储系统。
  10. 根据权利要求7所述的容器容灾方法,其特征在于,所述容灾恢复命令为计划内下发的命令,或者计划外下发的命令;
    其中,计划内下发的容灾恢复命令用于实现正常容器集群之间的业务切换,计划外下发的容灾恢复命令用于实现容器集群故障时的业务切换。
  11. 根据权利要求7所述的容器容灾方法,其特征在于,所述容器管理平台预存有各第一容器集群中各容器应用的镜像数据。
  12. 根据权利要求7所述的容器容灾方法,其特征在于,所述利用所述第一容灾数据和各所述容器应用镜像进行容器应用重建,包括:
    将各个容器应用镜像以及对应的容器应用元数据分发到第一容器集群中的各个集群节点上,以在所述各个集群节点上进行容器应用重建。
  13. 一种容器容灾方法,其特征在于,应用于容器管理平台,包括:
    根据预设容灾配置信息对待保护容器集群进行配置,得到第一容器集群和第二容器集群;
    下发容灾备份命令至所述第一容器集群,以使所述第一容器集群根据所述容灾备份命令进行容灾备份,得到备份数据;
    下发容灾恢复命令至所述第二容器集群,以使所述第二容器集群响应所述容灾恢复命令,利用所述备份数据进行容灾恢复。
  14. 根据权利要求13所述的容器容灾方法,其特征在于,所述下发容灾恢复命令至所述第二容器集群之前,还包括:
    下发停机指令至所述第一容器集群,以使所述第一容器集群中的各容器应用停止运行。
  15. 根据权利要求13所述的容器容灾方法,其特征在于,还包括:
    对平台容灾信息进行复制,获得备份容灾信息;
    将所述备份容灾信息存储至平台存储系统。
  16. 根据权利要求15所述的容器容灾方法,其特征在于,所述平台容灾信息包括:容器管理平台本身的管理信息,和所述预设容灾配置信息。
  17. 根据权利要求13所述的容器容灾方法,其特征在于,所述预设容灾配置信息包括容灾集群配置信息和容灾保护单元信息;
    其中,所述容灾集群配置信息包括进行容灾的两个集群的标识和每个集群使用的存储信息;
    所述容灾保护单元信息包括使用的容灾配置、保护的容器应用标识以及保护单元状态。
  18. 一种容器容灾系统,其特征在于,包括:
    容器管理平台,用于下发容灾备份命令至第一容器集群,以及下发容灾恢复命令至第二容器集群;
    所述第一容器集群,用于根据所述容灾备份命令进行容灾备份,得到备份数据;
    所述第二容器集群,用于响应所述容灾恢复命令,并利用所述备份数据进行容灾恢复。
  19. 一种容器容灾装置,其特征在于,应用于第一容器集群,包括:
    备份命令接收模块,用于接收容器管理平台下发的容灾备份命令;
    第一复制模块,用于响应所述容灾备份命令,对容器应用元数据进行复制获得第一容灾数据,并将所述第一容灾数据存储至第一存储系统,以便第二容器集群利用所述第一存储系统中的第一容灾数据进行容器应用重建,获得重建容器应用;
    第二复制模块,用于对容器业务数据进行复制获得第二容灾数据,并将所述第二容灾数据存储至第二存储系统,以便所述第二容器集群利用所述第二存储系统中的第二容灾数据在所述重建容器应用中进行容器业务恢复,获得恢复后的容器业务。
  20. 一种容器容灾装置,其特征在于,应用于第二容器集群,包括:
    恢复命令接收模块,用于接收容器管理平台下发的容灾恢复命令;
    容器应用重建模块,用于响应所述容灾恢复命令,从第一存储系统中调取第一容灾数据,并利用所述第一容灾数据进行容器应用重建,获得重建容器应用;其中,所述第一容灾数据由第一容器集群对自身的容器应用元数据进行复制得到;
    容器业务恢复模块,用于从第二存储系统中调取第二容灾数据,并利用所述第二容灾数据在所述重建容器应用中进行容器业务恢复,获得恢复后的容器业务;其中,所述 第二容灾数据由所述第一容器集群对自身的容器业务数据进行复制得到。
  21. 一种容器容灾装置,其特征在于,应用于容器管理平台,包括:
    容器集群配置模块,用于根据预设容灾配置信息对待保护容器集群进行配置,得到第一容器集群和第二容器集群;
    第一命令下发模块,用于下发容灾备份命令至所述第一容器集群,以使所述第一容器集群根据所述容灾备份命令进行容灾备份,得到备份数据;
    第二命令下发模块,用于下发容灾恢复命令至所述第二容器集群,以使所述第二容器集群响应所述容灾恢复命令,利用所述备份数据进行容灾恢复。
  22. 一种容器容灾设备,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述计算机程序时实现如权利要求1至17任一项所述的容器容灾方法。
  23. 一种非易失性计算机可读存储介质,其特征在于,所述非易失性计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至17任一项所述的容器容灾方法。
PCT/CN2023/084590 2022-11-14 2023-03-29 容器容灾方法、系统、装置、设备及计算机可读存储介质 WO2024103594A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211417472.8 2022-11-14
CN202211417472.8A CN115658390A (zh) 2022-11-14 2022-11-14 容器容灾方法、系统、装置、设备及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2024103594A1 true WO2024103594A1 (zh) 2024-05-23

Family

ID=85021217

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/084590 WO2024103594A1 (zh) 2022-11-14 2023-03-29 容器容灾方法、系统、装置、设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN115658390A (zh)
WO (1) WO2024103594A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115658390A (zh) * 2022-11-14 2023-01-31 济南浪潮数据技术有限公司 容器容灾方法、系统、装置、设备及计算机可读存储介质
CN116627661B (zh) * 2023-07-24 2023-11-03 杭州谐云科技有限公司 算力资源调度的方法和系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170060710A1 (en) * 2015-08-28 2017-03-02 Netapp Inc. Trust relationship migration for data mirroring
CN110377459A (zh) * 2019-06-28 2019-10-25 苏州浪潮智能科技有限公司 一种容灾系统、容灾处理方法、监控节点和备份集群
CN112422628A (zh) * 2020-10-19 2021-02-26 天翼电子商务有限公司 Redis-canal跨机房缓存同步系统
CN114741234A (zh) * 2021-01-07 2022-07-12 华为技术有限公司 数据的备份存储方法、设备及系统
CN115174364A (zh) * 2022-06-30 2022-10-11 济南浪潮数据技术有限公司 一种容灾场景下的数据还原方法、装置以及介质
CN115658390A (zh) * 2022-11-14 2023-01-31 济南浪潮数据技术有限公司 容器容灾方法、系统、装置、设备及计算机可读存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108958971A (zh) * 2018-06-14 2018-12-07 北京小米移动软件有限公司 信息的备份方法、装置及设备
CN111611109A (zh) * 2020-05-22 2020-09-01 苏州浪潮智能科技有限公司 一种分布式集群的备份方法、系统、设备以及介质
CN111338854B (zh) * 2020-05-25 2020-10-02 南京云信达科技有限公司 基于Kubernetes集群快速恢复数据的方法及系统
US11409614B2 (en) * 2020-08-07 2022-08-09 EMC IP Holding Company LLC Systems and methods for multiple recovery types using single backup type
CN114328007B (zh) * 2021-11-19 2024-03-22 苏州浪潮智能科技有限公司 一种容器备份还原方法、装置及其介质
CN114466027B (zh) * 2022-01-26 2023-08-04 苏州浪潮智能科技有限公司 一种云原生数据库服务提供方法、系统、设备及介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170060710A1 (en) * 2015-08-28 2017-03-02 Netapp Inc. Trust relationship migration for data mirroring
CN110377459A (zh) * 2019-06-28 2019-10-25 苏州浪潮智能科技有限公司 一种容灾系统、容灾处理方法、监控节点和备份集群
CN112422628A (zh) * 2020-10-19 2021-02-26 天翼电子商务有限公司 Redis-canal跨机房缓存同步系统
CN114741234A (zh) * 2021-01-07 2022-07-12 华为技术有限公司 数据的备份存储方法、设备及系统
CN115174364A (zh) * 2022-06-30 2022-10-11 济南浪潮数据技术有限公司 一种容灾场景下的数据还原方法、装置以及介质
CN115658390A (zh) * 2022-11-14 2023-01-31 济南浪潮数据技术有限公司 容器容灾方法、系统、装置、设备及计算机可读存储介质

Also Published As

Publication number Publication date
CN115658390A (zh) 2023-01-31

Similar Documents

Publication Publication Date Title
CN103226502B (zh) 一种数据灾备控制系统及数据恢复方法
WO2024103594A1 (zh) 容器容灾方法、系统、装置、设备及计算机可读存储介质
US11429305B2 (en) Performing backup operations using replicas
JP4668763B2 (ja) ストレージ装置のリストア方法及びストレージ装置
US6691245B1 (en) Data storage with host-initiated synchronization and fail-over of remote mirror
US7793060B2 (en) System method and circuit for differential mirroring of data
US8615578B2 (en) Using a standby data storage system to detect the health of a cluster of data storage servers
CN103853837B (zh) Oracle全自动不停生产数据库的表级备份恢复方法
WO2021136422A1 (zh) 状态管理方法、主备应用服务器的切换方法及电子设备
JP2008171387A (ja) 継続的データ保護を備えたバックアップシステム
JP2003517651A (ja) 高度利用可能ファイルサーバ
WO2010118657A1 (zh) 一种数据恢复的方法、数据节点及分布式文件系统
WO2020088533A1 (zh) 虚拟化平台的容灾方法及装置
JP2002297456A (ja) バックアップ処理方法及びその実施システム並びにその処理プログラム
WO2020063600A1 (zh) 数据容灾方法与站点
US20110197040A1 (en) Storage system and storage control method
WO2024120227A1 (zh) 容器数据保护系统、方法、装置、设备及可读存储介质
CN111984465A (zh) 数据库远程备份方法、装置、介质和电子设备
CN107038091A (zh) 一种基于异步远程镜像的数据安全性保护系统与电力应用系统数据保护方法
WO2015043155A1 (zh) 一种基于命令集的网元备份与恢复方法及装置
CN110928728A (zh) 一种基于快照的虚拟机复制、切换方法及系统
JP2013543179A (ja) アイテム単位でのリカバリー
CN112214358A (zh) 一种GaussDB分布式数据库的备份恢复系统及其方法
CN114356650A (zh) 数据备份方法、装置、设备、系统及存储介质
CN114328009A (zh) 基于虚拟化和快照的异构数据库统一容灾备份方法和装置