CN114584458A

CN114584458A - Cluster disaster recovery management method, system, equipment and storage medium based on ETCD

Info

Publication number: CN114584458A
Application number: CN202210209647.XA
Authority: CN
Inventors: 雷特; 白小龙
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-03-03
Filing date: 2022-03-03
Publication date: 2022-06-03
Anticipated expiration: 2042-03-03
Also published as: CN114584458B

Abstract

The invention relates to the field of cloud storage, and discloses a cluster disaster recovery management method, a cluster disaster recovery management system, cluster disaster recovery management equipment and a storage medium based on ETCD. The method comprises the following steps: establishing a standby storage system for a main storage system, and setting a key value mark for disaster recovery switching for a standby ETCD cluster in the standby storage system; monitoring the state of the main storage system, connecting a standby ETCD cluster in a standby storage system when the main storage system is monitored to be in an unavailable state, and acquiring a key value of the standby ETCD cluster; and switching the bottom layer cluster flow and the user flow of the main storage system to a standby storage system according to the key value of the standby ETCD cluster. The embodiment of the invention carries out disaster recovery switching based on the ETCD, and can realize system-level disaster recovery.

Description

Cluster disaster recovery management method, system, equipment and storage medium based on ETCD

Technical Field

The invention relates to the technical field of cloud storage, in particular to a cluster disaster recovery management method, a cluster disaster recovery management system, cluster disaster recovery management equipment and a storage medium based on ETCD.

Background

The disaster recovery system is characterized in that two or more sets of IT systems with the same function are established in different places far away from each other, health status monitoring and function switching can be performed among the IT systems, and when one system stops working due to accidents (such as fire, earthquake and the like), the whole application system can be switched to the other system, so that the system functions can continue to work normally.

At present, in a part of cloud architectures, a cloud storage resource management platform does not support a disaster recovery function, so that a system is interrupted for a long time when accidents such as fire, flood, earthquake and the like occur, and service continuity cannot be ensured.

Disclosure of Invention

The invention provides a cluster disaster recovery management method, a cluster disaster recovery management system, cluster disaster recovery management equipment and a storage medium based on ETCD (electronic toll collection), and aims to solve the technical problem that the existing cloud storage resource management platform does not support a disaster recovery function, so that the system is interrupted in service for a long time when an accident occurs.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a cluster disaster recovery management method based on ETCD comprises the following steps:

establishing a standby storage system for a main storage system, and setting a key value mark for disaster recovery switching for a standby ETCD cluster in the standby storage system;

monitoring the state of the main storage system, and when the main storage system is monitored to be in an unavailable state, connecting a standby ETCD cluster in the standby storage system and acquiring a key value of the standby ETCD cluster;

and switching the bottom layer cluster flow and the user flow of the main storage system to a standby storage system according to the key value of the standby ETCD cluster.

The technical scheme adopted by the embodiment of the invention also comprises the following steps: the switching of the bottom-layer cluster traffic and the user traffic of the main storage system to the standby storage system according to the key value of the standby ETCD cluster comprises:

judging whether the bottom layer cluster traffic can be switched to a standby storage system according to the key value of the standby ETCD cluster, if so,

and loading the configuration content of the standby ETCD cluster, reporting the heartbeat to the standby ETCD cluster, and switching the flow of the bottom-layer cluster to a standby storage system.

The technical scheme adopted by the embodiment of the invention also comprises the following steps: the switching the bottom-layer cluster traffic and the user traffic of the primary storage system to the standby storage system according to the key value of the standby ETCD cluster further comprises:

and after the switching of the bottom layer cluster flow is finished, automatically upgrading the standby storage system into a main storage system, and switching the user flow into the upgraded main storage system.

The technical scheme adopted by the embodiment of the invention also comprises the following steps: after switching the bottom-layer cluster traffic and the user traffic of the main storage system to the standby storage system according to the key value of the standby ETCD cluster, the method further comprises the following steps:

and monitoring the state of the main storage system, and automatically switching the bottom layer cluster flow and the user flow back to the main storage system when the main storage system is restored to the available state.

The technical scheme adopted by the embodiment of the invention also comprises the following steps: the main storage system and the standby storage system are CSSP systems respectively, and the standby CSSP system and the main CSSP system are independent.

The technical scheme adopted by the embodiment of the invention also comprises the following steps: the switching of the bottom-layer cluster traffic and the user traffic of the main storage system to the standby storage system according to the key value of the standby ETCD cluster specifically includes:

when the main CSSP system is in an unavailable state, executing disaster recovery switching logic, loading configuration contents of the standby ETCD, wherein the configuration contents comprise mysql and rabbitmq, switching the bottom layer cluster flow of the main CSSP system to the standby CSSP system, and managing bottom layer storage resources by the standby CSSP system; the mysql is a relational database management system, and the rabbitmq is a message queue service module;

and monitoring the main CSSP system by adopting a cluster management module, automatically upgrading the standby storage system to a main storage system when monitoring that a service port of the main CSSP system is disconnected and the flow of the bottom-layer cluster is switched, and switching the user flow into the upgraded main CSSP system.

The embodiment of the invention adopts another technical scheme that: a cluster disaster recovery management system based on ETCD comprises:

ETCD marking module: the system comprises a standby storage system, a key value mark, a disaster recovery switching device and a data processing device, wherein the standby storage system is used for establishing the standby storage system for a main storage system, and the key value mark is used for setting a standby ETCD cluster in the standby storage system;

a state monitoring module: the system comprises a main storage system, a standby ETCD cluster and a key value, wherein the main storage system is used for monitoring the state of the main storage system, and when the main storage system is monitored to be in an unavailable state, the standby ETCD cluster in the standby storage system is connected, and the key value of the standby ETCD cluster is acquired;

disaster recovery switching module: and the system is used for switching the bottom layer cluster traffic and the user traffic of the main storage system to the standby storage system according to the key value of the standby ETCD cluster.

The technical scheme adopted by the embodiment of the invention also comprises the following steps: the disaster recovery switching module switches the bottom layer cluster flow and the user flow of the main storage system to the standby storage system according to the key value of the standby ETCD cluster specifically:

loading the configuration content of the standby ETCD cluster, reporting the heartbeat to the standby ETCD cluster, and switching the bottom-layer cluster flow to a standby storage system;

The embodiment of the invention adopts another technical scheme that: an ETCD-based cluster disaster recovery management device, the device comprising a processor, a memory coupled to the processor, wherein,

the memory stores program instructions for implementing the cluster disaster recovery management method based on the ETCD;

the processor is configured to execute the program instructions stored by the memory to perform the ETCD-based cluster disaster recovery management operations.

The embodiment of the invention adopts another technical scheme that: a storage medium storing program instructions executable by a processor to perform the above-described etc d-based cluster disaster recovery management method.

The invention has the beneficial effects that: according to the cluster disaster recovery management method, system, device and storage medium based on the ETCD, disaster recovery switching is carried out based on the ETCD, when a main storage system cannot provide services due to irresistible factors, rapid switching between the main storage system and a standby storage system can be achieved, so that the stability of a cluster is guaranteed, service providing services for users are not influenced, and the high availability of cloud services is effectively improved. Meanwhile, the ETCD is a strongly consistent middleware, so that fault misjudgment and misswitching caused by data inconsistency can be avoided, and system-level disaster tolerance can be realized.

Drawings

Fig. 1 is a schematic flow chart of a cluster disaster recovery management method based on an etc in a first embodiment of the present invention;

fig. 2 is a schematic flowchart of an ETCD-based cluster disaster recovery management method according to a second embodiment of the present invention;

fig. 3 is a schematic flow chart of a cluster disaster recovery management method based on an etc d according to a third embodiment of the present invention;

fig. 4 is a schematic diagram of a CSSP system disaster recovery architecture according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an ETCD-based cluster disaster recovery management system according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a cluster disaster recovery management device based on an etc d according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a storage medium structure according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first", "second" and "third" in the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless explicitly specified otherwise. All directional indicators (such as up, down, left, right, front, and rear … …) in the embodiments of the present invention are only used to explain the relative positional relationship between the components, the movement, and the like in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indicator is changed accordingly. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

Fig. 1 is a schematic flow chart of a cluster disaster recovery management method based on an etc d according to a first embodiment of the present invention. The cluster disaster recovery management method based on the ETCD comprises the following steps:

s10: establishing a standby storage system for the main storage system, and setting a key value mark for disaster recovery switching for a standby ETCD cluster in the standby storage system;

in this step, in the embodiment of the present application, a backup storage system is provided for the primary storage system, and data synchronization is not required between the primary storage system and the backup storage system. Before disaster tolerance, a management function of the underlying kvm and storage cluster management is provided for a user through the main storage system, data traffic does not pass through the standby storage system, and the standby storage system does not provide services to the outside. The ETCD is a distributed, highly available, consistent key-value storage database for storing configuration data of the entire system for the configuration center.

S11: monitoring the state of a main storage system, connecting a standby ETCD cluster in a standby storage system when the main storage system is monitored to be in an unavailable state, and acquiring a key value of the standby ETCD cluster;

s12: and switching the flow of the main storage system to the standby storage system according to the key value of the standby ETCD cluster.

In this step, the traffic switching includes bottom-layer cluster traffic switching and user traffic switching. The bottom-layer cluster flow switching specifically comprises the following steps: judging whether the bottom layer cluster flow can be switched to a standby storage system or not according to the key value of the standby ETCD cluster, if so, loading the configuration content of the standby ETCD cluster, reporting the heartbeat to the standby ETCD cluster, switching the bottom layer cluster flow to the standby storage system, and managing bottom layer storage resources by the standby storage system; after the disaster recovery switching is completed, the standby storage system takes over the main storage system to provide the user with the management functions of lower layer KVM (abbreviation of Keyboard Video Mouse, KVM can access and control the computer by directly connecting with a Keyboard, Video or Mouse port) and storage cluster management. The backup storage system is completely independent from the main storage system, and only data synchronization of a Database (DB) is performed.

Taking the CSSP system as an example, the user traffic switching is mainly switching between res-mgr and the external service address, and since the CSSP system entry is res-mgr, the domain name is used to provide service to the outside. The user flow switching specifically comprises the following steps: before disaster recovery switching, keepalived is adopted to monitor the main CSSP system, and when keepalived monitors that a service port of the main CSSP system is down (disconnected) and the flow of a bottom layer cluster is switched, a standby CSSP system is automatically upgraded to the main CSSP system, and the flow of a user is switched to the upgraded main CSSP system.

Based on the above, the cluster disaster recovery management method based on the etc d according to the first embodiment of the present invention performs disaster recovery switching based on the etc d, and when the main storage system cannot provide a service due to the irresistible factor, the main storage system and the standby storage system can be switched quickly. Because the ETCD is a strongly consistent middleware, fault misjudgment and misswitching caused by data inconsistency can not occur, and system-level disaster tolerance can be realized.

Please refer to fig. 2, which is a flowchart illustrating a cluster disaster recovery management method based on an etc d according to a second embodiment of the present application. The cluster disaster recovery management method based on the ETCD comprises the following steps:

s20: establishing a standby storage system for the main storage system, and setting a key value mark capable of carrying out disaster recovery switching for a standby ETCD cluster in the standby storage system;

in this step, in the embodiment of the present application, a backup storage system is provided for the primary storage system, and data synchronization is not required between the primary storage system and the backup storage system. Before disaster tolerance, a management function of the underlying kvm and storage cluster management is provided for a user through the main storage system, data traffic does not pass through the standby storage system, and the standby storage system does not provide services to the outside.

S21: monitoring the main storage system in real time, judging whether an ETCD cluster of the main storage system is in an available state or not, and if so, continuing to execute S21; otherwise, go to S22;

s22: connecting a standby ETCD cluster in the standby storage system, and acquiring a key value of the standby ETCD cluster after the connection is successful;

s23: judging whether the bottom layer cluster flow can be switched to a standby storage system or not according to the key value of the standby ETCD cluster, and executing S24 if the bottom layer cluster flow can be switched; otherwise, go to S27;

s24: loading configuration contents of the standby ETCD cluster, reporting heartbeat to the standby ETCD cluster, and switching the flow of the bottom-layer cluster to a standby storage system;

in this step, when disaster recovery switching is performed, the main storage system is unavailable, the bottom-layer cluster traffic is switched to the backup storage system, and the backup storage system manages the bottom-layer storage resources. After the disaster recovery switching is completed, the standby storage system takes over the main storage system to provide the user with the management functions of lower layer KVM (abbreviation of Keyboard Video Mouse, KVM can access and control the computer by directly connecting with a Keyboard, Video or Mouse port) and storage cluster management. The backup storage system is completely independent from the main storage system, and only data synchronization of a Database (DB) is performed.

S25: after the switching of the bottom-layer cluster flow is completed, the main-up operation is automatically carried out on the standby storage system, and the user flow is switched into the standby storage system;

s26: judging whether the main storage system is restored to an available state or not, and automatically switching the bottom-layer cluster flow and the user flow back to the main storage system when the main storage system is restored to the available state;

s27: and (6) ending.

Based on the above, the cluster disaster recovery management method based on the etc d according to the second embodiment of the present application performs disaster recovery switching based on the etc d, and when the main storage system cannot provide a service due to the irresistible factor, the main storage system and the standby storage system can be quickly switched, so that the stability of the cluster is ensured, the service provision service used by a user is not affected, and the high availability of the cloud service is effectively improved. Meanwhile, the ETCD is a strongly consistent middleware, so that fault misjudgment and misswitching caused by data inconsistency can be avoided, and system-level disaster tolerance is realized.

Further, in order to more clearly explain the implementation process of the embodiment of the present application, a cluster disaster recovery management of a CSSP (Cloud Storage Service Platform) system is taken as an example for specific description. Please refer to fig. 3, which is a flowchart illustrating a cluster disaster recovery management method based on an etc d according to a third embodiment of the present application. The cluster disaster recovery management method based on the ETCD in the third embodiment of the application comprises the following steps:

s30: establishing a standby CSSP system for the main CSSP system, and setting a key (CSSP _ can _ switch) value mark capable of carrying out disaster recovery switching for a standby ETCD cluster in the standby CSSP system;

in this step, as shown in fig. 4, a schematic diagram of a disaster recovery architecture of a CSSP system according to an embodiment of the present application is shown. The CSSP system comprises res-mgr, mysql, rabbitmq, etcd and other service components, wherein res-mgr is used for storing resource management service, mysql is a relational database management system of open source codes and is used for storing persistent data at a management side, rabbitmq is a set of open source message queue service module and is used for communicating with bottom layer service, and etcd is used for storing configuration data of the whole system for a configuration center. The PM physical Machine is a data gateway layer of the efs and is used for providing NAS (Network Attached Storage) services such as efs-server processes, vm (Virtual Machine) and nasagent in the vm, the efs-server is used for providing interfaces for managing Virtual Machine resources on the physical Machine and Network resources of the physical Machine for res-mgr calling, the nasagent is used for providing interfaces for managing resources such as disks, networks, computing and software resources (nfsd, samba, syslog) inside the vm, and the efs-server calls the interfaces of the nasagent through grpc to manage the resources inside the vm.

In the embodiment of the application, the standby CSSP (main/standby) res-mgr, rabbitmq, etcd and other service components are arranged for the main CSSP system in a cluster or high availability mode for deployment, the res-mgr service is stateless, and persistent data is needed when the rabbitmq and etcd do not run, so that data synchronization is not needed between the main CSSP system and the standby CSSP system. Before disaster recovery, the main CSSP system provides management functions of lower-layer kvm and storage cluster management for users through res-mgr, mysql, rabbitmq, etcd and the like, data traffic does not pass through the standby CSSP system, and the standby CSSP system does not provide services for the outside.

S31: monitoring the main CSSP system in real time, judging whether an ETCD cluster of the main CSSP system is in an available state or not, and if so, continuing to execute S31; otherwise, go to S32;

in this step, when services such as the efs-server and the nasagent detect that the main CSSP system is unavailable, the disaster recovery switching logic is executed to switch the bottom-layer cluster traffic of the main CSSP system to the standby CSSP system, and after the disaster recovery switching is completed, the standby CSSP system manages the bottom-layer storage resources.

S32: connecting a standby ETCD cluster in the standby CSSP system, and acquiring a key value of the standby ETCD cluster after the connection is successful;

s33: judging whether the flow of the bottom layer cluster can be switched to a standby CSSP system or not according to the key value of the standby ETCD cluster, and executing S34 if the flow of the bottom layer cluster can be switched; otherwise, go to S37;

s34: loading configuration contents of a standby ETCD cluster, reporting heartbeat to the standby ETCD cluster, and switching the flow of a bottom layer cluster to a standby CSSP system;

in this step, the loaded configuration content includes mysql, rabbitmq and other related configurations. As can be seen from the CSSP disaster recovery architecture diagram shown in fig. 4, in the CSSP system, the efs-server and rpcserver at the bottom layer provide the vm management function through rabbitmq service, and the heartbeat report and configuration modification function are provided by the heartbeat module through etcd service. Therefore, only the traffic switching of the underlying cluster and the traffic switching of the upper layer user of the services such as efs-server, rpcserver, heartbeat and the like in the underlying kvm cluster need to be considered in the disaster recovery switching. The nasgen and the efs-server in vm on the PM physical machine in the kvm cluster need to use services such as etcd and rabbitmq in the CSSP system, and because the etcd and the rabbitmq in the main CSSP system and the standby CSSP system are mutually independent, when the flow switching of the bottom-layer cluster is performed, the related configuration and access services of the nasgen and the efs-server in the kvm cluster need to be switched to the standby CSSP system. After the switching of the bottom-layer cluster flow is completed, the standby CSSP system takes over the management function of the main CSSP system for providing the management of the bottom-layer kvm and the storage cluster for the user. Wherein the backup CSSP system is completely independent from the main CSSP system and only performs data synchronization of the DB.

S35: after the switching of the bottom-layer cluster flow is finished, automatically upgrading the standby CSSP system into a main CSSP system by adopting a cluster management module, and switching the user flow into the upgraded CSSP system;

in the above, the user traffic switching is mainly switching between res-mgr and the external service address, and as the CSSP system entry is res-mgr, the domain name is used to provide the external service. In the embodiment of the application, the cluster management module performs user traffic switching on the CSSP system by using Keepalived. keepalive is service software which guarantees high availability of the cluster in cluster management, can detect cluster service nodes, provides the same service IP to the outside through a Virtual Redundant Routing Protocol (VRRP), and can have a plurality of service systems or nodes at the back end. Specifically, before disaster recovery switching, a keepalived is adopted to monitor the main CSSP system, and when the keepalived monitors that a service port of the main CSSP system is down and the bottom-layer cluster traffic is switched, the keepalived automatically upgrades the standby CSSP system to the main CSSP system and switches the user traffic to the upgraded CSSP system.

S36: judging whether the main CSSP system is recovered to an available state or not, and automatically switching the bottom layer cluster flow and the user flow to the main CSSP system when the main CSSP system is recovered to the available state;

s37: and (6) ending.

Based on the above, the cluster disaster recovery management method based on the etc d according to the third embodiment of the present application performs disaster recovery switching on the main CSSP system and the standby CSSP system based on the etc d, and when the main CSSP system cannot provide a service due to irresistible factors, the main CSSP system and the standby CSSP system can be quickly switched, so that stability of a cluster is ensured, a service providing service used by a user is not affected, and high availability of a cloud service is effectively improved. Meanwhile, the ETCD is a strongly consistent middleware, so that fault misjudgment and misswitching caused by data inconsistency can be avoided, and system-level disaster tolerance is realized.

In an alternative embodiment, it is also possible to: and uploading the result of the cluster disaster recovery management method based on the ETCD to a block chain.

Specifically, the corresponding digest information is obtained based on the result of the cluster disaster recovery management method based on the etc, specifically, the digest information is obtained by performing hash processing on the result of the cluster disaster recovery management method based on the etc, for example, by using the sha256s algorithm. Uploading summary information to the blockchain can ensure the safety and the fair transparency of the user. The user can download the summary information from the blockchain, so as to verify whether the result of the cluster disaster recovery management method based on the ETCD is tampered. The blockchain referred to in this example is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm, and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

Fig. 5 is a schematic structural diagram of a cluster disaster recovery management system based on an etc d according to an embodiment of the present invention. The cluster disaster recovery management system 40 based on the ETCD in the embodiment of the invention comprises:

the etc flag module 41: the system comprises a main storage system, a standby storage system, a key value mark, a disaster recovery switching system and a cluster management system, wherein the main storage system is used for establishing the standby storage system, and the key value mark is used for setting a standby ETCD cluster in the standby storage system; in the embodiment of the application, a backup storage system is arranged for the main storage system, and data synchronization is not required between the main storage system and the backup storage system. Before disaster tolerance, a management function of the underlying kvm and storage cluster management is provided for a user through the main storage system, data traffic does not pass through the standby storage system, and the standby storage system does not provide services to the outside.

The condition monitoring module 42: the system comprises a main storage system, a standby ETCD cluster and a key value, wherein the main storage system is used for monitoring the state of the main storage system, and when the main storage system is monitored to be in an unavailable state, the standby ETCD cluster in the standby storage system is connected, and the key value of the standby ETCD cluster is acquired;

disaster recovery switching module 43: the main storage system is used for switching the flow of the main storage system to the standby storage system according to the key value of the standby ETCD cluster; the flow switching comprises bottom layer cluster flow switching and user flow switching. The bottom-layer cluster flow switching specifically comprises the following steps: judging whether the bottom-layer cluster flow can be switched to a standby storage system or not according to the key value, if so, loading the configuration content of the standby ETC cluster D, reporting the heartbeat to the standby ETC cluster, switching the bottom-layer cluster flow to the standby storage system, and managing bottom-layer storage resources by the standby storage system; after the disaster recovery switching is completed, the standby storage system takes over the main storage system to provide the user with the management functions of lower layer KVM (abbreviation of Keyboard Video Mouse, KVM can access and control the computer by directly connecting with a Keyboard, Video or Mouse port) and storage cluster management. The backup storage system is completely independent from the main storage system, and only data synchronization of a Database (DB) is performed.

Taking the CSSP system as an example, the user traffic switching is mainly switching between res-mgr and the external service address, and since the CSSP system entry is res-mgr, the domain name is used to provide service to the outside. The user flow switching specifically comprises the following steps: before disaster recovery switching, a keepalive is adopted to monitor the main CSSP system, and when the keepalive monitors that a service port of the main CSSP system is down and the flow of a bottom layer cluster is switched, the keepalive automatically carries out a main-up operation on a standby CSSP system, and the flow of a user is switched into the standby CSSP system.

Based on the above, the cluster disaster recovery management system based on the ETCD according to the embodiment of the invention performs disaster recovery switching based on the ETCD, and when the main storage system cannot provide services due to irresistible factors, the main storage system and the standby storage system can be quickly switched, so that the stability of a cluster is ensured, service providing services used by users are not influenced, and the high availability of cloud services is effectively improved. Meanwhile, the ETCD is a strongly consistent middleware, so that fault misjudgment and misswitching caused by data inconsistency can be avoided, and system-level disaster tolerance is realized.

Fig. 6 is a schematic structural diagram of a cluster disaster recovery management device based on an etc d according to an embodiment of the present invention. The device 50 comprises a processor 51, a memory 52 coupled to the processor 51.

The memory 52 stores program instructions for implementing the above-described cluster disaster recovery management method based on the etc.

The processor 51 is configured to execute program instructions stored by the memory 52 to perform ETCD-based cluster disaster recovery management operations.

The processor 51 may also be referred to as a CPU (Central Processing Unit). The processor 51 may be an integrated circuit chip having signal processing capabilities. The processor 51 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The terminal of the embodiment of the application carries out disaster recovery switching based on the ETCD, and when the main storage system cannot provide services due to irresistible factors, the main storage system and the standby storage system can be rapidly switched, so that the stability of a cluster is ensured, service providing services used by users are not influenced, and the high availability of cloud services is effectively improved. Meanwhile, the ETCD is a strongly consistent middleware, so that fault misjudgment and misswitching caused by data inconsistency can be avoided, and system-level disaster tolerance is realized.

Fig. 7 is a schematic structural diagram of a storage medium according to an embodiment of the invention. The storage medium of the embodiment of the present invention stores a program file 61 capable of implementing all the methods described above, wherein the program file 61 may be stored in the storage medium in the form of a software product, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.

The storage medium of the embodiment of the application is subjected to disaster recovery switching based on the ETCD, and when the main storage system cannot provide services due to irresistible factors, the main storage system and the standby storage system can be quickly switched, so that the stability of a cluster is ensured, service providing services used by users are not influenced, and the high availability of cloud services is effectively improved. Meanwhile, the ETCD is a strongly consistent middleware, so that fault misjudgment and misswitching caused by data inconsistency can be avoided, and system-level disaster tolerance is realized.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A cluster disaster recovery management method based on ETCD is characterized by comprising the following steps:

2. The ETCD-based cluster disaster recovery management method according to claim 1, wherein the switching of the underlying cluster traffic and the user traffic of the primary storage system to a backup storage system according to the key value of the backup ETCD cluster comprises:

and loading the configuration content of the standby ETCD cluster, reporting the heartbeat to the standby ETCD cluster, and switching the flow of the bottom layer cluster to a standby storage system.

3. The ETCD-based cluster disaster recovery management method according to claim 2, wherein switching the primary storage system's underlying cluster traffic and user traffic to a backup storage system according to the backup ETCD cluster's key value further comprises:

4. The ETCD-based cluster disaster recovery management method according to claim 3, wherein switching the underlying cluster traffic and the user traffic of the primary storage system to a backup storage system according to the key value of the backup ETCD cluster further comprises:

5. The ETCD-based cluster disaster recovery management method according to any one of claims 1 to 4, wherein the main storage system and the backup storage system are CSSP systems, and the backup CSSP system and the main CSSP system are independent.

6. The ETCD-based cluster disaster recovery management method according to claim 5, wherein the switching of the bottom-layer cluster traffic and the user traffic of the main storage system to the standby storage system according to the key value is specifically:

when the main CSSP system is in an unavailable state, executing disaster recovery switching logic, loading configuration contents of the standby ETCD, wherein the configuration contents comprise a mysql relational database management system and a rabbitmq message queue service module, switching the bottom-layer cluster flow of the main CSSP system to the standby CSSP system, and managing bottom-layer storage resources by the standby CSSP system;

7. The utility model provides a cluster disaster recovery management system based on ETCD which characterized in that includes:

ETCD marking module: the system comprises a main storage system, a standby storage system, a key value mark and a key value mark, wherein the standby storage system is used for establishing the standby storage system for the main storage system, and the key value mark is used for disaster recovery switching for a standby ETCD cluster in the standby storage system;

8. The ETCD-based cluster disaster recovery management system according to claim 7, wherein the disaster recovery switching module switches the bottom-layer cluster traffic and the user traffic of the main storage system to the backup storage system according to the key value of the backup ETCD cluster specifically:

and after the switching of the bottom layer cluster flow is completed, automatically upgrading the standby storage system into a main storage system, and switching the user flow into the upgraded main storage system.

9. An ETCD-based cluster disaster recovery management device, the device comprising a processor, a memory coupled to the processor, wherein,

the memory stores program instructions for implementing the ETCD-based cluster disaster recovery management method according to any one of claims 1 to 6;

the processor is configured to execute the program instructions stored by the memory to execute the ETCD-based cluster disaster recovery management method.

10. A storage medium storing program instructions executable by a processor to perform the etc d based cluster disaster recovery management method according to any one of claims 1 to 6.