CN114584458B - Cluster disaster recovery management method, system, equipment and storage medium based on ETCD - Google Patents

Cluster disaster recovery management method, system, equipment and storage medium based on ETCD Download PDF

Info

Publication number
CN114584458B
CN114584458B CN202210209647.XA CN202210209647A CN114584458B CN 114584458 B CN114584458 B CN 114584458B CN 202210209647 A CN202210209647 A CN 202210209647A CN 114584458 B CN114584458 B CN 114584458B
Authority
CN
China
Prior art keywords
storage system
cluster
standby
etcd
disaster recovery
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210209647.XA
Other languages
Chinese (zh)
Other versions
CN114584458A (en
Inventor
雷特
白小龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202210209647.XA priority Critical patent/CN114584458B/en
Publication of CN114584458A publication Critical patent/CN114584458A/en
Application granted granted Critical
Publication of CN114584458B publication Critical patent/CN114584458B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/22Arrangements for detecting or preventing errors in the information received using redundant apparatus to increase reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of cloud storage, and discloses a cluster disaster recovery management method, system, equipment and storage medium based on ETCD. The method comprises the following steps: establishing a standby storage system for a main storage system, and setting a key value mark for disaster recovery switching for a standby ETCD cluster in the standby storage system; monitoring the state of the main storage system, when the main storage system is monitored to be in an unavailable state, connecting a standby ETCD cluster in a standby storage system, and acquiring a key value of the standby ETCD cluster; and switching the bottom layer cluster flow and the user flow of the main storage system to the standby storage system according to the key value of the standby ETCD cluster. The embodiment of the invention carries out disaster recovery switching based on ETCD, and can realize system-level disaster recovery.

Description

Cluster disaster recovery management method, system, equipment and storage medium based on ETCD
Technical Field
The present invention relates to the field of cloud storage technologies, and in particular, to an ETCD-based cluster disaster recovery management method, system, device, and storage medium.
Background
The disaster recovery system is characterized in that two or more sets of IT systems with the same function are built at different places far apart, health state monitoring and function switching can be performed between the two systems, and when one system stops working due to accidents (such as fire, earthquake and the like), the whole application system can be switched to the other system, so that the system functions can continue to work normally.
At present, in a part of cloud architecture, a cloud storage resource management platform does not support a disaster recovery function, so that when accidents such as fire, flood, earthquake and the like occur, a system is interrupted for a long time, and the continuity of service cannot be ensured.
Disclosure of Invention
The invention provides an ETCD-based cluster disaster recovery management method, system, equipment and storage medium, and aims to solve the technical problem that the system is interrupted for a long time when accidents occur due to the fact that an existing cloud storage resource management platform does not support a disaster recovery function.
In order to solve the technical problems, the invention adopts the following technical scheme:
a cluster disaster recovery management method based on ETCD includes:
establishing a standby storage system for a main storage system, and setting a key value mark for disaster recovery switching for a standby ETCD cluster in the standby storage system;
monitoring the state of the main storage system, and when the main storage system is monitored to be in an unavailable state, connecting a standby ETCD cluster in the standby storage system and acquiring a key value of the standby ETCD cluster;
and switching the bottom layer cluster flow and the user flow of the main storage system to the standby storage system according to the key value of the standby ETCD cluster.
The technical scheme adopted by the embodiment of the invention further comprises the following steps: the switching the bottom layer cluster traffic and the user traffic of the main storage system to the backup storage system according to the key value of the backup ETCD cluster includes:
judging whether the bottom layer cluster flow can be switched to the standby storage system according to the key value of the standby ETCD cluster, if so,
and loading the configuration content of the backup ETCD cluster, reporting the heartbeat to the backup ETCD cluster, and switching the bottom layer cluster flow to a backup storage system.
The technical scheme adopted by the embodiment of the invention further comprises the following steps: the switching the bottom layer cluster traffic and the user traffic of the main storage system to the backup storage system according to the key value of the backup ETCD cluster further includes:
after the bottom layer cluster flow is switched, the standby storage system is automatically upgraded to a main storage system, and the user flow is switched to the upgraded main storage system.
The technical scheme adopted by the embodiment of the invention further comprises the following steps: and after switching the bottom layer cluster traffic and the user traffic of the main storage system to the standby storage system according to the key value of the standby ETCD cluster, the method further comprises the following steps:
and monitoring the state of the main storage system, and automatically switching the bottom layer cluster flow and the user flow back to the main storage system when the main storage system is restored to the available state.
The technical scheme adopted by the embodiment of the invention further comprises the following steps: the main storage system and the standby storage system are CSSP systems respectively, and the standby CSSP systems are mutually independent from the main CSSP systems.
The technical scheme adopted by the embodiment of the invention further comprises the following steps: the switching the bottom layer cluster flow and the user flow of the main storage system to the standby storage system according to the key value of the standby ETCD cluster specifically comprises:
when the main CSSP system is in an unavailable state, executing disaster recovery switching logic, loading configuration content of the standby ETCD, wherein the configuration content comprises mysql, rabbitmq, switching bottom layer cluster traffic of the main CSSP system to the standby CSSP system, and managing bottom layer storage resources by the standby CSSP system; the mysql is a relational database management system, and the rubbitmq is a message queue service module;
and monitoring the main CSSP system by adopting a cluster management module, and automatically upgrading the standby storage system to the main storage system and switching the user flow to the upgraded main CSSP system when the service port of the main CSSP system is monitored to be disconnected and the bottom cluster flow is switched.
The embodiment of the invention adopts another technical scheme that: an ETCD-based cluster disaster recovery management system comprising:
ETCD marking module: the method comprises the steps of establishing a standby storage system for a main storage system, and setting a key value mark for disaster recovery switching for a standby ETCD cluster in the standby storage system;
the state monitoring module: the method comprises the steps of monitoring the state of a main storage system, connecting a standby ETCD cluster in the standby storage system when the main storage system is monitored to be in an unavailable state, and acquiring a key value of the standby ETCD cluster;
disaster recovery switching module: and the method is used for switching the bottom layer cluster flow and the user flow of the main storage system to the standby storage system according to the key value of the standby ETCD cluster.
The technical scheme adopted by the embodiment of the invention further comprises the following steps: the disaster recovery switching module switches the bottom layer cluster flow and the user flow of the main storage system to the standby storage system according to the key value of the standby ETCD cluster specifically comprises:
judging whether the bottom layer cluster flow can be switched to the standby storage system according to the key value of the standby ETCD cluster, if so,
loading configuration content of the backup ETCD cluster, reporting heartbeat to the backup ETCD cluster, and switching the bottom layer cluster flow to a backup storage system;
after the bottom layer cluster flow is switched, the standby storage system is automatically upgraded to a main storage system, and the user flow is switched to the upgraded main storage system.
The embodiment of the invention adopts the following technical scheme: an ETCD-based cluster disaster recovery management device, the device comprising a processor, a memory coupled to the processor, wherein,
the memory stores program instructions for implementing the ETCD-based cluster disaster recovery management method;
the processor is configured to execute the program instructions stored in the memory to perform the ETCD-based cluster disaster recovery management operation.
The embodiment of the invention adopts the following technical scheme: a storage medium storing program instructions executable by a processor, where the program instructions are configured to execute the ETCD-based cluster disaster recovery management method.
The beneficial effects of the invention are as follows: according to the ETCD-based cluster disaster recovery management method, system, equipment and storage medium, disaster recovery switching is performed based on the ETCD, and when the main storage system cannot provide service due to the irresistible factors, the main storage system and the standby storage system can be rapidly switched, so that the stability of a cluster is ensured, service provision by using a service by a user is not influenced, and the high availability of cloud service is effectively improved. Meanwhile, because the ETCD is a middleware with strong consistency, fault misjudgment and misswitching caused by data inconsistency can not occur, and disaster recovery of a system level can be realized.
Drawings
Fig. 1 is a schematic flow chart of an ETCD-based cluster disaster recovery management method according to a first embodiment of the present invention;
fig. 2 is a flow chart of an ETCD-based cluster disaster recovery management method according to a second embodiment of the present invention;
FIG. 3 is a flowchart of a method for managing disaster recovery of ETCD-based clusters according to a third embodiment of the present invention;
FIG. 4 is a schematic diagram of a disaster recovery architecture of a CSSP system according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a cluster disaster recovery management system based on ETCD according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an ETCD-based cluster disaster recovery management device according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a storage medium structure according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms "first," "second," "third," and the like in this disclosure are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first", "a second", and "a third" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise. All directional indications (such as up, down, left, right, front, back … …) in embodiments of the present invention are merely used to explain the relative positional relationship, movement, etc. between the components in a particular gesture (as shown in the drawings), and if the particular gesture changes, the directional indication changes accordingly. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
Fig. 1 is a flow chart of an ETCD-based cluster disaster recovery management method according to a first embodiment of the invention. The cluster disaster recovery management method based on ETCD of the first embodiment of the invention comprises the following steps:
s10: establishing a standby storage system for the main storage system, and setting a key value mark for disaster recovery switching for a standby ETCD cluster in the standby storage system;
in this step, in the embodiment of the present application, by setting a spare storage system for the main storage system, data synchronization is not required between the main storage system and the spare storage system. Before disaster recovery, the main storage system provides the management functions of kvm of the bottom layer and storage cluster management for the user, the data flow does not pass through the standby storage system, and the standby storage system does not provide service to the outside. ETCD is a distributed, highly available, consistent key-value store database for storing configuration data for the entire system for a configuration center.
S11: monitoring the state of a main storage system, and when the main storage system is monitored to be in an unavailable state, connecting a standby ETCD cluster in a standby storage system and acquiring a key value of the standby ETCD cluster;
s12: and switching the flow of the main storage system to the standby storage system according to the key value of the standby ETCD cluster.
In this step, the traffic switching includes the underlying cluster traffic switching and the user traffic switching. The bottom layer cluster flow switching specifically comprises the following steps: judging whether the bottom layer cluster flow can be switched to a standby storage system according to the key value of the standby ETCD cluster, if so, loading the configuration content of the standby ETCD cluster, reporting heartbeat to the standby ETCD cluster, switching the bottom layer cluster flow to the standby storage system, and managing bottom layer storage resources by the standby storage system; when the disaster recovery switch is completed, the backup storage system takes over the management functions of the primary storage system to provide the user with the lower KVM (Keyboard Video Mouse abbreviation, KVM can access and control the computer by directly connecting to the keyboard, video or mouse ports) and storage cluster management. Wherein the backup storage system is completely independent from the primary storage system, and only data synchronization of DB (Database) is performed.
Taking a CSSP system as an example, the user traffic switching is mainly the switching between res-mgr and external service addresses, and because the CSSP system entrance is res-mgr, domain name is adopted to provide service to the outside. The user traffic switching is specifically: before disaster recovery switching, the keep alive is adopted to monitor the main CSSP system, when the keep alive monitors that the service port of the main CSSP system is down (dropped), and the bottom cluster flow is switched, the standby CSSP system is automatically upgraded to the main CSSP system, and the user flow is switched to the upgraded main CSSP system.
Based on the above, the cluster disaster recovery management method based on ETCD according to the first embodiment of the present invention performs disaster recovery switching based on ETCD, and when the main storage system cannot provide service due to the irresistible factor, fast switching between the main storage system and the standby storage system can be realized. Because ETCD is a middleware with strong consistency, fault misjudgment and misswitching caused by data inconsistency can not occur, and disaster recovery of a system level can be realized.
Fig. 2 is a flow chart of an ETCD-based cluster disaster recovery management method according to a second embodiment of the present application. The cluster disaster recovery management method based on ETCD in the second embodiment of the application comprises the following steps:
s20: establishing a standby storage system for a main storage system, and setting a key value mark capable of disaster recovery switching for a standby ETCD cluster in the standby storage system;
in this step, in the embodiment of the present application, by setting a spare storage system for the main storage system, data synchronization is not required between the main storage system and the spare storage system. Before disaster recovery, the main storage system provides the management functions of kvm of the bottom layer and storage cluster management for the user, the data flow does not pass through the standby storage system, and the standby storage system does not provide service to the outside.
S21: monitoring the main storage system in real time, judging whether an ETCD cluster of the main storage system is in an available state, and if so, continuing to execute S21; otherwise, executing S22;
s22: connecting a standby ETCD cluster in a standby storage system, and acquiring a key value of the standby ETCD cluster after successful connection;
s23: judging whether the flow of the bottom layer cluster can be switched to the standby storage system according to the key value of the standby ETCD cluster, and executing S24 if the flow can be switched; otherwise, S27 is performed;
s24: loading configuration content of the standby ETCD cluster, reporting heartbeat to the standby ETCD cluster, and switching the bottom layer cluster flow to a standby storage system;
in the step, when disaster recovery is switched, the main storage system is unavailable, the bottom layer cluster flow is switched to the standby storage system, and the standby storage system manages bottom layer storage resources. When the disaster recovery switch is completed, the backup storage system takes over the management functions of the primary storage system to provide the user with the lower KVM (Keyboard Video Mouse abbreviation, KVM can access and control the computer by directly connecting to the keyboard, video or mouse ports) and storage cluster management. Wherein the backup storage system is completely independent from the primary storage system, and only data synchronization of DB (Database) is performed.
S25: after the bottom layer cluster flow is switched, automatically performing primary lifting operation on the standby storage system, and switching the user flow into the standby storage system;
s26: judging whether the main storage system is restored to an available state, and automatically switching the bottom cluster flow and the user flow back to the main storage system when the main storage system is restored to the available state;
s27: and (5) ending.
Based on the above, according to the cluster disaster recovery management method based on ETCD in the second embodiment of the present application, disaster recovery switching is performed based on ETCD, and when the main storage system cannot provide service due to the irresistible factor, fast switching between the main storage system and the standby storage system can be achieved, so that stability of the cluster is ensured, service provision by using service by a user is not affected, and high availability of cloud service is effectively improved. Meanwhile, because ETCD is a middleware with strong consistency, fault misjudgment and misswitching caused by data inconsistency cannot occur, and disaster recovery of a system level is realized.
Further, in order to more clearly describe the implementation process of the embodiment of the present application, a cluster disaster recovery management of a CSSP (Cloud Storage Service Platform, cloud storage resource management platform) system is specifically described below as an example. Fig. 3 is a flow chart of an ETCD-based cluster disaster recovery management method according to a third embodiment of the present application. The cluster disaster recovery management method based on ETCD in the third embodiment of the application comprises the following steps:
s30: establishing a standby CSSP system for a main CSSP system, and setting a key (cssp_can_switch) value mark capable of disaster recovery switching for a standby ETCD cluster in the standby CSSP system;
in this step, as shown in fig. 4, a schematic diagram of a disaster recovery architecture of the CSSP system according to an embodiment of the present application is shown. The CSSP system comprises service components such as res-mgr, mysql, rabbitmq, etcd, wherein res-mgr is a storage resource management service, mysql is a relational database management system of an open source code and is responsible for storing persistent data of a management side, rabitmq is a set of open source message queue service modules and is responsible for communicating with a bottom layer service, and etcd is used for storing configuration data of the whole system for a configuration center. PM physical machines are data gateway layers of effs, which are used for providing NAS (Network Attached Storage ) services of effs-server processes, vm (Virtual Machine) and nasagent in vm, wherein effs-server is used for providing network resource interfaces for managing Virtual Machine resources on physical machines and physical machines for res-mgr to call, nasagent is used for providing interfaces for managing resources such as disk, network, computing and software resources (nfsd, samba, syslog) inside vm, and the efs-server calls the nasagent interfaces through grpc to manage vm internal resources.
According to the embodiment of the application, by setting one standby CSSP for the main CSSP system, service components such as (main/standby) res-mgr, rabbitmq, etcd are deployed in a cluster or high-availability mode, res-mgr service is stateless, and data which are required to be persisted when running in the rabkitmq and etcd are not needed, so that data synchronization is not needed between the main CSSP system and the standby CSSP system. Before disaster recovery, the main CSSP system provides the management functions of kvm and storage cluster management of the lower layer for users through res-mgr, mysql, rabbitmq, etcd and the like, data traffic does not pass through the standby CSSP system, and the standby CSSP system does not provide services to the outside.
S31: monitoring the main CSSP system in real time, judging whether an ETCD cluster of the main CSSP system is in an available state, and if so, continuing to execute S31; otherwise, S32 is performed;
in this step, when services such as ffs-server, nasagent detect that the primary CSSP system is unavailable, a disaster recovery switching logic is executed to switch the bottom layer cluster traffic of the primary CSSP system to the backup CSSP system, and after the disaster recovery switching is completed, the backup CSSP system manages the bottom layer storage resources.
S32: connecting a standby ETCD cluster in a standby CSSP system, and acquiring a key value of the standby ETCD cluster after successful connection;
s33: judging whether the bottom layer cluster flow can be switched to the standby CSSP system according to the key value of the standby ETCD cluster, and if so, executing S34; otherwise, S37 is performed;
s34: loading configuration content of the standby ETCD cluster, reporting heartbeat to the standby ETCD cluster, and switching the bottom layer cluster flow to a standby CSSP system;
in this step, the loaded configuration content includes mysql, rabbitmq and other relevant configurations. As can be seen from the CSSP disaster recovery architecture diagram shown in fig. 4, in the CSSP system, the underlying efs-server, rpcserver provides the function of managing vm through the rubbitmq service, and the heartbeat module provides the functions of reporting heartbeat, modifying configuration and the like through the etcd service. Therefore, only the traffic switching of the bottom layer cluster and the traffic switching of the upper layer user of the service such as the ffs-server, rpcserver, heartbeat in the bottom layer kvm cluster need to be considered in disaster recovery switching. The nasagent and the effs-server in the vm on the PM physical machine in the kvm cluster need to use the etcd and the rubbbitmq in the CSSP system, and because the etcd and the rubbbitmq in the main CSSP system and the standby CSSP system are mutually independent, when the traffic of the bottom cluster is switched, the relevant configuration and the access service of the nasagent and the effs-server in the kvm cluster need to be switched to the standby CSSP system. After the bottom layer cluster traffic switching is completed, the standby CSSP system takes over the management functions of the main CSSP system for providing the user with the lower layer kvm and the storage cluster management. Wherein the standby CSSP system is completely independent from the primary CSSP system and only performs DB data synchronization.
S35: after the bottom layer cluster flow is switched, a cluster management module is adopted to automatically upgrade the standby CSSP system into a main CSSP system, and user flow is switched into the upgraded CSSP system;
in the above, the user traffic switching is mainly the switching between res-mgr and the external service address, and because the CSSP system entry is res-mgr, the domain name is adopted to provide service to the outside. In the embodiment of the application, the cluster management module uses keepalive to switch the user traffic of the CSSP system. keepalive is a service software in cluster management for ensuring high availability of clusters, can detect cluster service nodes, provides the same service IP to the outside through Virtual Redundancy Routing Protocol (VRRP), and can have a plurality of service systems or nodes at the back end. Specifically, before disaster recovery switching, a keep alive is adopted to monitor the main CSSP system, when the keep alive monitors that the service port of the main CSSP system is down and the bottom layer cluster flow is switched, the standby CSSP system is automatically updated to the main CSSP system, and the user flow is switched to the updated CSSP system.
S36: judging whether the main CSSP system is restored to an available state, and when the main CSSP system is restored to the available state, automatically switching the bottom cluster flow and the user flow to the main CSSP system;
s37: and (5) ending.
Based on the above, in the cluster disaster recovery management method based on ETCD according to the third embodiment of the present application, disaster recovery switching is performed on the primary CSSP system and the backup CSSP system based on ETCD, and when the primary CSSP system cannot provide services due to the intolerant factors, fast switching between the primary CSSP system and the backup CSSP system can be achieved, so that stability of the cluster is ensured, service provision by using services by users is not affected, and high availability of cloud services is effectively improved. Meanwhile, because ETCD is a middleware with strong consistency, fault misjudgment and misswitching caused by data inconsistency cannot occur, and disaster recovery of a system level is realized.
In an alternative embodiment, it is also possible to: and uploading the result of the ETCD-based cluster disaster recovery management method to a blockchain.
Specifically, corresponding summary information is obtained based on the result of the ETCD-based cluster disaster recovery management method, specifically, the summary information is obtained by performing hash processing on the result of the ETCD-based cluster disaster recovery management method, for example, by using a sha256s algorithm. Uploading summary information to the blockchain can ensure its security and fair transparency to the user. The user can download the summary information from the blockchain so as to verify whether the result of the ETCD-based cluster disaster recovery management method is tampered. The blockchain referred to in this example is a novel mode of application for computer technology such as distributed data storage, point-to-point transmission, consensus mechanisms, encryption algorithms, and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
Fig. 5 is a schematic structural diagram of an ETCD-based disaster recovery management system according to an embodiment of the present invention. The cluster disaster recovery management system 40 based on ETCD according to the embodiment of the present invention includes:
ETCD marking module 41: the method comprises the steps of establishing a standby storage system for a main storage system, and setting a key value mark for disaster recovery switching for a standby ETCD cluster in the standby storage system; in the embodiment of the application, the standby storage system is arranged for the main storage system, so that data synchronization between the main storage system and the standby storage system is not needed. Before disaster recovery, the main storage system provides the management functions of kvm of the bottom layer and storage cluster management for the user, the data flow does not pass through the standby storage system, and the standby storage system does not provide service to the outside.
Status monitoring module 42: the method comprises the steps of monitoring the state of a main storage system, connecting a standby ETCD cluster in a standby storage system when the main storage system is monitored to be in an unavailable state, and obtaining a key value of the standby ETCD cluster;
disaster recovery switching module 43: the method comprises the steps of switching the flow of a main storage system to a standby storage system according to key values of the standby ETCD clusters; the traffic switching comprises bottom layer cluster traffic switching and user traffic switching. The bottom layer cluster flow switching specifically comprises the following steps: judging whether the bottom layer cluster flow can be switched to the standby storage system according to the key value, if so, loading the configuration content of the standby ETC cluster D, reporting the heartbeat to the standby ETCD cluster, switching the bottom layer cluster flow to the standby storage system, and managing bottom layer storage resources by the standby storage system; when the disaster recovery switch is completed, the backup storage system takes over the management functions of the primary storage system to provide the user with the lower KVM (Keyboard Video Mouse abbreviation, KVM can access and control the computer by directly connecting to the keyboard, video or mouse ports) and storage cluster management. Wherein the backup storage system is completely independent from the primary storage system, and only data synchronization of DB (Database) is performed.
Taking a CSSP system as an example, the user traffic switching is mainly the switching between res-mgr and external service addresses, and because the CSSP system entrance is res-mgr, domain name is adopted to provide service to the outside. The user traffic switching is specifically: before disaster recovery switching, a keep alive is adopted to monitor a main CSSP system, when the keep alive monitors that a service port of the main CSSP system is down and the bottom layer cluster flow is switched, main lifting operation is automatically carried out on a standby CSSP system, and user flow is switched into the standby CSSP system.
Based on the above, the cluster disaster recovery management system based on ETCD in the embodiment of the present invention performs disaster recovery switching based on ETCD, and when the main storage system cannot provide service due to the irresistible factor, the main storage system and the standby storage system can be rapidly switched, so that stability of the cluster is ensured, service provision by using service by a user is not affected, and high availability of cloud service is effectively improved. Meanwhile, because ETCD is a middleware with strong consistency, fault misjudgment and misswitching caused by data inconsistency cannot occur, and disaster recovery of a system level is realized.
Fig. 6 is a schematic structural diagram of an ETCD-based cluster disaster recovery management device according to an embodiment of the invention. The device 50 includes a processor 51, a memory 52 coupled to the processor 51.
The memory 52 stores program instructions for implementing the ETCD-based cluster disaster recovery management method described above.
Processor 51 is operative to execute program instructions stored in memory 52 to perform ETCD-based cluster disaster recovery management operations.
The processor 51 may also be referred to as a CPU (Central Processing Unit ). The processor 51 may be an integrated circuit chip with signal processing capabilities. Processor 51 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
According to the terminal, disaster recovery switching is performed based on the ETCD, when the main storage system cannot provide service due to the irresistible factors, the main storage system and the standby storage system can be rapidly switched, so that the stability of a cluster is ensured, the service providing service of a user is not influenced, and the high availability of cloud service is effectively improved. Meanwhile, because ETCD is a middleware with strong consistency, fault misjudgment and misswitching caused by data inconsistency cannot occur, and disaster recovery of a system level is realized.
Fig. 7 is a schematic structural diagram of a storage medium according to an embodiment of the invention. The storage medium of the embodiment of the present invention stores a program file 61 capable of implementing all the methods described above, where the program file 61 may be stored in the storage medium in the form of a software product, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes, or a terminal device such as a computer, a server, a mobile phone, a tablet, or the like.
According to the storage medium, disaster recovery switching is performed based on the ETCD, when the main storage system cannot provide service due to the irresistible factors, the main storage system and the standby storage system can be rapidly switched, so that the stability of a cluster is ensured, the service providing service of a user is not influenced, and the high availability of cloud service is effectively improved. Meanwhile, because ETCD is a middleware with strong consistency, fault misjudgment and misswitching caused by data inconsistency cannot occur, and disaster recovery of a system level is realized.
In the several embodiments provided in the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the system embodiments described above are merely illustrative, e.g., the partitioning of elements is merely a logical functional partitioning, and there may be additional partitioning in actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not implemented. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units. The foregoing is only the embodiments of the present invention, and therefore, the patent scope of the invention is not limited thereto, and all equivalent structures or equivalent processes using the descriptions of the present invention and the accompanying drawings, or direct or indirect application in other related technical fields, are included in the scope of the invention.

Claims (6)

1. The cluster disaster recovery management method based on ETCD is characterized by comprising the following steps:
establishing a standby storage system for a main storage system, and setting a key value mark for disaster recovery switching for a standby ETCD cluster in the standby storage system;
monitoring the state of the main storage system, and when the main storage system is monitored to be in an unavailable state, connecting a standby ETCD cluster in the standby storage system and acquiring a key value of the standby ETCD cluster;
switching the bottom layer cluster flow and the user flow of the main storage system to a standby storage system according to the key value of the standby ETCD cluster;
monitoring the state of the main storage system, and automatically switching the bottom layer cluster flow and the user flow back to the main storage system when the main storage system is restored to the available state;
wherein the switching the bottom layer cluster traffic and the user traffic of the main storage system to the backup storage system according to the key value of the backup ETCD cluster includes:
judging whether the bottom layer cluster flow can be switched to the standby storage system according to the key value of the standby ETCD cluster, if so,
loading configuration content of the backup ETCD cluster, reporting heartbeat to the backup ETCD cluster, and switching the bottom layer cluster flow to a backup storage system;
after the bottom layer cluster flow is switched, the standby storage system is automatically upgraded to a main storage system, and the user flow is switched to the upgraded main storage system.
2. The ETCD based cluster disaster recovery management method of claim 1, wherein the primary storage system and the backup storage system are a primary CSSP system and a backup CSSP system, respectively, and the backup CSSP system is independent from the primary CSSP system.
3. The ETCD-based cluster disaster recovery management method according to claim 2, wherein the switching the bottom layer cluster traffic and the user traffic of the primary storage system to the backup storage system according to the key value specifically includes:
when the main CSSP system is in an unavailable state, disaster recovery switching logic is executed, configuration content of the standby ETCD cluster is loaded, wherein the configuration content comprises a mysql relational database management system and a rabhimmq message queue service module, the bottom cluster flow of the main CSSP system is switched to the standby CSSP system, and the standby CSSP system manages bottom storage resources;
and monitoring the main CSSP system by adopting a cluster management module, and automatically upgrading the standby storage system to the main storage system and switching the user flow to the upgraded main CSSP system when the service port of the main CSSP system is monitored to be disconnected and the bottom cluster flow is switched.
4. An ETCD-based cluster disaster recovery management system, comprising:
ETCD marking module: the method comprises the steps of establishing a standby storage system for a main storage system, and setting a key value mark for disaster recovery switching for a standby ETCD cluster in the standby storage system;
the state monitoring module: the method comprises the steps of monitoring the state of a main storage system, connecting a standby ETCD cluster in the standby storage system when the main storage system is monitored to be in an unavailable state, and acquiring a key value of the standby ETCD cluster;
disaster recovery switching module: the method comprises the steps that the bottom layer cluster flow and the user flow of a main storage system are switched to the standby storage system according to the key value of the standby ETCD cluster, the state of the main storage system is monitored, and when the main storage system is restored to the available state, the bottom layer cluster flow and the user flow are automatically switched back to the main storage system;
the disaster recovery switching module switches the bottom layer cluster flow and the user flow of the main storage system to the backup storage system according to the key value of the backup ETCD cluster specifically comprises:
judging whether the bottom layer cluster flow can be switched to the standby storage system according to the key value of the standby ETCD cluster, if so,
loading configuration content of the backup ETCD cluster, reporting heartbeat to the backup ETCD cluster, and switching the bottom layer cluster flow to a backup storage system;
after the bottom layer cluster flow is switched, the standby storage system is automatically upgraded to a main storage system, and the user flow is switched to the upgraded main storage system.
5. An ETCD-based cluster disaster recovery management device, characterized in that the device comprises a processor, a memory coupled to the processor, wherein,
the memory stores program instructions for implementing the ETCD-based cluster disaster recovery management method of any one of claims 1 to 3;
the processor is configured to execute the program instructions stored in the memory to execute the ETCD-based cluster disaster recovery management method.
6. A storage medium storing program instructions executable by a processor for performing the ETCD-based cluster disaster recovery management method as set forth in any one of claims 1 to 3.
CN202210209647.XA 2022-03-03 2022-03-03 Cluster disaster recovery management method, system, equipment and storage medium based on ETCD Active CN114584458B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210209647.XA CN114584458B (en) 2022-03-03 2022-03-03 Cluster disaster recovery management method, system, equipment and storage medium based on ETCD

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210209647.XA CN114584458B (en) 2022-03-03 2022-03-03 Cluster disaster recovery management method, system, equipment and storage medium based on ETCD

Publications (2)

Publication Number Publication Date
CN114584458A CN114584458A (en) 2022-06-03
CN114584458B true CN114584458B (en) 2023-06-06

Family

ID=81774231

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210209647.XA Active CN114584458B (en) 2022-03-03 2022-03-03 Cluster disaster recovery management method, system, equipment and storage medium based on ETCD

Country Status (1)

Country Link
CN (1) CN114584458B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115208743A (en) * 2022-07-18 2022-10-18 中国工商银行股份有限公司 ETCD-based cross-site cluster deployment method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111176888A (en) * 2018-11-13 2020-05-19 浙江宇视科技有限公司 Cloud storage disaster recovery method, device and system
CN113590386A (en) * 2021-07-30 2021-11-02 深圳前海微众银行股份有限公司 Disaster recovery method, system, terminal device and computer storage medium for data

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103176909A (en) * 2011-12-26 2013-06-26 中国银联股份有限公司 Service processing method and service processing system
CN107231221B (en) * 2016-03-25 2020-10-23 阿里巴巴集团控股有限公司 Method, device and system for controlling service flow among data centers
CN110046064B (en) * 2018-01-15 2020-08-04 厦门靠谱云股份有限公司 Cloud server disaster tolerance implementation method based on fault drift
CN110086726A (en) * 2019-04-22 2019-08-02 航天云网科技发展有限责任公司 A method of automatically switching Kubernetes host node
CN110740167A (en) * 2019-09-20 2020-01-31 北京浪潮数据技术有限公司 distributed storage system and node monitoring method thereof
CN111371599A (en) * 2020-02-26 2020-07-03 山东汇贸电子口岸有限公司 Cluster disaster recovery management system based on ETCD
CN112367214B (en) * 2020-10-12 2022-06-14 成都精灵云科技有限公司 Method for rapidly detecting and switching main node based on etcd
CN113949691A (en) * 2021-10-15 2022-01-18 湖南麒麟信安科技股份有限公司 ETCD-based virtual network address high-availability implementation method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111176888A (en) * 2018-11-13 2020-05-19 浙江宇视科技有限公司 Cloud storage disaster recovery method, device and system
CN113590386A (en) * 2021-07-30 2021-11-02 深圳前海微众银行股份有限公司 Disaster recovery method, system, terminal device and computer storage medium for data

Also Published As

Publication number Publication date
CN114584458A (en) 2022-06-03

Similar Documents

Publication Publication Date Title
US9747179B2 (en) Data management agent for selective storage re-caching
US9037899B2 (en) Automated node fencing integrated within a quorum service of a cluster infrastructure
US9639437B2 (en) Techniques to manage non-disruptive SAN availability in a partitioned cluster
US10810096B2 (en) Deferred server recovery in computing systems
JP2021506044A (en) Automatically deployed information technology (IT) systems and methods
US20140365537A1 (en) File storage system, apparatus, and file access method
US20080281959A1 (en) Managing addition and removal of nodes in a network
CN105933391A (en) Node capacity expansion method, device and system
CN108572976A (en) Data reconstruction method, relevant device and system in a kind of distributed data base
CN112395047A (en) Virtual machine fault evacuation method, system and computer readable medium
US20090044186A1 (en) System and method for implementation of java ais api
CN108614728A (en) Virtual machine service providing method, device, equipment and computer readable storage medium
CN110908723A (en) Main/standby switching method and device of operating system and related equipment
CN114584458B (en) Cluster disaster recovery management method, system, equipment and storage medium based on ETCD
CN110262893A (en) The method, apparatus and computer storage medium of configuration mirroring memory
CN111147274B (en) System and method for creating a highly available arbitration set for a cluster solution
US20210109735A1 (en) Networking-device-based hyper-coverged infrastructure edge controller system
US11579911B1 (en) Emulated edge locations in cloud-based networks for testing and migrating virtualized resources
US11153173B1 (en) Dynamically updating compute node location information in a distributed computing environment
CN116192885A (en) High-availability cluster architecture artificial intelligent experiment cloud platform data processing method and system
EP3884648B1 (en) Geo-replicated iot hub
CN108920164A (en) The management method and device of host in cloud computing system
CN115426251B (en) Disaster recovery method, device and medium of cloud host
US20230409215A1 (en) Graph-based storage management
CN114880019A (en) Method, device and medium for synchronizing primary and standby mirror image user configuration data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant