WO2017071274A1 - 双活集群系统中容灾的方法及装置 - Google Patents

双活集群系统中容灾的方法及装置 Download PDF

Info

Publication number
WO2017071274A1
WO2017071274A1 PCT/CN2016/087915 CN2016087915W WO2017071274A1 WO 2017071274 A1 WO2017071274 A1 WO 2017071274A1 CN 2016087915 W CN2016087915 W CN 2016087915W WO 2017071274 A1 WO2017071274 A1 WO 2017071274A1
Authority
WO
WIPO (PCT)
Prior art keywords
arbitration
storage array
host
service
application
Prior art date
Application number
PCT/CN2016/087915
Other languages
English (en)
French (fr)
Inventor
陈怡佳
刘辉
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP16858717.8A priority Critical patent/EP3285168B1/en
Publication of WO2017071274A1 publication Critical patent/WO2017071274A1/zh
Priority to US15/892,003 priority patent/US10671498B2/en
Priority to US16/839,205 priority patent/US11194679B2/en
Priority to US17/529,770 priority patent/US11809291B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2025Failover techniques using centralised failover control functionality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/142Reconfiguring to eliminate the error
    • G06F11/1425Reconfiguring to eliminate the error by reconfiguration of node membership
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2033Failover techniques switching over of hardware resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2069Management of state, configuration or failover
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2064Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring while ensuring consistency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2071Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring using a plurality of controllers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a disaster recovery method and apparatus in a dual active cluster system.
  • the active-active cluster system includes a host cluster, several storage arrays, and an arbitration server.
  • the storage array A and the storage array B are used as an example.
  • the host cluster can send and receive services to and from the storage array A and storage array B.
  • a sends a write service the host in the cluster system first writes data to the storage array A, and then the storage array A writes the data to the storage array A and the storage array B at the same time, and writes in the storage array A and the storage array B.
  • the storage array A After the data is completed, the storage array A returns the host data writing completion, and the host cluster writes the data to the storage array B in a similar manner to the above basic process.
  • the storage array A and the storage array B are split, that is, when a communication failure occurs between the storage array A and the storage array B, one storage array in the active-active cluster system can automatically take over the service, thereby avoiding business downtime and data loss.
  • the host sends data to the storage array A
  • the communication between the storage array A and the storage array B fails, and the storage array A and the storage array B respectively initiate an arbitration request to the arbitration server after detecting the failure, and the arbitration server according to the logic
  • the determination determines that the storage array A takes over the service and responds to the storage array A and storage array B arbitration results.
  • the host sends data to the storage array A because the fault between the storage array A and the storage array B is not synchronized to the storage array B. If the storage array B has not taken over the host when the storage array A takes over the service.
  • the data read by the host to storage array B will be the wrong data, which will cause I/O (Input/Output, I/O) isolation fencing.
  • I/O Input/Output
  • the storage array A and the storage array B are agreed by software, and the storage array A is required to stop the service before the unilateral service is provided, for example, the agreement When storage array A detects that storage array B cannot communicate, it takes 10 seconds. After receiving the service, it is ensured that the storage array A has stopped the service before the storage array A provides the service on one side.
  • the prior art method has strict timing requirements for the storage array A and the storage array B, and when the storage array is actually running, there are some uncontrollable factors, for example, the system CPU passes High and network delays cause storage array B to stop services in time, resulting in chaotic timing and inconsistent data between storage arrays, resulting in I/O isolation.
  • the embodiment of the present invention provides a method and a device for disaster tolerance in a dual-active cluster system, which can solve the problem that the storage array B cannot stop in time due to the uncontrollable factor existing in the actual operation of the storage array in the prior art, thereby causing the storage array B to stop the service in time.
  • the timing is chaotic and the data between the storage arrays is inconsistent, causing I/O isolation problems.
  • an embodiment of the present invention provides a disaster recovery method in a dual-active cluster system, configured for a host cluster and a system consisting of at least one pair of storage arrays, where the host cluster includes an arbitration host, and the arbitration host includes An arbitration unit, the arbitration host is an application host having an arbitration function, and the at least one pair of storage arrays includes a first storage array and a second storage array, including:
  • the arbitration host receives an arbitration request, and the arbitration request is sent when the first storage array or the second storage array detects that the peer storage array is faulty;
  • the arbitration host suspends the delivery of the service to the first storage array and the second storage array
  • the arbitration host stops the service of the storage array with the arbitration failure
  • the arbitration host sends arbitration win information to the arbitration winning storage array, so that the arbitration winning storage array changes the manner of receiving write data from the synchronous write local and remote storage array mode to the write-only local mode;
  • the arbitration host resumes the delivery service with the arbitration winning storage array.
  • the host cluster further includes at least one application host; and the arbitration host receiving the arbitration request includes:
  • the arbitration host receives an arbitration request sent by the at least one application host, and the arbitration request sent by the at least one application host is used by the at least one application host to receive the arbitration sent by the first storage array or the second storage array. Forwarded after request.
  • the method before the arbitrating host sends a notification to the arbitrating winning storage array, the method further includes:
  • the arbitrating host sends a first indication to the at least one application host, where the first indication is used to indicate that the at least one application host suspends sending services to the first storage array and the second storage array;
  • the arbitration host receives the at least one application host response information, where the response information is used to indicate that the at least one application host has stopped the service with the arbitration failure storage array;
  • the method further includes:
  • the arbitration host sends a second indication to the at least one application host, where the second indication is used to indicate the delivered service of the at least one application host and the arbitration winning storage array.
  • the method further includes:
  • the arbitration host restores the delivered service of the arbitration failure storage array.
  • the host cluster further includes at least one application host; and the service recovery request for receiving the arbitration failure storage array includes:
  • the method further includes:
  • the arbitrating host sends a third indication to the at least one application host, where the third indication is used to instruct the at least one application host to resume the delivered service of the quorum failed storage array.
  • an embodiment of the present invention provides an apparatus for disaster tolerance in a dual-active cluster system, where the apparatus includes a host cluster and at least one pair of storage arrays, the host cluster includes an arbitration host, and the arbitration host includes arbitration a unit, the arbitration host is an application host having an arbitration function, the at least one pair of storage arrays includes a first storage array and a second storage array, and the arbitration host further includes: a receiving unit, a suspending unit, a determining unit, and a stopping unit , sending unit and recovery unit;
  • the receiving unit is configured to receive an arbitration request, where the arbitration request is sent when the first storage array or the second storage array detects that the peer storage array is faulty;
  • the suspending unit is configured to suspend sending services to the first storage array and the second storage array;
  • the determining unit is configured to determine, according to a logical judgment, an arbitration winning storage array and an arbitration failure storage array in the first storage array and the second storage array;
  • the stopping unit is configured to stop the service of the storage array with the arbitration failure
  • the sending unit is configured to send arbitration win information to the arbitration winning storage array, so that the arbitration winning storage array changes the manner of receiving write data from a synchronous write local and remote storage array manner to a write-only local mode;
  • the recovery unit is configured to restore the delivered service with the arbitration winning storage array.
  • the host cluster further includes at least one application host;
  • the receiving unit is configured to receive an arbitration request sent by the at least one application host, and the arbitration request sent by the at least one application host receives, by the at least one application host, the first storage array or the second storage array Forwarded after the arbitration request sent.
  • the sending unit is further configured to send a first indication to the at least one application host, where the first indication is used to indicate The at least one application host suspends delivery of the service to the first storage array and the second storage array;
  • the receiving unit is further configured to receive the at least one application host response information, where the response information is used to indicate that the at least one application host has stopped sending services to the second array;
  • the sending unit is further configured to send a second indication to the at least one application host, where the second indication is used to indicate the delivered service of the at least one application host and the arbitration winning storage array.
  • the receiving unit is further configured to receive a service recovery request of the arbitration failure storage array
  • the recovery unit is further configured to restore the delivered service of the arbitration failure storage array.
  • the host cluster further includes at least one application host
  • the receiving unit is configured to receive a service recovery request sent by the at least one application host, where the at least one application host sends a service recovery request, where the at least one application host receives the service recovery request sent by the arbitration failure storage array. After sending.
  • the sending unit is further configured to send, to the at least one application host, a third indication, where the third indication is used to indicate The at least one application host recovers the delivered service of the arbitration failure storage array.
  • the arbitration host is an application host with an arbitration function, and the arbitration unit is set in the arbitration host, which can be completed in the prior art.
  • the arbitration function of the arbitration function server sends an arbitration request to the arbitration host when the first storage array and the second storage array detect that the peer storage array is faulty, and the arbitration host pauses to the arbitration request after receiving the arbitration request.
  • the first storage array and the second storage array deliver the service, and after the arbitration host determines the arbitration result, stop and the secondary Cutting the service of the storage array and recovering the service with the arbitrated winning storage array, avoiding arbitration or application host communication failure between the first storage array and the second storage array, before determining the arbitration result, to the first
  • the data inconsistency caused by the synchronization of the service data sent by the storage array or the second storage array does not cause the I/O isolation, and the execution process of the embodiment of the present invention has strict timing requirements on the storage array A and the storage array B, thereby avoiding When the storage array is actually running, the timing is chaotic due to uncontrollable factors, and the data between the storage arrays is inconsistent, resulting in I/O isolation.
  • FIG. 1 is a schematic structural diagram of an embodiment of the present invention
  • FIG. 3 is a flowchart of a method according to another embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a device according to another embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a network device according to another embodiment of the present invention.
  • the embodiment of the present invention is applicable to a dual-active cluster system, where the system includes a host cluster and at least one pair of storage arrays, the host cluster includes an arbitration host, and the arbitration host is an application host having an arbitration function.
  • the arbitration host includes an arbitration unit, it has an arbitration function.
  • the host cluster includes an arbitration host and at least one application host
  • the pair of storage arrays includes a first storage array and a second storage array, and each of the host clusters
  • the host (including the arbitration host and the application host) can communicate with each other, and each host can be connected to the first storage array and the second storage array respectively, and the first storage array and the second storage array are connected to perform data communication.
  • the host cluster may also include only the arbitration host, and the communication manner is the same as that in FIG.
  • An embodiment of the present invention provides a method for disaster tolerance in a dual-active cluster system. As shown in FIG. 2, the method includes:
  • the arbitration host receives the arbitration request.
  • the arbitration request is sent when the first storage array or the second storage array detects that the peer storage array is faulty.
  • the arbitration host suspends sending services to the first storage array and the second storage array.
  • the arbitration host determines, according to the logical judgment, the arbitration winning storage array and the arbitration failure storage array in the first storage array and the second storage array.
  • the arbitration host stops the service of the storage array with the arbitration failure.
  • the arbitration host sends arbitration win information to the arbitration winning storage array.
  • the arbitration host sends arbitration win information to the arbitration winning storage array, so that the arbitration winning storage array changes the manner of receiving write data from the synchronous write local and remote storage array mode to the write-only local mode.
  • the write data is synchronously written to the local and remote storage arrays.
  • the write service sent by the host needs to be synchronized to the second storage array.
  • the write data is written only in the local state, indicating that the write service sent by the host only needs to be written locally. .
  • the arbitration host resumes the delivery service with the arbitration winning storage array.
  • the arbitration host may select an application host as the arbitration host in the host cluster, and may also add an arbitration host to the host cluster.
  • the arbitration host may also have two or more, one is used as an arbitration host during normal operation, and the other is used as a standby arbitration host. When the arbitration host in normal operation fails, the terminal may be selected. The standby arbitration host performs the corresponding operations.
  • the method of selecting an application host as the quorum host in the host cluster may be any mode, which is not limited herein.
  • the application host in the application host cluster can be directly used as the arbitration host, and the arbitration server is not configured to reduce the cost, the deployment is simpler, the fault domain is reduced, and the arbitration server in the prior art is prevented from being single-point fault.
  • the situation improves the reliability of the system, and the arbitration function and the host are combined to reduce the system complexity and maintenance cost, and also avoid the arbitration misjudgment caused by the arbitration server and the host service in the prior art.
  • the web is more flexible.
  • the embodiment of the present invention can solve the problem that the first and second storage arrays in the dual-active cluster system are broken due to the link between the arrays, causing the two storage arrays to split into two independent operating systems, resulting in two independent storage arrays. Data inconsistency issues that may arise while providing business at the same time.
  • an arbitration unit is set in the arbitration host, and the arbitration host is an application host having an arbitration function, which can complete the arbitration function of the arbitration function server in the prior art, when the first storage array and the second storage array detect the opposite end.
  • the arbitration host sends an arbitration request to the arbitration host.
  • the arbitration host suspends the service to the first storage array and the second storage array, and determines the arbitration result in the arbitration host.
  • the arbitration or the application host is prevented from communicating between the first storage array and the second storage array.
  • the execution process of the embodiment of the present invention is strict to the storage array A and the storage array B. Timing requirements to avoid the memory array in actual operation due to uncontrollable factors Confusion, inconsistent data between the storage array, resulting in I / O isolation.
  • a further embodiment of the present invention provides a method for disaster tolerance in a dual-active cluster system. As shown in FIG. 3, the method includes:
  • the arbitration request sent by the second storage array or the second storage array may be sent to the arbitration host in the host cluster, or may be sent to other hosts connected to the host, and the other hosts receive the first storage array discovery and the second storage. After the arbitration request sent by the array, the arbitration request is forwarded to the arbitration host.
  • the arbitrator host suspends the delivery of the service to the first storage array and the second storage array, and instructs the other application host to suspend the delivery of the service to the first storage array and the second storage array.
  • the quorum host When the host cluster only has the quorum host, the quorum host performs the function of the application host at the same time, and the other application hosts are not required to suspend the service to the first storage array and the second storage array.
  • this step can avoid data inconsistency caused by the host with the application host function in the host cluster during the delivery of the service to the first storage array and the second storage array.
  • the arbitration host determines, according to a logical judgment, that the first storage array is an arbitration winning storage array, and the second storage array is an arbitration failure storage array.
  • the arbitration host determines the arbitration winning storage array and the arbitration failure storage array in the first storage array and the second storage array according to the logical judgment.
  • the first storage array is used as the arbitration winning storage array
  • the second storage array is the arbitration.
  • a failed storage array is an example.
  • the method for the logical judgment of the arbitration host is not limited herein.
  • the arbitration host stops the service with the second storage array, and instructs the other application host to stop the service with the second storage array.
  • the service that stops the second storage array in this step may be a software stop and a second storage array service, thereby preventing the second storage array from being obtained between the host and the first storage array and the host having the application host function. contact.
  • the host cluster has only the quorum host, it is not necessary to instruct other application hosts to stop the service with the second storage array.
  • the arbitration host receives response information of other application hosts.
  • the response information is used to indicate that other application hosts have stopped services with the second storage array.
  • the arbitration host sends arbitration win information to the first storage array.
  • the arbitration winning information indicates that the first storage array takes over the service with the host cluster when the fault occurs between the first storage array and the second storage array.
  • the first storage array changes the manner in which the write data is received by the synchronous write local and remote storage arrays to the write-only local mode.
  • the write data mode is synchronous write local and remote storage array mode, indicating that the write service sent by the host needs to be synchronized to the second storage array, and the write data mode is write-only local state, indicating that the write service sent by the host only needs to be written locally. Just fine.
  • the arbitrator host resumes the service with the first storage array, and instructs the other application host to resume the service with the first storage array.
  • the host cluster has only the arbitrator host, the other application hosts need not be instructed to restore the service with the first storage array.
  • the above steps complete the arbitration process when the first storage array fails between the first storage array and the second storage array, and the manner of data processing after arbitration.
  • the first storage array needs to synchronize the data of the difference during the failure to the second storage array, and change the manner of receiving the write data from the write-only local mode to the synchronous write.
  • the second storage array has completed the operation, and after receiving the reply to the first storage array, the second storage array initiates a service recovery request to the arbitration host.
  • the arbitration host receives a service recovery request.
  • the service recovery request sent by the second storage array may be sent to the arbitration host in the host cluster, or may be sent to other application hosts connected to the host, and after receiving the service recovery request sent by the second storage array, the other application host receives the service recovery request sent by the second storage array.
  • the service recovery request is forwarded to the arbitrator host.
  • the arbitrating host restores the delivered service of the second storage array, and instructs the other application host to restore the delivered service of the second storage array.
  • the host cluster when the host cluster only has an arbitration host, it does not need to indicate that other application hosts are restored.
  • the service of the second storage array is delivered.
  • the arbitration host replies to the second storage array service recovery request that has been processed.
  • the first storage array and the second storage array are restored to the normal data processing process.
  • an arbitration unit is set in the arbitration host, and the arbitration host is an application host having an arbitration function, which can complete the arbitration function of the arbitration function server in the prior art, when the first storage array and the second storage array detect the opposite end.
  • the arbitration host sends an arbitration request to the arbitration host.
  • the arbitration host suspends the service to the first storage array and the second storage array, and determines the arbitration result in the arbitration host.
  • the arbitration or the application host is prevented from communicating between the first storage array and the second storage array.
  • the execution process of the embodiment of the present invention is strict to the storage array A and the storage array B. Timing requirements to avoid the memory array in actual operation due to uncontrollable factors Confusion, inconsistent data between the storage array, resulting in I / O isolation.
  • a further embodiment of the present invention provides a device 30 for disaster tolerance in a dual-active cluster system.
  • the device 30 includes a host cluster and at least one pair of storage arrays, and the host cluster includes an arbitration host.
  • the arbitration host includes an arbitration unit, the arbitration host is an application host having an arbitration function, the at least one pair of storage arrays includes a first storage array and a second storage array, and the arbitration host further includes: a receiving unit 31, a pause unit 32, determining unit 33, stopping unit 34, transmitting unit 35 and recovery unit 36;
  • the receiving unit 31 is configured to receive an arbitration request, where the arbitration request is sent when the first storage array or the second storage array detects that the peer storage array is faulty;
  • the suspending unit 32 is configured to suspend sending services to the first storage array and the second storage array;
  • the determining unit 33 is configured to determine the first storage array and the first according to a logical judgment Arbitrating the winning storage array and the arbitration failed storage array in the second storage array;
  • the stopping unit 34 is configured to stop the service of the storage array with the arbitration failure
  • the sending unit 35 is configured to send arbitration win information to the arbitration winning storage array, so that the arbitration winning storage array changes the manner of receiving write data from the synchronous write local and remote storage array mode to the write-only local mode;
  • the recovery unit 36 is configured to restore the delivered service with the arbitration winning storage array.
  • the host cluster further includes at least one application host.
  • the receiving unit 31 is specifically configured to receive an arbitration request sent by the first storage array or the second storage array.
  • the host cluster further includes at least one application host; the receiving unit 31 is specifically configured to receive an arbitration request sent by the at least one application host, and the arbitration request sent by the at least one application host is the at least one application. The host forwards the arbitration request sent by the first storage array or the second storage array.
  • the host cluster further includes at least one application host;
  • the sending unit 35 is further configured to send a first indication to the at least one application host, where the first indication is used to indicate that the at least one application host pauses The first storage array and the second storage array deliver a service;
  • the receiving unit 31 is further configured to receive the at least one application host response information, where the response information is used to indicate that the at least one application host has stopped sending a service to the second storage array.
  • the sending unit 35 is further configured to send a second indication to the at least one application host, where the second indication is used to indicate the delivered service of the at least one application host and the arbitration winning storage array.
  • the receiving unit 31 is further configured to receive a service recovery request of the arbitration failure storage array.
  • the recovery unit 36 is further configured to restore the delivered service of the arbitration failure storage array.
  • the receiving unit 31 is specifically configured to receive the arbitration failure storage array to send Business recovery request.
  • the host cluster further includes at least one application host; the receiving unit 31 is specifically configured to receive a service recovery request sent by the at least one application host, where the service recovery request sent by the at least one application host is the at least An application host sends the service recovery request sent by the arbitration failure storage array.
  • the sending unit 35 is further configured to send, to the at least one application host, a third indication, where the third indication is used to instruct the at least one application host to resume the delivered service of the arbitration failed storage array.
  • an arbitration unit is set in the arbitration host, and the arbitration host is an application host having an arbitration function, which can complete the arbitration function of the arbitration function server in the prior art, when the first storage array and the second storage array detect the opposite end.
  • the arbitration host sends an arbitration request to the arbitration host.
  • the arbitration host suspends the service to the first storage array and the second storage array, and determines the arbitration result in the arbitration host.
  • the arbitration or the application host is prevented from communicating between the first storage array and the second storage array.
  • the execution process of the embodiment of the present invention is strict to the storage array A and the storage array B. Timing requirements to avoid the memory array in actual operation due to uncontrollable factors Confusion, inconsistent data between the storage array, resulting in I / O isolation.
  • a further embodiment of the present invention provides a disaster-tolerant network device 40 in a dual-active cluster system, configured for a host cluster and a system consisting of at least one pair of storage arrays, where the host cluster includes an arbitration host, and the arbitration host includes arbitration.
  • the at least one pair of storage arrays includes a first storage array and a second storage array.
  • the network device 40 serves as an arbitration host.
  • the network device 40 includes a processor 41 and The interface circuit 42, which also shows the memory 43 and the bus 44, is connected, and the processor 41, the interface circuit 42, and the memory 43 are connected via the bus 44 and perform communication with each other.
  • the processor 41 herein may be a processing component or a collective name of multiple processing components.
  • the processing component may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present invention.
  • CPU Central Processing Unit
  • ASIC Application Specific Integrated Circuit
  • DSPs digital singal processors
  • FPGAs Field Programmable Gate Arrays
  • the memory 43 may be a storage device or a collective name of a plurality of storage elements, and is used to store executable program code or parameters, data, and the like required for the operation of the access network management device. And the memory 43 may include a random access memory (RAM), and may also include a non-volatile memory such as a magnetic disk memory, a flash memory, or the like.
  • RAM random access memory
  • the bus 44 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus.
  • ISA Industry Standard Architecture
  • PCI Peripheral Component
  • EISA Extended Industry Standard Architecture
  • the bus 44 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 5, but it does not mean that there is only one bus or one type of bus.
  • Network device 40 may also include input and output devices coupled to bus 44 for connection to other portions, such as processor 41, via bus 44.
  • the processor 41 calls the program code in the memory 43 for performing the operations performed by the network device 40 in the above method embodiment.
  • the processor 41 is configured to receive an arbitration request by using the interface circuit 42, where the arbitration request is sent when the first storage array or the second storage array detects that the peer storage array is faulty; Suspending service to the first storage array and the second storage array; and determining, according to logical judgment, determining an arbitration winning storage array and arbitration failure storage in the first storage array and the second storage array An array; and, for stopping the service of the storage array with the arbitration failure; and for transmitting arbitration win information to the arbitration winning storage array through the interface circuit 42 so that the arbitration winning storage array will receive the write data mode
  • the synchronous write local and remote storage array mode is changed to the write-only local mode; and is used to restore the delivered service with the arbitration winning storage array.
  • the processor 41 is further configured to receive, by using the interface circuit 42, an arbitration request sent by the first storage array or the second storage array.
  • the host cluster further includes at least one application host.
  • the host cluster further includes at least one application host; the processor 41 is further configured to receive, by using the interface circuit 42, an arbitration request sent by the at least one application host, where the at least The arbitration request sent by an application host is forwarded after the at least one application host receives the arbitration request sent by the first storage array or the second storage array.
  • the host cluster further includes at least one application host; the processor 41 is further configured to send, by using the interface circuit 42, the first indication to the at least one application host, where the An indication for instructing the at least one application host to suspend delivery of traffic to the first storage array and the second storage array; and, for receiving the at least one application host response information through the interface circuit 42,
  • the response information is used to indicate that the at least one application host has stopped sending services to the second storage array; and is configured to send a second indication to the at least one application host through the interface circuit 42, where the second indication is used And indicating the delivered service of the at least one application host and the arbitration winning storage array.
  • the processor 41 is further configured to receive, by using the interface circuit 42, a service recovery request of the arbitration failure storage array; and, for recovering the release of the arbitration failure storage array. business.
  • the processor 41 is further configured to receive, by using the interface circuit 42, a service recovery request sent by the arbitration failure storage array.
  • the host cluster further includes at least one application host; the processor 41 is further configured to receive, by using the interface circuit 42, a service recovery request sent by the at least one application host, where The service recovery request sent by the at least one application host is sent by the at least one application host after receiving the service recovery request sent by the arbitration failure storage array Sent.
  • the processor 41 is further configured to send, by using the interface circuit 42 to the at least one application host, a third indication, where the third indication is used to indicate the at least one application host The service that is delivered by the arbitration failure storage array is restored.
  • an arbitration unit is set in the network device 40, and the network device 40 is an application host having an arbitration function, and can complete the arbitration function of the arbitration function server in the prior art, when the first storage array and the second storage array detect When the transmission to the peer storage array is faulty, the arbitration request is sent to the arbitration host, and the network device 40 pauses to send the service to the first storage array and the second storage array after receiving the arbitration request. After determining the arbitration result, the device 40 stops the service of the storage array with the arbitration failure, and resumes the delivery service with the arbitration winning storage array, thereby avoiding the arbitration or the application host from occurring between the first storage array and the second storage array.
  • the data inconsistency caused by the failure to synchronize the service data sent to the first storage array or the second storage array before the communication failure is determined, avoids causing I/O isolation, and the embodiment of the present invention performs the process on the storage array A and Storage array B has strict timing requirements, which avoids the uncontrollable factor of the storage array during actual operation. This leads to chaotic timing and inconsistent data between storage arrays, resulting in I/O isolation.
  • the apparatus for disaster recovery in the dual-active cluster system may implement the foregoing method embodiments.
  • the method and apparatus for disaster tolerance in the active-active cluster system provided by the embodiment of the present invention may be applicable to a system composed of a host cluster and at least one pair of storage arrays, but is not limited thereto.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)
  • Computer And Data Communications (AREA)

Abstract

一种双活集群系统中容灾的方法及装置,涉及通信技术领域,能够解决现有技术中存储阵列在运行时存在不可控因数,导致存储阵列B无法及时停止业务,从而导致I/O隔离问题。用于主机集群和至少一对存储阵列组成的系统,主机集群包括仲裁主机,仲裁主机中包括仲裁单元,仲裁主机为具有仲裁功能的应用主机,一对存储阵列包括第一存储阵列和第二存储阵列,仲裁主机接收仲裁请求(101);暂停向第一存储阵列和第二存储阵列下发业务(102);根据逻辑判断确定第一存储阵列和第二存储阵列中仲裁获胜存储阵列和仲裁失败存储阵列(103);停止与仲裁失败存储阵列的业务(104);向仲裁获胜存储阵列发送仲裁获胜信息(105);恢复与仲裁获胜存储阵列的下发业务(106)。

Description

双活集群系统中容灾的方法及装置 技术领域
本发明涉及通信技术领域,尤其涉及一种双活集群系统中容灾的方法及装置。
背景技术
AA(Active-Active,双活)主要应用于双活集群系统中数据的容灾备份,双活集群系统包括主机集群、若干个存储阵列和仲裁服务器。以若干个存储阵列包括存储阵列A、存储阵列B为例,在双活集群系统正常运行时,主机集群可同时对存储阵列A、存储阵列B下发读写业务,当主机集群需要对存储阵列A下发写业务时,首先集群系统中主机向存储阵列A下发写数据,然后存储阵列A将下发数据同时写到存储阵列A和存储阵列B,在存储阵列A和存储阵列B的写数据均完成后,存储阵列A返回主机数据写完成,主机集群对存储阵列B的写数据过程与上述基本流程类似。当存储阵列A和存储阵列B脑裂,即存储阵列A和存储阵列B之间发生通信故障时,双活集群系统中的一个存储阵列可以自动接管业务,避免出现业务宕机时间和数据丢失的情况,例如,主机对存储阵列A下发数据,存储阵列A和存储阵列B之间通信故障,存储阵列A、存储阵列B会分别在检测到故障后向仲裁服务器发起仲裁请求,仲裁服务器根据逻辑判断确定由存储阵列A来接管业务,并响应存储阵列A和存储阵列B仲裁结果。虽然确定存储阵列A接管业务,但是主机对存储阵列A下发数据会由于存储阵列A和存储阵列B之间故障没有同步到存储阵列B,若存储阵列B在存储阵列A接管业务时尚未与主机断开连接,那么主机向存储阵列B读取到的数据将为错误的数据,这就会造成I/O(Input/Output,输入/输出)隔离fencing。现有技术中为保证存储阵列A和存储阵列B数据一致性,通过软件对存储阵列A和存储阵列B进行协定,要求存储阵列A在单边提供服务之前存储阵列B必须停止业务,例如,协定存储阵列A检测到存储阵列B无法通信时,需要10秒 后才能接收业务,从而保证存储阵列A在单边提供服务之前存储阵列B已经停止业务。
现有技术中至少存在如下问题:现有技术中的方法对存储阵列A和存储阵列B有严格的时序要求,而存储阵列在实际运行时,会存在一些不可控的因数,例如,系统CPU过高、网络延迟等导致存储阵列B无法及时停止业务,从而导致时序混乱、存储阵列之间数据不一致,造成I/O隔离。
发明内容
本发明的实施例提供一种双活集群系统中容灾的方法及装置,能够解决现有技术中由于存储阵列在实际运行时存在的不可控因数,导致存储阵列B无法及时停止业务,从而导致时序混乱、存储阵列之间数据不一致,造成I/O隔离问题。
为达到上述目的,本发明的实施例采用如下技术方案:
第一方面,本发明的实施例提供一种双活集群系统中容灾的方法,用于主机集群和至少一对存储阵列组成的系统,所述主机集群包括仲裁主机,所述仲裁主机中包括仲裁单元,所述仲裁主机为具有仲裁功能的应用主机,所述至少一对存储阵列包括第一存储阵列和第二存储阵列,包括:
所述仲裁主机接收仲裁请求,所述仲裁请求为所述第一存储阵列或所述第二存储阵列检测到对端存储阵列故障时发送的;
所述仲裁主机暂停向所述第一存储阵列和所述第二存储阵列下发业务;
所述仲裁主机根据逻辑判断确定所述第一存储阵列和所述第二存储阵列中仲裁获胜存储阵列和仲裁失败存储阵列;
所述仲裁主机停止与所述仲裁失败存储阵列的业务;
所述仲裁主机向所述仲裁获胜存储阵列发送仲裁获胜信息,以便于所述仲裁获胜存储阵列将接收写数据方式由同步写本地和远端存储阵列方式变更为只写本地方式;
所述仲裁主机恢复与所述仲裁获胜存储阵列的下发业务。
结合第一方面,在第一方面的第一种实施方式中,所述主机集群还包括至少一个应用主机;所述仲裁主机接收到仲裁请求包括:
所述仲裁主机接收所述至少一个应用主机发送的仲裁请求,所述至少一个应用主机发送的仲裁请求为所述至少一个应用主机接收所述第一存储阵列或所述第二存储阵列发送的仲裁请求后转发的。
结合第一方面的第一种实施方式,在第一方面的第二种实施方式中,在所述仲裁主机向所述仲裁获胜存储阵列发送通知之前,所述方法还包括:
所述仲裁主机向所述至少一个应用主机发送第一指示,所述第一指示用于指示所述至少一个应用主机暂停向所述第一存储阵列和所述第二存储阵列下发业务;
所述仲裁主机接收所述至少一个应用主机响应信息,所述响应信息用于表示所述至少一个应用主机已经停止与所述仲裁失败存储阵列的业务;
在所述仲裁主机恢复与所述仲裁获胜存储阵列的下发业务之后,所述方法还包括:
所述仲裁主机向所述至少一个应用主机发送第二指示,所述第二指示用于指示所述至少一个应用主机与所述仲裁获胜存储阵列的下发业务。
结合第一方面,在第一方面的第三种实施方式中,在所述仲裁主机恢复所述主机集群与仲裁获胜存储阵列的下发业务之后,所述方法还包括:
接收所述仲裁失败存储阵列的业务恢复请求;
所述仲裁主机恢复所述仲裁失败存储阵列的下发业务。
结合第一方面的第三种实施方式,在第一方面的第四种实施方式中,所述主机集群还包括至少一个应用主机;所述接收所述仲裁失败存储阵列的业务恢复请求包括:
接收所述至少一个应用主机发送的业务恢复请求,所述至少一个应用主机发送的业务恢复请求为所述至少一个应用主机接收所述仲裁失败存储阵列发送的业务恢复请求后发送的。
结合第一方面的第五种实施方式,在第一方面的第六种实施方式中,在所述仲裁主机恢复所述仲裁失败存储阵列的下发业务之后,所述方法还包括:
所述仲裁主机向所述至少一个应用主机发送第三指示,所述第三指示用于指示所述至少一个应用主机恢复所述仲裁失败存储阵列的下发业务。
第二方面,本发明的实施例提供一种双活集群系统中容灾的装置,所述装置包括主机集群和至少一对存储阵列,所述主机集群包括仲裁主机,所述仲裁主机中包括仲裁单元,所述仲裁主机为具有仲裁功能的应用主机,所述至少一对存储阵列包括第一存储阵列和第二存储阵列,所述仲裁主机还包括:接收单元、暂停单元、确定单元、停止单元、发送单元和恢复单元;
所述接收单元用于接收仲裁请求,所述仲裁请求为所述第一存储阵列或所述第二存储阵列检测到对端存储阵列故障时发送的;
所述暂停单元用于暂停向所述第一存储阵列和所述第二存储阵列下发业务;
所述确定单元用于根据逻辑判断确定所述第一存储阵列和所述第二存储阵列中仲裁获胜存储阵列和仲裁失败存储阵列;
所述停止单元用于停止与所述仲裁失败存储阵列的业务;
所述发送单元用于向所述仲裁获胜存储阵列发送仲裁获胜信息,以便于所述仲裁获胜存储阵列将接收写数据方式由同步写本地和远端存储阵列方式变更为只写本地方式;
所述恢复单元用于恢复与所述仲裁获胜存储阵列的下发业务。
结合第二方面,在第二方面的第一种实施方式中,所述主机集群还包括至少一个应用主机;
所述接收单元具体用于接收所述至少一个应用主机发送的仲裁请求,所述至少一个应用主机发送的仲裁请求为所述至少一个应用主机接收所述第一存储阵列或所述第二存储阵列发送的仲裁请求后转发的。
结合第二方面的第一种实施方式,在第二方面的第二种实施方式中,所述发送单元还用于向所述至少一个应用主机发送第一指示,所述第一指示用于指示所述至少一个应用主机暂停向所述第一存储阵列和所述第二存储阵列下发业务;
所述接收单元还用于接收所述至少一个应用主机响应信息,所述响应信息用于表示所述至少一个应用主机已经停止向所述第二阵下发业务;
所述发送单元还用于向所述至少一个应用主机发送第二指示,所述第二指示用于指示所述至少一个应用主机与所述仲裁获胜存储阵列的下发业务。
结合第二方面,在第二方面的第三种实施方式中,所述接收单元还用于接收所述仲裁失败存储阵列的业务恢复请求;
所述恢复单元还用于恢复所述仲裁失败存储阵列的下发业务。
结合第二方面的第三种实施方式,在第二方面的第四种实施方式中,所述主机集群还包括至少一个应用主机;
所述接收单元具体用于接收所述至少一个应用主机发送的业务恢复请求,所述至少一个应用主机发送的业务恢复请求为所述至少一个应用主机接收所述仲裁失败存储阵列发送的业务恢复请求后发送的。
结合第二方面的第四种实施方式,在第二方面的第五种实施方式中,所述发送单元还用于向所述至少一个应用主机发送第三指示,所述第三指示用于指示所述至少一个应用主机恢复所述仲裁失败存储阵列的下发业务。
本发明实施例提供的一种双活集群系统中容灾的方法及装置,本发明实施例中,仲裁主机为具有仲裁功能的应用主机,仲裁主机中设置仲裁单元单元,可以完成现有技术中仲裁功能服务器的仲裁功能,当第一存储阵列和第二存储阵列检测到对端存储阵列故障时发送的时,会分别向仲裁主机发送仲裁请求,仲裁主机在接到仲裁请求后暂停向所述第一存储阵列和所述第二存储阵列下发业务,在仲裁主机确定仲裁结果后,停止与所述仲 裁失败存储阵列的业务,并恢复与所述仲裁获胜存储阵列的下发业务,避免了仲裁或应用主机在第一存储阵列与第二存储阵列间发生通信故障没有确定仲裁结果前,向第一存储阵列或第二存储阵列下发业务数据确不能同步导致的数据不一致,避免了造成I/O隔离,同时本发明实施例执行过程对存储阵列A和存储阵列B有严格的时序要求,避免了存储阵列在实际运行时由于不可控的因数导致时序混乱、存储阵列之间数据不一致,造成I/O隔离。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为本发明一实施例提供的架构示意图;
图2为本发明一实施例提供的方法流程图;
图3为本发明又一实施例提供的方法流程图;
图4为本发明又一实施例提供的装置结构示意图;
图5为本发明又一实施例提供的网络设备结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。
为使本发明技术方案的优点更加清楚,下面结合附图和实施例对本发明作详细说明。
本发明实施例适用于双活集群系统,系统包括主机集群和至少一对存储阵列,主机集群包括仲裁主机,仲裁主机为具有仲裁功能的应用主机, 因仲裁主机中包括仲裁单元所以具有仲裁功能,例如,如图1所示,主机集群包括仲裁主机和至少一个应用主机,一对存储阵列包括第一存储阵列和第二存储阵列,主机集群中各主机(包括仲裁主机和应用主机)彼此之间可以进行通信,各主机分别可以与第一存储阵列和第二存储阵列连接,第一存储阵列和第二存储阵列连接,可以进行数据通信。主机集群还可以只包括仲裁主机,其通信方式与图1中的通信方式相同。
本发明一实施例提供一种双活集群系统中容灾的方法,如图2所示,所述方法包括:
101、仲裁主机接收仲裁请求。
其中,仲裁请求为所述第一存储阵列或所述第二存储阵列检测到对端存储阵列故障时发送的。
102、仲裁主机暂停向所述第一存储阵列和所述第二存储阵列下发业务。
103、仲裁主机根据逻辑判断确定所述第一存储阵列和所述第二存储阵列中仲裁获胜存储阵列和仲裁失败存储阵列。
104、仲裁主机停止与所述仲裁失败存储阵列的业务。
105、仲裁主机向所述仲裁获胜存储阵列发送仲裁获胜信息。
其中,仲裁主机向所述仲裁获胜存储阵列发送仲裁获胜信息,以便于仲裁获胜存储阵列将接收写数据方式由同步写本地和远端存储阵列方式变更为只写本地方式。写数据方式为同步写本地和远端存储阵列方式表示主机下发的写业务需要同步至第二存储阵列,写数据方式为只写本地状态表示主机下发的写业务只需要写入本地即可。
106、仲裁主机恢复与所述仲裁获胜存储阵列的下发业务。
需要说明的是,本发明实施例中仲裁主机可以在主机集群中选取一个应用主机作为仲裁主机,还可以在主机集群中添加一个仲裁主机。本发明实施例中仲裁主机还可以有两个或多个,一个作为正常运行时的仲裁主机,其他作为备用仲裁主机,当正常运行时的仲裁主机故障时,可以选取 备用仲裁主机进行相应的操作。在主机集群中选取应用主机作为仲裁主机的方式可以为任意方式,在此不做限定。
需要说明的是,本发明实施例中可直接使用应用主机集群中的应用主机作为仲裁主机,不用额外配置仲裁服务器降低成本,部署更简单,减少故障域,避免现有技术中仲裁服务器单点故障的情况,提升系统可靠性,并且仲裁功能和主机合二为一,降低系统复杂度和维护成本,还可以避免现有技术中仲裁服务器和主机业务不在同一网络而导致的仲裁误判问题,组网更灵活。本发明实施例能够解决双活集群系统中第一、第二存储阵列由于阵列间的链路中断,导致两个存储阵列脑裂为两个独立运作的系统,而导致的两个独立的存储阵列可能短时间同时提供业务而引发的数据不一致问题。
本发明实施例中仲裁主机中设置仲裁单元单元,仲裁主机为具有仲裁功能的应用主机,可以完成现有技术中仲裁功能服务器的仲裁功能,当第一存储阵列和第二存储阵列检测到对端存储阵列故障时发送的时,会分别向仲裁主机发送仲裁请求,仲裁主机在接到仲裁请求后暂停向所述第一存储阵列和所述第二存储阵列下发业务,在仲裁主机确定仲裁结果后,停止与所述仲裁失败存储阵列的业务,并恢复与所述仲裁获胜存储阵列的下发业务,避免了仲裁或应用主机在第一存储阵列与第二存储阵列间发生通信故障没有确定仲裁结果前,向第一存储阵列或第二存储阵列下发业务数据确不能同步导致的数据不一致,避免了造成I/O隔离,同时本发明实施例执行过程对存储阵列A和存储阵列B有严格的时序要求,避免了存储阵列在实际运行时由于不可控的因数导致时序混乱、存储阵列之间数据不一致,造成I/O隔离。
本发明又一实施例提供一种双活集群系统中容灾的方法,如图3所示,所述方法包括:
201、第二存储阵列或第二存储阵列检测到对端存储阵列故障时,向仲裁主机发送仲裁请求。
其中,第二存储阵列或第二存储阵列发送的仲裁请求可以向主机集群中的仲裁主机发送,也可以向与其连接通信的其他主机发送,其他主机在接收到第一存储阵列发现和第二存储阵列发送的仲裁请求后,会将仲裁请求转发给仲裁主机。
202、仲裁主机暂停向第一存储阵列和第二存储阵列的下发业务,并指示其他应用主机暂停向第一存储阵列和第二存储阵列的下发业务。
其中,当主机集群只有仲裁主机时,仲裁主机同时会执行应用主机的功能,则不需要指示其他应用主机暂停向第一存储阵列和第二存储阵列的下发业务。
需要说明的是,本步骤可以避免主机集群中的具有应用主机功能的主机在此期间向第一存储阵列和第二存储阵列的下发业务导致的数据不一致问题。
203、仲裁主机根据逻辑判断确定第一存储阵列为仲裁获胜存储阵列,第二存储阵列中为仲裁失败存储阵列。
其中,仲裁主机根据逻辑判断确定第一存储阵列和第二存储阵列中仲裁获胜存储阵列和仲裁失败存储阵列,本发明实施例以第一存储阵列为仲裁获胜存储阵列,第二存储阵列中为仲裁失败存储阵列为例。仲裁主机进行逻辑判断的方法在此不做限定。
204、仲裁主机停止与第二存储阵列的业务,并指示其他应用主机停止与第二存储阵列的业务。
其中,本步骤中停止与第二存储阵列的业务可以为软件上停止和第二存储阵列业务,从而防止第二存储阵列在其与第一存储阵列之间故障时和具有应用主机功能的主机获得联系。当主机集群只有仲裁主机时,则不需要指示其他应用主机停止与第二存储阵列的业务。
205、仲裁主机接收其他应用主机的响应信息。
其中,响应信息用于表示其他应用主机已经停止与第二存储阵列的业务。
206、仲裁主机向第一存储阵列发送仲裁获胜信息。
其中,仲裁获胜信息表示第一存储阵列在第一存储阵列与第二存储阵列之间故障时接管与主机集群的业务。
207、第一存储阵列将接收写数据方式由同步写本地和远端存储阵列方式变更为只写本地方式。
其中,写数据方式为同步写本地和远端存储阵列方式表示主机下发的写业务需要同步至第二存储阵列,写数据方式为只写本地状态表示主机下发的写业务只需要写入本地即可。
208、仲裁主机恢复与第一存储阵列的下发业务,并指示其他应用主机恢复与第一存储阵列的下发业务。
其中,当主机集群只有仲裁主机时,则不需要指示其他应用主机恢复与第一存储阵列的下发业务。
需要说明的是,上述步骤完成了在第一存储阵列在第一存储阵列与第二存储阵列之间故障时的仲裁过程,以及仲裁后数据处理的方式。当第一存储阵列与第二存储阵列之间恢复通信后,第一存储阵列需要先将故障期间差异的数据同步到第二存储阵列,并将接收写数据方式由只写本地方式变更为同步写本地和远端存储阵列方式,在完成后回复第二存储阵列已完成操作,第二存储阵列接收待第一存储阵列的回复后,向仲裁主机发起业务恢复请求。
209、仲裁主机接收业务恢复请求。
其中,第二存储阵列发送的业务恢复请求可以向主机集群中的仲裁主机发送,也可以向与其连接通信的其他应用主机发送,其他应用主机在接收到第二存储阵列发送的业务恢复请求后,会将业务恢复请求转发给仲裁主机。
210、仲裁主机恢复第二存储阵列的下发业务,并指示其他应用主机恢复第二存储阵列的下发业务。
其中,当主机集群只有仲裁主机时,则不需要指示其他应用主机恢复 第二存储阵列的下发业务。
211、仲裁主机回复第二存储阵列业务恢复请求已处理完成。
本步骤之后,第一存储阵列与第二存储阵列恢复为正常数据处理的过程。
本发明实施例中仲裁主机中设置仲裁单元单元,仲裁主机为具有仲裁功能的应用主机,可以完成现有技术中仲裁功能服务器的仲裁功能,当第一存储阵列和第二存储阵列检测到对端存储阵列故障时发送的时,会分别向仲裁主机发送仲裁请求,仲裁主机在接到仲裁请求后暂停向所述第一存储阵列和所述第二存储阵列下发业务,在仲裁主机确定仲裁结果后,停止与所述仲裁失败存储阵列的业务,并恢复与所述仲裁获胜存储阵列的下发业务,避免了仲裁或应用主机在第一存储阵列与第二存储阵列间发生通信故障没有确定仲裁结果前,向第一存储阵列或第二存储阵列下发业务数据确不能同步导致的数据不一致,避免了造成I/O隔离,同时本发明实施例执行过程对存储阵列A和存储阵列B有严格的时序要求,避免了存储阵列在实际运行时由于不可控的因数导致时序混乱、存储阵列之间数据不一致,造成I/O隔离。
本发明又一实施例提供一种双活集群系统中容灾的装置30,如图4所示,所述装置30包括主机集群和至少一对存储阵列,所述主机集群包括仲裁主机,所述仲裁主机中包括仲裁单元,所述仲裁主机为具有仲裁功能的应用主机,所述至少一对存储阵列包括第一存储阵列和第二存储阵列,所述仲裁主机还包括:接收单元31、暂停单元32、确定单元33、停止单元34、发送单元35和恢复单元36;
所述接收单元31用于接收仲裁请求,所述仲裁请求为所述第一存储阵列或所述第二存储阵列检测到对端存储阵列故障时发送的;
所述暂停单元32用于暂停向所述第一存储阵列和所述第二存储阵列下发业务;
所述确定单元33用于根据逻辑判断确定所述第一存储阵列和所述第 二存储阵列中仲裁获胜存储阵列和仲裁失败存储阵列;
所述停止单元34用于停止与所述仲裁失败存储阵列的业务;
所述发送单元35用于向所述仲裁获胜存储阵列发送仲裁获胜信息,以便于所述仲裁获胜存储阵列将接收写数据方式由同步写本地和远端存储阵列方式变更为只写本地方式;
所述恢复单元36用于恢复与所述仲裁获胜存储阵列的下发业务。
其中,所述主机集群还包括至少一个应用主机。
进一步的,所述接收单元31具体用于接收所述第一存储阵列或所述第二存储阵列发送的仲裁请求。
进一步的,所述主机集群还包括至少一个应用主机;所述接收单元31具体用于接收所述至少一个应用主机发送的仲裁请求,所述至少一个应用主机发送的仲裁请求为所述至少一个应用主机接收所述第一存储阵列或所述第二存储阵列发送的仲裁请求后转发的。
进一步的,所述主机集群还包括至少一个应用主机;所述发送单元35还用于向所述至少一个应用主机发送第一指示,所述第一指示用于指示所述至少一个应用主机暂停向所述第一存储阵列和所述第二存储阵列下发业务;
所述接收单元31还用于接收所述至少一个应用主机响应信息,所述响应信息用于表示所述至少一个应用主机已经停止向所述第二存储阵列下发业务;
所述发送单元35还用于向所述至少一个应用主机发送第二指示,所述第二指示用于指示所述至少一个应用主机与所述仲裁获胜存储阵列的下发业务。
进一步的,所述接收单元31还用于接收所述仲裁失败存储阵列的业务恢复请求;
所述恢复单元36还用于恢复所述仲裁失败存储阵列的下发业务。
进一步的,所述接收单元31具体用于接收所述仲裁失败存储阵列发送 的业务恢复请求。
进一步的,所述主机集群还包括至少一个应用主机;所述接收单元31具体用于接收所述至少一个应用主机发送的业务恢复请求,所述至少一个应用主机发送的业务恢复请求为所述至少一个应用主机接收所述仲裁失败存储阵列发送的业务恢复请求后发送的。
进一步的,所述发送单元35还用于向所述至少一个应用主机发送第三指示,所述第三指示用于指示所述至少一个应用主机恢复所述仲裁失败存储阵列的下发业务。
本发明实施例中仲裁主机中设置仲裁单元单元,仲裁主机为具有仲裁功能的应用主机,可以完成现有技术中仲裁功能服务器的仲裁功能,当第一存储阵列和第二存储阵列检测到对端存储阵列故障时发送的时,会分别向仲裁主机发送仲裁请求,仲裁主机在接到仲裁请求后暂停向所述第一存储阵列和所述第二存储阵列下发业务,在仲裁主机确定仲裁结果后,停止与所述仲裁失败存储阵列的业务,并恢复与所述仲裁获胜存储阵列的下发业务,避免了仲裁或应用主机在第一存储阵列与第二存储阵列间发生通信故障没有确定仲裁结果前,向第一存储阵列或第二存储阵列下发业务数据确不能同步导致的数据不一致,避免了造成I/O隔离,同时本发明实施例执行过程对存储阵列A和存储阵列B有严格的时序要求,避免了存储阵列在实际运行时由于不可控的因数导致时序混乱、存储阵列之间数据不一致,造成I/O隔离。
本发明又一实施例提供一种双活集群系统中容灾的网络设备40,用于主机集群和至少一对存储阵列组成的系统,所述主机集群包括仲裁主机,所述仲裁主机中包括仲裁单元,所述至少一对存储阵列包括第一存储阵列和第二存储阵列,本发明实施例中所述网络设备40作为仲裁主机,如图5所示,所述网络设备40包括处理器41和接口电路42,图5中还示出了存储器43和总线44,该处理器41、接口电路42和存储器43通过总线44连接并完成相互间的通信。
需要说明的是,这里的处理器41可以是一个处理元件,也可以是多个处理元件的统称。例如,该处理元件可以是中央处理器(Central Processing Unit,CPU),也可以是特定集成电路(Application Specific Integrated Circuit,ASIC),或者是被配置成实施本发明实施例的一个或多个集成电路,例如:一个或多个微处理器(digital singnal processor,DSP),或,一个或者多个现场可编程门阵列(Field Programmable Gate Array,FPGA)。
存储器43可以是一个存储装置,也可以是多个存储元件的统称,且用于存储可执行程序代码或接入网管理设备运行所需要参数、数据等。且存储器43可以包括随机存储器(RAM),也可以包括非易失性存储器(non-volatile memory),例如磁盘存储器,闪存(Flash)等。
总线44可以是工业标准体系结构(Industry Standard Architecture,ISA)总线、外部设备互连(Peripheral Component,PCI)总线或扩展工业标准体系结构(Extended Industry Standard Architecture,EISA)总线等。该总线44可以分为地址总线、数据总线、控制总线等。为便于表示,图5中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
网络设备40还可以包括输入输出装置,连接于总线44,以通过总线44与处理器41等其它部分连接。
其中,处理器41调用存储器43中的程序代码,用于执行以上方法实施例中网络设备40执行的操作。
具体的,所述处理器41用于通过接口电路42接收仲裁请求,所述仲裁请求为所述第一存储阵列或所述第二存储阵列检测到对端存储阵列故障时发送的;以及,用于暂停向所述第一存储阵列和所述第二存储阵列下发业务;以及,用于根据逻辑判断确定所述第一存储阵列和所述第二存储阵列中仲裁获胜存储阵列和仲裁失败存储阵列;以及,用于停止与所述仲裁失败存储阵列的业务;以及,用于通过接口电路42向所述仲裁获胜存储阵列发送仲裁获胜信息,以便于所述仲裁获胜存储阵列将接收写数据方式由 同步写本地和远端存储阵列方式变更为只写本地方式;以及,用于恢复与所述仲裁获胜存储阵列的下发业务。
本发明实施例的一种实施方式中,所述处理器41还用于通过接口电路42接收所述第一存储阵列或所述第二存储阵列发送的仲裁请求。
其中,所述主机集群还包括至少一个应用主机。
本发明实施例的又一种实施方式中,所述主机集群还包括至少一个应用主机;所述处理器41还用于通过接口电路42接收所述至少一个应用主机发送的仲裁请求,所述至少一个应用主机发送的仲裁请求为所述至少一个应用主机接收所述第一存储阵列或所述第二存储阵列发送的仲裁请求后转发的。
本发明实施例的又一种实施方式中,所述主机集群还包括至少一个应用主机;所述处理器41还用于通过接口电路42向所述至少一个应用主机发送第一指示,所述第一指示用于指示所述至少一个应用主机暂停向所述第一存储阵列和所述第二存储阵列下发业务;以及,用于通过接口电路42接收所述至少一个应用主机响应信息,所述响应信息用于表示所述至少一个应用主机已经停止向所述第二存储阵列下发业务;以及,用于通过接口电路42向所述至少一个应用主机发送第二指示,所述第二指示用于指示所述至少一个应用主机与所述仲裁获胜存储阵列的下发业务。
本发明实施例的又一种实施方式中,所述处理器41还用于通过接口电路42接收所述仲裁失败存储阵列的业务恢复请求;以及,用于恢复所述仲裁失败存储阵列的下发业务。
本发明实施例的又一种实施方式中,所述处理器41还用于通过接口电路42接收所述仲裁失败存储阵列发送的业务恢复请求。
本发明实施例的又一种实施方式中,所述主机集群还包括至少一个应用主机;所述处理器41还用于通过接口电路42接收所述至少一个应用主机发送的业务恢复请求,所述至少一个应用主机发送的业务恢复请求为所述至少一个应用主机接收所述仲裁失败存储阵列发送的业务恢复请求后发 送的。
本发明实施例的又一种实施方式中,所述处理器41还用于通过接口电路42向所述至少一个应用主机发送第三指示,所述第三指示用于指示所述至少一个应用主机恢复所述仲裁失败存储阵列的下发业务。
本发明实施例中,网络设备40中设置仲裁单元单元,网络设备40为具有仲裁功能的应用主机,可以完成现有技术中仲裁功能服务器的仲裁功能,当第一存储阵列和第二存储阵列检测到对端存储阵列故障时发送的时,会分别向仲裁主机发送仲裁请求,网络设备40在接到仲裁请求后暂停向所述第一存储阵列和所述第二存储阵列下发业务,在网络设备40确定仲裁结果后,停止与所述仲裁失败存储阵列的业务,并恢复与所述仲裁获胜存储阵列的下发业务,避免了仲裁或应用主机在第一存储阵列与第二存储阵列间发生通信故障没有确定仲裁结果前,向第一存储阵列或第二存储阵列下发业务数据确不能同步导致的数据不一致,避免了造成I/O隔离,同时本发明实施例执行过程对存储阵列A和存储阵列B有严格的时序要求,避免了存储阵列在实际运行时由于不可控的因数导致时序混乱、存储阵列之间数据不一致,造成I/O隔离。
本发明实施例提供的双活集群系统中容灾的装置可以实现上述提供的方法实施例,具体功能实现请参见方法实施例中的说明,在此不再赘述。本发明实施例提供的双活集群系统中容灾的方法及装置可以适用于主机集群和至少一对存储阵列组成的系统,但不仅限于此。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应该以权利要求的保护范围为准。

Claims (12)

  1. 一种双活集群系统中容灾的方法,其特征在于,用于主机集群和至少一对存储阵列组成的系统,所述主机集群包括仲裁主机,所述仲裁主机中包括仲裁单元,所述仲裁主机为具有仲裁功能的应用主机,所述至少一对存储阵列包括第一存储阵列和第二存储阵列,包括:
    所述仲裁主机接收仲裁请求,所述仲裁请求为所述第一存储阵列或所述第二存储阵列检测到对端存储阵列故障时发送的;
    所述仲裁主机暂停向所述第一存储阵列和所述第二存储阵列下发业务;
    所述仲裁主机根据逻辑判断确定所述第一存储阵列和所述第二存储阵列中仲裁获胜存储阵列和仲裁失败存储阵列;
    所述仲裁主机停止与所述仲裁失败存储阵列的业务;
    所述仲裁主机向所述仲裁获胜存储阵列发送仲裁获胜信息,以便于所述仲裁获胜存储阵列将接收写数据方式由同步写本地和远端存储阵列方式变更为只写本地方式;
    所述仲裁主机恢复与所述仲裁获胜存储阵列的下发业务。
  2. 根据权利要求1所述的方法,其特征在于,所述主机集群还包括至少一个应用主机;所述仲裁主机接收到仲裁请求包括:
    所述仲裁主机接收所述至少一个应用主机发送的仲裁请求,所述至少一个应用主机发送的仲裁请求为所述至少一个应用主机接收所述第一存储阵列或所述第二存储阵列发送的仲裁请求后转发的。
  3. 根据权利要求2所述的方法,其特征在于,在所述仲裁主机向所述仲裁获胜存储阵列发送通知之前,所述方法还包括:
    所述仲裁主机向所述至少一个应用主机发送第一指示,所述第一指示用于指示所述至少一个应用主机暂停向所述第一存储阵列和所述第二存储阵列下发业务;
    所述仲裁主机接收所述至少一个应用主机响应信息,所述响应信息用 于表示所述至少一个应用主机已经停止与所述仲裁失败存储阵列的业务;
    在所述仲裁主机恢复与所述仲裁获胜存储阵列的下发业务之后,所述方法还包括:
    所述仲裁主机向所述至少一个应用主机发送第二指示,所述第二指示用于指示所述至少一个应用主机与所述仲裁获胜存储阵列的下发业务。
  4. 根据权利要求1所述的方法,其特征在于,在所述仲裁主机恢复所述主机集群与仲裁获胜存储阵列的下发业务之后,所述方法还包括:
    接收所述仲裁失败存储阵列的业务恢复请求;
    所述仲裁主机恢复所述仲裁失败存储阵列的下发业务。
  5. 根据权利要求4所述的方法,其特征在于,所述主机集群还包括至少一个应用主机;所述接收所述仲裁失败存储阵列的业务恢复请求包括:
    接收所述至少一个应用主机发送的业务恢复请求,所述至少一个应用主机发送的业务恢复请求为所述至少一个应用主机接收所述仲裁失败存储阵列发送的业务恢复请求后发送的。
  6. 根据权利要求5所述的方法,其特征在于,在所述仲裁主机恢复所述仲裁失败存储阵列的下发业务之后,所述方法还包括:
    所述仲裁主机向所述至少一个应用主机发送第三指示,所述第三指示用于指示所述至少一个应用主机恢复所述仲裁失败存储阵列的下发业务。
  7. 一种双活集群系统中容灾的装置,其特征在于,所述装置包括主机集群和至少一对存储阵列,所述主机集群包括仲裁主机,所述仲裁主机中包括仲裁单元,所述仲裁主机为具有仲裁功能的应用主机,所述至少一对存储阵列包括第一存储阵列和第二存储阵列,所述仲裁主机还包括:接收单元、暂停单元、确定单元、停止单元、发送单元和恢复单元;
    所述接收单元用于接收仲裁请求,所述仲裁请求为所述第一存储阵列或所述第二存储阵列检测到对端存储阵列故障时发送的;
    所述暂停单元用于暂停向所述第一存储阵列和所述第二存储阵列下发业务;
    所述确定单元用于根据逻辑判断确定所述第一存储阵列和所述第二存储阵列中仲裁获胜存储阵列和仲裁失败存储阵列;
    所述停止单元用于停止与所述仲裁失败存储阵列的业务;
    所述发送单元用于向所述仲裁获胜存储阵列发送仲裁获胜信息,以便于所述仲裁获胜存储阵列将接收写数据方式由同步写本地和远端存储阵列方式变更为只写本地方式;
    所述恢复单元用于恢复与所述仲裁获胜存储阵列的下发业务。
  8. 根据权利要求7所述的装置,其特征在于,所述主机集群还包括至少一个应用主机;
    所述接收单元具体用于接收所述至少一个应用主机发送的仲裁请求,所述至少一个应用主机发送的仲裁请求为所述至少一个应用主机接收所述第一存储阵列或所述第二存储阵列发送的仲裁请求后转发的。
  9. 根据权利要求8所述的装置,其特征在于,所述发送单元还用于向所述至少一个应用主机发送第一指示,所述第一指示用于指示所述至少一个应用主机暂停向所述第一存储阵列和所述第二存储阵列下发业务;
    所述接收单元还用于接收所述至少一个应用主机响应信息,所述响应信息用于表示所述至少一个应用主机已经停止向所述第二存储阵列下发业务;
    所述发送单元还用于向所述至少一个应用主机发送第二指示,所述第二指示用于指示所述至少一个应用主机与所述仲裁获胜存储阵列的下发业务。
  10. 根据权利要求7所述的装置,其特征在于,所述接收单元还用于接收所述仲裁失败存储阵列的业务恢复请求;
    所述恢复单元还用于恢复所述仲裁失败存储阵列的下发业务。
  11. 根据权利要求10所述的装置,其特征在于,所述主机集群还包括至少一个应用主机;
    所述接收单元具体用于接收所述至少一个应用主机发送的业务恢复 请求,所述至少一个应用主机发送的业务恢复请求为所述至少一个应用主机接收所述仲裁失败存储阵列发送的业务恢复请求后发送的。
  12. 根据权利要求11所述的装置,其特征在于,所述发送单元还用于向所述至少一个应用主机发送第三指示,所述第三指示用于指示所述至少一个应用主机恢复所述仲裁失败存储阵列的下发业务。
PCT/CN2016/087915 2015-10-30 2016-06-30 双活集群系统中容灾的方法及装置 WO2017071274A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP16858717.8A EP3285168B1 (en) 2015-10-30 2016-06-30 Disaster tolerance method and apparatus in active-active cluster system
US15/892,003 US10671498B2 (en) 2015-10-30 2018-02-08 Method and apparatus for redundancy in active-active cluster system
US16/839,205 US11194679B2 (en) 2015-10-30 2020-04-03 Method and apparatus for redundancy in active-active cluster system
US17/529,770 US11809291B2 (en) 2015-10-30 2021-11-18 Method and apparatus for redundancy in active-active cluster system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510727389.4 2015-10-30
CN201510727389.4A CN105426275B (zh) 2015-10-30 2015-10-30 双活集群系统中容灾的方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/892,003 Continuation US10671498B2 (en) 2015-10-30 2018-02-08 Method and apparatus for redundancy in active-active cluster system

Publications (1)

Publication Number Publication Date
WO2017071274A1 true WO2017071274A1 (zh) 2017-05-04

Family

ID=55504495

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/087915 WO2017071274A1 (zh) 2015-10-30 2016-06-30 双活集群系统中容灾的方法及装置

Country Status (4)

Country Link
US (3) US10671498B2 (zh)
EP (1) EP3285168B1 (zh)
CN (1) CN105426275B (zh)
WO (1) WO2017071274A1 (zh)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426275B (zh) 2015-10-30 2019-04-19 成都华为技术有限公司 双活集群系统中容灾的方法及装置
CN105893176B (zh) * 2016-03-28 2019-02-26 杭州宏杉科技股份有限公司 一种网络存储系统的管理方法和装置
CN107526652B (zh) * 2016-06-21 2021-08-20 华为技术有限公司 一种数据同步方法及存储设备
CN106301900B (zh) * 2016-08-08 2019-08-23 华为技术有限公司 设备仲裁的方法和设备
CN107844259B (zh) 2016-09-18 2020-06-16 华为技术有限公司 数据访问方法、路由装置和存储系统
CN108123976B (zh) * 2016-11-30 2020-11-20 阿里巴巴集团控股有限公司 集群间的数据备份方法、装置及系统
CN107016029B (zh) * 2016-12-13 2020-11-06 创新先进技术有限公司 一种业务数据的处理方法、装置及系统
CN107302598A (zh) * 2017-08-21 2017-10-27 长沙曙通信息科技有限公司 一种新型双活存储活动仲裁实现方法
CN107766003A (zh) * 2017-10-31 2018-03-06 郑州云海信息技术有限公司 一种存储双活方法、装置、系统及计算机可读存储介质
CN108037942B (zh) * 2017-12-06 2021-04-09 中电科蓉威电子技术有限公司 一种嵌入式设备的自适应数据恢复与更新方法及装置
CN108040102A (zh) * 2017-12-06 2018-05-15 迈普通信技术股份有限公司 数据同步方法、装置及分级网管系统
CN110535714B (zh) 2018-05-25 2023-04-18 华为技术有限公司 一种仲裁方法及相关装置
CN112612653A (zh) * 2018-08-31 2021-04-06 成都华为技术有限公司 一种业务恢复方法、装置、仲裁服务器以及存储系统
CN114780293A (zh) * 2022-04-26 2022-07-22 北京科杰科技有限公司 基于hadoop的异地双活容灾方法、装置、设备和可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8108715B1 (en) * 2010-07-02 2012-01-31 Symantec Corporation Systems and methods for resolving split-brain scenarios in computer clusters
CN103607310A (zh) * 2013-11-29 2014-02-26 华为技术有限公司 一种异地容灾的仲裁方法
CN103684941A (zh) * 2013-11-23 2014-03-26 广东新支点技术服务有限公司 基于仲裁服务器的集群裂脑预防方法和装置
CN104469699A (zh) * 2014-11-27 2015-03-25 华为技术有限公司 集群仲裁方法和多集群配合系统
CN105426275A (zh) * 2015-10-30 2016-03-23 成都华为技术有限公司 双活集群系统中容灾的方法及装置

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US1704480A (en) * 1928-01-25 1929-03-05 Kicileski Mike Vehicle brake mechanism
US5673384A (en) * 1995-10-06 1997-09-30 Hewlett-Packard Company Dual disk lock arbitration between equal sized partition of a cluster
US6643795B1 (en) * 2000-03-30 2003-11-04 Hewlett-Packard Development Company, L.P. Controller-based bi-directional remote copy system with storage site failover capability
US6782416B2 (en) * 2001-01-12 2004-08-24 Hewlett-Packard Development Company, L.P. Distributed and geographically dispersed quorum resource disks
US7016946B2 (en) * 2001-07-05 2006-03-21 Sun Microsystems, Inc. Method and system for establishing a quorum for a geographically distributed cluster of computers
US7120821B1 (en) * 2003-07-24 2006-10-10 Unisys Corporation Method to revive and reconstitute majority node set clusters
US6859811B1 (en) * 2004-01-15 2005-02-22 Oracle International Corporation Cluster database with remote data mirroring
US8726067B1 (en) * 2011-07-29 2014-05-13 Emc Corporation Utilizing both application and storage networks for distributed storage over asynchronous distances
CN102662803A (zh) * 2012-03-13 2012-09-12 深圳华北工控股份有限公司 一种双控双活冗余设备
CN103106048A (zh) * 2013-01-30 2013-05-15 浪潮电子信息产业股份有限公司 一种多控多活的存储系统
US9852034B2 (en) * 2014-03-24 2017-12-26 International Business Machines Corporation Efficient high availability for a SCSI target over a fibre channel
US9442792B2 (en) * 2014-06-23 2016-09-13 Vmware, Inc. Using stretched storage to optimize disaster recovery
US9489273B2 (en) * 2014-06-23 2016-11-08 Vmware, Inc. Using stretched storage to optimize disaster recovery
US10095590B2 (en) * 2015-05-06 2018-10-09 Stratus Technologies, Inc Controlling the operating state of a fault-tolerant computer system
US9836366B2 (en) * 2015-10-27 2017-12-05 Netapp, Inc. Third vote consensus in a cluster using shared storage devices

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8108715B1 (en) * 2010-07-02 2012-01-31 Symantec Corporation Systems and methods for resolving split-brain scenarios in computer clusters
CN103684941A (zh) * 2013-11-23 2014-03-26 广东新支点技术服务有限公司 基于仲裁服务器的集群裂脑预防方法和装置
CN103607310A (zh) * 2013-11-29 2014-02-26 华为技术有限公司 一种异地容灾的仲裁方法
CN104469699A (zh) * 2014-11-27 2015-03-25 华为技术有限公司 集群仲裁方法和多集群配合系统
CN105426275A (zh) * 2015-10-30 2016-03-23 成都华为技术有限公司 双活集群系统中容灾的方法及装置

Also Published As

Publication number Publication date
US20220075698A1 (en) 2022-03-10
EP3285168A4 (en) 2018-07-04
US11194679B2 (en) 2021-12-07
CN105426275B (zh) 2019-04-19
US11809291B2 (en) 2023-11-07
US20180165168A1 (en) 2018-06-14
EP3285168A1 (en) 2018-02-21
US10671498B2 (en) 2020-06-02
EP3285168B1 (en) 2019-08-07
CN105426275A (zh) 2016-03-23
US20200233762A1 (en) 2020-07-23

Similar Documents

Publication Publication Date Title
WO2017071274A1 (zh) 双活集群系统中容灾的方法及装置
US7437598B2 (en) System, method and circuit for mirroring data
US7519856B2 (en) Fault tolerant system and controller, operation method, and operation program used in the fault tolerant system
US9916113B2 (en) System and method for mirroring data
US7694177B2 (en) Method and system for resynchronizing data between a primary and mirror data storage system
US7793060B2 (en) System method and circuit for differential mirroring of data
US9141493B2 (en) Isolating a PCI host bridge in response to an error event
US9916216B2 (en) Selectively coupling a PCI host bridge to multiple PCI communication paths
US20150339200A1 (en) Intelligent disaster recovery
WO2016202051A1 (zh) 一种通信系统中管理主备节点的方法和装置及高可用集群
US7797571B2 (en) System, method and circuit for mirroring data
CN106850255B (zh) 一种多机备份的实现方法
WO2016107443A1 (zh) 一种快照处理方法及相关设备
CN109445984B (zh) 一种业务恢复方法、装置、仲裁服务器以及存储系统
TWI669605B (zh) 虛擬機器群組的容錯方法及其容錯系統
US11947431B1 (en) Replication data facility failure detection and failover automation
JP5870174B1 (ja) データ送信システム
US8713359B1 (en) Autonomous primary-mirror synchronized reset
CN117827544A (zh) 热备份系统、方法、电子设备及存储介质
CN117785568A (zh) 一种双主双机热备方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16858717

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE