CN112131045A - Storage design and fault recovery method and equipment for dual-computer hot standby system - Google Patents

Storage design and fault recovery method and equipment for dual-computer hot standby system Download PDF

Info

Publication number
CN112131045A
CN112131045A CN202010921737.2A CN202010921737A CN112131045A CN 112131045 A CN112131045 A CN 112131045A CN 202010921737 A CN202010921737 A CN 202010921737A CN 112131045 A CN112131045 A CN 112131045A
Authority
CN
China
Prior art keywords
node
copy
virtual machine
slave node
slave
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010921737.2A
Other languages
Chinese (zh)
Inventor
赵胜龑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zstack Information Technology Co ltd
Original Assignee
Shanghai Zstack Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zstack Information Technology Co ltd filed Critical Shanghai Zstack Information Technology Co ltd
Priority to CN202010921737.2A priority Critical patent/CN112131045A/en
Publication of CN112131045A publication Critical patent/CN112131045A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking
    • G06F11/1489Generic software techniques for error detection or fault masking through recovery blocks

Abstract

The application aims to provide a method and equipment for storage design and fault recovery of a dual-computer hot-standby system. Compared with the prior art, the method and the device have the advantages that the main node receives a data write request and writes the data write request into the main node primary copy and the main node synchronous virtual machine copy, wherein the main node primary copy and the main node synchronous virtual machine copy are synchronized based on a full synchronous mode, the main-standby synchronization of the main node and the slave node is established, the slave node primary copy and the slave node synchronous virtual machine copy are established, and the slave node primary copy and the slave node synchronous virtual machine copy are synchronized based on the full synchronous mode. By the storage design method, the synchronization of data can be ensured even if the slave node is independently recovered when the double computer fails.

Description

Storage design and fault recovery method and equipment for dual-computer hot standby system
Technical Field
The present application relates to the field of computer technologies, and in particular, to a storage design and fault recovery technique for a dual-computer hot-standby system.
Background
In the prior art, for a fault-tolerant system with dual-node local storage, if a fault occurs in the primary and secondary synchronization process, inconsistency between the primary and secondary copies is likely to be caused because the synchronization is not completed.
In this case, fault tolerant systems typically do not allow for individual recovery of slave nodes because the data synchronized by the master is likely to be incomplete, causing a slave node system crash; or when the copy is written and forked to cause the next time the original main node recovers again, the synchronous connection of the main node and the standby node in the fault-tolerant system can not be automatically established again due to the conflict of the written fork.
In the practice of product application, most scenes such as power supply cannot be stable like a laboratory or an enterprise machine room, so that the double machines are powered off at the same time, and then the scene of one node is randomly recovered, which is usually the scene required by a client. The scenario includes simultaneous failure of the master node and the slave node and then recovery of the slave node.
If the main node and the standby node simultaneously fail, the main node is recovered, and the problem of incomplete copy synchronization is not involved, because the main node always holds the latest data, the slave node is on-line again, and the resynchronization is good, but if the slave node is recovered after the failure, the problem of inconsistent main data and the standby data is likely to occur.
For this scenario, there is no relevant scheme at present, and most of the active/standby synchronous systems adopt means for avoiding such situations, for example, if recovery cannot recover only one node, 2 nodes must recover; or the scenario that the slave nodes are recovered independently after the failure of both nodes is avoided, so how to recover the slave nodes and ensure the integrity of the synchronous data becomes an urgent problem to be solved.
Disclosure of Invention
The present application aims to provide a storage design method for a dual-computer hot-standby system, so as to solve the problem that data synchronization cannot be guaranteed when a slave node recovers in the prior art.
According to an aspect of the present application, a storage design method for a dual-computer hot-standby system is provided, where the method includes:
the method comprises the steps that a main node receives a data write-in request and writes into a main node primary copy and a main node synchronous virtual machine copy, wherein the main node primary copy and the main node synchronous virtual machine copy are synchronized based on a full synchronous mode;
establishing main-standby synchronization of the main node and the slave node, and establishing a master copy of the slave node and a synchronous virtual machine copy of the slave node, wherein the master copy of the slave node and the synchronous virtual machine copy of the slave node are synchronized based on a full synchronization mode.
Further, wherein the full synchronization mode is implemented based on drbd synchronization.
According to another aspect of the present application, there is also provided a dual-node failure recovery method based on the foregoing storage design method, where the method includes:
receiving a data writing request by a slave node, starting the master node synchronous virtual machine copy, and synchronizing the original slave node copy of the slave node through the master node synchronous virtual machine copy;
and establishing primary and standby synchronization of the master node synchronous virtual machine copy and the slave node synchronous virtual machine copy so as to synchronize the original slave node slave copy through the master node synchronous virtual machine copy.
According to yet another aspect of the present application, there is also provided a computer readable medium having computer readable instructions stored thereon, the computer readable instructions being executable by a processor to implement the operations of the method as described above.
Compared with the prior art, the method and the device have the advantages that the main node receives a data write request and writes the data write request into the main node primary copy and the main node synchronous virtual machine copy, wherein the main node primary copy and the main node synchronous virtual machine copy are synchronized based on a full synchronous mode, the main-standby synchronization of the main node and the slave node is established, the slave node primary copy and the slave node synchronous virtual machine copy are established, and the slave node primary copy and the slave node synchronous virtual machine copy are synchronized based on the full synchronous mode. By the storage design method, the synchronization of data can be ensured even if the slave node is independently recovered when the double computer fails.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
FIG. 1 illustrates a flow diagram of a method for memory design for a dual-machine hot-standby system, according to an aspect of the subject application;
FIG. 2 is a schematic diagram of a memory design for a dual-computer hot-standby system according to a preferred embodiment of the present application;
FIG. 3 illustrates a flow diagram of a dual node failure recovery method implemented based on the storage design method illustrated in FIG. 1, according to another aspect of the present application;
fig. 4 is a schematic diagram illustrating a dual-node failure recovery based on the storage design method illustrated in fig. 2 according to a preferred embodiment of the present application.
The same or similar reference numbers in the drawings identify the same or similar elements.
Detailed Description
The present invention is described in further detail below with reference to the attached drawing figures.
In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
To further illustrate the technical means and effects adopted by the present application, the following description clearly and completely describes the technical solution of the present application with reference to the accompanying drawings and preferred embodiments.
Fig. 1 shows a flowchart of a method for designing a storage of a dual-computer hot-standby system according to an aspect of the present application, where the method includes the following steps:
s11 the main node receives a data write request and writes the data into the main node primary copy and the main node synchronous virtual machine copy, wherein the main node primary copy and the main node synchronous virtual machine copy are synchronized based on a full synchronous mode;
s12, establishing main-standby synchronization of the main node and the slave node, and establishing a master copy of the slave node and a synchronous virtual machine copy of the slave node, wherein the master copy of the slave node and the synchronous virtual machine copy of the slave node are synchronized based on a full synchronization mode.
In this application, the master node and the slave node refer to nodes for data synchronization, for example, data in the master node may be synchronized by establishing master-slave synchronization, so that the same data for synchronization is stored in the slave node. The master node and slave nodes include but are not limited to physical machine nodes or nodes that are virtual concepts. When including physical machine nodes, including but not limited to computer devices including but not limited to personal computers, laptops, industrial computers, network hosts, single network servers, multiple sets of network servers, and/or clouds; the Cloud is made up of a large number of computers or web servers based on Cloud Computing (Cloud Computing), which is a type of distributed Computing, a virtual supercomputer consisting of a collection of loosely coupled computers. Here, the specific master node and slave node are not limited in this application. The hot standby in the dual-computer hot standby system comprises power failure, and the scheme of the application can be used for recovering the slave node in the situation that the dual computers are powered off simultaneously.
In this embodiment, in the step S11, the data write request may include any data request sent to the master node, where the master node primary copy is primary and secondary copy data on the master node for performing primary and secondary synchronization, and the master node synchronization virtual machine copy is a virtual machine copy corresponding to the master node primary copy on a slave node, and the virtual machine copy and the master node primary copy perform synchronization based on a full synchronization mode.
Preferably, the full synchronization mode is implemented based on drbd synchronization. For example, drbd synchronization is a protocol C full synchronization mode, which requires that the slave end also successfully writes data in each write, so that the write operation returns success. In this mode, even if a fault occurs in the writing process, the writing is not successful, and the condition that the primary and secondary drbd devices are inconsistent is not caused. The drbd synchronization is only an example, and other existing or future possible ways of implementing the full synchronization mode, such as applying to the present application, are also included in the scope of the present application.
In this embodiment, in step S12, the master node and the slave node establish master-slave synchronization, where the slave node master copy is data corresponding to the master node master copy after the master-slave synchronization is established, the slave node synchronous virtual machine copy is a virtual machine copy corresponding to the slave node master copy on the master node, and the virtual machine copy and the slave node master copy are synchronized based on a full synchronization mode. For example, the full sync mode is drbd sync.
Fig. 2 shows a schematic diagram of a memory design for a dual-computer hot-standby system according to a preferred embodiment of the present application. Wherein, H1 is the master node, H2 is the slave node; PVM is the master node primary copy, PVM is the master node synchronous virtual machine copy, SVM is the slave node primary copy, PVM is the slave node synchronous virtual machine copy, wherein, PVM establishes drbd synchronization with PVM, SVM establishes drbd synchronization with SVM, and PVM establishes primary and standby synchronization with SVM.
FIG. 3 is a flow chart of a dual-node fault recovery method implemented based on the storage design method shown in FIG. 1 according to another aspect of the present application, the method including the steps of:
s31, the slave node receives a data writing request, the master node synchronous virtual machine copy is started, and the original slave node slave copy is synchronized through the master node synchronous virtual machine copy;
s32 establishes primary-standby synchronization between the master node synchronization virtual machine replica and the slave node synchronization virtual machine replica, so as to synchronize the original slave node synchronization replica via the master node synchronization virtual machine replica.
Fig. 4 is a schematic diagram illustrating a dual-node failure recovery based on the storage design method illustrated in fig. 2 according to a preferred embodiment of the present application.
In fig. 2, after the dual-machine simultaneous failure for H1 and H2, the PVM and SVM are likely to be inconsistent. In this case, if the slave node is recovered separately, according to the storage design mode of the present application, as shown in fig. 4, when both H1 and H2 fail, it is not feasible to recover the H2 slave node directly, but in the storage design herein, a PVM copy may be used to start the virtual machine, since PVM and PVM must be synchronized. After synchronization, PVM becomes the master copy of drbd, PVM and SVM establish master-slave synchronization, and PVM and SVM become slave copies of drbd for accepting the synchronization of drbd from "new master" PVM. This solves the problem of inconsistent availability and data.
Compared with the prior art, the method and the device have the advantages that the main node receives a data write request and writes the data write request into the main node primary copy and the main node synchronous virtual machine copy, wherein the main node primary copy and the main node synchronous virtual machine copy are synchronized based on a full synchronous mode, the main-standby synchronization of the main node and the slave node is established, the slave node primary copy and the slave node synchronous virtual machine copy are established, and the slave node primary copy and the slave node synchronous virtual machine copy are synchronized based on the full synchronous mode. By the storage design method, the synchronization of data can be ensured even if the slave node is independently recovered when the double computer fails.
Furthermore, the embodiment of the present application also provides a computer readable medium, on which computer readable instructions are stored, and the computer readable instructions can be executed by a processor to implement the foregoing method.
An embodiment of the present application further provides an apparatus for storing a design, where the apparatus includes:
one or more processors; and
a memory storing computer readable instructions that, when executed, cause the processor to perform the operations of the foregoing method.
For example, the computer readable instructions, when executed, cause the one or more processors to: the method comprises the steps that a main node receives a data write-in request and writes into a main node primary copy and a main node synchronous virtual machine copy, wherein the main node primary copy and the main node synchronous virtual machine copy are synchronized based on a full synchronous mode; establishing main-standby synchronization of the main node and the slave node, and establishing a master copy of the slave node and a synchronous virtual machine copy of the slave node, wherein the master copy of the slave node and the synchronous virtual machine copy of the slave node are synchronized based on a full synchronization mode.
In addition, an embodiment of the present application further provides an apparatus for dual-node failure recovery, where the apparatus includes:
one or more processors; and
a memory storing computer readable instructions that, when executed, cause the processor to perform the operations of the foregoing method.
For example, the computer readable instructions, when executed, cause the one or more processors to: receiving a data writing request by a slave node, starting the master node synchronous virtual machine copy, and synchronizing the original slave node copy of the slave node through the master node synchronous virtual machine copy; and establishing primary and standby synchronization of the master node synchronous virtual machine copy and the slave node synchronous virtual machine copy so as to synchronize the original slave node slave copy through the master node synchronous virtual machine copy.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims (6)

1. A storage design method for a dual-computer hot standby system is provided, wherein the method comprises the following steps:
the method comprises the steps that a main node receives a data write-in request and writes into a main node primary copy and a main node synchronous virtual machine copy, wherein the main node primary copy and the main node synchronous virtual machine copy are synchronized based on a full synchronous mode;
establishing main-standby synchronization of the main node and the slave node, and establishing a master copy of the slave node and a synchronous virtual machine copy of the slave node, wherein the master copy of the slave node and the synchronous virtual machine copy of the slave node are synchronized based on a full synchronization mode.
2. The method of claim 1, wherein the full sync mode is implemented based on drbd sync.
3. A dual-node failure recovery method based on the storage design method of claim 1 or 2, wherein the method comprises:
receiving a data writing request by a slave node, starting the master node synchronous virtual machine copy, and synchronizing the original slave node copy of the slave node through the master node synchronous virtual machine copy;
and establishing primary and standby synchronization of the master node synchronous virtual machine copy and the slave node synchronous virtual machine copy so as to synchronize the original slave node slave copy through the master node synchronous virtual machine copy.
4. A computer readable medium having computer readable instructions stored thereon which are executable by a processor to implement the method of any one of claims 1 to 3.
5. An apparatus for storing a design, wherein the apparatus comprises:
one or more processors; and
a memory storing computer readable instructions that, when executed, cause the processor to perform the operations of the method of claim 1 or 2.
6. An apparatus for dual node failure recovery, wherein the apparatus comprises:
one or more processors; and
a memory storing computer readable instructions that, when executed, cause the processor to perform the operations of the method of claim 3.
CN202010921737.2A 2020-09-04 2020-09-04 Storage design and fault recovery method and equipment for dual-computer hot standby system Pending CN112131045A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010921737.2A CN112131045A (en) 2020-09-04 2020-09-04 Storage design and fault recovery method and equipment for dual-computer hot standby system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010921737.2A CN112131045A (en) 2020-09-04 2020-09-04 Storage design and fault recovery method and equipment for dual-computer hot standby system

Publications (1)

Publication Number Publication Date
CN112131045A true CN112131045A (en) 2020-12-25

Family

ID=73848069

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010921737.2A Pending CN112131045A (en) 2020-09-04 2020-09-04 Storage design and fault recovery method and equipment for dual-computer hot standby system

Country Status (1)

Country Link
CN (1) CN112131045A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110099147A1 (en) * 2009-10-26 2011-04-28 Mcalister Grant Alexander Macdonald Provisioning and managing replicated data instances
CN104461792A (en) * 2014-12-03 2015-03-25 浪潮集团有限公司 HA method for clearing single-point failure of NAMENODE of HADOOP distributed file system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110099147A1 (en) * 2009-10-26 2011-04-28 Mcalister Grant Alexander Macdonald Provisioning and managing replicated data instances
CN104461792A (en) * 2014-12-03 2015-03-25 浪潮集团有限公司 HA method for clearing single-point failure of NAMENODE of HADOOP distributed file system

Similar Documents

Publication Publication Date Title
EP3474516B1 (en) Data processing method and device
CN107426265A (en) The synchronous method and apparatus of data consistency
WO2018049983A1 (en) Data synchronization method and system, and synchronization acquisition method and device
CN103051681B (en) Collaborative type log system facing to distribution-type file system
CN102890716B (en) The data back up method of distributed file system and distributed file system
CN103902405B (en) Quasi-continuity data replication method and device
CN106802892B (en) Method and equipment for checking consistency of main and standby data
CN111611109A (en) Backup method, system, device and medium for distributed cluster
CN111651275A (en) MySQL cluster automatic deployment system and method
CN107817951B (en) Method and device for realizing Ceph cluster fusion
CN108228581B (en) Zookeeper compatible communication method, server and system
CN108762982A (en) A kind of database restoring method, apparatus and system
CN107621994B (en) Method and device for creating data snapshot
CN107528703B (en) Method and equipment for managing node equipment in distributed system
CN115955488B (en) Distributed storage copy cross-machine room placement method and device based on copy redundancy
US8977897B2 (en) Computer-readable recording medium, data management method, and storage device
CN107045426B (en) Multi-copy reading method and system
CN112131045A (en) Storage design and fault recovery method and equipment for dual-computer hot standby system
CN113535477B (en) Method and equipment for data disaster recovery
CN115237674A (en) Data backup method, device and medium for SDN controller based on opennaylight
CN114281600A (en) Disaster recovery backup and recovery method, device, equipment and storage medium
CN110532134B (en) NAS data backup disaster recovery method and device
CN115328931A (en) Database cluster data verification method and device, storage medium and electronic equipment
CN115705269A (en) Data synchronization method, system, server and storage medium
CN114610533A (en) Database processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination