CN112131045A

CN112131045A - Storage design and fault recovery method and equipment for dual-computer hot standby system

Info

Publication number: CN112131045A
Application number: CN202010921737.2A
Authority: CN
Inventors: 赵胜龑
Original assignee: Shanghai Zstack Information Technology Co ltd
Current assignee: Shanghai Zstack Information Technology Co ltd
Priority date: 2020-09-04
Filing date: 2020-09-04
Publication date: 2020-12-25

Abstract

The application aims to provide a method and equipment for storage design and fault recovery of a dual-computer hot-standby system. Compared with the prior art, the method and the device have the advantages that the main node receives a data write request and writes the data write request into the main node primary copy and the main node synchronous virtual machine copy, wherein the main node primary copy and the main node synchronous virtual machine copy are synchronized based on a full synchronous mode, the main-standby synchronization of the main node and the slave node is established, the slave node primary copy and the slave node synchronous virtual machine copy are established, and the slave node primary copy and the slave node synchronous virtual machine copy are synchronized based on the full synchronous mode. By the storage design method, the synchronization of data can be ensured even if the slave node is independently recovered when the double computer fails.

Description

Storage design and fault recovery method and equipment for dual-computer hot standby system

Technical Field

The present application relates to the field of computer technologies, and in particular, to a storage design and fault recovery technique for a dual-computer hot-standby system.

Background

In the prior art, for a fault-tolerant system with dual-node local storage, if a fault occurs in the primary and secondary synchronization process, inconsistency between the primary and secondary copies is likely to be caused because the synchronization is not completed.

In this case, fault tolerant systems typically do not allow for individual recovery of slave nodes because the data synchronized by the master is likely to be incomplete, causing a slave node system crash; or when the copy is written and forked to cause the next time the original main node recovers again, the synchronous connection of the main node and the standby node in the fault-tolerant system can not be automatically established again due to the conflict of the written fork.

In the practice of product application, most scenes such as power supply cannot be stable like a laboratory or an enterprise machine room, so that the double machines are powered off at the same time, and then the scene of one node is randomly recovered, which is usually the scene required by a client. The scenario includes simultaneous failure of the master node and the slave node and then recovery of the slave node.

If the main node and the standby node simultaneously fail, the main node is recovered, and the problem of incomplete copy synchronization is not involved, because the main node always holds the latest data, the slave node is on-line again, and the resynchronization is good, but if the slave node is recovered after the failure, the problem of inconsistent main data and the standby data is likely to occur.

For this scenario, there is no relevant scheme at present, and most of the active/standby synchronous systems adopt means for avoiding such situations, for example, if recovery cannot recover only one node, 2 nodes must recover; or the scenario that the slave nodes are recovered independently after the failure of both nodes is avoided, so how to recover the slave nodes and ensure the integrity of the synchronous data becomes an urgent problem to be solved.

Disclosure of Invention

The present application aims to provide a storage design method for a dual-computer hot-standby system, so as to solve the problem that data synchronization cannot be guaranteed when a slave node recovers in the prior art.

According to an aspect of the present application, a storage design method for a dual-computer hot-standby system is provided, where the method includes:

the method comprises the steps that a main node receives a data write-in request and writes into a main node primary copy and a main node synchronous virtual machine copy, wherein the main node primary copy and the main node synchronous virtual machine copy are synchronized based on a full synchronous mode;

establishing main-standby synchronization of the main node and the slave node, and establishing a master copy of the slave node and a synchronous virtual machine copy of the slave node, wherein the master copy of the slave node and the synchronous virtual machine copy of the slave node are synchronized based on a full synchronization mode.

Further, wherein the full synchronization mode is implemented based on drbd synchronization.

According to another aspect of the present application, there is also provided a dual-node failure recovery method based on the foregoing storage design method, where the method includes:

receiving a data writing request by a slave node, starting the master node synchronous virtual machine copy, and synchronizing the original slave node copy of the slave node through the master node synchronous virtual machine copy;

and establishing primary and standby synchronization of the master node synchronous virtual machine copy and the slave node synchronous virtual machine copy so as to synchronize the original slave node slave copy through the master node synchronous virtual machine copy.

According to yet another aspect of the present application, there is also provided a computer readable medium having computer readable instructions stored thereon, the computer readable instructions being executable by a processor to implement the operations of the method as described above.

Compared with the prior art, the method and the device have the advantages that the main node receives a data write request and writes the data write request into the main node primary copy and the main node synchronous virtual machine copy, wherein the main node primary copy and the main node synchronous virtual machine copy are synchronized based on a full synchronous mode, the main-standby synchronization of the main node and the slave node is established, the slave node primary copy and the slave node synchronous virtual machine copy are established, and the slave node primary copy and the slave node synchronous virtual machine copy are synchronized based on the full synchronous mode. By the storage design method, the synchronization of data can be ensured even if the slave node is independently recovered when the double computer fails.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:

FIG. 1 illustrates a flow diagram of a method for memory design for a dual-machine hot-standby system, according to an aspect of the subject application;

FIG. 2 is a schematic diagram of a memory design for a dual-computer hot-standby system according to a preferred embodiment of the present application;

FIG. 3 illustrates a flow diagram of a dual node failure recovery method implemented based on the storage design method illustrated in FIG. 1, according to another aspect of the present application;

fig. 4 is a schematic diagram illustrating a dual-node failure recovery based on the storage design method illustrated in fig. 2 according to a preferred embodiment of the present application.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

The present invention is described in further detail below with reference to the attached drawing figures.

In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

To further illustrate the technical means and effects adopted by the present application, the following description clearly and completely describes the technical solution of the present application with reference to the accompanying drawings and preferred embodiments.

Fig. 1 shows a flowchart of a method for designing a storage of a dual-computer hot-standby system according to an aspect of the present application, where the method includes the following steps:

s11 the main node receives a data write request and writes the data into the main node primary copy and the main node synchronous virtual machine copy, wherein the main node primary copy and the main node synchronous virtual machine copy are synchronized based on a full synchronous mode;

s12, establishing main-standby synchronization of the main node and the slave node, and establishing a master copy of the slave node and a synchronous virtual machine copy of the slave node, wherein the master copy of the slave node and the synchronous virtual machine copy of the slave node are synchronized based on a full synchronization mode.

In this application, the master node and the slave node refer to nodes for data synchronization, for example, data in the master node may be synchronized by establishing master-slave synchronization, so that the same data for synchronization is stored in the slave node. The master node and slave nodes include but are not limited to physical machine nodes or nodes that are virtual concepts. When including physical machine nodes, including but not limited to computer devices including but not limited to personal computers, laptops, industrial computers, network hosts, single network servers, multiple sets of network servers, and/or clouds; the Cloud is made up of a large number of computers or web servers based on Cloud Computing (Cloud Computing), which is a type of distributed Computing, a virtual supercomputer consisting of a collection of loosely coupled computers. Here, the specific master node and slave node are not limited in this application. The hot standby in the dual-computer hot standby system comprises power failure, and the scheme of the application can be used for recovering the slave node in the situation that the dual computers are powered off simultaneously.

In this embodiment, in the step S11, the data write request may include any data request sent to the master node, where the master node primary copy is primary and secondary copy data on the master node for performing primary and secondary synchronization, and the master node synchronization virtual machine copy is a virtual machine copy corresponding to the master node primary copy on a slave node, and the virtual machine copy and the master node primary copy perform synchronization based on a full synchronization mode.

Preferably, the full synchronization mode is implemented based on drbd synchronization. For example, drbd synchronization is a protocol C full synchronization mode, which requires that the slave end also successfully writes data in each write, so that the write operation returns success. In this mode, even if a fault occurs in the writing process, the writing is not successful, and the condition that the primary and secondary drbd devices are inconsistent is not caused. The drbd synchronization is only an example, and other existing or future possible ways of implementing the full synchronization mode, such as applying to the present application, are also included in the scope of the present application.

In this embodiment, in step S12, the master node and the slave node establish master-slave synchronization, where the slave node master copy is data corresponding to the master node master copy after the master-slave synchronization is established, the slave node synchronous virtual machine copy is a virtual machine copy corresponding to the slave node master copy on the master node, and the virtual machine copy and the slave node master copy are synchronized based on a full synchronization mode. For example, the full sync mode is drbd sync.

Fig. 2 shows a schematic diagram of a memory design for a dual-computer hot-standby system according to a preferred embodiment of the present application. Wherein, H1 is the master node, H2 is the slave node; PVM is the master node primary copy, PVM is the master node synchronous virtual machine copy, SVM is the slave node primary copy, PVM is the slave node synchronous virtual machine copy, wherein, PVM establishes drbd synchronization with PVM, SVM establishes drbd synchronization with SVM, and PVM establishes primary and standby synchronization with SVM.

FIG. 3 is a flow chart of a dual-node fault recovery method implemented based on the storage design method shown in FIG. 1 according to another aspect of the present application, the method including the steps of:

s31, the slave node receives a data writing request, the master node synchronous virtual machine copy is started, and the original slave node slave copy is synchronized through the master node synchronous virtual machine copy;

s32 establishes primary-standby synchronization between the master node synchronization virtual machine replica and the slave node synchronization virtual machine replica, so as to synchronize the original slave node synchronization replica via the master node synchronization virtual machine replica.

In fig. 2, after the dual-machine simultaneous failure for H1 and H2, the PVM and SVM are likely to be inconsistent. In this case, if the slave node is recovered separately, according to the storage design mode of the present application, as shown in fig. 4, when both H1 and H2 fail, it is not feasible to recover the H2 slave node directly, but in the storage design herein, a PVM copy may be used to start the virtual machine, since PVM and PVM must be synchronized. After synchronization, PVM becomes the master copy of drbd, PVM and SVM establish master-slave synchronization, and PVM and SVM become slave copies of drbd for accepting the synchronization of drbd from "new master" PVM. This solves the problem of inconsistent availability and data.

Furthermore, the embodiment of the present application also provides a computer readable medium, on which computer readable instructions are stored, and the computer readable instructions can be executed by a processor to implement the foregoing method.

An embodiment of the present application further provides an apparatus for storing a design, where the apparatus includes:

one or more processors; and

a memory storing computer readable instructions that, when executed, cause the processor to perform the operations of the foregoing method.

For example, the computer readable instructions, when executed, cause the one or more processors to: the method comprises the steps that a main node receives a data write-in request and writes into a main node primary copy and a main node synchronous virtual machine copy, wherein the main node primary copy and the main node synchronous virtual machine copy are synchronized based on a full synchronous mode; establishing main-standby synchronization of the main node and the slave node, and establishing a master copy of the slave node and a synchronous virtual machine copy of the slave node, wherein the master copy of the slave node and the synchronous virtual machine copy of the slave node are synchronized based on a full synchronization mode.

In addition, an embodiment of the present application further provides an apparatus for dual-node failure recovery, where the apparatus includes:

one or more processors; and

For example, the computer readable instructions, when executed, cause the one or more processors to: receiving a data writing request by a slave node, starting the master node synchronous virtual machine copy, and synchronizing the original slave node copy of the slave node through the master node synchronous virtual machine copy; and establishing primary and standby synchronization of the master node synchronous virtual machine copy and the slave node synchronous virtual machine copy so as to synchronize the original slave node slave copy through the master node synchronous virtual machine copy.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A storage design method for a dual-computer hot standby system is provided, wherein the method comprises the following steps:

2. The method of claim 1, wherein the full sync mode is implemented based on drbd sync.

3. A dual-node failure recovery method based on the storage design method of claim 1 or 2, wherein the method comprises:

4. A computer readable medium having computer readable instructions stored thereon which are executable by a processor to implement the method of any one of claims 1 to 3.

5. An apparatus for storing a design, wherein the apparatus comprises:

one or more processors; and

a memory storing computer readable instructions that, when executed, cause the processor to perform the operations of the method of claim 1 or 2.

6. An apparatus for dual node failure recovery, wherein the apparatus comprises:

one or more processors; and

a memory storing computer readable instructions that, when executed, cause the processor to perform the operations of the method of claim 3.