CN109062735B

CN109062735B - Disaster recovery method of storage system, storage system and related device

Info

Publication number: CN109062735B
Application number: CN201810872084.6A
Authority: CN
Inventors: 赵阳
Original assignee: Zhengzhou Yunhai Information Technology Co Ltd
Current assignee: Zhengzhou Yunhai Information Technology Co Ltd
Priority date: 2018-08-02
Filing date: 2018-08-02
Publication date: 2022-04-26
Anticipated expiration: 2038-08-02
Also published as: CN109062735A

Abstract

The application provides a disaster recovery method of a storage system, wherein the storage system comprises a production site, a first disaster recovery site and a second disaster recovery site, and the disaster recovery method comprises the following steps: the first disaster recovery site copies the data of the production site; when incremental data appear on a production site or a first disaster recovery site, recording the incremental data by using the incremental snapshot; updating incremental data by the second disaster recovery site; when the production site is unavailable, the first disaster recovery site replaces the production site; and when the production site and the first disaster recovery site are unavailable, the second disaster recovery site replaces the production site. By realizing data protection of three sites for the storage system and only updating incremental data by utilizing the second disaster recovery site, the influence that remote replication does not support cascade is reduced, the efficiency of data synchronization between the sites is improved to the maximum extent, and the disaster recovery capability of the whole storage system is effectively improved. The present application further provides a storage system and a computer-readable storage medium, which have the above-mentioned advantages and are not described herein again.

Description

Disaster recovery method of storage system, storage system and related device

Technical Field

The present application relates to the field of storage, and in particular, to a disaster recovery method for a storage system, and a computer-readable storage medium.

Background

With the rapid development of information technology, information systems play an increasingly important role in key business of various industries. In the fields of communication, finance, medical treatment, electronic commerce, logistics, government and the like, the interruption of service of an information system can cause huge economic loss, influence brand image and possibly cause important data loss. Therefore, ensuring service continuity is a key to information system construction.

In recent years, a large-scale natural disaster often occurs, and in order to ensure the service continuity, a 'two places and three centers' disaster tolerance solution combining a city disaster recovery center with a remote disaster recovery center is more and more emphasized and accepted by the industry.

In the existing disaster recovery solution of 'two places and three centers', the MCS software InMetro is realized based on synchronous replication, and MCS remote replication does not support cascading, i.e. the same volume can only be in the only one remote replication relation, so that the existing disaster recovery method has low synchronization efficiency when realizing multi-center data synchronization, and directly causes poor disaster recovery capability.

Therefore, how to improve the disaster tolerance capability of the storage system is an urgent problem to be solved by those skilled in the art.

Content of application

The present application aims to provide a disaster recovery method for a storage system, a storage system and a computer-readable storage medium, which solve the problems of low data synchronization efficiency and poor disaster recovery capability of the existing disaster recovery schemes.

In order to solve the above technical problem, the present application provides a disaster recovery method for a storage system, where the storage system includes a production site, a first disaster recovery site, and a second disaster recovery site, and the specific technical solution is as follows:

the first disaster recovery site copies the data of the production site to simultaneously store two pieces of latest data;

when incremental data appear in the production site or the first disaster recovery site, recording the incremental data by using an incremental snapshot;

sending the incremental snapshot volume of the incremental snapshot to the second disaster recovery site, so that the second disaster recovery site updates the incremental data;

when the production site is unavailable, the first disaster recovery site replaces the production site; and when the production site and the first disaster recovery site are unavailable, the second disaster recovery site replaces the production site.

Wherein the updating of the incremental data by the second disaster recovery site includes:

the second disaster recovery site updates the incremental data according to a preset strategy; the preset strategy specifically comprises the following steps:

generating an incremental snapshot according to the production volume of the production site or the incremental data of the first disaster recovery volume of the first disaster recovery site, and generating a protection snapshot at the same time;

remotely copying the incremental snapshot;

judging whether the incremental snapshot is copied remotely or not;

if so, stopping generating the incremental snapshot and the protection snapshot;

and if not, utilizing the protection snapshot to execute a rollback operation on the second disaster recovery volume of the second disaster recovery site.

Wherein, the updating the incremental data by the second disaster recovery site according to a preset policy includes:

and the second disaster recovery site periodically updates the incremental data according to a preset strategy.

Wherein, when the production site and the first disaster recovery site are both unavailable, the method further comprises:

judging whether the remote copying is stopped consistently;

if so, stopping the preset strategy, and deleting the preset strategy, the incremental snapshot, the protection snapshot and the remotely copied data;

if not, judging whether remote copying is required to be completed or not;

and if the remote copy is not required to be completed, stopping the preset strategy, stopping the remote copy, deleting the remotely copied data, and rolling back the second disaster recovery site to the consistency protection snapshot.

Wherein, when any one of the production site and the first disaster recovery site is unavailable, the method further comprises:

when the remote copying is not synchronous and is consistently stopped, snapshot is taken on the second disaster recovery volume;

and when the remote copy is synchronous and does not need to wait for the completion of the remote copy, or the remote copy is not synchronous and the remote copy is not consistently stopped, snapshot is taken for the changed volume of the second disaster recovery volume.

Wherein the copying of the data of the production site by the first disaster recovery site comprises:

and the first disaster recovery site synchronously or asynchronously copies the data of the production site.

The present application also provides a storage system, comprising:

a production site, a first disaster recovery site and a second disaster recovery site;

the first disaster recovery site is used for copying the data of the production site and replacing the production site when the production site is unavailable;

the second disaster recovery site is used for updating the incremental data of the production site or the first disaster recovery site, and replacing the production site when the production site and the first disaster recovery site are unavailable.

The present application also provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the disaster recovery method as described above.

The application provides a disaster recovery method for a storage system, where the storage system includes a production site, a first disaster recovery site, and a second disaster recovery site, and the disaster recovery method includes: the first disaster recovery site copies the data of the production site to simultaneously store two pieces of latest data; when incremental data appear in the production site or the first disaster recovery site, recording the incremental data by using an incremental snapshot; sending the incremental snapshot volume of the incremental snapshot to the second disaster recovery site, so that the second disaster recovery site updates the incremental data; when the production site is unavailable, the first disaster recovery site replaces the production site; and when the production site and the first disaster recovery site are unavailable, the second disaster recovery site replaces the production site.

According to the method and the device, data protection of three sites is realized on a storage system, incremental data updating of a production site or a first disaster recovery site is only carried out by utilizing a second disaster recovery site, and data comparison between the sites and data synchronization between the sites are not required to be carried out firstly in the existing 'two places and three centers' disaster recovery solution. The influence that remote replication does not support cascade is reduced, the efficiency of data synchronization between sites is improved to the maximum extent, and the disaster tolerance capability of the whole storage system is effectively improved. The present application further provides a storage system and a computer-readable storage medium, which have the above-mentioned advantages and are not described herein again.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of a disaster recovery method for a storage system according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram illustrating a relationship between a first production site, a first disaster recovery site, and a second disaster recovery site according to an embodiment of the present application;

fig. 3 is a schematic diagram of a relationship between a second production site, a first disaster recovery site, and a second disaster recovery site according to an embodiment of the present application;

fig. 4 is a flowchart of a preset policy provided in the embodiment of the present application;

FIG. 5 is a flowchart of initiating a third copy provided by an embodiment of the present application;

fig. 6 is a flowchart for creating a third copy snapshot according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a flowchart of a disaster recovery method of a storage system according to an embodiment of the present application, where the storage system includes a production site, a first disaster recovery site, and a second disaster recovery site, and the disaster recovery method includes:

s101: the first disaster recovery site copies the data of the production site to simultaneously store two pieces of latest data;

in the present application, the storage system includes two disaster recovery sites in addition to the production site. Usually, at least one disaster recovery site needs to keep a data synchronization relationship with a production site, that is, the production site and one disaster recovery site simultaneously keep real-time latest data, so that two pieces of latest data are ensured to exist at any time, once an accident occurs to the production site, including an accident caused by a natural disaster, or a device fails, the production site can be replaced by the disaster recovery site, and the service continuity of the whole storage system is ensured. In addition, the disaster recovery site and the production site, which usually maintain a data synchronization relationship with the production site, are located in the same city, but once a natural disaster occurs, the storage devices in a whole area are often paralyzed at the same time, and at this time, a second disaster recovery site located in a different place needs to be utilized to ensure that no service interruption occurs in the storage system in the area where the production site is located.

The first disaster recovery site is not limited to how to replicate the data of the production site, and may be synchronous replication or asynchronous replication, or may be periodic asynchronous replication or a method using a dual active data center, and so on. Of course, the application does not limit whether the production site and the first disaster recovery site are in the same city, and only the data synchronization relationship between the production site and the first disaster recovery site is needed.

The data corresponding to the production site may be referred to as a production volume Prod _ disk, and the data corresponding to the first disaster recovery site may be referred to as a first disaster recovery volume Primary _ DR _ disk, and is used to store a production volume with RPO of 0, and may also be referred to as a copy of the production volume. The data corresponding to the second disaster recovery site is a second disaster recovery volume Secondary _ DR _ disk, that is, a third copy of the production volume. Thus, the disaster recovery scheme of 'three sites and three copies' proposed by the application is formed. It should be noted that the production site and the production volume, the first disaster recovery site and the first disaster recovery volume, and the second disaster recovery site and the second disaster recovery volume may be used in common in terms of meaning. RPO refers to the level of the recovery point, and simply, RPO is 0, meaning that there is no difference between the production volume Prod _ disk and the first disaster recovery volume Primary _ DR _ disk. If the RPO is not 0, it indicates that the production volume Prod _ disk and the first disaster recovery volume Primary _ DR _ disk are different, and the difference level is the value of the RPO.

S102: when incremental data appear in the production site or the first disaster recovery site, recording the incremental data by using an incremental snapshot;

the method comprises the steps of recording incremental data through the incremental snapshot when the incremental data appear on a production site or a first disaster recovery site. It should be noted that although the data of the production site and the data of the first disaster recovery site are synchronized under normal conditions, the second disaster recovery site is connected to only one of the two sites, and therefore, as long as incremental data occurs, the incremental data needs to be recorded by using the incremental snapshot. The incremental data can be recorded by using the incremental snapshot regardless of the production site or the first disaster recovery site. However, it should be noted that only the site connected to the second disaster recovery site needs to record the incremental data using the incremental snapshot.

S103: sending the incremental snapshot volume of the incremental snapshot to the second disaster recovery site, so that the second disaster recovery site updates the incremental data;

the reason why the second disaster recovery site can only update the production site or the increment of the first disaster recovery site is that the second disaster recovery site can only be connected to one of the two. Referring to fig. 2 and fig. 3, fig. 2 is a schematic diagram illustrating a relationship between a first production site, a first disaster recovery site, and a second disaster recovery site according to an embodiment of the present application; fig. 3 is a schematic diagram of a relationship between a second production site, a first disaster recovery site, and a second disaster recovery site provided in an embodiment of the present application, and in fig. 2 and fig. 3, an ellipse represents a volume of each site, that is, a production volume of a production site, a first disaster recovery volume of a first disaster recovery site, and a second disaster recovery volume of a second disaster recovery site, specifically, there are two connection manners:

parallel connection 1+ 2: as the name implies, the first disaster recovery site and the second disaster recovery site are both connected to the production site, i.e., as shown in the figure, the first disaster recovery volume and the second disaster recovery volume are connected to the production volume, respectively.

Series 1+1+ 1: in this connection, the production site, the first disaster recovery site, and the second disaster recovery site are connected in sequence.

That is, the second disaster recovery site can be optionally connected to either one of the production site and the first disaster recovery site while the production site and the first disaster recovery site remain connected to each other. Because the data between the production site and the first disaster recovery site are the same, the second disaster recovery site does not have a change in data when updating the incremental data. However, if the second disaster recovery site is connected to the production site, the incremental snapshot corresponding to the incremental data is stored on the production site. And if the second disaster recovery site is connected with the first disaster recovery site, the incremental snapshot corresponding to the incremental data is stored on the first disaster recovery site.

In this step, only the incremental data on the production site or the first disaster recovery site is updated, which means that it is not necessary to generate a snapshot of all data of the production site or the first disaster recovery site at this time, but only the incremental snapshot is generated. Of course, it can be understood that, in order to ensure the consistency of the second disaster recovery volume, the second disaster recovery site generates its own consistency protection snapshot before receiving the incremental snapshot.

S104: when the production site is unavailable, the first disaster recovery site replaces the production site; and when the production site and the first disaster recovery site are unavailable, the second disaster recovery site replaces the production site.

The specific reason why the production site is unavailable is not limited in the present application, and may be, for example, a natural disaster or an equipment failure, and the like, in which case the production site is replaced with the first disaster recovery site.

And if the production site and the first disaster recovery site are unavailable, starting the second disaster recovery site to replace the production site.

In this embodiment, data protection of three sites (a production site, a first disaster recovery site, and a second disaster recovery site) is implemented on a storage system, and the second disaster recovery site is used to update incremental data of the production site or the first disaster recovery site only, so that it is not necessary to compare data between sites first and synchronize data between sites in the existing "two-site three-center" disaster recovery solution. The influence that remote replication does not support cascade is reduced, the efficiency of data synchronization between sites is improved to the maximum extent, and the disaster tolerance capability of the whole storage system is effectively improved.

Based on the above embodiment, as a preferred embodiment, the updating, by the second disaster recovery site, the incremental data of the production site or the first disaster recovery site may specifically be:

the second disaster recovery site updates the incremental data of the production site or the first disaster recovery site according to a preset strategy; the preset strategy specifically comprises the following steps:

generating an incremental snapshot according to the production volume of the production site or the incremental data of the disaster recovery volume of the first disaster recovery site, and generating a protection snapshot at the same time;

remotely copying the incremental snapshot;

judging whether the remote copy increment snapshot is successful or not;

and if not, performing rollback operation on the second disaster recovery volume of the second disaster recovery site by using the protection snapshot.

The increment snapshot is denoted by LCx, the protection snapshot is denoted by LCy, and the remote copy process is denoted by RCy, which can be referred to fig. 4, where fig. 4 is a flow chart of the preset policy provided by the embodiment of the present application.

In this embodiment, it is default that before the second disaster recovery site is updated, there is a detection for determining whether the production volume is in a consistency state, and only when the consistency state detection is passed, remote copy of the incremental snapshot is performed, that is, a process for updating the incremental data by the second disaster recovery site. Remote replication is used because sites are typically far apart. The consistency state detection of the production volume refers to the production volume maintaining data consistency with the copy of the production volume (i.e., the first disaster recovery volume). And normally this condition is satisfied. And when the consistency of the production volume is not satisfied, the second disaster recovery site cannot update the incremental data.

When the consistency of the production volume is satisfied, the incremental snapshot LCx, the protection snapshot LCy and the remote copy RCy are started in sequence. The protection snapshot LCy is used to protect the data consistency of the second disaster recovery volume.

When both the incremental snapshot LCx and the remote copy RCy are complete, the protection snapshot LCy is stopped, as is the remote copy RCy, representing that the second disaster recovery volume has updated the incremental snapshot at this time.

Of course, as shown in fig. 4, the above process may be a cyclic process, that is, the second disaster recovery site periodically updates the incremental data of the production site or the first disaster recovery site according to a preset policy. In other words, when incremental data is present and the production volume satisfies the consistency state, the incremental data may be continuously copied remotely to the second disaster recovery volume. At this time, the value of the RPO may be determined by the user by setting a period, for example, if the RPO is updated once a day, one unit of the RPO represents one day. Typically the RPO is less than twice the reasonable period, with the lower limit of the period being determined by the link bandwidth between stations. Reasonable periodicity is an indeterminate value known to those skilled in the art.

Once the remote copy process fails, the remote copy may be restarted, and if the remote copy is not actually available, the second disaster recovery volume may be rolled back to the previous consistency point by the protection snapshot LCy.

The embodiment is based on the combination of the incremental snapshot and the remote copy, realizes the data protection of the three-site three-copy, and avoids the influence of low efficiency of data synchronization caused by the fact that the MCS does not support the cascade by utilizing the remote copy.

Based on the foregoing embodiment, as a preferred embodiment, referring to fig. 5, fig. 5 is a flowchart of starting a third copy provided in the embodiment of the present application, and when both the production site and the first disaster recovery site are unavailable, the method further includes:

judging whether the remote copying is stopped consistently;

if not, judging whether remote copying is required to be completed or not;

Once the production site and the first disaster recovery site are both unavailable, the second disaster recovery site is started, and the read-write permission of the second disaster recovery site is started.

When the second disaster recovery site is started, it is first determined whether the remote replication processes are consistently stopped, i.e., it is necessary to determine whether all the remote replication processes have been stopped. If the remote copy is consistently stopped, stopping the preset policy, and deleting the preset policy, the incremental snapshot LCx, the protection snapshot LCy, and the remotely copied data, in addition, LCy2 and LC _ target may be further included, where LCy2 is a rollback snapshot relationship of LCy, that is, a consistency protection snapshot; the LC _ target is an incremental snapshot volume of the production volume, and is used to transmit a change in the increment of the production volume to a Secondary _ DR _ disk, that is, a second disaster recovery volume; the change _ disk is a changed volume of the second disaster recovery volume and is used for protecting the data consistency of the second disaster recovery volume in the periodic asynchronous replication process.

If the remote copy is not consistently stopped and the remote copy is not required to be completed, the preset policy is terminated, the remote copy is directly terminated, the remotely copied data is deleted, and the second disaster recovery _ DR _ disk is rolled back to the consistency protection snapshot. And finally deleting the incremental snapshots LCx and LC _ target (if the remote copy is asynchronous copy, the remote snapshots LCx and LC _ target need to wait for the rollback to LCy2 to be completed, and then deleting LCy, LCy2 and change _ disk).

The purpose of deleting the incremental snapshot LCx, the protection snapshot LCy, the preset policy, and the like is to reconfigure the policy when the second disaster recovery site is used as a production site. Because the second disaster recovery site cannot synchronize its subsequent incremental data back to the original production site or the first disaster recovery site.

Referring to fig. 6, fig. 6 is a flowchart of creating a third copy snapshot according to an embodiment of the present application, and based on the foregoing embodiment, as a preferred embodiment, when either one of the production site and the first disaster recovery site is unavailable, the method further includes:

and when the remote copy is synchronous and does not need to wait for the completion of the remote copy, snapshot is taken for the changed volume of the second disaster recovery volume.

the first disaster recovery site synchronously replicates or asynchronously replicates data of the production site.

The embodiment aims to ensure that when any one of the production site and the first disaster recovery site is unavailable, the second disaster recovery volume does not need to be started (which means that the read-write permission is started), only the observation and verification operation needs to be carried out on the second disaster recovery volume, and therefore only the snapshot of the third copy needs to be created at the moment. Specifically, when the remote copy is not synchronized and consistently stops, a snapshot is taken of the second disaster recovery volume. That is, at this time, the data of the entire second disaster recovery site needs to be snapshot. It should be noted that the snapshot is performed to facilitate viewing of the data of the disaster recovery volume.

When the remote copy is in synchronization and does not need to wait for the completion of the remote copy, or the remote copy is not in synchronization and the remote copy is not consistently stopped, snapshot the changed volume of the second disaster recovery volume, that is, snapshot change _ disk.

Referring to fig. 2 and 3, the present application also provides a storage system comprising:

the first disaster recovery site is used for copying data of the production site and replacing the production site when the production site is unavailable;

the second disaster recovery site is used for updating the incremental data of the production site or the first disaster recovery site and replacing the production site when the production site and the first disaster recovery site are unavailable.

Of course, the production site and the first disaster recovery site can also be used for recording data by using the incremental snapshot when incremental data occurs, and sending the incremental snapshot volume of the incremental snapshot to the second disaster recovery site.

In fig. 2 and 3, the production volumes correspond to production sites, the first disaster recovery volumes correspond to first disaster recovery sites, and the second disaster recovery volumes correspond to second disaster recovery sites. Fig. 2 and 3 illustrate two different connection modes included in the storage system provided by the present application.

The present application also provides a computer readable storage medium having stored thereon a computer program which, when executed, may implement the steps provided by the above-described embodiments. The storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system provided by the embodiment, the description is relatively simple because the system corresponds to the method provided by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A disaster recovery method of a storage system, wherein the storage system comprises a production site, a first disaster recovery site and a second disaster recovery site, the disaster recovery method comprising:

sending the incremental snapshot volume of the incremental snapshot to the second disaster recovery site, so that the second disaster recovery site updates the incremental data; before the incremental snapshot, the second disaster recovery site generates a consistency protection snapshot of the second disaster recovery site;

when the production site is unavailable, the first disaster recovery site replaces the production site; when the production site and the first disaster recovery site are both unavailable, the second disaster recovery site replaces the production site;

generating an incremental snapshot according to the production volume of the production site or the incremental data of the first disaster recovery volume of the first disaster recovery site;

remotely copying the incremental snapshot;

judging whether the incremental snapshot is copied remotely or not;

2. The disaster recovery method according to claim 1, wherein the updating of the incremental data by the second disaster recovery site according to a preset policy includes:

3. The disaster recovery method according to claim 1, wherein when neither the production site nor the first disaster recovery site is available, further comprising:

judging whether the remote copying is stopped consistently;

if not, judging whether remote copying is required to be completed or not;

4. The disaster recovery method according to claim 1, wherein when either of the production site and the first disaster recovery site is unavailable, further comprising:

when the remote copying is consistently stopped and is not synchronous, snapshot is taken on the second disaster recovery volume;

5. The disaster recovery method of claim 1, wherein the first disaster recovery site replicating the data of the production site comprises:

6. A storage system, comprising:

the first disaster recovery site is used for copying the data of the production site and replacing the production site when the production site is unavailable; when incremental data appear in the production site or the first disaster recovery site, recording the incremental data by using an incremental snapshot;

the second disaster recovery site is configured to update incremental data of the production site or the first disaster recovery site by using an incremental snapshot volume of the incremental snapshot, and replace the production site when both the production site and the first disaster recovery site are unavailable;

the second disaster recovery site is further used for updating the incremental data according to a preset strategy; the preset strategy specifically comprises the following steps:

remotely copying the incremental snapshot;

judging whether the incremental snapshot is copied remotely or not;

7. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the disaster recovery method according to any one of claims 1 to 5.