WO2016107042A1

WO2016107042A1 - Data incremental backup method and apparatus, and nas device

Info

Publication number: WO2016107042A1
Application number: PCT/CN2015/078495
Authority: WO
Inventors: 徐帆; 黄茁
Original assignee: 中兴通讯股份有限公司
Priority date: 2014-12-30
Filing date: 2015-05-07
Publication date: 2016-07-07
Also published as: CN105808373A

Abstract

Disclosed are a data incremental backup method and apparatus, and a NAS device. The method comprises: segmenting data blocks; determining whether each data segment of the data blocks stored in a source end is correspondingly identical to each data segment of the data blocks stored in a destination end; and if not, sending, to the destination end for incremental backup, one or more data segments of the source end that are different from those of the destination end. Therefore, the problems in the related art that it is impossible to use a data backup function to perform reliable data incremental backup based on NDMP and a backup server is under heavy load are solved, so that data incremental backup can be realized based on NDMP, the load on the backup server is alleviated, and the bandwidth transmission is reduced.

Description

Data incremental backup method, device and NAS device

Technical field

The present invention relates to the field of information storage technologies, and in particular, to a data incremental backup method, device, and NAS device.

Background technique

With the continuous growth of information data, enterprises are increasingly aware of data protection, and disaster recovery backup has become one of the essential functions of storage devices. However, for massive data and equipment, how to manage it in a unified manner and how to reduce the impact of disaster recovery backup on online services is an important issue to be solved in the current storage industry.

Therefore, the Network Data Management Protocol (NDMP) is used to centrally control and manage enterprise-level data. The NDMP architecture enables backup application vendors to control local backup and recovery devices on a network attached storage server, for example, on a Network Attached Storage (NAS). Provides a common interface between any backup software application and NAS devices. In this way, application vendors can support a variety of NAS devices without having to redesign expensive programming logic. Moreover, NAS device vendors can seamlessly collaborate with any NDMP-compliant application.

NDMP divides the control and data flow of backup and recovery operations into separate sessions. This gives you more flexibility in configuring the environment used to protect NAS device data. Because sessions are independent, they can be launched from various locations and directed to different locations for more flexible design of NDMP-based topologies.

In the current implementation of NDMP, the control flow follows a unified interface defined by the standard protocol, and the data flow can be customized according to the device characteristics of each NAS vendor. In order to effectively reduce the bandwidth usage during data stream transmission, device vendors have introduced an incremental backup mechanism. The main incremental backup methods are as follows:

Method 1: Record the timestamp of the backup, obtain the timestamp of the most recent backup operation before each backup, compare the modification time of the file to be backed up and the latest backup time. If the file has not been modified, filter the file. Otherwise, the file will be filtered. File attributes and file data blocks are sent to the backup destination device. There are certain risks in this method. For example, the system time may be updated, and this incremental backup method can only send the entire difference file, and cannot send different data blocks.

Method 2: Use the file system mirroring method to record the difference data block to the backup destination device by comparing the current file system snapshot with the baseline snapshot difference before each backup. This method relies on the snapshot function of the file system, and the NDMP protocol cannot complete the snapshot management, which will cause a snapshot backlog.

Both of the above methods have certain defects and cannot be well integrated with NDMP. Therefore, in the related art, the data backup function cannot perform incremental data backup based on NDMP, and there is a problem that the backup server is heavy.

Summary of the invention

The present invention provides a data incremental backup method, which solves at least the problem that the data backup function in the related art cannot perform reliable data incremental backup based on NDMP, and the backup server has a heavy burden.

According to an aspect of the present invention, a data incremental backup method includes: segmenting a data block; determining each segment data of the data block stored at the source end and the data block stored by the destination end Whether the segment data corresponds to the consistency; if the determination result is no, one or more segment data of the source end having the difference is sent to the destination end for incremental backup.

Optionally, before the segmentation of the data block, the method further includes: determining whether the backup attribute information of the data block in the source end and the destination end is the same, where the backup attribute information is used to identify the data block. Backup history information; if the judgment result is no, the data block is segmented for data incremental backup.

Optionally, determining whether the segment data of the data block stored by the source end and the segment data of the data block stored by the destination end are consistently include: calculating the a first hash value corresponding to each segment data of the data block, and a second hash value corresponding to each segment data of the data block stored by the destination end; according to the first hash value and the The second hash value determines whether each segment data of the data block stored at the source end is consistent with each segment data of the data block stored by the destination end.

Optionally, determining, according to the first hash value and the second hash value, each segment data of the data block stored by the source end and each segment of the data block stored by the destination end Whether the segment data is consistent or not includes: determining whether the first hash value is the same as the second hash value; if the determination result is yes, determining segment data and destination of the data block stored by the source end The segment data of the data block stored at the end is consistent; and/or, if the determination result is no, determining that the segment data of the data block stored at the source end corresponds to the segment data of the data block stored at the destination end Inconsistent.

Optionally, if the determination result is no, sending the one or more segment data of the source end that has the difference to the destination end for incremental backup comprises: calculating the first hash value according to the Generate source segmentation a first hash list corresponding to the data, and a second hash list corresponding to each segment data of the destination end according to the calculated second hash value; one or more segment data of the source end where there is a difference The storage is sent to the destination end in the first hash table for incremental backup.

According to another aspect of the present invention, a data incremental backup apparatus includes: a data segmentation module configured to segment a data block; and a first determining module configured to determine the source storage Whether each segment data of the data block is consistent with each segment data of the data block stored at the destination end; and the sending module is configured to have a difference if the determination result of the determining module is negative. One or more segment data of the source end is sent to the destination end for incremental backup.

Optionally, the method further includes: determining, by the second determining module, whether the backup attribute information of the data block in the source end and the destination end is the same, wherein the backup attribute information is used to identify a backup of the data block The data segmentation module is further configured to segment the data block for data incremental backup if the determination result of the second determining module is negative.

Optionally, the first determining module includes: a first acquiring unit, configured to acquire a first hash value corresponding to each segment data of the data block stored by the source end; and the a second hash value corresponding to each segment data of the data block; the determining unit is configured to determine, according to the first hash value and the second hash value, each segment of the data block stored by the source end Whether the segment data is consistent with each segment data of the data block stored at the destination end.

Optionally, the determining unit includes: a determining subunit, configured to determine whether the first hash value and the second hash value are the same; the first determining subunit is set as a determining result in the determining subunit In the case of YES, it is determined that the segment data of the data block stored at the source end is consistent with the segment data of the data block stored at the destination end; and/or the second determining subunit is set to be in the determining If the judgment result of the unit is no, it is determined that the segment data of the data block stored at the source end is inconsistent with the segment data of the data block stored at the destination end.

Optionally, the sending module includes: a second acquiring unit, configured to acquire a first hash list corresponding to each segment data of the source end according to the first hash value, and generate a destination according to the second hash value a second hash list corresponding to each segment data; a sending unit, configured to store one or more segment data of the source end having a difference in the first hash table, and send the data to the destination end Incremental backup.

According to still another aspect of the present invention, a network connection storage NAS device is provided, the NAS supporting a network data management protocol NDMP, including the apparatus of any of the above.

By segmenting the data block; determining whether each segment data of the data block stored at the source end is consistent with each segment data of the data block stored at the destination end; if the determination result is negative, there will be a source of the difference One end One or more segmented data is sent to the destination for incremental backup, which solves the problem that the data backup function in the related art cannot perform reliable data incremental backup based on NDMP, and the backup server has a heavy burden, thereby achieving NDMP-based Perform incremental data backups, reduce backup server load, and reduce bandwidth transfer.

DRAWINGS

The drawings described herein are intended to provide a further understanding of the invention, and are intended to be a part of the invention. In the drawing:

1 is a flow chart of a data incremental backup method according to an embodiment of the present invention;

2 is a block diagram showing the structure of a data incremental backup apparatus according to an embodiment of the present invention;

3 is a block diagram 1 of a preferred structure of a data incremental backup device according to an embodiment of the present invention;

4 is a block diagram showing a preferred structure of the first determining module 24 in the data incremental backup device according to an embodiment of the present invention;

FIG. 5 is a block diagram showing a preferred structure of the determining unit 44 in the first determining module 24 in the data incremental backup device according to the embodiment of the present invention;

6 is a block diagram showing a preferred structure of a transmit module 26 in a data incremental backup device according to an embodiment of the present invention;

7 is a structural block diagram of a network connection storage NAS device according to an embodiment of the present invention;

8 is a schematic diagram of an NDMP operation mode in accordance with a preferred embodiment of the present invention;

9 is a schematic diagram of NDMP data flow command interaction in accordance with a preferred embodiment of the present invention.

detailed description

The invention will be described in detail below with reference to the drawings in conjunction with the embodiments. It should be noted that the embodiments in the present application and the features in the embodiments may be combined with each other without conflict.

In this embodiment, a data incremental backup method is provided. FIG. 1 is a flowchart of a data incremental backup method according to an embodiment of the present invention. As shown in FIG. 1 , the process includes:

Step S102, segmenting the data block;

Step S104, determining whether each segment data of the data block stored at the source end is consistent with each segment data of the data block stored at the destination end;

Step S106: If the determination result is no, one or more segment data of the source end having the difference is sent to the destination end for incremental backup.

By segmenting the data blocks, according to the method of segmentation comparison, the differential backup is performed according to the segmentation data, and the data backup function in the related art cannot be reliably performed, and the data backup function cannot be performed based on NDMP. The problem is heavy, which in turn achieves incremental data backup based on NDMP, and reduces the load on the backup server and reduces the effect of bandwidth transmission.

In order to ensure the correct solution of the backup, before the data block is segmented, it may be determined whether the backup attribute information of the data block in the source end and the destination end is the same, wherein the backup attribute information is used to identify the backup history information of the data block; If the result is no, the data block is segmented for incremental data backup. Through the above processing, the accuracy of the backup is ensured.

When it is determined whether the segment data of the data block stored at the source end and the segment data of the data block stored at the destination end are consistent, a plurality of manners may be adopted, for example, a manner of directly determining whether the segment data is consistent; For example, the following hash value may be optionally used for comparison: first calculating a first hash value corresponding to each segment data of the data block stored at the source end, and calculating each segment data of the data block stored at the destination end Corresponding second hash value; determining, according to the first hash value and the second hash value, whether each segment data of the data block stored at the source end is consistent with each segment data of the data block stored by the destination end.

When determining, according to the first hash value and the second hash value, whether each segment data of the data block stored at the source end is consistent with each segment data of the data block stored by the destination end, the method includes: determining the first hash value Whether the second hash value is the same; if the judgment result is yes, it is determined that the segment data of the data block stored at the source end is consistent with the segment data of the data block stored at the destination end; and/or, in the judgment result In the case of No, it is determined that the segment data of the data block stored at the source end is inconsistent with the segment data of the data block stored at the destination end.

After determining whether the segment data is the same according to the hash value, if the judgment result is no, one or more segment data of the source end having the difference may be sent to the destination end for backup according to the following processing: first according to calculation The first hash value generates a first hash list corresponding to each segment data of the source end, and generates a second hash list corresponding to each segment data of the destination end according to the calculated second hash value; One or more segmented data of the end is stored in the first hash table and sent to the destination for incremental backup. That is to say, according to the form directly according to the table, the corresponding feedback needs to update the segmentation data, which is directly clear.

In this embodiment, a data incremental backup device is also provided, which is used to implement the foregoing embodiments and preferred embodiments, and has not been described again. As used below, the term "module" may implement a combination of software and/or hardware of a predetermined function. Although the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.

2 is a structural block diagram of a data incremental backup apparatus according to an embodiment of the present invention. As shown in FIG. 2, the apparatus includes: a data segmentation module 22, a first determination module 24, and a transmission module 26, and the device is performed below. Description.

The data segmentation module 22 is configured to segment the data block; the first determining module 24 is connected to the data segmentation module 22, and is configured to determine the segment data of the data block stored at the source end and the data stored by the destination end. Whether the segment data of the block corresponds to the consistency; the sending module 26 is connected to the first determining module 24, and is configured to set one or more segment data of the source end having the difference if the determining result of the determining module is negative. Send to the destination for incremental backup.

3 is a block diagram of a preferred structure of a data incremental backup device according to an embodiment of the present invention. As shown in FIG. 3, the device includes a second judging module 32 in addition to all the modules shown in FIG. The device is described.

The second judging module 32 is connected to the data segmentation module 22, and is configured to determine whether the backup attribute information of the data block in the source end and the destination end is the same, wherein the backup attribute information is used to identify the backup history information of the data block; The segment module 22 is further configured to segment the data block for data incremental backup if the determination result of the second determination module is negative.

4 is a block diagram of a preferred structure of the first determining module 24 in the data incremental backup device. As shown in FIG. 4, the first determining module 24 includes: a first obtaining unit 42 and a determining unit 44. The first judging module 24 is described.

The first obtaining unit 42 is configured to obtain a first hash value corresponding to each segment data of the data block stored at the source end, and a second hash value corresponding to each segment data of the data block stored at the destination end; 44, connected to the first obtaining unit 42, configured to determine, according to the first hash value and the second hash value, whether each segment data of the data block stored at the source end and each segment data of the data block stored at the destination end are Corresponding.

FIG. 5 is a block diagram showing a preferred structure of the determining unit 44 in the first determining module 24 of the data incremental backup device according to the embodiment of the present invention. As shown in FIG. 5, the determining unit 44 includes: a determining subunit 52 and a first determining unit. Unit 54 and/or second decision subunit 56, the determination unit 44 will be described below.

The determining subunit 52 is configured to determine whether the first hash value and the second hash value are the same; the first determining subunit 54 is configured to determine the data stored at the source end if the judgment result of the determining subunit is YES. The segmentation data of the block is consistent with the segmentation data of the data block stored at the destination; and/or the second determination subunit 56 is configured to determine the source storage when the determination result of the determination subunit is negative. The segmentation data of the data block is inconsistent with the segmentation data of the data block stored at the destination.

FIG. 6 is a block diagram showing a preferred structure of a transmission breaking module 26 in a data incremental backup apparatus according to an embodiment of the present invention. As shown in FIG. 6, the transmitting module 26 includes a second obtaining unit 62 and a transmitting unit 64. Module 26 is described.

The second obtaining unit 62 is configured to: acquire a first hash list corresponding to each segment data of the source end according to the first hash value, and generate a second hash corresponding to each segment data of the destination end according to the second hash value. The sending unit 64 is connected to the second obtaining unit 62, and is configured to store one or more segment data of the source end having the difference in the first hash table and send the data to the destination end for incremental backup.

FIG. 7 is a structural block diagram of a network connection storage NAS device according to an embodiment of the present invention. As shown in FIG. 7, the NAS device 70 supports a network data management protocol NDMP, including the data backup device 72 described in any of the above.

For the above problems in the related art, in this embodiment, reliable data incremental backup is performed based on NDMP, and according to the feature of NDMP file-based backup, considering the mature rsync algorithm mechanism to implement incremental backup of the custom data stream. Features. The solution includes the following processing: for a NAS device that supports the NDMP protocol, the data stream interaction command between the source end and the destination end is defined, so that the source end can obtain the destination end data backup destination data file and data block information, and is based on the rsync algorithm. The changed data block is sent to the destination for incremental backup.

Preferred embodiments of the invention are described below.

8 is a schematic diagram of an NDMP operation mode according to a preferred embodiment of the present invention. As shown in FIG. 8, the storage server A is a source end, the storage server B is a destination end, and the data management software DMA establishes an NDMP protocol control flow with A and B respectively. The control command interaction is performed; after the server backup parameter is set by the control command, the DMA notifies A and B to establish a data flow channel, and starts the backup process.

Table 1 is a NDMP data stream interactive command function table according to a preferred embodiment of the present invention,

Table 1

命令名称Command name	功能Features	备注Remarks
命令名称Command name	功能Features	备注Remarks	NDMP_GET_ATTR_EXNDMP_GET_ATTR_EX	获取指定文件属性信息Get the specified file attribute information	文件不存在返回空File does not exist empty
NDMP_GET_HASH_EXNDMP_GET_HASH_EX	获取指定分段哈希值表Get the specified segmentation hash table	指定块长度和分段个数Specify block length and number of segments	NDMP_GET_ATTR_EXNDMP_GET_ATTR_EX	获取指定文件属性信息Get the specified file attribute information	文件不存在返回空File does not exist empty

NDMP_SET_HASH_EX

Set the specified segmentation hash table

Attach segmentation data that needs to be updated

As shown in Table 1, the custom data stream command between A and B, NDMP_GET_ATTR_EX is used to obtain the attribute information of the file to be backed up from the peer host. If the attribute returns null, the file has no backup history; NDMP_GET_HASH_EX is used to obtain the file from the peer host. The backup file data block segmentation hash value; NDMP_SET_HASH_EX is used to update the peer data block segmentation data.

Table 2 is an NDMP hash table element field description table in accordance with a preferred embodiment of the present invention,

Table 2

字段名称Field Name	类型Types of	说明Description
字段名称Field Name	类型Types of	说明Description	idId	uint32_tUint32_t	分段编号Segment number
offsetOffset	uint64_tUint64_t	分段偏移Segment offset	idId	uint32_tUint32_t	分段编号Segment number
offsetOffset	uint64_tUint64_t	分段偏移Segment offset	lengthLength	uint32_tUint32_t	分段长度Segment length
md5_sumMd5_sum	uint32_tUint32_t	分段弱哈希值Segmented weak hash	lengthLength	uint32_tUint32_t	分段长度Segment length
md5_sumMd5_sum	uint32_tUint32_t	分段弱哈希值Segmented weak hash	sha1_sumSha1_sum	uint32_tUint32_t	分段强哈希值Piecewise strong hash
dataData	BYTEBYTE	分段数据Segmentation data	sha1_sumSha1_sum	uint32_tUint32_t	分段强哈希值Piecewise strong hash

As shown in Table 2, each element in the table records a data segmentation information, including the segment ID number, the data block start offset and length, the data block 128-bit strong, the weak hash value, and the update peer. Data data.

FIG. 9 is a schematic diagram of NDMP data flow command interaction according to a preferred embodiment of the present invention. As shown in FIG. 9 , when source A backs up a specified file, first sends a file attribute query request to destination B, according to attributes and local files fed back by B. Attribute comparison, if the file does not change the attribute, the file data block is not sent; if the file attribute is changed, the file data block is segmented, each data block hash value is calculated and stored in the table, and the data block is sent at the same time. The segment hash requests to B, and the hash table according to the B feedback is compared with the local file data block hash table, and the data block with the same hash value is marked in the hash table, the data is not empty, and the data blocks with different hash values are different. The source data block content is added to the hash table, and B receives the hash table of the updated data block content to perform the destination data block update.

By implementing the incremental backup technology based on the rsync algorithm for the NDMP data stream, 50% of the bandwidth usage can be effectively saved, and the load of the server backup task is greatly reduced.

It will be apparent to those skilled in the art that the various modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein. The steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module. Thus, the invention is not limited to any specific combination of hardware and software.

The above description is only the preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Industrial applicability

As described above, the above embodiments and preferred embodiments solve the problem that the data backup function in the related art cannot perform reliable data incremental backup based on NDMP, and the backup server has a heavy burden, thereby achieving data increase based on NDMP. Volume backup, and reduce the backup server load, reducing the effect of bandwidth transmission.

Claims

A data incremental backup method, including:

Segment the data block;

Determining whether each segment data of the data block stored at the source end is consistent with each segment data of the data block stored at the destination end;

If the determination result is no, one or more segment data of the source end having the difference is sent to the destination end for incremental backup.
The method of claim 1 further comprising: before segmenting said data block, further comprising:

Determining whether the backup attribute information of the data block in the source end and the destination end is the same, wherein the backup attribute information is used to identify backup history information of the data block;

In the case where the determination result is no, the data block is segmented for data incremental backup.
The method of claim 1, wherein determining whether the segment data of the data block stored by the source end and the segment data of the data block stored by the destination end are consistently comprises:

Acquiring a first hash value corresponding to each segment data of the data block stored by the source end, and a second hash value corresponding to each segment data of the data block stored by the destination end;

Determining, according to the obtained first hash value and the second hash value, each segment data of the data block stored by the source end and each segment data of the data block stored by the destination end Whether it corresponds to the same.
The method according to claim 3, wherein, according to the first hash value and the second hash value, each segment data of the data block stored at the source end is stored with the destination end Whether the pieces of data of the data block correspond to each other consistently includes:

Determining whether the first hash value is the same as the second hash value;

If the determination result is yes, determining that the segment data of the data block stored at the source end is consistent with the segment data of the data block stored by the destination end; and/or, if the determination result is negative, determining The segment data of the data block stored at the source end is inconsistent with the segment data of the data block stored at the destination end.
The method according to claim 3, wherein, if the determination result is negative, transmitting the one or more segment data of the source end having the difference to the destination end for incremental backup comprises:

Acquiring a first hash list corresponding to each segment data of the source end generated according to the first hash value, and a second hash list corresponding to each segment data of the destination end generated according to the second hash value;

One or more segment data of the source end having a difference is stored in the first hash table and sent to the destination for incremental backup.
A data incremental backup device comprising:

a data segmentation module, configured to segment the data block;

a first judging module, configured to determine whether each segment data of the data block stored at the source end is consistent with each segment data of the data block stored at the destination end;

The sending module is configured to send, when the determining result of the determining module is negative, one or more segment data of the source end having a difference to the destination end for incremental backup.
The apparatus of claim 6 further comprising:

a second determining module, configured to determine whether the backup attribute information of the data block in the source end and the destination end is the same, wherein the backup attribute information is used to identify backup history information of the data block;

The data segmentation module is further configured to segment the data block for data incremental backup if the determination result of the second determination module is negative.
The apparatus of claim 7, wherein the first determining module comprises:

a first acquiring unit, configured to acquire a first hash value corresponding to each segment data of the data block stored by the source end, and a corresponding to each segment data of the data block stored by the destination end Two hash values;

a determining unit, configured to determine, according to the first hash value and the second hash value, each segment data of the data block stored by the source end and each of the data blocks stored by the destination end Whether the segmentation data corresponds to the same.
The apparatus according to claim 8, wherein said determining unit comprises:

a determining subunit, configured to determine whether the first hash value is the same as the second hash value;

a first determining subunit, configured to determine, in a case that the determining result of the determining subunit is YES, determining that the segment data of the data block stored by the source end is consistent with the segment data of the data block stored by the destination end; And/or the second determining subunit is set to be a case where the judgment result of the judging subunit is no And determining that the segment data of the data block stored by the source end is inconsistent with the segment data of the data block stored by the destination end.
The apparatus of claim 8 wherein said transmitting module comprises:

a second acquiring unit, configured to acquire a first hash list corresponding to each segment data of the source end according to the first hash value, and generate a corresponding number corresponding to each segment data of the destination end according to the second hash value Two hash list;

And a sending unit, configured to store one or more segment data of the source end with a difference in the first hash table and send the data to the destination end for incremental backup.
A network connection storage NAS device supporting a network data management protocol NDMP, comprising the apparatus of any one of claims 6 to 10.