WO2012065408A1

WO2012065408A1 - Disaster tolerance data backup method and system

Info

Publication number: WO2012065408A1
Application number: PCT/CN2011/073780
Authority: WO
Inventors: 赵巍
Original assignee: 中兴通讯股份有限公司
Priority date: 2010-11-17
Filing date: 2011-05-06
Publication date: 2012-05-24
Also published as: CN101989929A; CN101989929B

Abstract

The present invention discloses a disaster tolerance data backup method and system, and belongs to the field of network management. The method includes: receiving a data file to be backed up, which is transmitted by a network management system (101); segmenting the data file to be backed up to obtain segmented data blocks (102); by utilizing a weak check value hash algorithm and a strong check value hash algorithm, calculating the data fingerprint value of the current data block and searching whether a target data block with the same data fingerprint value is in the data file which has been backed up (103); if yes, comparing the target data block with the current data block byte by byte (104); backing up the current data block according to the comparison result (105). The system includes a reception module, a segmentation module, a calculation and search module, a comparison module and a backup module. The technical solution of the present invention can improve the applicability of a data backup file, reduce the occupation of storage space and improve system performance.

Description

Method and system for disaster tolerance data * Technical field

The invention belongs to a network management system, and particularly relates to a method and system for backing up disaster recovery data of a network management system based on deduplication. Background technique

The network management system is a system for managing the network elements of the communication network. The configuration data of the network elements of the entire network is configured. The configuration data of these network elements is very important. If the configuration data is not available, the network element cannot run the service normally. Based on disaster tolerance considerations, configuration data needs to be backed up offsite. Once the network management system is damaged by earthquakes, fires, etc., you can restore the configuration data of the offsite backup to ensure that the network element can run the service normally. In general, offsite backups of configuration data require a backup once a day.

The existing off-site backup technology for disaster recovery data simply exports the configuration data to a file, the file is named by date, and then the file is copied to the remote backup system, but this will cause data redundancy. There are also other network management systems that take into account the processing of redundant data for backup data. The specific processing methods are as follows:

Export the configuration data to generate a text file to record the specific configuration data of each network element. When the generated text file is copied to the remote backup system, the backup system compares the current configuration data with the configuration data saved yesterday, and extracts the configuration data of the changed network element and saves it to the current backup file. The configuration data of the changed network element is not saved.

There are obvious defects in this approach: There are strict requirements for the files backed up by the network management system. The network management system and the backup system must comply with the same file format requirements in order to achieve data backup and subsequent recovery. It cannot be applied to all network management systems, and the applicability is poor. In addition, it is required that the backup file must be a text file, and the text file cannot be compressed, occupying a large storage space, and transmitting uncompressed text. This document occupies network bandwidth and consumes a lot of system resources, which affects system performance. Summary of the invention

In order to improve the applicability of data backup files, reduce storage space occupation, and improve system performance, the present invention provides a method and system for disaster recovery data backup.

In order to solve the above technical problem, the technical solution of the present invention is implemented as follows:

A method for backing up disaster recovery data includes: receiving a data file to be backed up sent by the network management system; dividing the data file to be backed up to obtain a divided data block; using a weak check value hash algorithm and a strong check value a hash algorithm, calculating a data fingerprint value for the current data block, and searching for a target data block having the same data fingerprint value in the backed up data file; if yes, performing the target data block with the current data block Byte-by-byte comparison; the backup of the current data block is performed according to the comparison result.

The weak check value hash algorithm and the strong check value hash algorithm are used to calculate a data fingerprint value for the current data block, and to find a target data block having the same data fingerprint value in the backed up data file, including: Calculating a first data fingerprint value for the current data block by using a weak parity value hash algorithm, and searching for the target of the first data fingerprint value in the backed up data file by using the first data fingerprint value a data block, if yes, calculating a second data fingerprint value for the current data block by using a strong check value hash algorithm, and searching for the target data of the same second data fingerprint value in the backed up data file Piece.

The performing the backup of the current data block according to the comparison result, including: determining, when the comparison result is the same, determining that the current data block is a duplicate data block, and storing logical index information of the current data block; when the comparison result is different And determining that the current data block is a new unique data block, and storing meta information of the current data block.

The method further includes: if the target data block having the same data fingerprint value is not found in the backed up data file, storing the meta information of the current data block.

The data to be backed up is divided into: the number of the to-be-backed according to the number of network elements Split according to the file.

A system for backing up disaster recovery data, including:

a receiving module, configured to receive a data file to be backed up sent by the network management system;

a segmentation module, configured to segment the data file to be backed up to obtain a divided data block;

The calculation and search module is configured to calculate a data fingerprint value for the current data block by using a weak check value hash algorithm and a strong check value hash algorithm, and find a target of the same data fingerprint value in the backed up data file. data block;

a comparison module, configured to compare the target data block with the current data block byte by byte when the target data block having the same data fingerprint value is found in the backed up data file; the backup module is set to be based on the The comparison result of the comparison module performs backup of the current data block.

The calculating and searching module is specifically configured to: calculate a first data fingerprint value for the current data block by using a weak check value hash algorithm, and use the first data fingerprint value in the backed up data file Finding whether there is a target data block having the same first data fingerprint value, and if so, calculating a second data fingerprint value for the current data block by using a strong check value hash algorithm, and in the backed up data file Finding a target data block of the same second data fingerprint value.

The comparing module is specifically configured to: when the comparison result is the same, determining that the current data block is a duplicate data block, storing logical index information of the current data block; and when the comparison result is different, determining the current data A block is a new unique data block that stores meta information of the current data block.

The backup module is further configured to: if the target data block having the same data fingerprint value is not found in the backed up data file, store the meta information of the current data block.

The meta information of the current data block includes: a current data block, logical index information of the current data block, a weak check value of the current data block, and a strong check value. The invention divides the received data file to be backed up, and then uses the weak check value hash algorithm and the strong check value hash algorithm to calculate the data fingerprint value of the divided data block, and the data fingerprint value is used as a keyword. Hash search, after finding the target data block with the same data fingerprint value, compare the target data block with the current data block byte by byte, and perform data backup according to the comparison result, which can realize data files of various formats. Backups improve the applicability of backup files. Deduplication can be performed in real time, which can effectively control the rapid growth of backup data, thereby increasing the effective storage space and improving storage efficiency. Moreover, backup files can be compressed and network bandwidth can be reduced. Occupy, improve system performance. DRAWINGS

The drawings are intended to provide a further understanding of the invention, and are intended to be a part of the invention. In the drawing:

1 is a schematic flowchart of a method for backing up disaster recovery data provided by the present invention;

2 is a detailed flow chart of a method for backing up remote disaster recovery data of a network management system provided by the present invention;

FIG. 3 is a schematic structural diagram of a system for backing up disaster recovery data provided by the present invention. detailed description

The present invention will be further described in detail below with reference to the accompanying drawings and embodiments in order to make the present invention. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in FIG. 1 , the present invention provides a method for backing up disaster recovery data, including: Step 101: Receive a data file to be backed up sent by a network management system;

Step 102: Perform a segmentation on the backup data file to obtain a divided data block.

Step 103: Using a weak check value hash algorithm and a strong check value hash algorithm, for current data The block calculates its data fingerprint value, and searches for the target data block having the same data fingerprint value in the backed up data file;

Step 104: If yes, compare the target data block with the current data block byte by byte; Step 105: Perform backup of the current data block according to the comparison result.

In a preferred embodiment of the present invention, if a target data block having the same data fingerprint value is not found in the backed up data file, the meta information of the current data block is stored.

In a preferred embodiment of the present invention, performing the backup of the current data block according to the comparison result, including: when the comparison result is the same, determining that the current data block is a duplicate data block, and storing logical index information of the current data block; When the comparison result is different, it is determined that the current data block is a new unique data block, and the meta information of the current data block is stored.

In a preferred embodiment of the present invention, the weak fingerprint value hash algorithm and the strong check value hash algorithm are used to calculate the data fingerprint value for the current data block, and to find out whether the same data fingerprint exists in the backed up data file. The target data block for the value, including:

First, the first data fingerprint value is calculated for the current data block by using the weak parity value hash algorithm, and the target data block having the same first data fingerprint value is searched in the backed up data file by using the first data fingerprint value. If so, the second data fingerprint value is calculated for the current data block using the strong check value hash algorithm, and the target data block of the same second data fingerprint value is searched for in the backed up data file.

In a preferred embodiment of the present invention, the backup data file is divided according to the number of network elements to obtain a divided data block.

In a preferred embodiment of the invention, the meta-information includes a current data block, logical index information of the current data block, a weak check value of the current data block, and a strong check value.

The implementation process of the present invention will be described in detail below with reference to the accompanying drawings.

As shown in FIG. 2, the method for backing up the remote disaster recovery data of the network management system provided by the present invention includes: Step 201: The network management system exports the data file to be backed up configured by the network element, and the data file is Transfer to an offsite backup system.

Wherein, the data file can be any format file of text or binary.

Step 202: The backup system receives the data file to be backed up, and divides the data file to be backed up into a group of data blocks according to the number of network elements.

Specifically, the backup data file is segmented with a pre-determined data block size. The data block size can be determined according to the number of network elements. The configuration data file of 10000 network elements can be 300MB. If the data block size of the split file is too fine, the system resource overhead is too large; if the granularity is too coarse, the effect of deduplication is not good. It is necessary to weigh the trade-off between the two. According to the test, the following empirical values are obtained: the number of network elements is less than 1000, the data block size can be 1 KB, the number of network elements is between 1000 and 5000, and the data block size can be 4 KB; The number is between 5000 and 10000, and the block size can be 8KB.

After the backup data file is divided, the backup system allocates unique data block logical index information to each data block, and the logical index information may be a logical index number.

Step 203: The backup system calculates a data fingerprint for the current data block, and performs a hash search in the backed up data file with the data fingerprint value as a keyword to obtain a target data block with the same data fingerprint.

Specifically, the data fingerprint is an essential feature of the data block, each unique data block has a unique data fingerprint value, and the data fingerprint value is obtained by performing a hash operation on the data block content. Commonly used Hash algorithms are FNV1, CRC, MD5, SHA1, SHA-256, SHA-512, and so on. Different Hash algorithms have different collision probability (Hash algorithm has collision problem, that is, different data blocks may produce the same data fingerprint), and the calculated data fingerprint value has different digits, and the corresponding calculation amount is also different. A Hash algorithm with a lower probability of collision occurrence and more data fingerprint value bits is much more computationally intensive.

Calculating the data fingerprint of the data block requires trade-offs between performance and data security. The CRC Hash algorithm is a weak check value hash algorithm, which is fast. In this embodiment, the data fingerprint value calculated by the CRC Hash algorithm is 32 bits. The MD5 Hash algorithm is a strong checksum hash algorithm. There is a very low probability of collision occurrence, and the calculated data fingerprint value is 128 bits. Among them, the strong check value hash algorithm and the weak check value hash algorithm are usually based on 128 bits as a distinguishing standard. Below 128 bits, it belongs to a weak check value hash algorithm, and higher than 128 bits belong to a strong check value. Hash algorithm. The backup system uses the CRC Hash algorithm and the MD5 Hash algorithm to calculate a data fingerprint for the data block, as follows: For each data block that is divided in step 202, the CRC check value is calculated by using the CRC Hash algorithm, and then the CRC check is performed. The value is a hash search in the backed up data file, and it is determined whether there is a matching item with the same CRC check value. If not, it indicates that the data block is a new unique data block, and the data block is stored at this time. The logical index number of the data block and the CRC check value and the MD5 check value of the data block, wherein the MD5 check value is calculated by the MD5 Hash algorithm; if present, the MD5 Hash algorithm is used to calculate the MD5 check of the data block. Value, and perform a hash search in the backed up data file with the MD5 check value to determine whether there is a match with the MD5 check value, and if so, it is determined that there may be duplicate data blocks, and the process proceeds to the step 204; If not, store the data block and create related meta information.

In general, the MD5 Hash algorithm does not generate collisions. The MD5 checksum calculated for a block is unique. That is, a block corresponds to a unique data fingerprint value, which can be mapped with 1:1. Said. The element items of a traditional Hash table are represented by a two-group:

<md5_hashkey, block>, where md5_hashkey represents the MD5 checksum of the data block. However, in actual situations, there may be the same MD5 check value of two data blocks. For example, the MD5 check value calculated for data block 1 is equal to the MD5 check value calculated for data block 2, and multiple data blocks correspond to one data fingerprint. The value, then you need to use the l:n map to represent. In the present invention, a triplet is used to represent an element item of a Hash table:

<md5_hashkey, block_nr, block_IDs>

Where md5_hashkey represents the check value of the data block MD5, block_nr represents the number of data blocks with the same MD5 check value, and block_IDs represents the logical index number of these data blocks. In the algorithm design of the present invention, block_nr and block_IDs are combined to form a linked list, and the structure is as follows: block_nr I block_ID 1 I block_ID2 I ... I block_IDn Where block_ID1 I block_ID2 I ... I block_IDn is a data block logical index number linked list, and the block_ID list is used to represent the data block logical index number linked list, and block_nr is the length of the linked list.

In the present invention, the linked list method is actually used to solve the Hash collision problem, and the block_ID list length of the element items of each hash table is not fixed. Use the linked list method to find the target data block with the same MD5 check value in the backed up data file as follows:

(1) Calculate the MD5 check value md5_hashkey of the current data block, ie md5_hashkey=hash_md5 (block);

(2) Use md5_hashkey to find the Hash table of the backed up data file, bindex=hash_value(hashkey, Hash table), where bindex represents the index number of the current data block;

(3) If no matching element is found in the Hash table, ie bindex==NULL, the md5_hashkey is directly inserted into the Hash table, and block_nr=l , block_IDl=the logical index number of the current data block;

(4) If a matching element item is found in the Hash table, it may be determined that the data block may be a duplicate data block, or a CRC Hash algorithm or an MD5 Hash algorithm may collide, and the data block is not a duplicate data block.

In this step, the first data fingerprint value of the current data block is calculated by using the weak check value algorithm, and then the data fingerprint value is searched; then the second data fingerprint value of the current data block is calculated by using the strong check value algorithm, and then Perform data fingerprint value search; in practical applications, you can also use the strong check value algorithm to calculate the second data fingerprint value of the current data block, and then perform data fingerprint value search; then use the weak check value algorithm to calculate the current data block. The first data fingerprint value, and then the data fingerprint value search, the specific principle is similar, and will not be described here.

Step 204: The backup system compares the acquired target data block with the current data block by byte level. If the comparison result is the same, the process proceeds to step 205. If the comparison result is different, the process proceeds to step 206. Step 205: Determine that the current data block is a duplicate data block, and store a logical index number of the current data block.

Step 206: Determine that the current data block is a new unique block, and store meta information of the current data block, where the meta information includes: the current data block, a logical index number of the current data block, a CRC check value, and an MD5 check value. .

For the steps 204-206, the process proceeds to step 203, after the same match is found, the block_ID list of the matching element item is traversed, and the target data block corresponding to the logical index number of each data block in the block_ID list is verbatim with the current data block. The comparison of the sections, if they are the same, indicates that the current data block already exists, and stores the logical index number of the current data block; if the same data block is not found, the data block is stored, the current data block is inserted into the end of the block_ID list, and the current The data block is written to the file, and the block_nr value is incremented by 1, block_IDn = the logical index number of the current data block. The data storage of the present invention can adopt the RAID 5 mode.

At this point, a data file is represented by a logical file in the backup system, and consists of meta information consisting of a set of data fingerprints.

After the data file backup of the NE configuration is completed, in the case that recovery is required, the backup system performs file reading, first reads the logical file, and then according to the data block fingerprint, takes out the corresponding data block, restores the physical file copy, and then A copy of the file is sent to the network management system for recovery.

As shown in FIG. 3, the present invention provides a system for backing up disaster recovery data, including: a receiving module 301, configured to receive a data file to be backed up sent by the network management system;

The segmentation module 302 is configured to divide the data file to be backed up to obtain a divided data block, and the calculation and search module 303 is configured to calculate the current data block by using a weak check value hash algorithm and a strong check value hash algorithm. Data fingerprint value, and find in the backed up data file whether there is a target data block with the same data fingerprint value;

The comparing module 304 is configured to compare the target data block with the current data block byte by byte when the target data block having the same data fingerprint value is found in the backed up data file; The backup module 305 is configured to perform backup of the current data block according to the comparison result of the comparison module 304.

In a preferred embodiment of the present invention, the backup module 305 is further configured to store meta information of the current data block if the target data block having the same data fingerprint value is not found in the backed up data file.

In a preferred embodiment of the present invention, the comparing module 304 is specifically configured to: when the comparison result is the same, determine that the current data block is a duplicate data block, and store logical index information of the current data block; when the comparison result is different, determine The current data block is a new unique data block that stores the meta information of the current data block.

In a preferred embodiment of the present invention, the calculation and search module 303 is specifically configured to first calculate a first data fingerprint value for the current data block by using a weak check value hash algorithm, and back up the first data fingerprint value. Finding, in the data file, whether there is a target data block having the same first data fingerprint value, and if so, calculating a second data fingerprint value for the current data block by using a strong check value hash algorithm, and in the backed up data file Finding a target data block of the same second data fingerprint value.

In a preferred embodiment of the present invention, the segmentation module 302 is specifically configured to segment the data file to be backed up according to the number of network elements to obtain a divided data block.

The existing remote disaster recovery data backup method is based on a text file, and is compared according to the text content, and the duplicate data is deleted. The backup of the existing disaster recovery data has an 80% data repetition rate, but the text comparison deletes the redundancy method and does not reach the 80% deduplication rate. Moreover, the inability of the binary file is an example, which limits the format of the backup file and has poor applicability. The solution for disaster recovery data backup provided by the present invention can be applied to backup files in various formats, such as text files, binary database files, and the like. Since the redundant data is deleted based on a comparison of a few KB of binary data blocks, the deduplication rate is high, which is close to 80%. Moreover, the more times the data is backed up, the shorter the interval, and the higher the deduplication rate. In the prior art, the segmentation method of the data file is either a single block strategy, or the block boundary feature calculation is performed in advance according to the file content type, and then the block is divided, but the actual use of the network management system configuration data, the deduplication rate is not high. . The solution for backing up the disaster recovery data provided by the present invention, for the specific file format of the data file configured by the network management system, and the periodic backup of the configuration data, the data block size is determined according to the number of network elements, and the backup data is effectively improved. Deduplication rate.

In the prior art, a Hash algorithm is used to calculate a data fingerprint. The method for backing up disaster recovery data provided by the present invention uses a weak check value hash algorithm and a strong check value hash algorithm to calculate a data fingerprint, and the calculation speed is calculated. Fast, the probability of collision is greatly reduced at a lower performance cost, and system performance is improved. In addition, the combination of the weak check value hash algorithm and the strong check value hash algorithm can delete the duplicate data in real time. After receiving the data file of the network management system, the backup system can delete the duplicate data online, and convert the cost. The local logical files are stored and do not need to wait until the subsequent system is idle before processing offline. The entire processing cycle is short, and it can cope with off-site backup operations that require urgent and short intervals.

The existing disaster recovery data backup method only uses the Hash algorithm with a smaller collision probability when deleting the duplicate data, and does not solve the collision problem of the Hash algorithm, so it cannot be used in the application of the network management system for remote disaster recovery data backup. A collision will result in huge economic losses. The solution for disaster recovery data backup provided by the present invention solves the collision problem by traversing all data blocks with the same data fingerprint value and performing complete byte comparison, so that the data security of the network management system is greatly improved, and can be applied to the network management system configuration data. Offsite disaster recovery data backup is a very high demand for data security.

In summary, the backup solution of the disaster recovery data provided by the present invention can be applied to various data file formats, and has strong applicability; it can effectively control the rapid growth of backup data, thereby increasing effective storage space, improving storage efficiency, and saving. Total storage cost and management cost; backup files can be compressed, saving network bandwidth for data transmission; saving space, power supply, cooling Waiting for operation and maintenance costs.

The above description shows and describes a preferred embodiment of the present invention, but as described above, it should be understood that the present invention is not limited to the forms disclosed herein, and should not be construed as Other combinations, modifications, and environments are possible and can be modified by the teachings or related art or knowledge within the scope of the inventive concept described herein. The modifications and variations made by those skilled in the art are intended to be within the scope of the appended claims.

Claims

Claim

1. A method for backing up disaster recovery data, including:

Receiving data files to be backed up sent by the network management system;

Segmenting the data file to be backed up to obtain a divided data block;

Using the weak check value hash algorithm and the strong check value hash algorithm, the data fingerprint value is calculated for the current data block, and the target data block having the same data fingerprint value is searched in the backed up data file;

If yes, the target data block is compared byte by byte with the current data block; and the current data block is backed up according to the comparison result.

2. The method according to claim 1, wherein the using the weak check value hash algorithm and the strong check value hash algorithm, calculating a data fingerprint value for the current data block, and searching in the backed up data file Whether there is a target data block with the same data fingerprint value, including:

Calculating a first data fingerprint value for the current data block by using a weak parity value hash algorithm, and searching for the target of the first data fingerprint value in the backed up data file by using the first data fingerprint value a data block, if yes, calculating a second data fingerprint value for the current data block by using a strong check value hash algorithm, and searching for the target data of the same second data fingerprint value in the backed up data file Piece.

3. The method according to claim 1, wherein the performing the backup of the current data block according to the comparison result comprises:

When the comparison result is the same, determining that the current data block is a duplicate data block, and storing logical index information of the current data block;

When the comparison result is different, it is determined that the current data block is a new unique data block, and the meta information of the current data block is stored.

4. The method according to claim 1, wherein the method further comprises: storing the current data if a target data block having the same data fingerprint value is not found in the backed up data file Meta information of the block.

The method according to any one of claims 1 to 4, wherein the data to be backed up is divided into: the data file to be backed up is divided according to the number of network elements.

The method according to claim 3 or 4, wherein the meta information of the current data block comprises: a current data block, logical index information of a current data block, a weak check value of a current data block, and a strong check value. .

7. A system for backing up disaster recovery data, including:

The system according to claim 7, wherein the calculating and searching module is specifically configured to: calculate a first data fingerprint value for the current data block by using a weak check value hash algorithm, and Determining, in the backed up data file, whether there is a target data block having the same first data fingerprint value in the backed up data file, and if so, calculating a first data block for the current data block by using a strong check value hash algorithm And a data fingerprint value, and searching for the target data block of the same second data fingerprint value in the backed up data file.

The system according to claim 7, wherein the comparing module is specifically configured to: when the comparison result is the same, determining that the current data block is a duplicate data block, and storing the Logical index information of the current data block;

10. The system according to claim 7, wherein the backup module is further configured to: if a target data block having the same data fingerprint value is not found in the backed up data file, storing the current data block Meta information.

The system according to claim 9 or 10, wherein the meta information of the current data block comprises: a current data block, logical index information of a current data block, a weak check value of a current data block, and a strong check value. .