WO2012065408A1 - Disaster tolerance data backup method and system - Google Patents

Disaster tolerance data backup method and system Download PDF

Info

Publication number
WO2012065408A1
WO2012065408A1 PCT/CN2011/073780 CN2011073780W WO2012065408A1 WO 2012065408 A1 WO2012065408 A1 WO 2012065408A1 CN 2011073780 W CN2011073780 W CN 2011073780W WO 2012065408 A1 WO2012065408 A1 WO 2012065408A1
Authority
WO
WIPO (PCT)
Prior art keywords
data block
data
backed
block
current data
Prior art date
Application number
PCT/CN2011/073780
Other languages
French (fr)
Chinese (zh)
Inventor
赵巍
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2012065408A1 publication Critical patent/WO2012065408A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/085Retrieval of network configuration; Tracking network configuration history
    • H04L41/0853Retrieval of network configuration; Tracking network configuration history by actively collecting configuration information or by backing up configuration information
    • H04L41/0856Retrieval of network configuration; Tracking network configuration history by actively collecting configuration information or by backing up configuration information by backing up or archiving configuration information

Definitions

  • the invention belongs to a network management system, and particularly relates to a method and system for backing up disaster recovery data of a network management system based on deduplication. Background technique
  • the network management system is a system for managing the network elements of the communication network.
  • the configuration data of the network elements of the entire network is configured.
  • the configuration data of these network elements is very important. If the configuration data is not available, the network element cannot run the service normally. Based on disaster tolerance considerations, configuration data needs to be backed up offsite. Once the network management system is damaged by earthquakes, fires, etc., you can restore the configuration data of the offsite backup to ensure that the network element can run the service normally. In general, offsite backups of configuration data require a backup once a day.
  • the existing off-site backup technology for disaster recovery data simply exports the configuration data to a file, the file is named by date, and then the file is copied to the remote backup system, but this will cause data redundancy.
  • the present invention provides a method and system for disaster recovery data backup.
  • a method for backing up disaster recovery data includes: receiving a data file to be backed up sent by the network management system; dividing the data file to be backed up to obtain a divided data block; using a weak check value hash algorithm and a strong check value a hash algorithm, calculating a data fingerprint value for the current data block, and searching for a target data block having the same data fingerprint value in the backed up data file; if yes, performing the target data block with the current data block Byte-by-byte comparison; the backup of the current data block is performed according to the comparison result.
  • the weak check value hash algorithm and the strong check value hash algorithm are used to calculate a data fingerprint value for the current data block, and to find a target data block having the same data fingerprint value in the backed up data file, including: Calculating a first data fingerprint value for the current data block by using a weak parity value hash algorithm, and searching for the target of the first data fingerprint value in the backed up data file by using the first data fingerprint value a data block, if yes, calculating a second data fingerprint value for the current data block by using a strong check value hash algorithm, and searching for the target data of the same second data fingerprint value in the backed up data file Piece.
  • the performing the backup of the current data block according to the comparison result including: determining, when the comparison result is the same, determining that the current data block is a duplicate data block, and storing logical index information of the current data block; when the comparison result is different And determining that the current data block is a new unique data block, and storing meta information of the current data block.
  • the method further includes: if the target data block having the same data fingerprint value is not found in the backed up data file, storing the meta information of the current data block.
  • the data to be backed up is divided into: the number of the to-be-backed according to the number of network elements Split according to the file.
  • a system for backing up disaster recovery data including:
  • a receiving module configured to receive a data file to be backed up sent by the network management system
  • a segmentation module configured to segment the data file to be backed up to obtain a divided data block
  • the calculation and search module is configured to calculate a data fingerprint value for the current data block by using a weak check value hash algorithm and a strong check value hash algorithm, and find a target of the same data fingerprint value in the backed up data file. data block;
  • a comparison module configured to compare the target data block with the current data block byte by byte when the target data block having the same data fingerprint value is found in the backed up data file; the backup module is set to be based on the The comparison result of the comparison module performs backup of the current data block.
  • the calculating and searching module is specifically configured to: calculate a first data fingerprint value for the current data block by using a weak check value hash algorithm, and use the first data fingerprint value in the backed up data file Finding whether there is a target data block having the same first data fingerprint value, and if so, calculating a second data fingerprint value for the current data block by using a strong check value hash algorithm, and in the backed up data file Finding a target data block of the same second data fingerprint value.
  • the comparing module is specifically configured to: when the comparison result is the same, determining that the current data block is a duplicate data block, storing logical index information of the current data block; and when the comparison result is different, determining the current data A block is a new unique data block that stores meta information of the current data block.
  • the backup module is further configured to: if the target data block having the same data fingerprint value is not found in the backed up data file, store the meta information of the current data block.
  • the meta information of the current data block includes: a current data block, logical index information of the current data block, a weak check value of the current data block, and a strong check value.
  • the invention divides the received data file to be backed up, and then uses the weak check value hash algorithm and the strong check value hash algorithm to calculate the data fingerprint value of the divided data block, and the data fingerprint value is used as a keyword.
  • Hash search after finding the target data block with the same data fingerprint value, compare the target data block with the current data block byte by byte, and perform data backup according to the comparison result, which can realize data files of various formats. Backups improve the applicability of backup files. Deduplication can be performed in real time, which can effectively control the rapid growth of backup data, thereby increasing the effective storage space and improving storage efficiency. Moreover, backup files can be compressed and network bandwidth can be reduced. Occupy, improve system performance. DRAWINGS
  • FIG. 1 is a schematic flowchart of a method for backing up disaster recovery data provided by the present invention
  • FIG. 2 is a detailed flow chart of a method for backing up remote disaster recovery data of a network management system provided by the present invention
  • FIG. 3 is a schematic structural diagram of a system for backing up disaster recovery data provided by the present invention. detailed description
  • the present invention provides a method for backing up disaster recovery data, including: Step 101: Receive a data file to be backed up sent by a network management system;
  • Step 102 Perform a segmentation on the backup data file to obtain a divided data block.
  • Step 103 Using a weak check value hash algorithm and a strong check value hash algorithm, for current data The block calculates its data fingerprint value, and searches for the target data block having the same data fingerprint value in the backed up data file;
  • Step 104 If yes, compare the target data block with the current data block byte by byte;
  • Step 105 Perform backup of the current data block according to the comparison result.
  • the meta information of the current data block is stored.
  • performing the backup of the current data block according to the comparison result including: when the comparison result is the same, determining that the current data block is a duplicate data block, and storing logical index information of the current data block; When the comparison result is different, it is determined that the current data block is a new unique data block, and the meta information of the current data block is stored.
  • the weak fingerprint value hash algorithm and the strong check value hash algorithm are used to calculate the data fingerprint value for the current data block, and to find out whether the same data fingerprint exists in the backed up data file.
  • the target data block for the value including:
  • the first data fingerprint value is calculated for the current data block by using the weak parity value hash algorithm, and the target data block having the same first data fingerprint value is searched in the backed up data file by using the first data fingerprint value. If so, the second data fingerprint value is calculated for the current data block using the strong check value hash algorithm, and the target data block of the same second data fingerprint value is searched for in the backed up data file.
  • the backup data file is divided according to the number of network elements to obtain a divided data block.
  • the meta-information includes a current data block, logical index information of the current data block, a weak check value of the current data block, and a strong check value.
  • the method for backing up the remote disaster recovery data of the network management system includes: Step 201: The network management system exports the data file to be backed up configured by the network element, and the data file is Transfer to an offsite backup system.
  • the data file can be any format file of text or binary.
  • Step 202 The backup system receives the data file to be backed up, and divides the data file to be backed up into a group of data blocks according to the number of network elements.
  • the backup data file is segmented with a pre-determined data block size.
  • the data block size can be determined according to the number of network elements.
  • the configuration data file of 10000 network elements can be 300MB. If the data block size of the split file is too fine, the system resource overhead is too large; if the granularity is too coarse, the effect of deduplication is not good. It is necessary to weigh the trade-off between the two. According to the test, the following empirical values are obtained: the number of network elements is less than 1000, the data block size can be 1 KB, the number of network elements is between 1000 and 5000, and the data block size can be 4 KB; The number is between 5000 and 10000, and the block size can be 8KB.
  • the backup system After the backup data file is divided, the backup system allocates unique data block logical index information to each data block, and the logical index information may be a logical index number.
  • the data fingerprint is an essential feature of the data block
  • each unique data block has a unique data fingerprint value
  • the data fingerprint value is obtained by performing a hash operation on the data block content.
  • Commonly used Hash algorithms are FNV1, CRC, MD5, SHA1, SHA-256, SHA-512, and so on.
  • Different Hash algorithms have different collision probability (Hash algorithm has collision problem, that is, different data blocks may produce the same data fingerprint), and the calculated data fingerprint value has different digits, and the corresponding calculation amount is also different.
  • a Hash algorithm with a lower probability of collision occurrence and more data fingerprint value bits is much more computationally intensive.
  • the CRC Hash algorithm is a weak check value hash algorithm, which is fast.
  • the data fingerprint value calculated by the CRC Hash algorithm is 32 bits.
  • the MD5 Hash algorithm is a strong checksum hash algorithm. There is a very low probability of collision occurrence, and the calculated data fingerprint value is 128 bits.
  • the strong check value hash algorithm and the weak check value hash algorithm are usually based on 128 bits as a distinguishing standard. Below 128 bits, it belongs to a weak check value hash algorithm, and higher than 128 bits belong to a strong check value. Hash algorithm.
  • the backup system uses the CRC Hash algorithm and the MD5 Hash algorithm to calculate a data fingerprint for the data block, as follows: For each data block that is divided in step 202, the CRC check value is calculated by using the CRC Hash algorithm, and then the CRC check is performed. The value is a hash search in the backed up data file, and it is determined whether there is a matching item with the same CRC check value. If not, it indicates that the data block is a new unique data block, and the data block is stored at this time.
  • the MD5 Hash algorithm does not generate collisions.
  • the MD5 checksum calculated for a block is unique. That is, a block corresponds to a unique data fingerprint value, which can be mapped with 1:1. Said.
  • the element items of a traditional Hash table are represented by a two-group:
  • ⁇ md5_hashkey, block> where md5_hashkey represents the MD5 checksum of the data block.
  • md5_hashkey represents the MD5 checksum of the data block.
  • the MD5 check value calculated for data block 1 is equal to the MD5 check value calculated for data block 2, and multiple data blocks correspond to one data fingerprint. The value, then you need to use the l:n map to represent.
  • a triplet is used to represent an element item of a Hash table:
  • block_nr represents the number of data blocks with the same MD5 check value
  • block_IDs represents the logical index number of these data blocks.
  • block_nr and block_IDs are combined to form a linked list, and the structure is as follows: block_nr I block_ID 1 I block_ID2 I ... I block_IDn
  • block_ID1 I block_ID2 I ... I block_IDn is a data block logical index number linked list
  • the block_ID list is used to represent the data block logical index number linked list
  • block_nr is the length of the linked list.
  • the linked list method is actually used to solve the Hash collision problem, and the block_ID list length of the element items of each hash table is not fixed. Use the linked list method to find the target data block with the same MD5 check value in the backed up data file as follows:
  • the data block may be a duplicate data block, or a CRC Hash algorithm or an MD5 Hash algorithm may collide, and the data block is not a duplicate data block.
  • the first data fingerprint value of the current data block is calculated by using the weak check value algorithm, and then the data fingerprint value is searched; then the second data fingerprint value of the current data block is calculated by using the strong check value algorithm, and then Perform data fingerprint value search; in practical applications, you can also use the strong check value algorithm to calculate the second data fingerprint value of the current data block, and then perform data fingerprint value search; then use the weak check value algorithm to calculate the current data block.
  • the first data fingerprint value, and then the data fingerprint value search the specific principle is similar, and will not be described here.
  • Step 204 The backup system compares the acquired target data block with the current data block by byte level. If the comparison result is the same, the process proceeds to step 205. If the comparison result is different, the process proceeds to step 206. Step 205: Determine that the current data block is a duplicate data block, and store a logical index number of the current data block.
  • Step 206 Determine that the current data block is a new unique block, and store meta information of the current data block, where the meta information includes: the current data block, a logical index number of the current data block, a CRC check value, and an MD5 check value. .
  • step 203 after the same match is found, the block_ID list of the matching element item is traversed, and the target data block corresponding to the logical index number of each data block in the block_ID list is verbatim with the current data block.
  • the data storage of the present invention can adopt the RAID 5 mode.
  • a data file is represented by a logical file in the backup system, and consists of meta information consisting of a set of data fingerprints.
  • the backup system After the data file backup of the NE configuration is completed, in the case that recovery is required, the backup system performs file reading, first reads the logical file, and then according to the data block fingerprint, takes out the corresponding data block, restores the physical file copy, and then A copy of the file is sent to the network management system for recovery.
  • the present invention provides a system for backing up disaster recovery data, including: a receiving module 301, configured to receive a data file to be backed up sent by the network management system;
  • the segmentation module 302 is configured to divide the data file to be backed up to obtain a divided data block
  • the calculation and search module 303 is configured to calculate the current data block by using a weak check value hash algorithm and a strong check value hash algorithm. Data fingerprint value, and find in the backed up data file whether there is a target data block with the same data fingerprint value;
  • the comparing module 304 is configured to compare the target data block with the current data block byte by byte when the target data block having the same data fingerprint value is found in the backed up data file;
  • the backup module 305 is configured to perform backup of the current data block according to the comparison result of the comparison module 304.
  • the backup module 305 is further configured to store meta information of the current data block if the target data block having the same data fingerprint value is not found in the backed up data file.
  • the comparing module 304 is specifically configured to: when the comparison result is the same, determine that the current data block is a duplicate data block, and store logical index information of the current data block; when the comparison result is different, determine The current data block is a new unique data block that stores the meta information of the current data block.
  • the calculation and search module 303 is specifically configured to first calculate a first data fingerprint value for the current data block by using a weak check value hash algorithm, and back up the first data fingerprint value. Finding, in the data file, whether there is a target data block having the same first data fingerprint value, and if so, calculating a second data fingerprint value for the current data block by using a strong check value hash algorithm, and in the backed up data file Finding a target data block of the same second data fingerprint value.
  • the segmentation module 302 is specifically configured to segment the data file to be backed up according to the number of network elements to obtain a divided data block.
  • the existing remote disaster recovery data backup method is based on a text file, and is compared according to the text content, and the duplicate data is deleted.
  • the backup of the existing disaster recovery data has an 80% data repetition rate, but the text comparison deletes the redundancy method and does not reach the 80% deduplication rate.
  • the inability of the binary file is an example, which limits the format of the backup file and has poor applicability.
  • the solution for disaster recovery data backup provided by the present invention can be applied to backup files in various formats, such as text files, binary database files, and the like. Since the redundant data is deleted based on a comparison of a few KB of binary data blocks, the deduplication rate is high, which is close to 80%.
  • the segmentation method of the data file is either a single block strategy, or the block boundary feature calculation is performed in advance according to the file content type, and then the block is divided, but the actual use of the network management system configuration data, the deduplication rate is not high. .
  • the solution for backing up the disaster recovery data provided by the present invention, for the specific file format of the data file configured by the network management system, and the periodic backup of the configuration data, the data block size is determined according to the number of network elements, and the backup data is effectively improved. Deduplication rate.
  • a Hash algorithm is used to calculate a data fingerprint.
  • the method for backing up disaster recovery data uses a weak check value hash algorithm and a strong check value hash algorithm to calculate a data fingerprint, and the calculation speed is calculated. Fast, the probability of collision is greatly reduced at a lower performance cost, and system performance is improved.
  • the combination of the weak check value hash algorithm and the strong check value hash algorithm can delete the duplicate data in real time. After receiving the data file of the network management system, the backup system can delete the duplicate data online, and convert the cost.
  • the local logical files are stored and do not need to wait until the subsequent system is idle before processing offline. The entire processing cycle is short, and it can cope with off-site backup operations that require urgent and short intervals.
  • the existing disaster recovery data backup method only uses the Hash algorithm with a smaller collision probability when deleting the duplicate data, and does not solve the collision problem of the Hash algorithm, so it cannot be used in the application of the network management system for remote disaster recovery data backup. A collision will result in huge economic losses.
  • the solution for disaster recovery data backup provided by the present invention solves the collision problem by traversing all data blocks with the same data fingerprint value and performing complete byte comparison, so that the data security of the network management system is greatly improved, and can be applied to the network management system configuration data. Offsite disaster recovery data backup is a very high demand for data security.
  • the backup solution of the disaster recovery data provided by the present invention can be applied to various data file formats, and has strong applicability; it can effectively control the rapid growth of backup data, thereby increasing effective storage space, improving storage efficiency, and saving.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a disaster tolerance data backup method and system, and belongs to the field of network management. The method includes: receiving a data file to be backed up, which is transmitted by a network management system (101); segmenting the data file to be backed up to obtain segmented data blocks (102); by utilizing a weak check value hash algorithm and a strong check value hash algorithm, calculating the data fingerprint value of the current data block and searching whether a target data block with the same data fingerprint value is in the data file which has been backed up (103); if yes, comparing the target data block with the current data block byte by byte (104); backing up the current data block according to the comparison result (105). The system includes a reception module, a segmentation module, a calculation and search module, a comparison module and a backup module. The technical solution of the present invention can improve the applicability of a data backup file, reduce the occupation of storage space and improve system performance.

Description

容灾数据 * 的方法及系统 技术领域  Method and system for disaster tolerance data * Technical field
本发明属于网络管理系统, 特别涉及一种基于重复数据删除的网管系 统异地容灾数据备份的方法及系统。 背景技术  The invention belongs to a network management system, and particularly relates to a method and system for backing up disaster recovery data of a network management system based on deduplication. Background technique
网管系统是管理通讯网网元的系统, 配置了全网网元的配置数据, 这 些网元配置数据非常重要, 如果没有这些配置数据, 网元就不能正常运行 业务。 基于容灾的考虑, 配置数据需要在异地备份起来。 一旦网管系统遭 受地震、 火灾等而被损坏, 则可以将异地备份的配置数据恢复过来, 以保 证网元能正常运行业务。 一般而言, 配置数据的异地备份要求每天备份一 次。  The network management system is a system for managing the network elements of the communication network. The configuration data of the network elements of the entire network is configured. The configuration data of these network elements is very important. If the configuration data is not available, the network element cannot run the service normally. Based on disaster tolerance considerations, configuration data needs to be backed up offsite. Once the network management system is damaged by earthquakes, fires, etc., you can restore the configuration data of the offsite backup to ensure that the network element can run the service normally. In general, offsite backups of configuration data require a backup once a day.
现有的一种容灾数据的异地备份技术只是简单的将配置数据导出成文 件, 文件按照日期来命名, 然后将文件拷贝到远程备份系统中, 但这样做 会产生数据冗余的问题。 也有其他网管系统考虑到了备份数据的冗余数据 的处理, 具体处理方法如下:  The existing off-site backup technology for disaster recovery data simply exports the configuration data to a file, the file is named by date, and then the file is copied to the remote backup system, but this will cause data redundancy. There are also other network management systems that take into account the processing of redundant data for backup data. The specific processing methods are as follows:
将配置数据导出, 生成文本文件, 用以记录每个网元具体的配置数据。 将该生成的文本文件拷贝到远程备份系统时, 备份系统会将今天的配置数 据与昨天保存的配置数据对比, 提取出有变化的网元的配置数据, 保存到 今天的备份文件中, 未发生变化的网元的配置数据则不保存。  Export the configuration data to generate a text file to record the specific configuration data of each network element. When the generated text file is copied to the remote backup system, the backup system compares the current configuration data with the configuration data saved yesterday, and extracts the configuration data of the changed network element and saves it to the current backup file. The configuration data of the changed network element is not saved.
这种做法存在明显的缺陷: 对网管系统备份的文件有严格要求, 网管 系统和备份系统要遵守同样的文件格式规定, 才能实现数据备份及以后的 恢复, 不能适用到所有网管系统, 适用性差; 此外, 还要求备份文件必须 是文本文件, 而文本文件不能压缩, 占用存储空间大, 而传输未压缩的文 本文件, 占用网络带宽, 对系统资源消耗大, 影响系统性能。 发明内容 There are obvious defects in this approach: There are strict requirements for the files backed up by the network management system. The network management system and the backup system must comply with the same file format requirements in order to achieve data backup and subsequent recovery. It cannot be applied to all network management systems, and the applicability is poor. In addition, it is required that the backup file must be a text file, and the text file cannot be compressed, occupying a large storage space, and transmitting uncompressed text. This document occupies network bandwidth and consumes a lot of system resources, which affects system performance. Summary of the invention
为了提高数据备份文件的适用性, 减少存储空间占用, 提高系统性能, 本发明提供了一种容灾数据备份的方法及系统。  In order to improve the applicability of data backup files, reduce storage space occupation, and improve system performance, the present invention provides a method and system for disaster recovery data backup.
为解决上述技术问题, 本发明的技术方案是这样实现的:  In order to solve the above technical problem, the technical solution of the present invention is implemented as follows:
一种容灾数据备份的方法, 包括: 接收网管系统发送的待备份数据文 件; 对所述待备份数据文件进行分割, 得到分割的数据块; 利用弱校验值 哈希算法和强校验值哈希算法, 针对当前数据块计算其数据指紋值, 并在 已备份数据文件中查找是否有相同数据指紋值的目标数据块; 如果有, 则 将所述目标数据块与所述当前数据块进行逐字节比较; 根据比较结果进行 所述当前数据块的备份。  A method for backing up disaster recovery data includes: receiving a data file to be backed up sent by the network management system; dividing the data file to be backed up to obtain a divided data block; using a weak check value hash algorithm and a strong check value a hash algorithm, calculating a data fingerprint value for the current data block, and searching for a target data block having the same data fingerprint value in the backed up data file; if yes, performing the target data block with the current data block Byte-by-byte comparison; the backup of the current data block is performed according to the comparison result.
所述利用弱校验值哈希算法和强校验值哈希算法, 针对当前数据块计 算其数据指紋值, 并在已备份数据文件中查找是否有相同数据指紋值的目 标数据块, 包括: 利用弱校验值哈希算法针对当前数据块计算其第一数据 指紋值, 并以所述第一数据指紋值在所述已备份数据文件中查找是否有相 同所述第一数据指紋值的目标数据块, 如果有, 则利用强校验值哈希算法 针对所述当前数据块计算其第二数据指紋值, 并在所述已备份数据文件中 查找相同所述第二数据指紋值的目标数据块。  The weak check value hash algorithm and the strong check value hash algorithm are used to calculate a data fingerprint value for the current data block, and to find a target data block having the same data fingerprint value in the backed up data file, including: Calculating a first data fingerprint value for the current data block by using a weak parity value hash algorithm, and searching for the target of the first data fingerprint value in the backed up data file by using the first data fingerprint value a data block, if yes, calculating a second data fingerprint value for the current data block by using a strong check value hash algorithm, and searching for the target data of the same second data fingerprint value in the backed up data file Piece.
所述根据比较结果进行所述当前数据块的备份, 包括: 当比较结果相 同时, 则确定所述当前数据块是重复数据块, 存储所述当前数据块的逻辑 索引信息; 当比较结果不同时, 则确定所述当前数据块是新的唯一数据块, 存储所述当前数据块的元信息。  The performing the backup of the current data block according to the comparison result, including: determining, when the comparison result is the same, determining that the current data block is a duplicate data block, and storing logical index information of the current data block; when the comparison result is different And determining that the current data block is a new unique data block, and storing meta information of the current data block.
该方法还包括: 如果在已备份数据文件中未查找到有相同数据指紋值 的目标数据块, 则存储所述当前数据块的元信息。  The method further includes: if the target data block having the same data fingerprint value is not found in the backed up data file, storing the meta information of the current data block.
所述对待备份数据文件进行分割, 为: 根据网元数量对所述待备份数 据文件进行分割。 The data to be backed up is divided into: the number of the to-be-backed according to the number of network elements Split according to the file.
一种容灾数据备份的系统, 包括:  A system for backing up disaster recovery data, including:
接收模块, 设置为接收网管系统发送的待备份数据文件;  a receiving module, configured to receive a data file to be backed up sent by the network management system;
分割模块, 设置为对所述待备份数据文件进行分割, 得到分割的数据 块;  a segmentation module, configured to segment the data file to be backed up to obtain a divided data block;
计算和查找模块, 设置为利用弱校验值哈希算法和强校验值哈希算法, 针对当前数据块计算其数据指紋值, 并在已备份数据文件中查找是否有相 同数据指紋值的目标数据块;  The calculation and search module is configured to calculate a data fingerprint value for the current data block by using a weak check value hash algorithm and a strong check value hash algorithm, and find a target of the same data fingerprint value in the backed up data file. data block;
比较模块, 设置为在已备份数据文件中查找到有相同数据指紋值的目 标数据块时, 则将所述目标数据块与所述当前数据块进行逐字节比较; 备份模块, 设置为根据所述比较模块的比较结果进行所述当前数据块 的备份。  a comparison module, configured to compare the target data block with the current data block byte by byte when the target data block having the same data fingerprint value is found in the backed up data file; the backup module is set to be based on the The comparison result of the comparison module performs backup of the current data block.
所述计算和查找模块, 具体设置为: 利用弱校验值哈希算法针对所述 当前数据块计算其第一数据指紋值, 并以所述第一数据指紋值在所述已备 份数据文件中查找是否有相同所述第一数据指紋值的目标数据块, 如果有, 则利用强校验值哈希算法针对所述当前数据块计算其第二数据指紋值, 并 在所述已备份数据文件中查找相同所述第二数据指紋值的目标数据块。  The calculating and searching module is specifically configured to: calculate a first data fingerprint value for the current data block by using a weak check value hash algorithm, and use the first data fingerprint value in the backed up data file Finding whether there is a target data block having the same first data fingerprint value, and if so, calculating a second data fingerprint value for the current data block by using a strong check value hash algorithm, and in the backed up data file Finding a target data block of the same second data fingerprint value.
所述比较模块, 具体设置为: 当比较结果相同时, 则确定所述当前数 据块是重复数据块, 存储所述当前数据块的逻辑索引信息; 当比较结果不 同时, 则确定所述当前数据块是新的唯一数据块, 存储所述当前数据块的 元信息。  The comparing module is specifically configured to: when the comparison result is the same, determining that the current data block is a duplicate data block, storing logical index information of the current data block; and when the comparison result is different, determining the current data A block is a new unique data block that stores meta information of the current data block.
所述备份模块, 还设置为: 如果在已备份数据文件中未查找到有相同 数据指紋值的目标数据块, 则存储所述当前数据块的元信息。  The backup module is further configured to: if the target data block having the same data fingerprint value is not found in the backed up data file, store the meta information of the current data block.
以上所述当前数据块的元信息包括: 当前数据块、 当前数据块的逻辑 索引信息、 当前数据块的弱校验值和强校验值。 本发明通过对接收的待备份数据文件进行分割, 然后利用弱校验值哈 希算法和强校验值哈希算法对分割的数据块计算其数据指紋值, 以该数据 指紋值为关键字进行哈希查找, 在查找到相同数据指紋值的目标数据块后, 将目标数据块与当前数据块进行逐字节比较, 并根据比较结果进行数据块 的备份, 可以实现对各种格式的数据文件进行备份, 提高了备份文件的适 用性; 可以实时进行重复数据的删除, 可以有效控制备份数据的急剧增长, 从而增加了有效存储空间, 提高了存储效率; 并且, 备份文件可以压缩, 减少网络带宽的占用, 提高了系统性能。 附图说明 The meta information of the current data block includes: a current data block, logical index information of the current data block, a weak check value of the current data block, and a strong check value. The invention divides the received data file to be backed up, and then uses the weak check value hash algorithm and the strong check value hash algorithm to calculate the data fingerprint value of the divided data block, and the data fingerprint value is used as a keyword. Hash search, after finding the target data block with the same data fingerprint value, compare the target data block with the current data block byte by byte, and perform data backup according to the comparison result, which can realize data files of various formats. Backups improve the applicability of backup files. Deduplication can be performed in real time, which can effectively control the rapid growth of backup data, thereby increasing the effective storage space and improving storage efficiency. Moreover, backup files can be compressed and network bandwidth can be reduced. Occupy, improve system performance. DRAWINGS
此处所说明的附图用来提供对本发明的进一步理解, 构成本发明的一 部分, 本发明的示意性实施例及其说明用于解释本发明, 并不构成对本发 明的不当限定。 在附图中:  The drawings are intended to provide a further understanding of the invention, and are intended to be a part of the invention. In the drawing:
图 1是本发明提供的容灾数据备份的方法的流程示意图;  1 is a schematic flowchart of a method for backing up disaster recovery data provided by the present invention;
图 2是本发明提供的网管系统异地容灾数据备份的方法的详细流程示 意图;  2 is a detailed flow chart of a method for backing up remote disaster recovery data of a network management system provided by the present invention;
图 3是本发明提供的容灾数据备份的系统的结构示意图。 具体实施方式  FIG. 3 is a schematic structural diagram of a system for backing up disaster recovery data provided by the present invention. detailed description
为了使本发明所要解决的技术问题、 技术方案及有益效果更加清楚、 明白, 以下结合附图和实施例, 对本发明进行进一步详细说明。 应当理解, 此处所描述的具体实施例仅仅用以解释本发明, 并不用于限定本发明。  The present invention will be further described in detail below with reference to the accompanying drawings and embodiments in order to make the present invention. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
如图 1所示, 本发明提供了一种容灾数据备份的方法, 包括: 步骤 101 , 接收网管系统发送的待备份数据文件;  As shown in FIG. 1 , the present invention provides a method for backing up disaster recovery data, including: Step 101: Receive a data file to be backed up sent by a network management system;
步骤 102, 对待备份数据文件进行分割, 得到分割的数据块;  Step 102: Perform a segmentation on the backup data file to obtain a divided data block.
步骤 103, 利用弱校验值哈希算法和强校验值哈希算法, 针对当前数据 块计算其数据指紋值, 并在已备份数据文件中查找是否有相同数据指紋值 的目标数据块; Step 103: Using a weak check value hash algorithm and a strong check value hash algorithm, for current data The block calculates its data fingerprint value, and searches for the target data block having the same data fingerprint value in the backed up data file;
步骤 104, 如果有, 则将目标数据块与当前数据块进行逐字节比较; 步骤 105 , 根据比较结果进行所述当前数据块的备份。  Step 104: If yes, compare the target data block with the current data block byte by byte; Step 105: Perform backup of the current data block according to the comparison result.
在本发明的一个优选实施例中, 如果在已备份数据文件中未查找到有 相同数据指紋值的目标数据块, 则存储当前数据块的元信息。  In a preferred embodiment of the present invention, if a target data block having the same data fingerprint value is not found in the backed up data file, the meta information of the current data block is stored.
在本发明的一个优选实施例中, 根据比较结果进行所述当前数据块的 备份, 包括: 当比较结果相同时, 则确定当前数据块是重复数据块, 存储 当前数据块的逻辑索引信息; 当比较结果不同时, 则确定当前数据块是新 的唯一数据块, 存储当前数据块的元信息。  In a preferred embodiment of the present invention, performing the backup of the current data block according to the comparison result, including: when the comparison result is the same, determining that the current data block is a duplicate data block, and storing logical index information of the current data block; When the comparison result is different, it is determined that the current data block is a new unique data block, and the meta information of the current data block is stored.
在本发明的一个优选实施例中, 利用弱校验值哈希算法和强校验值哈 希算法, 针对当前数据块计算其数据指紋值, 并在已备份数据文件中查找 是否有相同数据指紋值的目标数据块, 包括:  In a preferred embodiment of the present invention, the weak fingerprint value hash algorithm and the strong check value hash algorithm are used to calculate the data fingerprint value for the current data block, and to find out whether the same data fingerprint exists in the backed up data file. The target data block for the value, including:
先利用弱校验值哈希算法针对当前数据块计算其第一数据指紋值, 并 以所述第一数据指紋值在已备份数据文件中查找是否有相同第一数据指紋 值的目标数据块, 如果有, 则利用强校验值哈希算法针对当前数据块计算 其第二数据指紋值, 并在已备份数据文件中查找相同第二数据指紋值的目 标数据块。  First, the first data fingerprint value is calculated for the current data block by using the weak parity value hash algorithm, and the target data block having the same first data fingerprint value is searched in the backed up data file by using the first data fingerprint value. If so, the second data fingerprint value is calculated for the current data block using the strong check value hash algorithm, and the target data block of the same second data fingerprint value is searched for in the backed up data file.
在本发明的一个优选实施例中, 根据网元数量对待备份数据文件进行 分割, 得到分割的数据块。  In a preferred embodiment of the present invention, the backup data file is divided according to the number of network elements to obtain a divided data block.
在本发明的一个优选实施例中, 元信息包括当前数据块、 当前数据块 的逻辑索引信息、 当前数据块的弱校验值和强校验值。  In a preferred embodiment of the invention, the meta-information includes a current data block, logical index information of the current data block, a weak check value of the current data block, and a strong check value.
下面结合附图对本发明实施过程进行详细阐述。  The implementation process of the present invention will be described in detail below with reference to the accompanying drawings.
如图 2所示, 本发明提供的网管系统异地容灾数据备份的方法, 包括: 步骤 201 , 网管系统导出网元配置的待备份数据文件, 并将该数据文件 传输给异地备份系统。 As shown in FIG. 2, the method for backing up the remote disaster recovery data of the network management system provided by the present invention includes: Step 201: The network management system exports the data file to be backed up configured by the network element, and the data file is Transfer to an offsite backup system.
其中, 该数据文件可以是文本或二进制的任何格式文件。  Wherein, the data file can be any format file of text or binary.
步骤 202,备份系统接收待备份数据文件,根据网元数量将该待备份数 据文件分割成一组数据块。  Step 202: The backup system receives the data file to be backed up, and divides the data file to be backed up into a group of data blocks according to the number of network elements.
具体地, 釆用预先义好的数据块大小对待备份数据文件进行分割。 数 据块大小可以根据网元数量来确定, 10000 个网元的配置数据文件可以为 300MB。 分割文件的数据块粒度太细, 则系统资源开销太大; 粒度过粗, 则重复数据删除的效果不佳。 需要在两者之间权衡折衷, 根据测试得出如 下经验值:网元数量在 1000以内,数据块大小可以是 1KB;网元数量在 1000 到 5000之间, 数据块大小可以是 4KB; 网元数量在 5000到 10000之间, 数据块大小可以是 8KB。  Specifically, the backup data file is segmented with a pre-determined data block size. The data block size can be determined according to the number of network elements. The configuration data file of 10000 network elements can be 300MB. If the data block size of the split file is too fine, the system resource overhead is too large; if the granularity is too coarse, the effect of deduplication is not good. It is necessary to weigh the trade-off between the two. According to the test, the following empirical values are obtained: the number of network elements is less than 1000, the data block size can be 1 KB, the number of network elements is between 1000 and 5000, and the data block size can be 4 KB; The number is between 5000 and 10000, and the block size can be 8KB.
在对待备份数据文件进行分割后, 备份系统对每一数据块分配唯一的 数据块逻辑索引信息, 该逻辑索引信息可以是逻辑索引号。  After the backup data file is divided, the backup system allocates unique data block logical index information to each data block, and the logical index information may be a logical index number.
步骤 203 ,备份系统对当前数据块计算数据指紋, 并以数据指紋值为关 键字在已备份数据文件中进行哈希查找, 获取数据指紋相同的目标数据块。  Step 203: The backup system calculates a data fingerprint for the current data block, and performs a hash search in the backed up data file with the data fingerprint value as a keyword to obtain a target data block with the same data fingerprint.
具体地, 数据指紋是数据块的本质特征, 每个唯一数据块具有唯一的 数据指紋值, 数据指紋值是对数据块内容进行哈希(Hash )数学运算获得。 常用的 Hash算法有 FNV1、 CRC、 MD5、 SHA1、 SHA-256, SHA-512等。 不同的 Hash算法碰撞发生概率不同( Hash算法都存在碰撞问题, 即不同数 据块可能会产生相同的数据指紋), 计算出来的数据指紋值的位数也不同, 相应的计算量也不同。 具有更低的碰撞发生概率和更多数据指紋值位数的 Hash算法, 其计算量大很多。  Specifically, the data fingerprint is an essential feature of the data block, each unique data block has a unique data fingerprint value, and the data fingerprint value is obtained by performing a hash operation on the data block content. Commonly used Hash algorithms are FNV1, CRC, MD5, SHA1, SHA-256, SHA-512, and so on. Different Hash algorithms have different collision probability (Hash algorithm has collision problem, that is, different data blocks may produce the same data fingerprint), and the calculated data fingerprint value has different digits, and the corresponding calculation amount is also different. A Hash algorithm with a lower probability of collision occurrence and more data fingerprint value bits is much more computationally intensive.
计算数据块的数据指紋, 需要在性能和数据安全性方面作权衡, CRC Hash算法是弱校验值 Hash算法,计算快速。在本实施例中,釆用 CRC Hash 算法计算的数据指紋值是 32位。 MD5 Hash算法是强校验值 Hash算法, 具 有非常低的碰撞发生概率, 算出的数据指紋值是 128位。 其中, 强校验值 哈希算法和弱校验值哈希算法通常是以 128位作为区分标准, 低于 128位, 则属于弱校验值哈希算法, 高于 128位属于强校验值哈希算法。 备份系统 使用 CRC Hash算法和 MD5 Hash算法为数据块计算数据指紋, 具体如下: 针对步骤 202中分割好的每一数据块,先算用 CRC Hash算法计算 CRC 校验值, 然后以该 CRC校验值为关键字在已备份数据文件中进行 Hash查 找, 判断是否有与该 CRC校验值相同的匹配项, 如果没有, 则表示该数据 块是新的唯一数据块, 此时存储该数据块、 该数据块的逻辑索引号以及该 数据块的 CRC校验值和 MD5校验值,其中 MD5校验值用 MD5 Hash算法 计算得到; 如果存在, 则用 MD5 Hash算法计算该数据块的 MD5校验值, 并以该 MD5校验值在已备份的数据文件中进行 Hash查找, 判断是否有与 该 MD5校验值相同的匹配项, 如果有, 则判断出可能存在重复数据块, 并 转入步骤 204; 如果没有, 则存储该数据块, 并创建相关元信息。 Calculating the data fingerprint of the data block requires trade-offs between performance and data security. The CRC Hash algorithm is a weak check value hash algorithm, which is fast. In this embodiment, the data fingerprint value calculated by the CRC Hash algorithm is 32 bits. The MD5 Hash algorithm is a strong checksum hash algorithm. There is a very low probability of collision occurrence, and the calculated data fingerprint value is 128 bits. Among them, the strong check value hash algorithm and the weak check value hash algorithm are usually based on 128 bits as a distinguishing standard. Below 128 bits, it belongs to a weak check value hash algorithm, and higher than 128 bits belong to a strong check value. Hash algorithm. The backup system uses the CRC Hash algorithm and the MD5 Hash algorithm to calculate a data fingerprint for the data block, as follows: For each data block that is divided in step 202, the CRC check value is calculated by using the CRC Hash algorithm, and then the CRC check is performed. The value is a hash search in the backed up data file, and it is determined whether there is a matching item with the same CRC check value. If not, it indicates that the data block is a new unique data block, and the data block is stored at this time. The logical index number of the data block and the CRC check value and the MD5 check value of the data block, wherein the MD5 check value is calculated by the MD5 Hash algorithm; if present, the MD5 Hash algorithm is used to calculate the MD5 check of the data block. Value, and perform a hash search in the backed up data file with the MD5 check value to determine whether there is a match with the MD5 check value, and if so, it is determined that there may be duplicate data blocks, and the process proceeds to the step 204; If not, store the data block and create related meta information.
一般而言, MD5 Hash算法不会产生碰撞, 对一个数据块( block )计算 的 MD5校验值是唯一的,也就是说一个数据块对应一个唯一的数据指紋值, 可以用 1 :1映射来表示。 传统的 Hash表的元素项使用二元组表示:  In general, the MD5 Hash algorithm does not generate collisions. The MD5 checksum calculated for a block is unique. That is, a block corresponds to a unique data fingerprint value, which can be mapped with 1:1. Said. The element items of a traditional Hash table are represented by a two-group:
<md5_hashkey, block>, 其中 md5_hashkey表示数据块的 MD5校验值。 但实际情况中, 可能存在两个数据块的 MD5校验值相同, 例如, 对数 据块 1计算的 MD5校验值等于对数据块 2计算的 MD5校验值, 多个数据 块对应一个数据指紋值, 这时就需要使用 l:n映射来表示。 在本发明中, 使 用三元组表示 Hash表的元素项:  <md5_hashkey, block>, where md5_hashkey represents the MD5 checksum of the data block. However, in actual situations, there may be the same MD5 check value of two data blocks. For example, the MD5 check value calculated for data block 1 is equal to the MD5 check value calculated for data block 2, and multiple data blocks correspond to one data fingerprint. The value, then you need to use the l:n map to represent. In the present invention, a triplet is used to represent an element item of a Hash table:
<md5_hashkey, block_nr, block_IDs>  <md5_hashkey, block_nr, block_IDs>
其中, md5_hashkey表示数据块 MD5校验值, block_nr表示 MD5校验 值相同的数据块数量, block_IDs表示这些数据块的逻辑索引号。 在本发明 算法设计中,将 block_nr与 block_IDs合并在一块形成链表,结构如下所示: block_nr I block—ID 1 I block_ID2 I ... I block_IDn 其中 block_IDl I block_ID2 I ... I block_IDn是数据块逻辑索引号链表, 以下用 block_ID list表示数据块逻辑索引号链表, block_nr为链表的长度。 Where md5_hashkey represents the check value of the data block MD5, block_nr represents the number of data blocks with the same MD5 check value, and block_IDs represents the logical index number of these data blocks. In the algorithm design of the present invention, block_nr and block_IDs are combined to form a linked list, and the structure is as follows: block_nr I block_ID 1 I block_ID2 I ... I block_IDn Where block_ID1 I block_ID2 I ... I block_IDn is a data block logical index number linked list, and the block_ID list is used to represent the data block logical index number linked list, and block_nr is the length of the linked list.
在本发明中, 实际上是使用链表法来解决 Hash碰撞问题, 每个 Hash 表的元素项的 block_ID list长度不定。利用链表法在已备份的数据文件中查 找与 MD5校验值相同的目标数据块如下:  In the present invention, the linked list method is actually used to solve the Hash collision problem, and the block_ID list length of the element items of each hash table is not fixed. Use the linked list method to find the target data block with the same MD5 check value in the backed up data file as follows:
( 1 ) 计算当 前数据块的 MD5 校验值 md5_hashkey , 即 md5_hashkey=hash_md5 (block);  (1) Calculate the MD5 check value md5_hashkey of the current data block, ie md5_hashkey=hash_md5 (block);
( 2 ) 用 md5_hashkey 查找已备份数据文件的 Hash 表, bindex=hash_value(hashkey, Hash表) , 其中, bindex表示当前数据块的還辑 索引号;  (2) Use md5_hashkey to find the Hash table of the backed up data file, bindex=hash_value(hashkey, Hash table), where bindex represents the index number of the current data block;
( 3 )如果在 Hash表中, 未发现匹配元素项, 即 bindex==NULL, 则直 接将 md5_hashkey插入 Hash表, 并且 block_nr=l , block_IDl=当前数据块 的逻辑索引号;  (3) If no matching element is found in the Hash table, ie bindex==NULL, the md5_hashkey is directly inserted into the Hash table, and block_nr=l , block_IDl=the logical index number of the current data block;
( 4 )如果在 Hash表中发现匹配元素项, 则判断该数据块可能为重复 数据块, 也可能 CRC Hash算法、 MD5 Hash算法都发生了碰撞, 该数据块 不是重复数据块。  (4) If a matching element item is found in the Hash table, it may be determined that the data block may be a duplicate data block, or a CRC Hash algorithm or an MD5 Hash algorithm may collide, and the data block is not a duplicate data block.
在本步骤中, 先利用弱校验值算法计算当前数据块的第一数据指紋值, 再进行数据指紋值查找; 然后再利用强校验值算法计算当前数据块的第二 数据指紋值, 再进行数据指紋值查找; 在实际应用中, 也可以先利用强校 验值算法计算当前数据块的第二数据指紋值, 再进行数据指紋值查找; 然 后再利用弱校验值算法计算当前数据块的第一数据指紋值, 再进行数据指 紋值查找, 具体原理类似, 在此不再赘述。  In this step, the first data fingerprint value of the current data block is calculated by using the weak check value algorithm, and then the data fingerprint value is searched; then the second data fingerprint value of the current data block is calculated by using the strong check value algorithm, and then Perform data fingerprint value search; in practical applications, you can also use the strong check value algorithm to calculate the second data fingerprint value of the current data block, and then perform data fingerprint value search; then use the weak check value algorithm to calculate the current data block. The first data fingerprint value, and then the data fingerprint value search, the specific principle is similar, and will not be described here.
步骤 204, 备份系统将获取的目标数据块与当前数据块进行字节级比 较, 如果比较结果相同, 则转入步骤 205; 如果比较结果不同, 则转入步骤 206。 步骤 205 , 确定该当前数据块是重复数据块,存储该当前数据块的逻辑 索引号。 Step 204: The backup system compares the acquired target data block with the current data block by byte level. If the comparison result is the same, the process proceeds to step 205. If the comparison result is different, the process proceeds to step 206. Step 205: Determine that the current data block is a duplicate data block, and store a logical index number of the current data block.
步骤 206,确定该当前数据块是新的唯一块,存储当前数据块的元信息, 该元信息包括: 该当前数据块、 该当前数据块的逻辑索引号、 CRC校验值 和 MD5校验值。  Step 206: Determine that the current data block is a new unique block, and store meta information of the current data block, where the meta information includes: the current data block, a logical index number of the current data block, a CRC check value, and an MD5 check value. .
对于步骤 204-206, 承接步骤 203 , 在发现相同匹配项后, 遍历此匹配 元素项的 block_ID list, 将 block_ID list中的每个数据块逻辑索引号对应的 目标数据块与当前数据块进行逐字节的比较, 如果相同, 则说明当前数据 块已经存在, 存储该当前数据块的逻辑索引号; 如果未找到相同数据块, 则存储该数据块, 将当前数据块插入 block_ID list结尾, 并将当前数据块写 入文件, 并且 block_nr数值增加 1 , block_IDn=当前数据块的逻辑索引号。 其中, 本发明的数据存储可以釆用 RAID5方式。  For the steps 204-206, the process proceeds to step 203, after the same match is found, the block_ID list of the matching element item is traversed, and the target data block corresponding to the logical index number of each data block in the block_ID list is verbatim with the current data block. The comparison of the sections, if they are the same, indicates that the current data block already exists, and stores the logical index number of the current data block; if the same data block is not found, the data block is stored, the current data block is inserted into the end of the block_ID list, and the current The data block is written to the file, and the block_nr value is incremented by 1, block_IDn = the logical index number of the current data block. The data storage of the present invention can adopt the RAID 5 mode.
至此, 一个数据文件在备份系统就对应一个逻辑文件表示, 由一组数 据指紋组成的元信息组成。  At this point, a data file is represented by a logical file in the backup system, and consists of meta information consisting of a set of data fingerprints.
在完成网元配置的数据文件备份, 在出现需要恢复的情况下, 备份系 统进行文件读取, 先读取逻辑文件, 然后根据数据块指紋, 取出相应数据 块, 还原物理文件副本, 再将此文件副本发给网管系统用于恢复。  After the data file backup of the NE configuration is completed, in the case that recovery is required, the backup system performs file reading, first reads the logical file, and then according to the data block fingerprint, takes out the corresponding data block, restores the physical file copy, and then A copy of the file is sent to the network management system for recovery.
如图 3所示, 本发明提供了一种容灾数据备份的系统, 包括: 接收模块 301 , 用于接收网管系统发送的待备份数据文件;  As shown in FIG. 3, the present invention provides a system for backing up disaster recovery data, including: a receiving module 301, configured to receive a data file to be backed up sent by the network management system;
分割模块 302, 用于对待备份数据文件进行分割, 得到分割的数据块; 计算和查找模块 303 , 用于利用弱校验值哈希算法和强校验值哈希算 法, 针对当前数据块计算其数据指紋值, 并在已备份数据文件中查找是否 有相同数据指紋值的目标数据块;  The segmentation module 302 is configured to divide the data file to be backed up to obtain a divided data block, and the calculation and search module 303 is configured to calculate the current data block by using a weak check value hash algorithm and a strong check value hash algorithm. Data fingerprint value, and find in the backed up data file whether there is a target data block with the same data fingerprint value;
比较模块 304,用于在已备份数据文件中查找到有相同数据指紋值的目 标数据块时, 则将目标数据块与当前数据块进行逐字节比较; 备份模块 305 ,用于根据比较模块 304的比较结果进行当前数据块的备 份。 The comparing module 304 is configured to compare the target data block with the current data block byte by byte when the target data block having the same data fingerprint value is found in the backed up data file; The backup module 305 is configured to perform backup of the current data block according to the comparison result of the comparison module 304.
在本发明的一个优选实施例中,备份模块 305 ,还用于如果在已备份数 据文件中未查找到有相同数据指紋值的目标数据块, 则存储当前数据块的 元信息。  In a preferred embodiment of the present invention, the backup module 305 is further configured to store meta information of the current data block if the target data block having the same data fingerprint value is not found in the backed up data file.
在本发明的一个优选实施例中, 比较模块 304, 具体用于当比较结果相 同时, 则确定当前数据块是重复数据块, 存储当前数据块的逻辑索引信息; 当比较结果不同时, 则确定当前数据块是新的唯一数据块, 存储当前 数据块的元信息。  In a preferred embodiment of the present invention, the comparing module 304 is specifically configured to: when the comparison result is the same, determine that the current data block is a duplicate data block, and store logical index information of the current data block; when the comparison result is different, determine The current data block is a new unique data block that stores the meta information of the current data block.
在本发明的一个优选实施例中, 计算和查找模块 303 , 具体用于先利用 弱校验值哈希算法针对当前数据块计算其第一数据指紋值, 并以第一数据 指紋值在已备份数据文件中查找是否有相同所述第一数据指紋值的目标数 据块, 如果有, 则利用强校验值哈希算法针对当前数据块计算其第二数据 指紋值, 并在已备份数据文件中查找相同所述第二数据指紋值的目标数据 块。  In a preferred embodiment of the present invention, the calculation and search module 303 is specifically configured to first calculate a first data fingerprint value for the current data block by using a weak check value hash algorithm, and back up the first data fingerprint value. Finding, in the data file, whether there is a target data block having the same first data fingerprint value, and if so, calculating a second data fingerprint value for the current data block by using a strong check value hash algorithm, and in the backed up data file Finding a target data block of the same second data fingerprint value.
在本发明的一个优选实施例中, 分割模块 302, 具体用于根据网元数量 对所述待备份数据文件进行分割, 得到分割的数据块。  In a preferred embodiment of the present invention, the segmentation module 302 is specifically configured to segment the data file to be backed up according to the number of network elements to obtain a divided data block.
现有的异地容灾数据备份方法是在文本文件基础上, 根据文本内容进 行比较, 进行重复数据的删除。 现有的容灾数据的备份有 80%数据重复率, 但是文本比较删除冗余的方式, 没有达到 80%的重复数据删除率。 而且对 于二进制文件无能为例, 限制了备份文件的格式, 适用性差。 本发明提供 的容灾数据备份的方案可以适用于各种格式的备份文件, 例如文本文件、 二进制数据库文件等。由于是基于几 KB的二进制数据块进行比较来删除冗 余数据, 重复数据删除率高, 可以接近 80%。 而且备份数据的次数越多, 间隔越短, 重复数据删除率就越高。 现有技术对于数据文件的分割方法要么分块策略单一, 要么是根据文 件内容类型, 预先进行块边界特征计算, 然后再分块, 但实际利用到网管 系统配置数据上, 重复数据删除率不高。 本发明提供的容灾数据备份的方 案, 针对网管配置的数据文件这种特定的文件格式, 和配置数据周期性备 份这种场合, 根据网元数量确定数据块大小, 有效地提高了备份数据的重 复数据删除率。 The existing remote disaster recovery data backup method is based on a text file, and is compared according to the text content, and the duplicate data is deleted. The backup of the existing disaster recovery data has an 80% data repetition rate, but the text comparison deletes the redundancy method and does not reach the 80% deduplication rate. Moreover, the inability of the binary file is an example, which limits the format of the backup file and has poor applicability. The solution for disaster recovery data backup provided by the present invention can be applied to backup files in various formats, such as text files, binary database files, and the like. Since the redundant data is deleted based on a comparison of a few KB of binary data blocks, the deduplication rate is high, which is close to 80%. Moreover, the more times the data is backed up, the shorter the interval, and the higher the deduplication rate. In the prior art, the segmentation method of the data file is either a single block strategy, or the block boundary feature calculation is performed in advance according to the file content type, and then the block is divided, but the actual use of the network management system configuration data, the deduplication rate is not high. . The solution for backing up the disaster recovery data provided by the present invention, for the specific file format of the data file configured by the network management system, and the periodic backup of the configuration data, the data block size is determined according to the number of network elements, and the backup data is effectively improved. Deduplication rate.
现有技术中釆用一种 Hash算法计算数据指紋, 本发明提供的容灾数据 备份的方法釆用弱校验值 Hash算法和强校验值 Hash算法相结合的方式来 计算数据指紋, 计算速度快, 以较小的性能代价极大地降低了碰撞产生的 概率, 提高了系统性能。 此外, 弱校验值 Hash算法和强校验值 Hash算法 相结合的方式, 可以实时进行重复数据的删除, 备份系统在接收到网管系 统的数据文件后, 可以在线进行重复数据的删除, 转化成本地逻辑文件存 储起来, 不需要等到后续系统有空闲时再离线处理。 整个处理周期短, 能 够应对需要紧急进行的、 间隔很短的异地备份操作。  In the prior art, a Hash algorithm is used to calculate a data fingerprint. The method for backing up disaster recovery data provided by the present invention uses a weak check value hash algorithm and a strong check value hash algorithm to calculate a data fingerprint, and the calculation speed is calculated. Fast, the probability of collision is greatly reduced at a lower performance cost, and system performance is improved. In addition, the combination of the weak check value hash algorithm and the strong check value hash algorithm can delete the duplicate data in real time. After receiving the data file of the network management system, the backup system can delete the duplicate data online, and convert the cost. The local logical files are stored and do not need to wait until the subsequent system is idle before processing offline. The entire processing cycle is short, and it can cope with off-site backup operations that require urgent and short intervals.
现有的容灾数据备份方法在删除重复数据时仅仅釆用了碰撞概率更小 的 Hash算法, 没有解决 Hash算法的碰撞问题, 所以不能被用于网管系统 异地容灾数据备份的应用场合, 一旦发生碰撞将产生巨大的经济损失。 本 发明提供的容灾数据备份的方案通过遍历所有数据指紋值相同的数据块, 并进行字节完全比较来解决碰撞问题, 使得网管系统的数据安全性大大提 高, 能够应用到网管系统配置数据的异地容灾数据备份这种对数据的安全 性要求非常高的场合上。  The existing disaster recovery data backup method only uses the Hash algorithm with a smaller collision probability when deleting the duplicate data, and does not solve the collision problem of the Hash algorithm, so it cannot be used in the application of the network management system for remote disaster recovery data backup. A collision will result in huge economic losses. The solution for disaster recovery data backup provided by the present invention solves the collision problem by traversing all data blocks with the same data fingerprint value and performing complete byte comparison, so that the data security of the network management system is greatly improved, and can be applied to the network management system configuration data. Offsite disaster recovery data backup is a very high demand for data security.
综上所述, 本发明提供的容灾数据的备份方案可以适用于各种数据文 件格式, 适用性强; 可以有效控制备份数据的急剧增长, 从而增加有效存 储空间, 提高存储效率, 进而节省了存储总成本和管理成本; 备份文件可 以压缩, 能够节省数据传输的网络带宽; 可以节省空间、 电力供应、 冷却 等运维成本。 In summary, the backup solution of the disaster recovery data provided by the present invention can be applied to various data file formats, and has strong applicability; it can effectively control the rapid growth of backup data, thereby increasing effective storage space, improving storage efficiency, and saving. Total storage cost and management cost; backup files can be compressed, saving network bandwidth for data transmission; saving space, power supply, cooling Waiting for operation and maintenance costs.
上述说明示出并描述了本发明的一个优选实施例, 但如前所述, 应当 理解本发明并非局限于本文所披露的形式, 不应看作是对其他实施例的排 除, 而可用于各种其他组合、 修改和环境, 并能够在本文所述发明构想范 围内, 通过上述教导或相关领域的技术或知识进行改动。 而本领域人员所 进行的改动和变化不脱离本发明的精神和范围, 则都应在本发明所附权力 要求的保护范围内。  The above description shows and describes a preferred embodiment of the present invention, but as described above, it should be understood that the present invention is not limited to the forms disclosed herein, and should not be construed as Other combinations, modifications, and environments are possible and can be modified by the teachings or related art or knowledge within the scope of the inventive concept described herein. The modifications and variations made by those skilled in the art are intended to be within the scope of the appended claims.

Claims

权利要求书 Claim
1. 一种容灾数据备份的方法, 包括:  1. A method for backing up disaster recovery data, including:
接收网管系统发送的待备份数据文件;  Receiving data files to be backed up sent by the network management system;
对所述待备份数据文件进行分割, 得到分割的数据块;  Segmenting the data file to be backed up to obtain a divided data block;
利用弱校验值哈希算法和强校验值哈希算法, 针对当前数据块计算其 数据指紋值, 并在已备份数据文件中查找是否有相同数据指紋值的目标数 据块;  Using the weak check value hash algorithm and the strong check value hash algorithm, the data fingerprint value is calculated for the current data block, and the target data block having the same data fingerprint value is searched in the backed up data file;
如果有, 则将所述目标数据块与所述当前数据块进行逐字节比较; 根据比较结果进行所述当前数据块的备份。  If yes, the target data block is compared byte by byte with the current data block; and the current data block is backed up according to the comparison result.
2. 如权利要求 1所述的方法, 其中, 所述利用弱校验值哈希算法和强 校验值哈希算法, 针对当前数据块计算其数据指紋值, 并在已备份数据文 件中查找是否有相同数据指紋值的目标数据块, 包括:  2. The method according to claim 1, wherein the using the weak check value hash algorithm and the strong check value hash algorithm, calculating a data fingerprint value for the current data block, and searching in the backed up data file Whether there is a target data block with the same data fingerprint value, including:
利用弱校验值哈希算法针对当前数据块计算其第一数据指紋值, 并以 所述第一数据指紋值在所述已备份数据文件中查找是否有相同所述第一数 据指紋值的目标数据块, 如果有, 则利用强校验值哈希算法针对所述当前 数据块计算其第二数据指紋值, 并在所述已备份数据文件中查找相同所述 第二数据指紋值的目标数据块。  Calculating a first data fingerprint value for the current data block by using a weak parity value hash algorithm, and searching for the target of the first data fingerprint value in the backed up data file by using the first data fingerprint value a data block, if yes, calculating a second data fingerprint value for the current data block by using a strong check value hash algorithm, and searching for the target data of the same second data fingerprint value in the backed up data file Piece.
3. 如权利要求 1所述的方法, 其中, 所述根据比较结果进行所述当前 数据块的备份, 包括:  3. The method according to claim 1, wherein the performing the backup of the current data block according to the comparison result comprises:
当比较结果相同时, 则确定所述当前数据块是重复数据块, 存储所述 当前数据块的逻辑索引信息;  When the comparison result is the same, determining that the current data block is a duplicate data block, and storing logical index information of the current data block;
当比较结果不同时, 则确定所述当前数据块是新的唯一数据块, 存储 所述当前数据块的元信息。  When the comparison result is different, it is determined that the current data block is a new unique data block, and the meta information of the current data block is stored.
4. 如权利要求 1所述的方法, 其中, 该方法还包括: 如果在已备份数 据文件中未查找到有相同数据指紋值的目标数据块, 则存储所述当前数据 块的元信息。 4. The method according to claim 1, wherein the method further comprises: storing the current data if a target data block having the same data fingerprint value is not found in the backed up data file Meta information of the block.
5. 如权利要求 1至 4任意一项所述的方法, 其中, 所述对待备份数据 文件进行分割, 为: 根据网元数量对所述待备份数据文件进行分割。  The method according to any one of claims 1 to 4, wherein the data to be backed up is divided into: the data file to be backed up is divided according to the number of network elements.
6. 如权利要求 3或 4所述的方法, 其中, 所述当前数据块的元信息包 括: 当前数据块、 当前数据块的逻辑索引信息、 当前数据块的弱校验值和 强校验值。  The method according to claim 3 or 4, wherein the meta information of the current data block comprises: a current data block, logical index information of a current data block, a weak check value of a current data block, and a strong check value. .
7. 一种容灾数据备份的系统, 包括:  7. A system for backing up disaster recovery data, including:
接收模块, 设置为接收网管系统发送的待备份数据文件;  a receiving module, configured to receive a data file to be backed up sent by the network management system;
分割模块, 设置为对所述待备份数据文件进行分割, 得到分割的数据 块;  a segmentation module, configured to segment the data file to be backed up to obtain a divided data block;
计算和查找模块, 设置为利用弱校验值哈希算法和强校验值哈希算法, 针对当前数据块计算其数据指紋值, 并在已备份数据文件中查找是否有相 同数据指紋值的目标数据块;  The calculation and search module is configured to calculate a data fingerprint value for the current data block by using a weak check value hash algorithm and a strong check value hash algorithm, and find a target of the same data fingerprint value in the backed up data file. data block;
比较模块, 设置为在已备份数据文件中查找到有相同数据指紋值的目 标数据块时, 则将所述目标数据块与所述当前数据块进行逐字节比较; 备份模块, 设置为根据所述比较模块的比较结果进行所述当前数据块 的备份。  a comparison module, configured to compare the target data block with the current data block byte by byte when the target data block having the same data fingerprint value is found in the backed up data file; the backup module is set to be based on the The comparison result of the comparison module performs backup of the current data block.
8. 如权利要求 7所述的系统, 其中, 所述计算和查找模块, 具体设置 为: 利用弱校验值哈希算法针对所述当前数据块计算其第一数据指紋值, 并以所述第一数据指紋值在所述已备份数据文件中查找是否有相同所述第 一数据指紋值的目标数据块, 如果有, 则利用强校验值哈希算法针对所述 当前数据块计算其第二数据指紋值, 并在所述已备份数据文件中查找相同 所述第二数据指紋值的目标数据块。  The system according to claim 7, wherein the calculating and searching module is specifically configured to: calculate a first data fingerprint value for the current data block by using a weak check value hash algorithm, and Determining, in the backed up data file, whether there is a target data block having the same first data fingerprint value in the backed up data file, and if so, calculating a first data block for the current data block by using a strong check value hash algorithm And a data fingerprint value, and searching for the target data block of the same second data fingerprint value in the backed up data file.
9. 如权利要求 7所述的系统, 其中, 所述比较模块, 具体设置为: 当比较结果相同时, 则确定所述当前数据块是重复数据块, 存储所述 当前数据块的逻辑索引信息; The system according to claim 7, wherein the comparing module is specifically configured to: when the comparison result is the same, determining that the current data block is a duplicate data block, and storing the Logical index information of the current data block;
当比较结果不同时, 则确定所述当前数据块是新的唯一数据块, 存储 所述当前数据块的元信息。  When the comparison result is different, it is determined that the current data block is a new unique data block, and the meta information of the current data block is stored.
10. 如权利要求 7所述的系统, 其中, 所述备份模块, 还设置为: 如果 在已备份数据文件中未查找到有相同数据指紋值的目标数据块, 则存储所 述当前数据块的元信息。  10. The system according to claim 7, wherein the backup module is further configured to: if a target data block having the same data fingerprint value is not found in the backed up data file, storing the current data block Meta information.
11. 如权利要求 9或 10所述的系统, 其中, 所述当前数据块的元信息 包括: 当前数据块、 当前数据块的逻辑索引信息、 当前数据块的弱校验值 和强校验值。  The system according to claim 9 or 10, wherein the meta information of the current data block comprises: a current data block, logical index information of a current data block, a weak check value of a current data block, and a strong check value. .
PCT/CN2011/073780 2010-11-17 2011-05-06 Disaster tolerance data backup method and system WO2012065408A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010548146.1 2010-11-17
CN201010548146.1A CN101989929B (en) 2010-11-17 2010-11-17 Disaster recovery data backup method and system

Publications (1)

Publication Number Publication Date
WO2012065408A1 true WO2012065408A1 (en) 2012-05-24

Family

ID=43746287

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/073780 WO2012065408A1 (en) 2010-11-17 2011-05-06 Disaster tolerance data backup method and system

Country Status (2)

Country Link
CN (1) CN101989929B (en)
WO (1) WO2012065408A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI588670B (en) * 2016-05-25 2017-06-21 精品科技股份有限公司 System and method for segment backup

Families Citing this family (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101989929B (en) * 2010-11-17 2014-07-02 中兴通讯股份有限公司 Disaster recovery data backup method and system
CN102156727A (en) * 2011-04-01 2011-08-17 华中科技大学 Method for deleting repeated data by using double-fingerprint hash check
CN102184198B (en) * 2011-04-22 2016-04-27 张伟 Be applicable to the data de-duplication method of operating load protection system
CN102799598A (en) * 2011-05-25 2012-11-28 英业达股份有限公司 Data recovery method for deleting repeated data
CN102202098A (en) * 2011-05-25 2011-09-28 成都市华为赛门铁克科技有限公司 Data processing method and device
CN102541685A (en) * 2011-11-16 2012-07-04 中标软件有限公司 Linux system backup method and Linux system repair method
CN103428242B (en) * 2012-05-18 2016-12-14 阿里巴巴集团控股有限公司 A kind of method of increment synchronization, Apparatus and system
CN103713963B (en) * 2012-09-29 2017-06-23 南京壹进制信息技术股份有限公司 A kind of efficient file backup and restoration methods
CN103034564B (en) * 2012-12-05 2016-06-15 华为技术有限公司 Data disaster tolerance drilling method, data disaster tolerance practice device and system
CN103269352A (en) * 2012-12-07 2013-08-28 北京奇虎科技有限公司 Point-to-point (P2P) file downloading method and device
CN103269351A (en) * 2012-12-07 2013-08-28 北京奇虎科技有限公司 File download method and device
CN103259729B (en) * 2012-12-10 2018-03-02 上海德拓信息技术股份有限公司 Network data compaction transmission method based on zero collision hash algorithm
CN103365745A (en) * 2013-06-07 2013-10-23 上海爱数软件有限公司 Block level backup method based on content-addressed storage and system
CN103399853A (en) * 2013-06-28 2013-11-20 苏州海客科技有限公司 Method for selecting file cutting granularity
CN103473278A (en) * 2013-08-28 2013-12-25 苏州天永备网络科技有限公司 Repeating data processing technology
CN104750743A (en) * 2013-12-31 2015-07-01 中国银联股份有限公司 System and method for ticking and rechecking transaction files
CN103744939B (en) * 2013-12-31 2017-07-14 华为技术有限公司 A kind of recording method of daily record, the restoration methods and log manager of daily record
CN103795783A (en) * 2014-01-14 2014-05-14 上海上讯信息技术股份有限公司 Data synchronization method and system
CN103942125A (en) * 2014-05-06 2014-07-23 南宁博大全讯科技有限公司 Automatic backup method and system
CN103970852A (en) * 2014-05-06 2014-08-06 浪潮电子信息产业股份有限公司 Data de-duplication method of backup server
CN104268034B (en) * 2014-10-09 2017-11-07 中国人民解放军国防科学技术大学 A kind of data back up method and device and data reconstruction method and device
CN104375905A (en) * 2014-11-07 2015-02-25 北京云巢动脉科技有限公司 Incremental backing up method and system based on data block
CN104484402B (en) * 2014-12-15 2018-02-09 新华三技术有限公司 A kind of method and device of deleting duplicated data
CN106934293B (en) * 2015-12-29 2020-04-24 航天信息股份有限公司 Collision calculation device and method for digital abstract
CN107346271A (en) * 2016-05-05 2017-11-14 华为技术有限公司 The method and calamity of Backup Data block are for end equipment
CN106326035A (en) * 2016-08-13 2017-01-11 南京叱咤信息科技有限公司 File-metadata-based incremental backup method
CN106817419B (en) * 2017-01-19 2020-06-30 四川奥诚科技有限责任公司 VoLTE AS network element-based data extraction and analysis method and device and service terminal
CN106802841B (en) * 2017-01-19 2020-06-09 四川奥诚科技有限责任公司 Data extraction and analysis method and device and server
CN107704342A (en) * 2017-09-26 2018-02-16 郑州云海信息技术有限公司 A kind of snap copy method, system, device and readable storage medium storing program for executing
CN107729766B (en) * 2017-09-30 2020-02-07 中国联合网络通信集团有限公司 Data storage method, data reading method and system thereof
CN108090355B (en) * 2017-11-28 2020-10-27 西安交通大学 APK automatic triggering tool
CN108089949A (en) * 2017-12-29 2018-05-29 广州创慧信息科技有限公司 A kind of method and system of automatic duplicating of data
CN108304503A (en) * 2018-01-18 2018-07-20 阿里巴巴集团控股有限公司 A kind of processing method of data, device and equipment
WO2020232591A1 (en) * 2019-05-19 2020-11-26 深圳齐心集团股份有限公司 Stationery information distributed planning system based on big data
CN110692047A (en) * 2019-05-19 2020-01-14 深圳齐心集团股份有限公司 Stationery information scheduling system based on big data
CN110618790B (en) * 2019-09-06 2023-04-28 上海电力大学 Mist storage data redundancy elimination method based on repeated data deletion
CN113254262B (en) * 2020-02-13 2023-09-05 中国移动通信集团广东有限公司 Database disaster recovery method and device and electronic equipment
CN112202910B (en) * 2020-10-10 2021-10-08 上海威固信息技术股份有限公司 Computer distributed storage system
CN114691430A (en) * 2022-04-24 2022-07-01 北京科技大学 Incremental backup method and system for CAD (computer-aided design) engineering data files

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216791A (en) * 2008-01-04 2008-07-09 华中科技大学 File backup method based on fingerprint
CN101706825A (en) * 2009-12-10 2010-05-12 华中科技大学 Replicated data deleting method based on file content types
CN101814045A (en) * 2010-04-22 2010-08-25 华中科技大学 Data organization method for backup services
CN101989929A (en) * 2010-11-17 2011-03-23 中兴通讯股份有限公司 Disaster recovery data backup method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216791A (en) * 2008-01-04 2008-07-09 华中科技大学 File backup method based on fingerprint
CN101706825A (en) * 2009-12-10 2010-05-12 华中科技大学 Replicated data deleting method based on file content types
CN101814045A (en) * 2010-04-22 2010-08-25 华中科技大学 Data organization method for backup services
CN101989929A (en) * 2010-11-17 2011-03-23 中兴通讯股份有限公司 Disaster recovery data backup method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI588670B (en) * 2016-05-25 2017-06-21 精品科技股份有限公司 System and method for segment backup

Also Published As

Publication number Publication date
CN101989929A (en) 2011-03-23
CN101989929B (en) 2014-07-02

Similar Documents

Publication Publication Date Title
WO2012065408A1 (en) Disaster tolerance data backup method and system
CN109871366B (en) Block chain fragment storage and query method based on erasure codes
CN104932956B (en) A kind of cloud disaster-tolerant backup method towards big data
US9639289B2 (en) Systems and methods for retaining and using data block signatures in data protection operations
US9251160B1 (en) Data transfer between dissimilar deduplication systems
EP2256934B1 (en) Method and apparatus for content-aware and adaptive deduplication
WO2017096532A1 (en) Data storage method and apparatus
US8543555B2 (en) Dictionary for data deduplication
CN102222085B (en) Data de-duplication method based on combination of similarity and locality
CN105069111B (en) Block level data duplicate removal method based on similitude in cloud storage
US11182256B2 (en) Backup item metadata including range information
US7680998B1 (en) Journaled data backup during server quiescence or unavailability
US9002800B1 (en) Archive and backup virtualization
CN109445702B (en) block-level data deduplication storage system
JP2013541083A (en) System and method for scalable reference management in a storage system based on deduplication
WO2017020576A1 (en) Method and apparatus for file compaction in key-value storage system
CN105487942A (en) Backup and remote copy method based on data deduplication
CN108415671B (en) Method and system for deleting repeated data facing green cloud computing
Sun et al. Data backup and recovery based on data de-duplication
CN102722450B (en) Storage method for redundancy deletion block device based on location-sensitive hash
US7949630B1 (en) Storage of data addresses with hashes in backup systems
US20140258237A1 (en) Handling restores in an incremental backup storage system
CN112416879B (en) NTFS file system-based block-level data deduplication method
US10678754B1 (en) Per-tenant deduplication for shared storage
US20210240350A1 (en) Method, device, and computer program product for recovering based on reverse differential recovery

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11841034

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11841034

Country of ref document: EP

Kind code of ref document: A1