CN101989929B - Disaster recovery data backup method and system - Google Patents

Disaster recovery data backup method and system Download PDF

Info

Publication number
CN101989929B
CN101989929B CN201010548146.1A CN201010548146A CN101989929B CN 101989929 B CN101989929 B CN 101989929B CN 201010548146 A CN201010548146 A CN 201010548146A CN 101989929 B CN101989929 B CN 101989929B
Authority
CN
China
Prior art keywords
data
data block
backup
block
current data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201010548146.1A
Other languages
Chinese (zh)
Other versions
CN101989929A (en
Inventor
赵巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201010548146.1A priority Critical patent/CN101989929B/en
Publication of CN101989929A publication Critical patent/CN101989929A/en
Priority to PCT/CN2011/073780 priority patent/WO2012065408A1/en
Application granted granted Critical
Publication of CN101989929B publication Critical patent/CN101989929B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/085Retrieval of network configuration; Tracking network configuration history
    • H04L41/0853Retrieval of network configuration; Tracking network configuration history by actively collecting configuration information or by backing up configuration information
    • H04L41/0856Retrieval of network configuration; Tracking network configuration history by actively collecting configuration information or by backing up configuration information by backing up or archiving configuration information

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a disaster recovery data backup method and a disaster recovery data backup system, and belongs to the field of network management. The method comprises the following steps of: receiving a data file to be backed up and transmitted by a network management system; segmenting the data file to be backed up to obtain segmented data blocks; by utilizing a weak calibration value hash algorithm and a strong calibration value hash algorithm, calculating data fingerprint values of the data blocks aiming at the data blocks and searching whether a target data block has the same data fingerprint value in the data file which is backed up; if the target data block has the same data fingerprint value, comparing the target data block with a current data block byte by byte; and backing the current data block according to a comparison result. The system comprises a receiving module, a segmentation module, a calculation and search module, a comparison module and a backup module. The technical scheme can improve the applicability of a data backup file, reduce the occupation of storage space and improve system performance.

Description

The method and system of disaster tolerance data backup
Technical field
The invention belongs to network management system, particularly a kind of method and system of the network management system long-distance disaster data backup based on data de-duplication.
Background technology
Network management system is the system of management communication network element, has configured the configuration data of the whole network network element, and these element configuration datas are extremely important, if there is no these configuration datas, network element just can not normally move business.Based on the consideration of disaster tolerance, configuration data need to get up in remote backup.Damaged once network management system suffers earthquake, fire etc., the allocation data recovering of remote backup can be come, to ensure that net element business can normally move.Generally speaking, the remote backup of configuration data requires back up once every day.
The remote backup technology of existing a kind of disaster tolerance data just simply exports to file by configuration data, and file was named according to the date, then copies files in remote backup system, but does like this problem that can produce data redundancy.Also have other network management systems to consider the processing of the redundant data of Backup Data, concrete processing method is as follows:
Configuration data is derived, generate text, in order to record the concrete configuration data of each network element.While copying the text of this generation to remote backup system, standby system can be by the configuration data of today and the configuration data of preserving yesterday contrast, extract the configuration data of vicissitudinous network element, be saved in the backup file of today, the configuration data of the network element not changing is not preserved.
There is obvious defect in this way: the file to network management system backup has strict demand, network management system and standby system will be observed same file format regulation, data backup and later recovery could be realized, all network management systems can not be adapted to, poor for applicability; In addition, also requiring backup file is text, and text can not compress, and takies memory space large, and transmits unpressed text, takies the network bandwidth, large to system resources consumption, affects systematic function.
Summary of the invention
In order to improve the applicability of backup data file, reduce memory space and take, improve systematic function, the invention provides a kind of method and system of disaster tolerance data backup, technical scheme is as follows:
A method for disaster tolerance data backup, comprising:
Receive the data file to be backed up that network management system sends;
Described data file to be backed up is cut apart to the data block that obtains cutting apart;
First utilize weak check value hash algorithm to calculate its first data fingerprint value for current data block, and with described the first data fingerprint value searching the target data block whether having with identical described the first data fingerprint value in backup data files, if had, utilize strong check value hash algorithm to calculate its second data fingerprint value for described current data block, and in described backup data files, search the described target data block of identical described the second data fingerprint value by searching the hash table of backup data files; The element entry that described hash table uses tlv triple to represent comprises: the MD5 check value of data block, the quantity of the data block that MD5 check value is identical and the logic call number of these data blocks; Wherein, the logic call number of the quantity of data block identical MD5 check value and these data blocks is combined to formation chained list;
If had, described target data block and described current data block are carried out to byte-by-byte comparison;
Carry out the backup of described current data block according to comparative result.
In the preferred embodiment of the present invention, the described backup of carrying out described current data block according to comparative result, comprising:
In the time that comparative result is identical, determines that described current data block is repeating data piece, and store the logic index information of described current data block;
In the time that comparative result is different, determines that described current data block is new unique data piece, and store the metamessage of described current data block.
In the preferred embodiment of the present invention, if do not find the target data block of identical data fingerprint value in backup data files, store the metamessage of described current data block.
In the preferred embodiment of the present invention, according to NE quantity, described data file to be backed up is cut apart the data block that obtains cutting apart.
In the preferred embodiment of the present invention, the metamessage of described current data block comprises: the logic index information of current data block, current data block, the weak check value of current data block and strong check value.
A system for disaster tolerance data backup, comprising:
Receiver module, the data file to be backed up sending for receiving network management system;
Cut apart module, for described data file to be backed up is cut apart, the data block that obtains cutting apart;
Calculate and search module, for first utilizing weak check value hash algorithm to calculate its first data fingerprint value for current data block, and with described the first data fingerprint value searching the target data block whether having with identical described the first data fingerprint value in backup data files, if had, utilize strong check value hash algorithm to calculate its second data fingerprint value for described current data block, and in described backup data files, search the described target data block of identical described the second data fingerprint value by searching the hash table of backup data files; The element entry that described hash table uses tlv triple to represent comprises: the MD5 check value of data block, the quantity of the data block that MD5 check value is identical and the logic call number of these data blocks; Wherein, the logic call number of the quantity of data block identical MD5 check value and these data blocks is combined to formation chained list;
Comparison module, in the time that backup data files finds the target data block of identical data fingerprint value, carries out byte-by-byte comparison by described target data block and described current data block;
Backup module, for carrying out the backup of described current data block according to the comparative result of described comparison module.
In the preferred embodiment of the present invention, described comparison module, specifically in the time that comparative result is identical, determines that described current data block is repeating data piece, and stores the logic index information of described current data block;
In the time that comparative result is different, determines that described current data block is new unique data piece, and store the metamessage of described current data block.
In the preferred embodiment of the present invention, described backup module, if also for not finding the target data block of identical data fingerprint value in backup data files, store the metamessage of described current data block.
In the preferred embodiment of the present invention, the metamessage of described current data block comprises: the logic index information of current data block, current data block, the weak check value of current data block and strong check value.
The present invention is by cutting apart the data file to be backed up receiving, then utilize weak check value hash algorithm and strong check value hash algorithm to calculate its data fingerprint value to the data block of cutting apart, carry out Hash lookup taking this data fingerprint value as keyword, finding after the target data block of identical data fingerprint value, target data block and current data block are carried out to byte-by-byte comparison, and carry out the backup of data block according to comparative result, can realize the data file of various forms is backed up, improve the applicability of backup file; Can carry out in real time the deletion of repeating data, can effectively control the sharp increase of Backup Data, thereby increase effective memory space, improve storage efficiency; And backup file can compress, reduce taking of the network bandwidth, improve systematic function.
Brief description of the drawings
Accompanying drawing described herein is used to provide a further understanding of the present invention, forms a part of the present invention, and schematic description and description of the present invention is used for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of the method for disaster tolerance data backup provided by the invention;
Fig. 2 is the detail flowchart of the method for network management system long-distance disaster provided by the invention data backup;
Fig. 3 is the structure chart of the system of disaster tolerance data backup provided by the invention.
Embodiment
In order to make technical problem to be solved by this invention, technical scheme and beneficial effect clearer, clear, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.
As shown in Figure 1, the invention provides a kind of method of disaster tolerance data backup, comprising:
Step 101, receives the data file to be backed up that network management system sends;
Step 102, treats backup data files and cuts apart, the data block that obtains cutting apart;
Step 103, utilizes weak check value hash algorithm and strong check value hash algorithm, calculates its data fingerprint value for current data block, and searching the target data block that whether has identical data fingerprint value in backup data files;
Step 104, if had, carries out byte-by-byte comparison by target data block and current data block;
Step 105, carries out the backup of described current data block according to comparative result.
In a preferred embodiment of the invention, if do not find the target data block of identical data fingerprint value in backup data files, store the metamessage of current data block.
In a preferred embodiment of the invention, carry out the backup of described current data block according to comparative result, comprising: in the time that comparative result is identical, determines that current data block is repeating data piece, and store the logic index information of current data block; In the time that comparative result is different, determines that current data block is new unique data piece, and store the metamessage of current data block.
In a preferred embodiment of the invention, utilize weak check value hash algorithm and strong check value hash algorithm, calculate its data fingerprint value for current data block, and searching the target data block that whether has identical data fingerprint value in backup data files, comprising:
First utilize weak check value hash algorithm to calculate its first data fingerprint value for current data block, and with described the first data fingerprint value searching the target data block whether having with identical the first data fingerprint value in backup data files, if had, utilize strong check value hash algorithm to calculate its second data fingerprint value for current data block, and in backup data files, search the target data block of identical the second data fingerprint value.
In a preferred embodiment of the invention, treat backup data files according to NE quantity and cut apart, the data block that obtains cutting apart.
In a preferred embodiment of the invention, metamessage comprises the logic index information of current data block, current data block, weak check value and the strong check value of current data block.
Below in conjunction with accompanying drawing, the invention process process is described in detail.
As shown in Figure 2, the method for network management system long-distance disaster provided by the invention data backup, comprising:
Step 201, network management system derives the data file to be backed up of network element configuration, and by this transmitting data file to remote backup system.
Wherein, this data file can be text or binary any formatted file.
Step 202, standby system receives data file to be backed up, according to NE quantity, this data file to be backed up is divided into one group of data block.
Particularly, adopt adopted good data block size in advance to treat backup data files and cut apart.Data block size can determine according to NE quantity, and the configuration data file of 10000 network elements can be 300MB.The data block granularity of divided file is too thin, and system resource overhead is too large; Granularity is excessively thick, the poor effect of data de-duplication.Need between balance compromise, draw following empirical value according to test: network element is in 1000, data block size can be 1KB, and network element data block size between 1000 to 5000 can be 4KB, and network element data block size between 5000 to 10000 can be 8KB.
After treating backup data files and cutting apart, standby system distributes unique data block logic index information to each data block, and this logic index information can be logic call number.
Step 203, standby system is to current data block calculated data fingerprint, and taking data fingerprint value as keyword carrying out Hash lookup in backup data files, obtain the target data block that data fingerprint is identical.
Particularly, data fingerprint is the substantive characteristics of data block, and each unique data piece has unique data fingerprint value, and data fingerprint value is that data block contents is carried out to Hash mathematical operation acquisition.Conventional hash algorithm has FNV1, CRC, MD5, SHA1, SHA-256, SHA-512 etc.Different hash algorithm collision probability of happening differences (hash algorithm all exists collision problem, and different pieces of information piece may produce identical data fingerprint), the figure place of the data fingerprint value of calculating is also different, and corresponding amount of calculation is also different.The hash algorithm with lower collision probability of happening and more data fingerprint value figure place, its amount of calculation is much larger.
The data fingerprint of calculated data piece, need to weigh at work aspect performance and Information Security, and CRChash algorithm is weak verification hash algorithm, calculates fast.In the present embodiment, the data fingerprint value that adopts CRC algorithm to calculate is 32.MD5hash algorithm is strong verification hash algorithm, has low-down collision probability of happening, and the data fingerprint value calculating is 128.Wherein, strong hash algorithm and weak hash algorithm, normally using 128 as differentiation standard, lower than 128, belong to weak hash algorithm, belong to strong hash algorithm higher than 128.Standby system uses CRC hash algorithm and MD5hash algorithm to be data block calculated data fingerprint, specific as follows:
For each data block of having cut apart in step 202, first calculate by CRC hash algorithm calculating CRC check value, then in backup data files, carrying out hash taking this CRC check value as keyword searches, judge whether to be worth identical occurrence with this CRC check, if do not had, represent that this data block is new unique data piece, now store the logic call number of this data block, this data block and the CRC check value of this data block and MD5 check value; If exist, calculate the MD5 check value of this data block with MD5hash algorithm, and carry out Hash lookup with this MD5 check value in the data file having backed up, judge whether the occurrence that this MD5 check value is identical, if had, may there is repeating data piece in judgement, and proceed to step 204; If no, store this data block, and create relevant meta information.
Generally speaking, MD5Hash algorithm can not produce collision, and the MD5 check value that a data block (block) is calculated is unique, that is to say the corresponding unique data fingerprint value of a data block, can shine upon to represent with 1:1.The element entry of traditional hash table uses two element group representations:
<md5_hashkey, block>, wherein md5_hashkey represents the md5 check value of data block.
But in actual conditions, may exist the MD5 check value of two data blocks identical, for example, the MD5 check value that data block 1 is calculated equals the MD5 check value that data block 2 is calculated, the corresponding data fingerprint value of multiple data blocks, at this moment just need to shine upon to represent with 1:n.In the present invention, use tlv triple to represent the element entry of hash table:
<md5_hashkey,block_nr,block_IDs>
Wherein, md5_hashkey represents data block MD5 check value, and block_nr represents the data block quantity that MD5 check value is identical, and block_IDs represents the logic call number of these data blocks.In algorithm design of the present invention, block_nr and block_IDs are incorporated in to a formation chained list, structure is as follows:
block_nr|block_ID1|block_ID2|...|block_IDn
Wherein block_ID1|block_ID2|...|block_IDn is data block logical block number (LBN) chained list, below represents data block logic call number chained list, the length that block_nr is chained list with block_ID list.
In the present invention, be actually and use chained list method to solve hash collision problem, the block_ID list indefinite length of the element entry of each hash table.Utilize chained list method in the data file having backed up, to search the target data block identical with MD5 check value as follows:
(1) the MD5 check value hashkey of calculating current data block block, i.e. hashkey=hash_md5 (block);
(2) search the hash table of backup data files with hashkey, bindex=hash_value (hashkey, hash table), wherein, bindex represents the logic call number of current data block;
(3) if in hash table, do not find coupling element entry, i.e. bindex==NULL, directly inserts hashkey hash table, and block_nr=1, the logic call number of block_ID1=current data block block;
(4) if find coupling element entry in hash table, judge that this data block may be repeating data piece, all there is collision in possibility CRC hash algorithm, MD5hash algorithm also, and this data block is not repeating data piece.
Be first to utilize weak check value algorithm to calculate the first data fingerprint value of current data block in this step, then carry out data fingerprint value and search; And then utilize strong check value algorithm to calculate the second data fingerprint value of current data block, then carry out data fingerprint value and search; In actual applications, also can first utilize strong check value algorithm to calculate the second data fingerprint value of current data block, then carry out data fingerprint value and search; And then utilize weak check value algorithm to calculate the first data fingerprint value of current data block, then carry out data fingerprint value and search, concrete principle is similar, does not repeat them here.
Step 204, the target data block obtaining and current data block are carried out byte level comparison by standby system, if comparative result is identical, proceeds to step 205; If comparative result difference, proceeds to step 206.
Step 205, determines that this current data block is repeating data piece, stores the logic call number of this current data block.
Step 206, determines that this current data block is new unique, the metamessage of storage current data block, and this metamessage comprises: logic call number, CRC check value and the MD5 check value of this current data block, this current data block.
For step 204-206, accept step 203, finding after identical match item, travel through the block_ID list of this coupling element entry, corresponding target data block and the current data block of each data block logic call number in block_ID list carried out to byte-by-byte comparison, if identical, illustrate that current data block exists, store the logic call number of this current data block; If do not find identical block, store this data block, current data block is inserted to block_ID list ending, and by current data block writing in files, and block_nr numerical value increases the logic call number of 1, block_IDn=current data block.Wherein, data storage of the present invention can adopt RAID5 mode.
So far, data file represents at the just corresponding logical file of standby system, and the metamessage being made up of one group of data fingerprint forms.
Complete the data file backup of network element configuration, in the situation that occurring needing to recover, standby system carries out file and reads, first read logical file, then according to data block fingerprint, take out respective data blocks, reduction physics duplicate of the document, then this duplicate of the document is issued to network management system for recovering.
As shown in Figure 3, the invention provides a kind of system of disaster tolerance data backup, comprising:
Receiver module 301, the data file to be backed up sending for receiving network management system;
Cut apart module 302, cut apart the data block that obtains cutting apart for treating backup data files;
Calculate and search module 303, for utilizing weak check value hash algorithm and strong check value hash algorithm, calculating its data fingerprint value for current data block, and searching the target data block that whether has identical data fingerprint value in backup data files;
Comparison module 304, in the time that backup data files finds the target data block of identical data fingerprint value, carries out byte-by-byte comparison by target data block and current data block;
Backup module 305, for carrying out the backup of current data block according to the comparative result of comparison module 304.
In a preferred embodiment of the invention, backup module 305, if also for not finding the target data block of identical data fingerprint value in backup data files, store the metamessage of current data block.
In a preferred embodiment of the invention, comparison module 304, specifically in the time that comparative result is identical, determines that current data block is repeating data piece, and stores the logic index information of current data block;
In the time that comparative result is different, determines that current data block is new unique data piece, and store the metamessage of current data block.
In a preferred embodiment of the invention, calculate and search module 303, specifically for first utilizing weak check value hash algorithm to calculate its first data fingerprint value for current data block, and with the first data fingerprint value searching the target data block whether having with identical described the first data fingerprint value in backup data files, if had, utilize strong check value hash algorithm to calculate its second data fingerprint value for current data block, and in backup data files, search the target data block of identical described the second data fingerprint value.
In a preferred embodiment of the invention, cut apart module 302, specifically for according to NE quantity, described data file to be backed up being cut apart, the data block that obtains cutting apart.
Existing long-distance disaster data back up method is on text basis, compares the data de-duplication carrying out according to content of text.The backup of existing disaster tolerance data has 80% Data duplication rate, but text is relatively deleted the mode of redundancy, does not reach 80% data de-duplication rate.And be example for binary file incapability, limit the form of backup file, poor for applicability.The method of disaster tolerance data backup provided by the invention goes for the backup file of various forms, such as text, binary data library file etc.Owing to being that binary data blocks based on several KB compares to delete redundant data, data de-duplication rate is high, can approach 80%.And the number of times of Backup Data is more, interval is shorter, and data de-duplication is than just higher.
Prior art is single for dividing method or the partition strategy of data file, or is according to file content types, carries out in advance block boundary feature calculation, and then piecemeal, but actual using on network management system to configure data, data de-duplication rate is not high.The method of disaster tolerance data backup provided by the invention is for this specific file format of data file of network management configuration, with this occasion of configuration data periodic backups, according to NE quantity specified data block size, effectively improve the data de-duplication rate of Backup Data.
A kind of hash algorithm of available technology adopting calculated data fingerprint, the mode that the method for disaster tolerance data backup provided by the invention adopts weak check value hash algorithm and strong check value hash algorithm to combine is carried out calculated data fingerprint, algorithm speed is fast, greatly reduce with less performance cost the probability that collision produces, improved systematic function.In addition, the mode that weak check value hash algorithm and strong check value hash algorithm combine, can carry out in real time the deletion of repeating data, standby system is receiving after the data file of network management system, can carry out online the deletion of repeating data, change into local logical file and store, processed offline while more not needing that by the time follow-up system is available free.Whole treatment cycle is short, can tackle promptly to carry out the remote backup operation that interval is very short.
Existing disaster tolerance data back up method has only adopted the hash algorithm that collision probability is less in the time of deleting duplicated data, do not solve the collision problem of hash algorithm, so can not be used to the application scenario of network management system long-distance disaster data backup, will produce huge economic loss once bump.The method of disaster tolerance data backup provided by the invention is by traveling through the identical data block of all data fingerprint values, and carry out byte and completely relatively solve collision problem, the Information Security of network management system is improved greatly, and the long-distance disaster data backup that can be applied to network management system to configure data is this in the very high occasion of the security requirement of data.
In sum, the backup method of disaster tolerance data provided by the invention goes for various document format datas, and applicability is strong; Can effectively control the sharp increase of Backup Data, thereby increase effective memory space, improve storage efficiency, and then saved storage total cost and management cost; Can save the network bandwidth of transfer of data; Can save space, supply of electric power, the O&M cost such as cooling.
Above-mentioned explanation illustrates and has described a preferred embodiment of the present invention, but as previously mentioned, be to be understood that the present invention is not limited to disclosed form herein, should not regard the eliminating to other embodiment as, and can be used for various other combinations, amendment and environment, and can, in invention contemplated scope described herein, change by technology or the knowledge of above-mentioned instruction or association area.And the change that those skilled in the art carry out and variation do not depart from the spirit and scope of the present invention, all should be in the protection range of claims of the present invention.

Claims (9)

1. a method for disaster tolerance data backup, is characterized in that, comprising:
Receive the data file to be backed up that network management system sends;
Described data file to be backed up is cut apart to the data block that obtains cutting apart;
First utilize weak check value hash algorithm to calculate its first data fingerprint value for current data block, and with described the first data fingerprint value searching the target data block that whether has identical described the first data fingerprint value in backup data files, if had, utilize strong check value hash algorithm to calculate its second data fingerprint value for described current data block, and in described backup data files, search by searching the hash table of backup data files the described target data block that has identical described the second data fingerprint value; The element entry that described hash table uses tlv triple to represent comprises: the MD5 check value of data block, the quantity of the data block that MD5 check value is identical and the logic call number of these data blocks; Wherein, the logic call number of the quantity of data block identical MD5 check value and these data blocks is combined to formation chained list;
If had, described target data block and described current data block are carried out to byte-by-byte comparison;
Carry out the backup of described current data block according to comparative result.
2. the method for claim 1, is characterized in that, the described backup of carrying out described current data block according to comparative result, comprising:
In the time that comparative result is identical, determines that described current data block is repeating data piece, and store the logic index information of described current data block;
In the time that comparative result is different, determines that described current data block is new unique data piece, and store the metamessage of described current data block.
3. the method for claim 1, is characterized in that, if do not find the target data block of identical data fingerprint value in backup data files, stores the metamessage of described current data block.
4. the method as described in claim 1 to 3 any one, is characterized in that, according to NE quantity, described data file to be backed up is cut apart the data block that obtains cutting apart.
5. method as claimed in claim 2 or claim 3, is characterized in that, the metamessage of described current data block comprises: the logic index information of current data block, current data block, the weak check value of current data block and strong check value.
6. a system for disaster tolerance data backup, is characterized in that, comprising:
Receiver module, the data file to be backed up sending for receiving network management system;
Cut apart module, for described data file to be backed up is cut apart, the data block that obtains cutting apart;
Calculate and search module, for first utilizing weak check value hash algorithm to calculate its first data fingerprint value for current data block, and with described the first data fingerprint value searching the target data block that whether has identical described the first data fingerprint value in backup data files, if had, utilize strong check value hash algorithm to calculate its second data fingerprint value for described current data block, and in described backup data files, search by searching the hash table of backup data files the described target data block that has identical described the second data fingerprint value; The element entry that described hash table uses tlv triple to represent comprises: the MD5 check value of data block, the quantity of the data block that MD5 check value is identical and the logic call number of these data blocks; Wherein, the logic call number of the quantity of data block identical MD5 check value and these data blocks is combined to formation chained list;
Comparison module, in the time that backup data files finds the target data block of identical the second data fingerprint value, carries out byte-by-byte comparison by described target data block and described current data block;
Backup module, for carrying out the backup of described current data block according to the comparative result of described comparison module.
7. system as claimed in claim 6, is characterized in that, described comparison module, specifically in the time that comparative result is identical, is determined that described current data block is repeating data piece, and stored the logic index information of described current data block;
In the time that comparative result is different, determines that described current data block is new unique data piece, and store the metamessage of described current data block.
8. system as claimed in claim 6, is characterized in that, described backup module, if also for not finding the target data block of identical data fingerprint value in backup data files, store the metamessage of described current data block.
9. system as claimed in claim 7, is characterized in that, the metamessage of described current data block comprises: the logic index information of current data block, current data block, the weak check value of current data block and strong check value.
CN201010548146.1A 2010-11-17 2010-11-17 Disaster recovery data backup method and system Active CN101989929B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201010548146.1A CN101989929B (en) 2010-11-17 2010-11-17 Disaster recovery data backup method and system
PCT/CN2011/073780 WO2012065408A1 (en) 2010-11-17 2011-05-06 Disaster tolerance data backup method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010548146.1A CN101989929B (en) 2010-11-17 2010-11-17 Disaster recovery data backup method and system

Publications (2)

Publication Number Publication Date
CN101989929A CN101989929A (en) 2011-03-23
CN101989929B true CN101989929B (en) 2014-07-02

Family

ID=43746287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010548146.1A Active CN101989929B (en) 2010-11-17 2010-11-17 Disaster recovery data backup method and system

Country Status (2)

Country Link
CN (1) CN101989929B (en)
WO (1) WO2012065408A1 (en)

Families Citing this family (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101989929B (en) * 2010-11-17 2014-07-02 中兴通讯股份有限公司 Disaster recovery data backup method and system
CN102156727A (en) * 2011-04-01 2011-08-17 华中科技大学 Method for deleting repeated data by using double-fingerprint hash check
CN102184198B (en) * 2011-04-22 2016-04-27 张伟 Be applicable to the data de-duplication method of operating load protection system
CN102202098A (en) * 2011-05-25 2011-09-28 成都市华为赛门铁克科技有限公司 Data processing method and device
CN102799598A (en) * 2011-05-25 2012-11-28 英业达股份有限公司 Data recovery method for deleting repeated data
CN102541685A (en) * 2011-11-16 2012-07-04 中标软件有限公司 Linux system backup method and Linux system repair method
CN103428242B (en) * 2012-05-18 2016-12-14 阿里巴巴集团控股有限公司 A kind of method of increment synchronization, Apparatus and system
CN103713963B (en) * 2012-09-29 2017-06-23 南京壹进制信息技术股份有限公司 A kind of efficient file backup and restoration methods
CN103034564B (en) * 2012-12-05 2016-06-15 华为技术有限公司 Data disaster tolerance drilling method, data disaster tolerance practice device and system
CN103269351A (en) * 2012-12-07 2013-08-28 北京奇虎科技有限公司 File download method and device
CN103269352A (en) * 2012-12-07 2013-08-28 北京奇虎科技有限公司 Point-to-point (P2P) file downloading method and device
CN103259729B (en) * 2012-12-10 2018-03-02 上海德拓信息技术股份有限公司 Network data compaction transmission method based on zero collision hash algorithm
CN103365745A (en) * 2013-06-07 2013-10-23 上海爱数软件有限公司 Block level backup method based on content-addressed storage and system
CN103399853A (en) * 2013-06-28 2013-11-20 苏州海客科技有限公司 Method for selecting file cutting granularity
CN103473278A (en) * 2013-08-28 2013-12-25 苏州天永备网络科技有限公司 Repeating data processing technology
CN103744939B (en) * 2013-12-31 2017-07-14 华为技术有限公司 A kind of recording method of daily record, the restoration methods and log manager of daily record
CN104750743A (en) * 2013-12-31 2015-07-01 中国银联股份有限公司 System and method for ticking and rechecking transaction files
CN103795783A (en) * 2014-01-14 2014-05-14 上海上讯信息技术股份有限公司 Data synchronization method and system
CN103970852A (en) * 2014-05-06 2014-08-06 浪潮电子信息产业股份有限公司 Data de-duplication method of backup server
CN103942125A (en) * 2014-05-06 2014-07-23 南宁博大全讯科技有限公司 Automatic backup method and system
CN104268034B (en) * 2014-10-09 2017-11-07 中国人民解放军国防科学技术大学 A kind of data back up method and device and data reconstruction method and device
CN104375905A (en) * 2014-11-07 2015-02-25 北京云巢动脉科技有限公司 Incremental backing up method and system based on data block
CN104484402B (en) * 2014-12-15 2018-02-09 新华三技术有限公司 A kind of method and device of deleting duplicated data
CN106934293B (en) * 2015-12-29 2020-04-24 航天信息股份有限公司 Collision calculation device and method for digital abstract
CN107346271A (en) * 2016-05-05 2017-11-14 华为技术有限公司 The method and calamity of Backup Data block are for end equipment
TWI588670B (en) * 2016-05-25 2017-06-21 精品科技股份有限公司 System and method for segment backup
CN106326035A (en) * 2016-08-13 2017-01-11 南京叱咤信息科技有限公司 File-metadata-based incremental backup method
CN106802841B (en) * 2017-01-19 2020-06-09 四川奥诚科技有限责任公司 Data extraction and analysis method and device and server
CN106817419B (en) * 2017-01-19 2020-06-30 四川奥诚科技有限责任公司 VoLTE AS network element-based data extraction and analysis method and device and service terminal
CN107704342A (en) * 2017-09-26 2018-02-16 郑州云海信息技术有限公司 A kind of snap copy method, system, device and readable storage medium storing program for executing
CN107729766B (en) * 2017-09-30 2020-02-07 中国联合网络通信集团有限公司 Data storage method, data reading method and system thereof
CN108090355B (en) * 2017-11-28 2020-10-27 西安交通大学 APK automatic triggering tool
CN108089949A (en) * 2017-12-29 2018-05-29 广州创慧信息科技有限公司 A kind of method and system of automatic duplicating of data
CN108304503A (en) * 2018-01-18 2018-07-20 阿里巴巴集团控股有限公司 A kind of processing method of data, device and equipment
WO2020232592A1 (en) * 2019-05-19 2020-11-26 深圳齐心集团股份有限公司 Stationery information scheduling system based on big data
WO2020232591A1 (en) * 2019-05-19 2020-11-26 深圳齐心集团股份有限公司 Stationery information distributed planning system based on big data
CN110618790B (en) * 2019-09-06 2023-04-28 上海电力大学 Mist storage data redundancy elimination method based on repeated data deletion
CN113254262B (en) * 2020-02-13 2023-09-05 中国移动通信集团广东有限公司 Database disaster recovery method and device and electronic equipment
CN112202910B (en) * 2020-10-10 2021-10-08 上海威固信息技术股份有限公司 Computer distributed storage system
CN114691430A (en) * 2022-04-24 2022-07-01 北京科技大学 Incremental backup method and system for CAD (computer-aided design) engineering data files

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216791A (en) * 2008-01-04 2008-07-09 华中科技大学 File backup method based on fingerprint
CN101706825A (en) * 2009-12-10 2010-05-12 华中科技大学 Replicated data deleting method based on file content types
CN101814045A (en) * 2010-04-22 2010-08-25 华中科技大学 Data organization method for backup services

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101989929B (en) * 2010-11-17 2014-07-02 中兴通讯股份有限公司 Disaster recovery data backup method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216791A (en) * 2008-01-04 2008-07-09 华中科技大学 File backup method based on fingerprint
CN101706825A (en) * 2009-12-10 2010-05-12 华中科技大学 Replicated data deleting method based on file content types
CN101814045A (en) * 2010-04-22 2010-08-25 华中科技大学 Data organization method for backup services

Also Published As

Publication number Publication date
CN101989929A (en) 2011-03-23
WO2012065408A1 (en) 2012-05-24

Similar Documents

Publication Publication Date Title
CN101989929B (en) Disaster recovery data backup method and system
CN109871366B (en) Block chain fragment storage and query method based on erasure codes
US9268783B1 (en) Preferential selection of candidates for delta compression
US8918390B1 (en) Preferential selection of candidates for delta compression
US20170199699A1 (en) Systems and methods for retaining and using data block signatures in data protection operations
Geer Reducing the storage burden via data deduplication
US9141301B1 (en) Method for cleaning a delta storage system
CN102411637B (en) Metadata management method of distributed file system
CN104932841A (en) Saving type duplicated data deleting method in cloud storage system
CN102722583A (en) Hardware accelerating device for data de-duplication and method
US9400610B1 (en) Method for cleaning a delta storage system
US20120259825A1 (en) Data management method and data management system
US9165001B1 (en) Multi stream deduplicated backup of collaboration server data
CN102222085A (en) Data de-duplication method based on combination of similarity and locality
CN103970852A (en) Data de-duplication method of backup server
CN106708653B (en) Mixed tax big data security protection method based on erasure code and multiple copies
TW202111564A (en) Log-structured storage systems
CN102831222A (en) Differential compression method based on data de-duplication
TW202113580A (en) Log-structured storage systems
WO2017096532A1 (en) Data storage method and apparatus
WO2017020576A1 (en) Method and apparatus for file compaction in key-value storage system
CN103067525A (en) Cloud storage data backup method based on characteristic codes
CN105487942A (en) Backup and remote copy method based on data deduplication
CN104735110A (en) Metadata management method and system
CN108415671B (en) Method and system for deleting repeated data facing green cloud computing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant