CN101650677A - File data backup method based on Delta increment - Google Patents
File data backup method based on Delta increment Download PDFInfo
- Publication number
- CN101650677A CN101650677A CN200910017342A CN200910017342A CN101650677A CN 101650677 A CN101650677 A CN 101650677A CN 200910017342 A CN200910017342 A CN 200910017342A CN 200910017342 A CN200910017342 A CN 200910017342A CN 101650677 A CN101650677 A CN 101650677A
- Authority
- CN
- China
- Prior art keywords
- delta
- file
- module
- backup
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 15
- 101150060512 SPATA6 gene Proteins 0.000 claims abstract description 13
- 230000006870 function Effects 0.000 claims abstract description 7
- 238000001914 filtration Methods 0.000 claims abstract description 4
- 239000000284 extract Substances 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 230000004048 modification Effects 0.000 claims description 3
- 238000012986 modification Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 abstract 1
- 230000008859 change Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 210000003739 neck Anatomy 0.000 description 1
Images
Abstract
The invention provides a file data backup method based on Delta increment, which uses the Delta file increment technology to reduce memory space of data, enables a disc to leave more backup space andrealizes that backup data can be kept for longer time and a lot of bandwidth required in off-line storage can be saved. The system structure comprises a Delta Sequence module, a Delta reading module,a Delta merging module, a Delta comparing module and a Delta Processor module; the functions of the modules and the file backup steps are as follows: the Delta Sequence module is used for comparing whether the two files are same or not by comparing files, bytes and Hash, so that the module is the sequencing of a file sequence module, a byte sequence module and a Hash sequence module, and has the function of comparing and filtering, thus rapidly comparing whether the two sequences are same or not by transporting among sequences; the module has the advantages of increasing disc backup use ratio,saving backup space and dealing the challenge faced by data sharp increase when being applied into a data backup system.
Description
Technical field
The present invention is a kind of file incremental backup technology, is generally used for the standby system based on file, is intended to reduce the memory capacity of using in the storage system.Adopt " Delta file increment technology can be original 1/10 with the data reduction of storage; thus abdicate more backup space; not only can make the Backup Data on the disk preserve the longer time, but also required a large amount of bandwidth can save offline storage the time.
Background technology
Memory space inadequate is IT personnel pain in the necks always, because just will not buy more memory device, more will face all setting work that comes one after another behind the storage architecture of adjusting.Just much less the complicated loaded down with trivial details of these work in the process of extended storage capacity, more may need to shut down, and this can badly influence the normal operation of enterprise.
Enterprise must regularly carry out data backup for protected data, and this is one of reason of the quick accumulation of data.Especially now some enterprise begins to backup to earlier speed disk faster, back up to equipment such as tape more one by one, for must catch up with the same day come off duty to the next day finish before the working for the enterprise of a large amount of backups, Disk Backup is a good method, backup is fast, answer is also fast, but Disk Backup can be quickened the consumption of disk space undoubtedly.
In general there are a large amount of files and mail in enterprise in using, if each backup is all carried out full backup one time with All Files and data, that will need very large storage space.Generally adopt the mode of incremental backup and differential backup based on this reason industry.Differential backup (differential backup) can not removed the filing piece after backup is finished, and incremental backup can be removed the filing piece after finishing, so just can avoid some file unnecessarily to be backed up once more.Use the filing piece can also make the user view those files truly and need backup.
Be that incremental backup or differential backup all run into a same problem, that is to say for similar Outlook PST file and database file, file is bigger, and often change, thereby think all when adopting incremental backup or differential backup that therefore file backup has taken place carried out full backup.
Therefore how to provide a kind of method, when making it solve big file change, only the file part rather than the whole files that change of backup is that present data sharply increase the challenge that faces.
Summary of the invention
The purpose of this invention is to provide a kind of file incremental backup technology, be generally used for standby system, be intended to reduce the memory capacity of using in the storage system based on file.
The objective of the invention is to realize in the following manner, adopt " Delta file increment technology reduces the memory space of data; make disk abdicate more backup space; realize a large amount of bandwidth required when Backup Data on the disk is preserved the longer time and saved offline storage; system architecture comprises: Delta Sequence block, Delta read module, Delta merge module, Delta comparison module, Delta Processor processing module, and wherein functions of modules and file backup step are as follows:
Delta Sequence block: relatively whether two files are identical, need come comparison by file, byte and Hash, therefore this module is the ordering of file sequence, byte sequence and Hash block, have the comparison filtering function, thereby whether contrast two sequences by the transportation in sequence rapidly identical;
The Delta read module: a file is through after revising, and its part different with original relatively extracts by Delta, and has formed a Delta delta file separately, more all is the different piece of latest document and source document promptly at every turn;
Delta merges module: file is through repeatedly producing a lot of Delta versions after relatively, along with the increased frequency of revising, new file and source document Delta increment also can be bigger, adopt this moment Delta to merge module, nearest Delta incremental data is merged in the source file, and with the file of up-to-date merging source file as this file, next time, file was when revising, be benchmark then with up-to-date file, carry out Delta relatively, so that reduce the ever-increasing problem of Delta delta file;
Delta comparison module: by reading in the mode comparison document inconsequent part of file or byte line by line, the part that changes is taken out separately, and index record is passed through at the data seat, form a Delta delta file relatively, by this delta file, with the file of relatively more synthetic modification this moment of source document;
Delta Processor processing module: multitask Delta data comparison mechanism is provided, guarantees a plurality of files, compare simultaneously between a plurality of byte streams, relative efficiency is provided;
In architecture, generate the HASH hash of unique sign for the delegation of each, and compare the difference part of this delegation's byte stream, can correctly obtain the part of difference to guarantee two parts of different files.
In architecture, merge in the resume module step at Delta, each Delta delta file all merges with previous source document, forms the file of this time adjustment, that is to say the corresponding and FileVersion of each Delta delta file.
In architecture, be each document definition file sequence, Hash sequence and byte sequence.
The invention has the beneficial effects as follows: suppose in enterprise uses, if a 1G file has been modified 100 times, each data have only been revised 10K, and traditional backup method need back up whole file fully, that is to say the space that needs 100G, and adopt this file incremental backup technology based on the Delta technology, and only do not need to back up 10K, change 100 times, only needing to back up the data of 1000K, is several thousand times gap than 100G.Backup particularly Network Based will significantly reduce taking of the network bandwidth, thereby backup efficient is provided greatly.
Therefore adopt this technology, it is applied in the data backup system, can increase the Disk Backup utilization factor, sharply increase the challenge that faces thereby save backup space reply data.
Description of drawings
Accompanying drawing 1 is based on the file backup structural drawing of Delta technology;
Accompanying drawing 2 is based on the file increment variation diagram first time of Delta technology;
Accompanying drawing 3 is based on the file increment variation diagram second time of Delta technology;
Accompanying drawing 4 is based on the file merging/filing figure of Delta technology.
Embodiment
With reference to the accompanying drawings, method of the present invention is done following detailed explanation.
Adopt " Delta file increment technology can be original 1/10 with the data reduction of storage; thus abdicate more backup space; not only can make the Backup Data on the disk preserve the longer time, but also required a large amount of bandwidth can save offline storage the time.This system architecture comprises: Delta Sequence block, Delta read module, Delta merge module, Delta comparison module, Delta Processor processing module, as shown in the figure, and wherein:
Delta Sequence block: for relatively whether two files are identical, need come comparison, so this module is the ordering of file sequence, byte sequence and Hash block, relatively filtering function by file, byte and Hash.Thereby it is whether identical by two sequences of the contrast of the transportation in sequence rapidly.
The Delta read module: a file is through after revising, and its part different with original relatively extracts by Delta, and has formed a Delta delta file separately.More all be the different piece of latest document and source document promptly at every turn.
Delta merges module: file is through repeatedly producing a lot of Delta versions after relatively, along with the increased frequency of revising, new file and source document Delta increment may also can be bigger, can adopt this moment Delta to merge module, nearest Delta incremental data is merged in the source file, and with the file of up-to-date merging source file as this file, next time, file was when revising, be benchmark then with up-to-date file, carry out Delta relatively, so that reduce the ever-increasing problem of Delta delta file.
Delta comparison module:, the part that changes is taken out separately, and index record is passed through at the data seat by reading in the mode comparison document inconsequent part of file or byte line by line.Form a Delta delta file relatively, relatively can synthesize the file of revising this moment by this delta file and source document.
Delta Processor processing module: multitask Delta data comparison mechanism is provided, guarantees a plurality of files, compare simultaneously between a plurality of byte streams, relative efficiency is provided.
Embodiment
Content of the present invention is described the process that realizes this architecture with an instantiation.
In accompanying drawing 2, description be file through revising, generate the process of a Delta delta file at last by the Delta increment technique.On the basis of source file, variation has all taken place in the first interline byte, second first byte of row, latter two byte of the third line.By the analysis of Delta comparison module and source file, the byte separating treatment of Bian Huaing has the most at last formed appearance last in the picture.Just just the byte-extraction that changes is come out, remain unchanged for the byte that does not have to change.
In accompanying drawing 3, with the first time Delta comparing class seemingly, twice file more all is to compare with source file.Rather than on previous basis, compare, this mainly be preceding once relatively after, formed relatively more fixing Hash sequence and byte sequence, just do not need regenerate relatively the time next time, thereby improve efficient relatively twice.Relatively with for the first time similar, also be to generate the Delta delta file for the second time.
Description is in accompanying drawing 4, and a source file and certain Delta delta file merge, and the file after the merging has formed a new source file.The modification of subsequent file all is based on this new source file basis and makes amendment.Rather than compare with initial source file.
So far, the complete process that has realized whole file data backup based on the Delta technology, the Delta increment that this technology and the traditional different part of heavy incremental backup technology have been to introduce between the file changes, and is not only to consider whether file variation has taken place.
Therefore adopt this technology, it is applied in the data backup system, can increase the Disk Backup utilization factor, sharply increase the challenge that faces thereby save backup space reply data.
Claims (4)
1, a kind of file data backup method based on the Delta increment, it is characterized in that, adopt " De1ta file increment technology reduces the memory space of data; make disk abdicate more backup space; realize a large amount of bandwidth required when Backup Data on the disk is preserved the longer time and saved offline storage; system architecture comprises: Delta Sequence block, Delta read module, Delta merge module, Delta comparison module, Delta Processor processing module, and wherein functions of modules and file backup step are as follows:
Delta Sequence block: relatively whether two files are identical, need come comparison by file, byte and Hash, therefore this module is the ordering of file sequence, byte sequence and Hash block, have the comparison filtering function, thereby whether contrast two sequences by the transportation in sequence rapidly identical;
The Delta read module: a file is through after revising, and its part different with original relatively extracts by Delta, and has formed a Delta delta file separately, more all is the different piece of latest document and source document promptly at every turn;
Delta merges module: file is through repeatedly producing a lot of Delta versions after relatively, along with the increased frequency of revising, new file and source document Delta increment also can be bigger, adopt this moment Delta to merge module, nearest Delta incremental data is merged in the source file, and with the file of up-to-date merging source file as this file, next time, file was when revising, be benchmark then with up-to-date file, carry out Delta relatively, so that reduce the ever-increasing problem of Delta delta file;
Delta comparison module: by reading in the mode comparison document inconsequent part of file or byte line by line, the part that changes is taken out separately, and index record is passed through at the data seat, form a Delta delta file relatively, by this delta file, with the file of relatively more synthetic modification this moment of source document;
Delta Processor processing module: multitask Delta data comparison mechanism is provided, guarantees a plurality of files, compare simultaneously between a plurality of byte streams, relative efficiency is provided
2, method according to claim 1, it is characterized in that, in architecture, generate the HASH hash of unique sign for the delegation of each, and relatively the difference part of this delegation's byte stream can correctly be obtained the part of difference to guarantee two parts of different files.
3, method according to claim 1, it is characterized in that, in architecture, merge in the resume module step at Delta, each Delta delta file all merges with previous source document, form the file of this time adjustment, that is to say, the corresponding and FileVersion of each Delta delta file.
4, Delta Sequence block according to claim 1 is characterized in that in architecture, is each document definition file sequence, Hash sequence and byte sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200910017342A CN101650677A (en) | 2009-07-27 | 2009-07-27 | File data backup method based on Delta increment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200910017342A CN101650677A (en) | 2009-07-27 | 2009-07-27 | File data backup method based on Delta increment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101650677A true CN101650677A (en) | 2010-02-17 |
Family
ID=41672916
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200910017342A Pending CN101650677A (en) | 2009-07-27 | 2009-07-27 | File data backup method based on Delta increment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101650677A (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101848274A (en) * | 2010-03-12 | 2010-09-29 | 深圳市同洲电子股份有限公司 | Methods and devices for backup and recovery of records in mobile terminal |
CN102682127A (en) * | 2012-05-16 | 2012-09-19 | 北京像素软件科技股份有限公司 | Data version control method |
CN102737098A (en) * | 2011-03-29 | 2012-10-17 | 日本电气株式会社 | Distributed file system |
CN103312743A (en) * | 2012-03-09 | 2013-09-18 | 盛乐信息技术(上海)有限公司 | Data synchronization device and method |
CN103309847A (en) * | 2012-03-06 | 2013-09-18 | 百度在线网络技术(北京)有限公司 | Method and equipment for realizing file comparison |
CN103377208A (en) * | 2012-04-19 | 2013-10-30 | 北京智慧风云科技有限公司 | Method for updating files in cloud service file management system |
CN103379150A (en) * | 2012-04-19 | 2013-10-30 | 北京智慧风云科技有限公司 | Cloud service file management system |
CN103544075A (en) * | 2011-12-31 | 2014-01-29 | 华为数字技术(成都)有限公司 | Data processing method and system |
CN103793182A (en) * | 2012-09-04 | 2014-05-14 | Lsi公司 | Scalable storage protection |
CN104794143A (en) * | 2014-07-30 | 2015-07-22 | 北京中科同向信息技术有限公司 | Agent-free backup technology |
CN105404562A (en) * | 2014-08-18 | 2016-03-16 | 北京云巢动脉科技有限公司 | Method and system for realizing efficient backup of mirror file of operating system |
CN105474250A (en) * | 2013-07-12 | 2016-04-06 | 贸易技术国际公司 | Tailored messaging |
CN105516349A (en) * | 2016-01-04 | 2016-04-20 | 陈华锋 | File transmission method and system |
CN107111534A (en) * | 2016-06-28 | 2017-08-29 | 华为技术有限公司 | A kind of method and apparatus of data processing |
CN109597828A (en) * | 2018-09-29 | 2019-04-09 | 阿里巴巴集团控股有限公司 | A kind of off-line data checking method, device and server |
US11119863B2 (en) | 2015-09-25 | 2021-09-14 | Huawei Technologies Co., Ltd. | Data backup method and data processing system |
US11132260B2 (en) | 2015-09-25 | 2021-09-28 | Huawei Technologies Co., Ltd. | Data processing method and apparatus |
-
2009
- 2009-07-27 CN CN200910017342A patent/CN101650677A/en active Pending
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101848274A (en) * | 2010-03-12 | 2010-09-29 | 深圳市同洲电子股份有限公司 | Methods and devices for backup and recovery of records in mobile terminal |
CN102737098B (en) * | 2011-03-29 | 2017-11-10 | 日本电气株式会社 | Distributed file system |
CN102737098A (en) * | 2011-03-29 | 2012-10-17 | 日本电气株式会社 | Distributed file system |
CN103544075A (en) * | 2011-12-31 | 2014-01-29 | 华为数字技术(成都)有限公司 | Data processing method and system |
CN103309847A (en) * | 2012-03-06 | 2013-09-18 | 百度在线网络技术(北京)有限公司 | Method and equipment for realizing file comparison |
CN103312743A (en) * | 2012-03-09 | 2013-09-18 | 盛乐信息技术(上海)有限公司 | Data synchronization device and method |
CN103379150A (en) * | 2012-04-19 | 2013-10-30 | 北京智慧风云科技有限公司 | Cloud service file management system |
CN103377208A (en) * | 2012-04-19 | 2013-10-30 | 北京智慧风云科技有限公司 | Method for updating files in cloud service file management system |
CN102682127B (en) * | 2012-05-16 | 2014-12-03 | 北京像素软件科技股份有限公司 | Data version control method |
CN102682127A (en) * | 2012-05-16 | 2012-09-19 | 北京像素软件科技股份有限公司 | Data version control method |
CN103793182A (en) * | 2012-09-04 | 2014-05-14 | Lsi公司 | Scalable storage protection |
US10191676B2 (en) | 2012-09-04 | 2019-01-29 | Seagate Technology Llc | Scalable storage protection |
US9613656B2 (en) | 2012-09-04 | 2017-04-04 | Seagate Technology Llc | Scalable storage protection |
US10664548B2 (en) | 2013-07-12 | 2020-05-26 | Trading Technologies International, Inc. | Tailored messaging |
CN105474250A (en) * | 2013-07-12 | 2016-04-06 | 贸易技术国际公司 | Tailored messaging |
CN104794143A (en) * | 2014-07-30 | 2015-07-22 | 北京中科同向信息技术有限公司 | Agent-free backup technology |
CN105404562A (en) * | 2014-08-18 | 2016-03-16 | 北京云巢动脉科技有限公司 | Method and system for realizing efficient backup of mirror file of operating system |
US11119863B2 (en) | 2015-09-25 | 2021-09-14 | Huawei Technologies Co., Ltd. | Data backup method and data processing system |
US11132260B2 (en) | 2015-09-25 | 2021-09-28 | Huawei Technologies Co., Ltd. | Data processing method and apparatus |
CN105516349A (en) * | 2016-01-04 | 2016-04-20 | 陈华锋 | File transmission method and system |
CN107111534A (en) * | 2016-06-28 | 2017-08-29 | 华为技术有限公司 | A kind of method and apparatus of data processing |
CN109597828A (en) * | 2018-09-29 | 2019-04-09 | 阿里巴巴集团控股有限公司 | A kind of off-line data checking method, device and server |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101650677A (en) | File data backup method based on Delta increment | |
US20210385273A1 (en) | System and method for real-time cloud data synchronization using a database binary log | |
EP2052337B1 (en) | Retro-fitting synthetic full copies of data | |
CN101604268A (en) | A kind of method for filtering monitored directory change events | |
US20070156793A1 (en) | Synthetic full copies of data and dynamic bulk-to-brick transformation | |
CN101582076A (en) | Data de-duplication method based on data base | |
CN101409877B (en) | Method for generating call ticket | |
US20110196840A1 (en) | System and method for incremental backup storage | |
CN102339321A (en) | Network file system with version control and method using same | |
CN102436408A (en) | Data storage cloud and cloud backup method based on Map/Dedup | |
EP1450270A2 (en) | System and method of distributing replication commands | |
CN108614876B (en) | Redis database-based system and data processing method | |
CN103645970A (en) | Realizing method and device for de-weighting increments among multiple snapshots for remote copy | |
US8543581B2 (en) | Synchronizing records between databases | |
CN110175211B (en) | Data synchronization method and device | |
CN103853607A (en) | Task scheduling mutual backup method | |
ATE425623T1 (en) | HIGH PERFORMANCE LOCKING MANAGEMENT FOR FLASH COPY IN N-SHARED STORAGE SYSTEMS | |
CN111488243B (en) | Backup and recovery method and device for MongoDB database, electronic equipment and storage medium | |
CN104142943A (en) | Database expansion method and database | |
CN104461931A (en) | Method for output processing of trace logs of multi-kernel storage device and multi-kernel environment | |
CN109788077A (en) | A kind of cloud standby system that supporting cluster and its method | |
CN104572730A (en) | Method and device for importing and exporting digital resources | |
CN106126487A (en) | A kind of journal file method for splitting and device | |
CN108228592B (en) | Data archiving method and data archiving device based on binary log | |
EL-SAYED et al. | Impact of small files on hadoop performance: literature survey and open points |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Open date: 20100217 |