CN101650677A - File data backup method based on Delta increment - Google Patents

File data backup method based on Delta increment Download PDF

Info

Publication number
CN101650677A
CN101650677A CN200910017342A CN200910017342A CN101650677A CN 101650677 A CN101650677 A CN 101650677A CN 200910017342 A CN200910017342 A CN 200910017342A CN 200910017342 A CN200910017342 A CN 200910017342A CN 101650677 A CN101650677 A CN 101650677A
Authority
CN
China
Prior art keywords
delta
file
module
backup
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200910017342A
Other languages
Chinese (zh)
Inventor
刘正伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Langchao Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Langchao Electronic Information Industry Co Ltd filed Critical Langchao Electronic Information Industry Co Ltd
Priority to CN200910017342A priority Critical patent/CN101650677A/en
Publication of CN101650677A publication Critical patent/CN101650677A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides a file data backup method based on Delta increment, which uses the Delta file increment technology to reduce memory space of data, enables a disc to leave more backup space andrealizes that backup data can be kept for longer time and a lot of bandwidth required in off-line storage can be saved. The system structure comprises a Delta Sequence module, a Delta reading module,a Delta merging module, a Delta comparing module and a Delta Processor module; the functions of the modules and the file backup steps are as follows: the Delta Sequence module is used for comparing whether the two files are same or not by comparing files, bytes and Hash, so that the module is the sequencing of a file sequence module, a byte sequence module and a Hash sequence module, and has the function of comparing and filtering, thus rapidly comparing whether the two sequences are same or not by transporting among sequences; the module has the advantages of increasing disc backup use ratio,saving backup space and dealing the challenge faced by data sharp increase when being applied into a data backup system.

Description

A kind of file data backup method based on the Delta increment
Technical field
The present invention is a kind of file incremental backup technology, is generally used for the standby system based on file, is intended to reduce the memory capacity of using in the storage system.Adopt " Delta file increment technology can be original 1/10 with the data reduction of storage; thus abdicate more backup space; not only can make the Backup Data on the disk preserve the longer time, but also required a large amount of bandwidth can save offline storage the time.
Background technology
Memory space inadequate is IT personnel pain in the necks always, because just will not buy more memory device, more will face all setting work that comes one after another behind the storage architecture of adjusting.Just much less the complicated loaded down with trivial details of these work in the process of extended storage capacity, more may need to shut down, and this can badly influence the normal operation of enterprise.
Enterprise must regularly carry out data backup for protected data, and this is one of reason of the quick accumulation of data.Especially now some enterprise begins to backup to earlier speed disk faster, back up to equipment such as tape more one by one, for must catch up with the same day come off duty to the next day finish before the working for the enterprise of a large amount of backups, Disk Backup is a good method, backup is fast, answer is also fast, but Disk Backup can be quickened the consumption of disk space undoubtedly.
In general there are a large amount of files and mail in enterprise in using, if each backup is all carried out full backup one time with All Files and data, that will need very large storage space.Generally adopt the mode of incremental backup and differential backup based on this reason industry.Differential backup (differential backup) can not removed the filing piece after backup is finished, and incremental backup can be removed the filing piece after finishing, so just can avoid some file unnecessarily to be backed up once more.Use the filing piece can also make the user view those files truly and need backup.
Be that incremental backup or differential backup all run into a same problem, that is to say for similar Outlook PST file and database file, file is bigger, and often change, thereby think all when adopting incremental backup or differential backup that therefore file backup has taken place carried out full backup.
Therefore how to provide a kind of method, when making it solve big file change, only the file part rather than the whole files that change of backup is that present data sharply increase the challenge that faces.
Summary of the invention
The purpose of this invention is to provide a kind of file incremental backup technology, be generally used for standby system, be intended to reduce the memory capacity of using in the storage system based on file.
The objective of the invention is to realize in the following manner, adopt " Delta file increment technology reduces the memory space of data; make disk abdicate more backup space; realize a large amount of bandwidth required when Backup Data on the disk is preserved the longer time and saved offline storage; system architecture comprises: Delta Sequence block, Delta read module, Delta merge module, Delta comparison module, Delta Processor processing module, and wherein functions of modules and file backup step are as follows:
Delta Sequence block: relatively whether two files are identical, need come comparison by file, byte and Hash, therefore this module is the ordering of file sequence, byte sequence and Hash block, have the comparison filtering function, thereby whether contrast two sequences by the transportation in sequence rapidly identical;
The Delta read module: a file is through after revising, and its part different with original relatively extracts by Delta, and has formed a Delta delta file separately, more all is the different piece of latest document and source document promptly at every turn;
Delta merges module: file is through repeatedly producing a lot of Delta versions after relatively, along with the increased frequency of revising, new file and source document Delta increment also can be bigger, adopt this moment Delta to merge module, nearest Delta incremental data is merged in the source file, and with the file of up-to-date merging source file as this file, next time, file was when revising, be benchmark then with up-to-date file, carry out Delta relatively, so that reduce the ever-increasing problem of Delta delta file;
Delta comparison module: by reading in the mode comparison document inconsequent part of file or byte line by line, the part that changes is taken out separately, and index record is passed through at the data seat, form a Delta delta file relatively, by this delta file, with the file of relatively more synthetic modification this moment of source document;
Delta Processor processing module: multitask Delta data comparison mechanism is provided, guarantees a plurality of files, compare simultaneously between a plurality of byte streams, relative efficiency is provided;
In architecture, generate the HASH hash of unique sign for the delegation of each, and compare the difference part of this delegation's byte stream, can correctly obtain the part of difference to guarantee two parts of different files.
In architecture, merge in the resume module step at Delta, each Delta delta file all merges with previous source document, forms the file of this time adjustment, that is to say the corresponding and FileVersion of each Delta delta file.
In architecture, be each document definition file sequence, Hash sequence and byte sequence.
The invention has the beneficial effects as follows: suppose in enterprise uses, if a 1G file has been modified 100 times, each data have only been revised 10K, and traditional backup method need back up whole file fully, that is to say the space that needs 100G, and adopt this file incremental backup technology based on the Delta technology, and only do not need to back up 10K, change 100 times, only needing to back up the data of 1000K, is several thousand times gap than 100G.Backup particularly Network Based will significantly reduce taking of the network bandwidth, thereby backup efficient is provided greatly.
Therefore adopt this technology, it is applied in the data backup system, can increase the Disk Backup utilization factor, sharply increase the challenge that faces thereby save backup space reply data.
Description of drawings
Accompanying drawing 1 is based on the file backup structural drawing of Delta technology;
Accompanying drawing 2 is based on the file increment variation diagram first time of Delta technology;
Accompanying drawing 3 is based on the file increment variation diagram second time of Delta technology;
Accompanying drawing 4 is based on the file merging/filing figure of Delta technology.
Embodiment
With reference to the accompanying drawings, method of the present invention is done following detailed explanation.
Adopt " Delta file increment technology can be original 1/10 with the data reduction of storage; thus abdicate more backup space; not only can make the Backup Data on the disk preserve the longer time, but also required a large amount of bandwidth can save offline storage the time.This system architecture comprises: Delta Sequence block, Delta read module, Delta merge module, Delta comparison module, Delta Processor processing module, as shown in the figure, and wherein:
Delta Sequence block: for relatively whether two files are identical, need come comparison, so this module is the ordering of file sequence, byte sequence and Hash block, relatively filtering function by file, byte and Hash.Thereby it is whether identical by two sequences of the contrast of the transportation in sequence rapidly.
The Delta read module: a file is through after revising, and its part different with original relatively extracts by Delta, and has formed a Delta delta file separately.More all be the different piece of latest document and source document promptly at every turn.
Delta merges module: file is through repeatedly producing a lot of Delta versions after relatively, along with the increased frequency of revising, new file and source document Delta increment may also can be bigger, can adopt this moment Delta to merge module, nearest Delta incremental data is merged in the source file, and with the file of up-to-date merging source file as this file, next time, file was when revising, be benchmark then with up-to-date file, carry out Delta relatively, so that reduce the ever-increasing problem of Delta delta file.
Delta comparison module:, the part that changes is taken out separately, and index record is passed through at the data seat by reading in the mode comparison document inconsequent part of file or byte line by line.Form a Delta delta file relatively, relatively can synthesize the file of revising this moment by this delta file and source document.
Delta Processor processing module: multitask Delta data comparison mechanism is provided, guarantees a plurality of files, compare simultaneously between a plurality of byte streams, relative efficiency is provided.
Embodiment
Content of the present invention is described the process that realizes this architecture with an instantiation.
In accompanying drawing 2, description be file through revising, generate the process of a Delta delta file at last by the Delta increment technique.On the basis of source file, variation has all taken place in the first interline byte, second first byte of row, latter two byte of the third line.By the analysis of Delta comparison module and source file, the byte separating treatment of Bian Huaing has the most at last formed appearance last in the picture.Just just the byte-extraction that changes is come out, remain unchanged for the byte that does not have to change.
In accompanying drawing 3, with the first time Delta comparing class seemingly, twice file more all is to compare with source file.Rather than on previous basis, compare, this mainly be preceding once relatively after, formed relatively more fixing Hash sequence and byte sequence, just do not need regenerate relatively the time next time, thereby improve efficient relatively twice.Relatively with for the first time similar, also be to generate the Delta delta file for the second time.
Description is in accompanying drawing 4, and a source file and certain Delta delta file merge, and the file after the merging has formed a new source file.The modification of subsequent file all is based on this new source file basis and makes amendment.Rather than compare with initial source file.
So far, the complete process that has realized whole file data backup based on the Delta technology, the Delta increment that this technology and the traditional different part of heavy incremental backup technology have been to introduce between the file changes, and is not only to consider whether file variation has taken place.
Therefore adopt this technology, it is applied in the data backup system, can increase the Disk Backup utilization factor, sharply increase the challenge that faces thereby save backup space reply data.

Claims (4)

1, a kind of file data backup method based on the Delta increment, it is characterized in that, adopt " De1ta file increment technology reduces the memory space of data; make disk abdicate more backup space; realize a large amount of bandwidth required when Backup Data on the disk is preserved the longer time and saved offline storage; system architecture comprises: Delta Sequence block, Delta read module, Delta merge module, Delta comparison module, Delta Processor processing module, and wherein functions of modules and file backup step are as follows:
Delta Sequence block: relatively whether two files are identical, need come comparison by file, byte and Hash, therefore this module is the ordering of file sequence, byte sequence and Hash block, have the comparison filtering function, thereby whether contrast two sequences by the transportation in sequence rapidly identical;
The Delta read module: a file is through after revising, and its part different with original relatively extracts by Delta, and has formed a Delta delta file separately, more all is the different piece of latest document and source document promptly at every turn;
Delta merges module: file is through repeatedly producing a lot of Delta versions after relatively, along with the increased frequency of revising, new file and source document Delta increment also can be bigger, adopt this moment Delta to merge module, nearest Delta incremental data is merged in the source file, and with the file of up-to-date merging source file as this file, next time, file was when revising, be benchmark then with up-to-date file, carry out Delta relatively, so that reduce the ever-increasing problem of Delta delta file;
Delta comparison module: by reading in the mode comparison document inconsequent part of file or byte line by line, the part that changes is taken out separately, and index record is passed through at the data seat, form a Delta delta file relatively, by this delta file, with the file of relatively more synthetic modification this moment of source document;
Delta Processor processing module: multitask Delta data comparison mechanism is provided, guarantees a plurality of files, compare simultaneously between a plurality of byte streams, relative efficiency is provided
2, method according to claim 1, it is characterized in that, in architecture, generate the HASH hash of unique sign for the delegation of each, and relatively the difference part of this delegation's byte stream can correctly be obtained the part of difference to guarantee two parts of different files.
3, method according to claim 1, it is characterized in that, in architecture, merge in the resume module step at Delta, each Delta delta file all merges with previous source document, form the file of this time adjustment, that is to say, the corresponding and FileVersion of each Delta delta file.
4, Delta Sequence block according to claim 1 is characterized in that in architecture, is each document definition file sequence, Hash sequence and byte sequence.
CN200910017342A 2009-07-27 2009-07-27 File data backup method based on Delta increment Pending CN101650677A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200910017342A CN101650677A (en) 2009-07-27 2009-07-27 File data backup method based on Delta increment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200910017342A CN101650677A (en) 2009-07-27 2009-07-27 File data backup method based on Delta increment

Publications (1)

Publication Number Publication Date
CN101650677A true CN101650677A (en) 2010-02-17

Family

ID=41672916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200910017342A Pending CN101650677A (en) 2009-07-27 2009-07-27 File data backup method based on Delta increment

Country Status (1)

Country Link
CN (1) CN101650677A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101848274A (en) * 2010-03-12 2010-09-29 深圳市同洲电子股份有限公司 Methods and devices for backup and recovery of records in mobile terminal
CN102682127A (en) * 2012-05-16 2012-09-19 北京像素软件科技股份有限公司 Data version control method
CN102737098A (en) * 2011-03-29 2012-10-17 日本电气株式会社 Distributed file system
CN103312743A (en) * 2012-03-09 2013-09-18 盛乐信息技术(上海)有限公司 Data synchronization device and method
CN103309847A (en) * 2012-03-06 2013-09-18 百度在线网络技术(北京)有限公司 Method and equipment for realizing file comparison
CN103377208A (en) * 2012-04-19 2013-10-30 北京智慧风云科技有限公司 Method for updating files in cloud service file management system
CN103379150A (en) * 2012-04-19 2013-10-30 北京智慧风云科技有限公司 Cloud service file management system
CN103544075A (en) * 2011-12-31 2014-01-29 华为数字技术(成都)有限公司 Data processing method and system
CN103793182A (en) * 2012-09-04 2014-05-14 Lsi公司 Scalable storage protection
CN104794143A (en) * 2014-07-30 2015-07-22 北京中科同向信息技术有限公司 Agent-free backup technology
CN105404562A (en) * 2014-08-18 2016-03-16 北京云巢动脉科技有限公司 Method and system for realizing efficient backup of mirror file of operating system
CN105474250A (en) * 2013-07-12 2016-04-06 贸易技术国际公司 Tailored messaging
CN105516349A (en) * 2016-01-04 2016-04-20 陈华锋 File transmission method and system
CN107111534A (en) * 2016-06-28 2017-08-29 华为技术有限公司 A kind of method and apparatus of data processing
CN109597828A (en) * 2018-09-29 2019-04-09 阿里巴巴集团控股有限公司 A kind of off-line data checking method, device and server
US11119863B2 (en) 2015-09-25 2021-09-14 Huawei Technologies Co., Ltd. Data backup method and data processing system
US11132260B2 (en) 2015-09-25 2021-09-28 Huawei Technologies Co., Ltd. Data processing method and apparatus

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101848274A (en) * 2010-03-12 2010-09-29 深圳市同洲电子股份有限公司 Methods and devices for backup and recovery of records in mobile terminal
CN102737098B (en) * 2011-03-29 2017-11-10 日本电气株式会社 Distributed file system
CN102737098A (en) * 2011-03-29 2012-10-17 日本电气株式会社 Distributed file system
CN103544075A (en) * 2011-12-31 2014-01-29 华为数字技术(成都)有限公司 Data processing method and system
CN103309847A (en) * 2012-03-06 2013-09-18 百度在线网络技术(北京)有限公司 Method and equipment for realizing file comparison
CN103312743A (en) * 2012-03-09 2013-09-18 盛乐信息技术(上海)有限公司 Data synchronization device and method
CN103379150A (en) * 2012-04-19 2013-10-30 北京智慧风云科技有限公司 Cloud service file management system
CN103377208A (en) * 2012-04-19 2013-10-30 北京智慧风云科技有限公司 Method for updating files in cloud service file management system
CN102682127B (en) * 2012-05-16 2014-12-03 北京像素软件科技股份有限公司 Data version control method
CN102682127A (en) * 2012-05-16 2012-09-19 北京像素软件科技股份有限公司 Data version control method
CN103793182A (en) * 2012-09-04 2014-05-14 Lsi公司 Scalable storage protection
US10191676B2 (en) 2012-09-04 2019-01-29 Seagate Technology Llc Scalable storage protection
US9613656B2 (en) 2012-09-04 2017-04-04 Seagate Technology Llc Scalable storage protection
US10664548B2 (en) 2013-07-12 2020-05-26 Trading Technologies International, Inc. Tailored messaging
CN105474250A (en) * 2013-07-12 2016-04-06 贸易技术国际公司 Tailored messaging
CN104794143A (en) * 2014-07-30 2015-07-22 北京中科同向信息技术有限公司 Agent-free backup technology
CN105404562A (en) * 2014-08-18 2016-03-16 北京云巢动脉科技有限公司 Method and system for realizing efficient backup of mirror file of operating system
US11119863B2 (en) 2015-09-25 2021-09-14 Huawei Technologies Co., Ltd. Data backup method and data processing system
US11132260B2 (en) 2015-09-25 2021-09-28 Huawei Technologies Co., Ltd. Data processing method and apparatus
CN105516349A (en) * 2016-01-04 2016-04-20 陈华锋 File transmission method and system
CN107111534A (en) * 2016-06-28 2017-08-29 华为技术有限公司 A kind of method and apparatus of data processing
CN109597828A (en) * 2018-09-29 2019-04-09 阿里巴巴集团控股有限公司 A kind of off-line data checking method, device and server

Similar Documents

Publication Publication Date Title
CN101650677A (en) File data backup method based on Delta increment
US20210385273A1 (en) System and method for real-time cloud data synchronization using a database binary log
EP2052337B1 (en) Retro-fitting synthetic full copies of data
CN101604268A (en) A kind of method for filtering monitored directory change events
US20070156793A1 (en) Synthetic full copies of data and dynamic bulk-to-brick transformation
CN101582076A (en) Data de-duplication method based on data base
CN101409877B (en) Method for generating call ticket
US20110196840A1 (en) System and method for incremental backup storage
CN102339321A (en) Network file system with version control and method using same
CN102436408A (en) Data storage cloud and cloud backup method based on Map/Dedup
EP1450270A2 (en) System and method of distributing replication commands
CN108614876B (en) Redis database-based system and data processing method
CN103645970A (en) Realizing method and device for de-weighting increments among multiple snapshots for remote copy
US8543581B2 (en) Synchronizing records between databases
CN110175211B (en) Data synchronization method and device
CN103853607A (en) Task scheduling mutual backup method
ATE425623T1 (en) HIGH PERFORMANCE LOCKING MANAGEMENT FOR FLASH COPY IN N-SHARED STORAGE SYSTEMS
CN111488243B (en) Backup and recovery method and device for MongoDB database, electronic equipment and storage medium
CN104142943A (en) Database expansion method and database
CN104461931A (en) Method for output processing of trace logs of multi-kernel storage device and multi-kernel environment
CN109788077A (en) A kind of cloud standby system that supporting cluster and its method
CN104572730A (en) Method and device for importing and exporting digital resources
CN106126487A (en) A kind of journal file method for splitting and device
CN108228592B (en) Data archiving method and data archiving device based on binary log
EL-SAYED et al. Impact of small files on hadoop performance: literature survey and open points

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20100217