CN106326035A - File-metadata-based incremental backup method - Google Patents
File-metadata-based incremental backup method Download PDFInfo
- Publication number
- CN106326035A CN106326035A CN201610671739.4A CN201610671739A CN106326035A CN 106326035 A CN106326035 A CN 106326035A CN 201610671739 A CN201610671739 A CN 201610671739A CN 106326035 A CN106326035 A CN 106326035A
- Authority
- CN
- China
- Prior art keywords
- file
- metadata
- characteristic table
- code value
- incremental backup
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
Abstract
The invention discloses a file-metadata-based incremental backup method, which comprises the following steps: establishing a file feature table B for recording MD5 (message-digest algorithm 5) code value of a copied source disk file in a memory, simultaneously establishing a file feature table A in a destination disk, an attribute of the file feature table A being the same as that of the file feature table B, performing technical chunking on a file to be stored by adopting a CDC (content-defined chunking) technology, calculating and recording an MD5 code value of chunked metadata into the file feature table A, and comparing the MD5 code values in the file feature table A and the file feature table B. Compared with the conventional art, the incremental backup method has the advantages that replicated data between files can be helped to be eliminated, the space occupied by data can be reduced to a greater extent, the problem of space enlargement of a storage system can be alleviated, existing resources can be maximally utilized, and the storage cost can be reduced.
Description
Technical field
The present invention relates to file storage technology field, particularly relate to a kind of incremental backup method based on file metadata.
Background technology
In recent years, digital information is the main trend place of International Development, and numeral Informatization Development is the most extremely paid attention to by various countries,
Along with the continuous propelling of China's digital information process, digital information presents the situation of explosive growth, and data take up room more
Coming the biggest, and in the centralized stores systems such as filing, backup, there is substantial amounts of redundant data information, research finds, in storage system
In system, having up to 60% in the data of preservation is redundancy, and As time goes on gets more and more, in this case,
Elimination of duplicate data, saving memory space just becomes the key issue that storage system needs to solve.
Summary of the invention
The technical problem existed based on background technology, the present invention proposes a kind of incremental backup side based on file metadata
Method.
The technical scheme is that and be achieved in that:
A kind of incremental backup method based on file metadata, it is characterised in that include step:
S1, sets up a file characteristic Table A being currently needed for backing up file at purpose dish;
S2, sets up a file characteristic table B, the MD5 code value of the source tray file that record once copied in memory;
S3, carries out technology piecemeal by file to be stored, calculates the MD5 code value of the metadata after piecemeal and records at purpose dish
File characteristic Table A in;
MD5 code value in S4, comparison file characteristic Table A and file characteristic table B, if retrieved not in file characteristic table B
MD5 code value in file characteristic Table A, carries out the copy backup of metadata in memorizer, if in file characteristic Table A
MD5 code value is consistent with the MD5 code value in file characteristic table B, then do not carry out the backup of metadata in memorizer;
File characteristic table B in S5, more new memory.
Preferably, the attribute of the metadata of the record of the file characteristic table B in S2 includes file name, document size, file
Several or whole in establishment time, filemodetime, file user-defined metadata, file store path.
Preferably, the attribute of the metadata of the file characteristic Table A record in S3 includes file name, document size, file
Several or whole in establishment time, filemodetime, file user-defined metadata, file store path.
Preferably, in S4, all MD5 code values in file mark sheet A all compare with the MD5 code value in file characteristic table B
Right, after comparison completes, in file characteristic Table A, all metadata identical with file characteristic table B are deleted.
Preferably, S3 use CDC partition file to be stored is carried out technology piecemeal.
The MD5 mentioned in the present invention, is the letter of Message-Digest Algorithm5 (Message Digest Algorithm 5)
Claim, be Rivest in the modified version to MD4 in 1991, be used for guaranteeing that information transmission is complete consistent as current computer field
Widely used hash algorithm, main flow programming language generally has the realization of MD5.MD5 comes complicated than MD4, and speed is relatively
Want slow, but safer, perform better than in terms of analysis resistant and resisting differential.
It with 512 packets, through calculation process, generates four 32 bit data, finally by these four value associatings to input
Get up to become a 128-bits hashed value.Basic mode is: complementation, remainder, adjustment length and link variable are circulated fortune
Calculate, obtain a result.
In the present invention, the piecemeal flow process of the CDC partition that S3 uses includes:
Because the 1/D of the probability of the r such as the Rabin fingerprint value function calculating sliding window content is the most discrete, its value mould D,
Then from probability analysis, often the value of slip D length the most once h mould D is r, and the expected value of the most elongated piece is D, and certainly, this is simply
Expected value, the metadata of division is still likely to occur excessive or too small situation, and two file division have been by CDC partition
Exactly the same metadata, simultaneously as Rabin function has preferable character string identification ability, when file carries out inserting, deleting
Or during amendment operation, except the minority breakpoint after change point needs to repartition, the border of other metadata is the most constant,
File is carried out a little change so not havinging and is divided into diverse metadata, thus can not find duplicate contents
Situation.
The present invention compared with prior art, the Advantageous Effects having:
The present invention uses a kind of incremental backup mode based on file metadata, sets up one the most in memory
Individual file characteristic table B, for the MD5 code value of the source tray file that record once copied, sets up one at purpose dish simultaneously
File characteristic Table A, the attribute of file characteristic Table A is identical with file characteristic table B, uses CDC partition to enter file to be stored
Row technology piecemeal, calculates the MD5 code value of the metadata after piecemeal and recorded in file characteristic Table A, comparison file characteristic Table A and
MD5 code value in file characteristic table B, if retrieved less than the MD5 code value in file characteristic Table A in file characteristic table B, is depositing
The copy backup of metadata is carried out in reservoir, if the MD5 code value in file characteristic Table A and the MD5 code in file characteristic table B
Value is consistent, then do not carry out the backup of metadata in memorizer, and finally, the file characteristic table B in memorizer is updated, for next
Secondary backup is prepared, and compared with conventional art, incremental backup mode can help to eliminate the repetition data between file, more
Reduce data in big degree to take up room, alleviate the space growing concern of storage system, farthest utilize existing resource, fall
Low carrying cost.
Accompanying drawing explanation
Fig. 1 is shown as the specific embodiment of a kind of based on file metadata the incremental backup mode that the present invention proposes
Process blocks schematic diagram.
Detailed description of the invention
Below in conjunction with specific embodiment, the present invention is explained orally further.
A kind of incremental backup method based on file metadata, S1, sets up one at purpose dish and is currently needed for backing up file
File characteristic Table A;S2, sets up a file characteristic table B, the MD5 of the source tray file that record once copied in memory
Code value;S3, carries out technology piecemeal by file to be stored, calculates the MD5 code value of the metadata after piecemeal and records at purpose dish
In file characteristic Table A;MD5 code value in S4, comparison file characteristic Table A and file characteristic table B, if in file characteristic table B
Retrieval, less than the MD5 code value in file characteristic Table A, carries out the copy backup of metadata, if at file characteristic in memorizer
MD5 code value in Table A is consistent with the MD5 code value in file characteristic table B, then do not carry out the backup of metadata in memorizer;S5, more
File characteristic table B in new memory, prepares for backup next time.
The MD5 mentioned in the present invention, is the letter of Message-Digest Algorithm5 (Message Digest Algorithm 5)
Claim, be Rivest in the modified version to MD4 in 1991, be used for guaranteeing that information transmission is complete consistent as current computer field
Widely used hash algorithm, main flow programming language generally has the realization of MD5.MD5 comes complicated than MD4, and speed is relatively
Want slow, but safer, perform better than in terms of analysis resistant and resisting differential.
It with 512 packets, through calculation process, generates four 32 bit data, finally by these four value associatings to input
Get up to become a 128-bits hashed value.Basic mode is: complementation, remainder, adjustment length and link variable are circulated fortune
Calculate, obtain a result.
In the present invention, the piecemeal flow process of the CDC partition that S3 uses includes:
Because the 1/D of the probability of the r such as the Rabin fingerprint value function calculating sliding window content is the most discrete, its value mould D,
Then from probability analysis, often the value of slip D length the most once h mould D is r, and the expected value of the most elongated piece is D, and certainly, this is simply
Expected value, the metadata of division is still likely to occur excessive or too small situation, and two file division have been by CDC partition
Exactly the same metadata, simultaneously as Rabin function has preferable character string identification ability, when file carries out inserting, deleting
Or during amendment operation, except the minority breakpoint after change point needs to repartition, the border of other metadata is the most constant,
File is carried out a little change so not havinging and is divided into diverse metadata, thus can not find duplicate contents
Situation.
The above, the only present invention preferably detailed description of the invention, but protection scope of the present invention is not limited thereto,
Any those familiar with the art in the technical scope that the invention discloses, according to technical scheme and
Inventive concept equivalent or change in addition, all should contain within protection scope of the present invention.
Claims (5)
1. an incremental backup method based on file metadata, it is characterised in that include step:
S1, sets up a file characteristic Table A being currently needed for backing up file at purpose dish;
S2, sets up a file characteristic table B, the MD5 code value of the source tray file that record once copied in memory;
S3, carries out technology piecemeal by file to be stored, calculates the MD5 code value of the metadata after piecemeal and records the literary composition at purpose dish
In part mark sheet A;
MD5 code value in S4, comparison file characteristic Table A and file characteristic table B, if retrieved less than literary composition in file characteristic table B
MD5 code value in part mark sheet A, carries out the copy backup of metadata in memorizer, if the MD5 in file characteristic Table A
Code value is consistent with the MD5 code value in file characteristic table B, then do not carry out the backup of metadata in memorizer;
File characteristic table B in S5, more new memory.
A kind of incremental backup method based on file metadata the most according to claim 1, it is characterised in that the literary composition in S2
The attribute of the metadata of part mark sheet B record includes file name, document size, file creation time, filemodetime, literary composition
Several or whole in part self-defining metadata, file store path.
A kind of incremental backup method based on file metadata the most according to claim 1, it is characterised in that the literary composition in S3
The attribute of the metadata of part mark sheet A record includes file name, document size, file creation time, filemodetime, literary composition
Several or whole in part self-defining metadata, file store path.
A kind of incremental backup method based on file metadata the most according to claim 1, it is characterised in that file in S4
All MD5 code values in mark sheet A are all compared with the MD5 code value in file characteristic table B, after comparison completes, and file characteristic
In Table A, all metadata identical with file characteristic table B are deleted.
A kind of incremental backup method based on file metadata the most according to claim 1, it is characterised in that use in S3
CDC partition carries out technology piecemeal to file to be stored.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610671739.4A CN106326035A (en) | 2016-08-13 | 2016-08-13 | File-metadata-based incremental backup method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610671739.4A CN106326035A (en) | 2016-08-13 | 2016-08-13 | File-metadata-based incremental backup method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106326035A true CN106326035A (en) | 2017-01-11 |
Family
ID=57739356
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610671739.4A Pending CN106326035A (en) | 2016-08-13 | 2016-08-13 | File-metadata-based incremental backup method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106326035A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106850342A (en) * | 2017-01-20 | 2017-06-13 | 郑州云海信息技术有限公司 | The method and device of test interchanger compatibility and stability |
CN106873908A (en) * | 2017-01-17 | 2017-06-20 | 北京联想核芯科技有限公司 | Date storage method and device |
CN107704342A (en) * | 2017-09-26 | 2018-02-16 | 郑州云海信息技术有限公司 | A kind of snap copy method, system, device and readable storage medium storing program for executing |
CN111367871A (en) * | 2020-02-29 | 2020-07-03 | 华南理工大学 | Method for increment synchronization among files based on SAPCI (software application programming interface) variable-length blocks |
CN112507100A (en) * | 2020-12-18 | 2021-03-16 | 北京百度网讯科技有限公司 | Method and device for updating question-answering system |
CN112882866A (en) * | 2021-02-24 | 2021-06-01 | 上海泰宇信息技术股份有限公司 | Backup method suitable for massive files |
CN115145943A (en) * | 2022-09-06 | 2022-10-04 | 北京麦聪软件有限公司 | Multi-data-source metadata rapid comparison method, system, device and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101706825A (en) * | 2009-12-10 | 2010-05-12 | 华中科技大学 | Replicated data deleting method based on file content types |
CN101989929A (en) * | 2010-11-17 | 2011-03-23 | 中兴通讯股份有限公司 | Disaster recovery data backup method and system |
CN102810108A (en) * | 2011-06-02 | 2012-12-05 | 英业达股份有限公司 | Method for processing repeated data |
CN104375905A (en) * | 2014-11-07 | 2015-02-25 | 北京云巢动脉科技有限公司 | Incremental backing up method and system based on data block |
EP2905709A2 (en) * | 2014-02-11 | 2015-08-12 | Atlantis Computing, Inc. | Method and apparatus for replication of files and file systems using a deduplication key space |
CN104932841A (en) * | 2015-06-17 | 2015-09-23 | 南京邮电大学 | Saving type duplicated data deleting method in cloud storage system |
-
2016
- 2016-08-13 CN CN201610671739.4A patent/CN106326035A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101706825A (en) * | 2009-12-10 | 2010-05-12 | 华中科技大学 | Replicated data deleting method based on file content types |
CN101989929A (en) * | 2010-11-17 | 2011-03-23 | 中兴通讯股份有限公司 | Disaster recovery data backup method and system |
CN102810108A (en) * | 2011-06-02 | 2012-12-05 | 英业达股份有限公司 | Method for processing repeated data |
EP2905709A2 (en) * | 2014-02-11 | 2015-08-12 | Atlantis Computing, Inc. | Method and apparatus for replication of files and file systems using a deduplication key space |
CN104375905A (en) * | 2014-11-07 | 2015-02-25 | 北京云巢动脉科技有限公司 | Incremental backing up method and system based on data block |
CN104932841A (en) * | 2015-06-17 | 2015-09-23 | 南京邮电大学 | Saving type duplicated data deleting method in cloud storage system |
Non-Patent Citations (1)
Title |
---|
段梦博等: "基于内容的重复数据删除技术的研究", 《电脑知识与技术》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106873908A (en) * | 2017-01-17 | 2017-06-20 | 北京联想核芯科技有限公司 | Date storage method and device |
CN106873908B (en) * | 2017-01-17 | 2019-11-12 | 深圳忆联信息系统有限公司 | Date storage method and device |
CN106850342B (en) * | 2017-01-20 | 2020-11-24 | 苏州浪潮智能科技有限公司 | Method and device for testing compatibility and stability of switch |
CN106850342A (en) * | 2017-01-20 | 2017-06-13 | 郑州云海信息技术有限公司 | The method and device of test interchanger compatibility and stability |
CN107704342A (en) * | 2017-09-26 | 2018-02-16 | 郑州云海信息技术有限公司 | A kind of snap copy method, system, device and readable storage medium storing program for executing |
CN111367871B (en) * | 2020-02-29 | 2022-06-10 | 华南理工大学 | Method for increment synchronization among files based on SAPCI (software application programming interface) variable-length blocks |
CN111367871A (en) * | 2020-02-29 | 2020-07-03 | 华南理工大学 | Method for increment synchronization among files based on SAPCI (software application programming interface) variable-length blocks |
CN112507100A (en) * | 2020-12-18 | 2021-03-16 | 北京百度网讯科技有限公司 | Method and device for updating question-answering system |
CN112507100B (en) * | 2020-12-18 | 2023-12-22 | 北京百度网讯科技有限公司 | Update processing method and device of question-answering system |
CN112882866A (en) * | 2021-02-24 | 2021-06-01 | 上海泰宇信息技术股份有限公司 | Backup method suitable for massive files |
CN112882866B (en) * | 2021-02-24 | 2023-12-15 | 上海泰宇信息技术股份有限公司 | Backup method suitable for mass files |
CN115145943A (en) * | 2022-09-06 | 2022-10-04 | 北京麦聪软件有限公司 | Multi-data-source metadata rapid comparison method, system, device and storage medium |
CN115145943B (en) * | 2022-09-06 | 2023-02-28 | 北京麦聪软件有限公司 | Method, system, equipment and storage medium for rapidly comparing metadata of multiple data sources |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106326035A (en) | File-metadata-based incremental backup method | |
US10621142B2 (en) | Deduplicating input backup data with data of a synthetic backup previously constructed by a deduplication storage system | |
US10380073B2 (en) | Use of solid state storage devices and the like in data deduplication | |
US7925683B2 (en) | Methods and apparatus for content-aware data de-duplication | |
US8452739B2 (en) | Highly scalable and distributed data de-duplication | |
US9251160B1 (en) | Data transfer between dissimilar deduplication systems | |
CN106201771B (en) | Data-storage system and data read-write method | |
US9785646B2 (en) | Data file handling in a network environment and independent file server | |
CN101963982B (en) | Method for managing metadata of redundancy deletion and storage system based on location sensitive Hash | |
US9251235B1 (en) | Log-based synchronization | |
EP3779715A1 (en) | Method and apparatus for deleting duplicate data | |
CN104932841A (en) | Saving type duplicated data deleting method in cloud storage system | |
US20110218973A1 (en) | System and method for creating a de-duplicated data set and preserving metadata for processing the de-duplicated data set | |
US20150205674A1 (en) | Cataloging backup data | |
CN109522283B (en) | Method and system for deleting repeated data | |
US10776345B2 (en) | Efficiently updating a secondary index associated with a log-structured merge-tree database | |
CN105868286A (en) | Parallel adding method and system for merging small files on basis of distributed file system | |
US20220035786A1 (en) | Distributed database management system with dynamically split b-tree indexes | |
US10503605B2 (en) | Method of detecting source change for file level incremental backup | |
US11775482B2 (en) | File system metadata deduplication | |
US9678972B2 (en) | Packing deduplicated data in a self-contained deduplicated repository | |
US10956446B1 (en) | Log-based synchronization with inferred context | |
US7685186B2 (en) | Optimized and robust in-place data transformation | |
US11593304B2 (en) | Browsability of backup files using data storage partitioning | |
CN110399340A (en) | A kind of document handling method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170111 |
|
WD01 | Invention patent application deemed withdrawn after publication |