CN106326035A

CN106326035A - File-metadata-based incremental backup method

Info

Publication number: CN106326035A
Application number: CN201610671739.4A
Authority: CN
Inventors: 闫旋
Original assignee: Nanjing Chicha Information Technology Co Ltd
Current assignee: Nanjing Chicha Information Technology Co Ltd
Priority date: 2016-08-13
Filing date: 2016-08-13
Publication date: 2017-01-11

Abstract

The invention discloses a file-metadata-based incremental backup method, which comprises the following steps: establishing a file feature table B for recording MD5 (message-digest algorithm 5) code value of a copied source disk file in a memory, simultaneously establishing a file feature table A in a destination disk, an attribute of the file feature table A being the same as that of the file feature table B, performing technical chunking on a file to be stored by adopting a CDC (content-defined chunking) technology, calculating and recording an MD5 code value of chunked metadata into the file feature table A, and comparing the MD5 code values in the file feature table A and the file feature table B. Compared with the conventional art, the incremental backup method has the advantages that replicated data between files can be helped to be eliminated, the space occupied by data can be reduced to a greater extent, the problem of space enlargement of a storage system can be alleviated, existing resources can be maximally utilized, and the storage cost can be reduced.

Description

A kind of incremental backup method based on file metadata

Technical field

The present invention relates to file storage technology field, particularly relate to a kind of incremental backup method based on file metadata.

Background technology

In recent years, digital information is the main trend place of International Development, and numeral Informatization Development is the most extremely paid attention to by various countries, Along with the continuous propelling of China's digital information process, digital information presents the situation of explosive growth, and data take up room more Coming the biggest, and in the centralized stores systems such as filing, backup, there is substantial amounts of redundant data information, research finds, in storage system In system, having up to 60% in the data of preservation is redundancy, and As time goes on gets more and more, in this case, Elimination of duplicate data, saving memory space just becomes the key issue that storage system needs to solve.

Summary of the invention

The technical problem existed based on background technology, the present invention proposes a kind of incremental backup side based on file metadata Method.

The technical scheme is that and be achieved in that:

A kind of incremental backup method based on file metadata, it is characterised in that include step:

S1, sets up a file characteristic Table A being currently needed for backing up file at purpose dish；

S2, sets up a file characteristic table B, the MD5 code value of the source tray file that record once copied in memory；

S3, carries out technology piecemeal by file to be stored, calculates the MD5 code value of the metadata after piecemeal and records at purpose dish File characteristic Table A in；

MD5 code value in S4, comparison file characteristic Table A and file characteristic table B, if retrieved not in file characteristic table B MD5 code value in file characteristic Table A, carries out the copy backup of metadata in memorizer, if in file characteristic Table A MD5 code value is consistent with the MD5 code value in file characteristic table B, then do not carry out the backup of metadata in memorizer；

File characteristic table B in S5, more new memory.

Preferably, the attribute of the metadata of the record of the file characteristic table B in S2 includes file name, document size, file Several or whole in establishment time, filemodetime, file user-defined metadata, file store path.

Preferably, the attribute of the metadata of the file characteristic Table A record in S3 includes file name, document size, file Several or whole in establishment time, filemodetime, file user-defined metadata, file store path.

Preferably, in S4, all MD5 code values in file mark sheet A all compare with the MD5 code value in file characteristic table B Right, after comparison completes, in file characteristic Table A, all metadata identical with file characteristic table B are deleted.

Preferably, S3 use CDC partition file to be stored is carried out technology piecemeal.

The MD5 mentioned in the present invention, is the letter of Message-Digest Algorithm5 (Message Digest Algorithm 5) Claim, be Rivest in the modified version to MD4 in 1991, be used for guaranteeing that information transmission is complete consistent as current computer field Widely used hash algorithm, main flow programming language generally has the realization of MD5.MD5 comes complicated than MD4, and speed is relatively Want slow, but safer, perform better than in terms of analysis resistant and resisting differential.

It with 512 packets, through calculation process, generates four 32 bit data, finally by these four value associatings to input Get up to become a 128-bits hashed value.Basic mode is: complementation, remainder, adjustment length and link variable are circulated fortune Calculate, obtain a result.

In the present invention, the piecemeal flow process of the CDC partition that S3 uses includes:

Because the 1/D of the probability of the r such as the Rabin fingerprint value function calculating sliding window content is the most discrete, its value mould D, Then from probability analysis, often the value of slip D length the most once h mould D is r, and the expected value of the most elongated piece is D, and certainly, this is simply Expected value, the metadata of division is still likely to occur excessive or too small situation, and two file division have been by CDC partition Exactly the same metadata, simultaneously as Rabin function has preferable character string identification ability, when file carries out inserting, deleting Or during amendment operation, except the minority breakpoint after change point needs to repartition, the border of other metadata is the most constant, File is carried out a little change so not havinging and is divided into diverse metadata, thus can not find duplicate contents Situation.

The present invention compared with prior art, the Advantageous Effects having:

The present invention uses a kind of incremental backup mode based on file metadata, sets up one the most in memory Individual file characteristic table B, for the MD5 code value of the source tray file that record once copied, sets up one at purpose dish simultaneously File characteristic Table A, the attribute of file characteristic Table A is identical with file characteristic table B, uses CDC partition to enter file to be stored Row technology piecemeal, calculates the MD5 code value of the metadata after piecemeal and recorded in file characteristic Table A, comparison file characteristic Table A and MD5 code value in file characteristic table B, if retrieved less than the MD5 code value in file characteristic Table A in file characteristic table B, is depositing The copy backup of metadata is carried out in reservoir, if the MD5 code value in file characteristic Table A and the MD5 code in file characteristic table B Value is consistent, then do not carry out the backup of metadata in memorizer, and finally, the file characteristic table B in memorizer is updated, for next Secondary backup is prepared, and compared with conventional art, incremental backup mode can help to eliminate the repetition data between file, more Reduce data in big degree to take up room, alleviate the space growing concern of storage system, farthest utilize existing resource, fall Low carrying cost.

Accompanying drawing explanation

Fig. 1 is shown as the specific embodiment of a kind of based on file metadata the incremental backup mode that the present invention proposes Process blocks schematic diagram.

Detailed description of the invention

Below in conjunction with specific embodiment, the present invention is explained orally further.

A kind of incremental backup method based on file metadata, S1, sets up one at purpose dish and is currently needed for backing up file File characteristic Table A；S2, sets up a file characteristic table B, the MD5 of the source tray file that record once copied in memory Code value；S3, carries out technology piecemeal by file to be stored, calculates the MD5 code value of the metadata after piecemeal and records at purpose dish In file characteristic Table A；MD5 code value in S4, comparison file characteristic Table A and file characteristic table B, if in file characteristic table B Retrieval, less than the MD5 code value in file characteristic Table A, carries out the copy backup of metadata, if at file characteristic in memorizer MD5 code value in Table A is consistent with the MD5 code value in file characteristic table B, then do not carry out the backup of metadata in memorizer；S5, more File characteristic table B in new memory, prepares for backup next time.

The above, the only present invention preferably detailed description of the invention, but protection scope of the present invention is not limited thereto, Any those familiar with the art in the technical scope that the invention discloses, according to technical scheme and Inventive concept equivalent or change in addition, all should contain within protection scope of the present invention.

Claims

1. an incremental backup method based on file metadata, it is characterised in that include step:

S3, carries out technology piecemeal by file to be stored, calculates the MD5 code value of the metadata after piecemeal and records the literary composition at purpose dish In part mark sheet A；

MD5 code value in S4, comparison file characteristic Table A and file characteristic table B, if retrieved less than literary composition in file characteristic table B MD5 code value in part mark sheet A, carries out the copy backup of metadata in memorizer, if the MD5 in file characteristic Table A Code value is consistent with the MD5 code value in file characteristic table B, then do not carry out the backup of metadata in memorizer；

File characteristic table B in S5, more new memory.

A kind of incremental backup method based on file metadata the most according to claim 1, it is characterised in that the literary composition in S2 The attribute of the metadata of part mark sheet B record includes file name, document size, file creation time, filemodetime, literary composition Several or whole in part self-defining metadata, file store path.

A kind of incremental backup method based on file metadata the most according to claim 1, it is characterised in that the literary composition in S3 The attribute of the metadata of part mark sheet A record includes file name, document size, file creation time, filemodetime, literary composition Several or whole in part self-defining metadata, file store path.

A kind of incremental backup method based on file metadata the most according to claim 1, it is characterised in that file in S4 All MD5 code values in mark sheet A are all compared with the MD5 code value in file characteristic table B, after comparison completes, and file characteristic In Table A, all metadata identical with file characteristic table B are deleted.

A kind of incremental backup method based on file metadata the most according to claim 1, it is characterised in that use in S3 CDC partition carries out technology piecemeal to file to be stored.