CN104375906B - A kind of extensive Backup Data fast calibration method based on file system - Google Patents

A kind of extensive Backup Data fast calibration method based on file system Download PDF

Info

Publication number
CN104375906B
CN104375906B CN201410664300.XA CN201410664300A CN104375906B CN 104375906 B CN104375906 B CN 104375906B CN 201410664300 A CN201410664300 A CN 201410664300A CN 104375906 B CN104375906 B CN 104375906B
Authority
CN
China
Prior art keywords
file
files
blocks
total amount
archive index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410664300.XA
Other languages
Chinese (zh)
Other versions
CN104375906A (en
Inventor
何新敏
夏旭东
武新
崔维力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Original Assignee
TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd filed Critical TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Priority to CN201410664300.XA priority Critical patent/CN104375906B/en
Publication of CN104375906A publication Critical patent/CN104375906A/en
Application granted granted Critical
Publication of CN104375906B publication Critical patent/CN104375906B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a kind of extensive Backup Data fast calibration method based on file system:Including:Archive index file;The archive index file includes being actually backed up the file and blocks of files of data;The file includes the path of file;The blocks of files includes the path of file, start offset, file block size and check value where blocks of files.Beneficial effects of the present invention are:It is minimum backup unit, the check value without paying close attention to whole file with blocks of files;The organizational form of this Backup Data, is conducive to carrying out parallel check using multi-core CPU resource;During verification data, in addition to All Files block is verified, it is only necessary to verify the summation of file size and the presence situation of file, you can ensure that Backup Data is strictly correct, it is to avoid large-scale Merging.

Description

A kind of extensive Backup Data fast calibration method based on file system
Technical field
The verification of the Backup Data of file system is based on the invention belongs to database field, especially database.
Background technology
Any database all be unable to do without the backup/restoration system of its own.The mode of DB Backup, the group of Backup Data Knit form varied.Wherein, backed up in the way of binary system physical file block direct copying, more quickly, reliably.Existing skill In art, the data for backing up out, because database bottom storage is generally basede on file system, under the scene of mass data, can go out Existing problems with:
First, amount of calculation is huge, and all of Backup Data is required for carrying out verification calculating;
2nd, data volume is big, it is difficult to directly Backup Data is verified using a check value;
3rd, actual application system, the least unit of file operation is generally blocks of files, and the correctness of Backup Data is It is required that the correctness of whole file, the result of file is extracted by the result of calculation of blocks of files, in mass data, it is related to big rule The Merging of mould.
The content of the invention
The problem to be solved in the present invention is to provide a kind of Backup Data based on file system that can be quickly to magnanimity and enters The method of row correctness verification, is especially suitable for avoiding the situation of extensive Merging.
In order to solve the above technical problems, the technical solution adopted by the present invention is a kind of extensive backup based on file system Data fast calibration method, including:Archive index file;The archive index file include be actually backed up data file and Blocks of files;The file includes the path of file;The blocks of files includes the path of file where blocks of files, starting partially Shifting, file block size and check value;
Methods described is operated in accordance with the following steps:
Step 1:The check value of archive index file is obtained from backup metadata;
Step 2:Whole archive index file is verified using streaming checking algorithm, is judging archive index file just True property;
Step 3:If archive index file is correct, starts verification thread and the blocks of files is verified;
Step 4:A line record for reading archive index file is judged:
If file, judge that it whether there is, and count is incremented by file total amount;
If blocks of files, start verification thread and the blocks of files is verified, and file size total amount is counted into increasing The size of add file block;If the start offset amount of blocks of files is 0, by file total amount, count is incremented;
Step 5:File total amount, file total amount, file size total amount are obtained by the result of step 4;
Step 6:Traversal backup directory, this step needs to carry out contrast verification with the value of step 5;
Traversal backup directory is file, then by file total amount, count is incremented;
Traversal backup directory is file, then Jia 1 by file total amount record;
Step 7:The file size total amount that contrast step 5 and step 6 draw respectively, folder file total amount.
Further, the archive index file is by the way of additional writing;Often back up a blocks of files, be increased by one it is standby Record in part index file, meanwhile, according to additional Backup Data, the check value of data is write before substitution, update backup The check value of index file itself;In Backup end, by the archive index file check value of itself, backup metadata is write In.
Further, being verified in step 3 can calculate the check value of blocks of files when thread is verified to the blocks of files, The relevant information of blocks of files is written in archive index file simultaneously;The verification of the blocks of files and the relevant information of blocks of files Write-in being capable of parallel work-flow.
Further, the verification thread described in step 4 is multiple thread parallel operations.
Additionally, in order to rapid verification, of the invention reality tighter to the verification of Backup Data is ensure that again Strictly correctly end is equivalent to meet following 4 conditions simultaneously Backup Data:
The All Files block check of condition 1. is correct;
In the archive index file of condition 2. record blocks of files size total amount with backup directory the size of file it is total Amount is equal;
File total amount in the file total amount, file total amount and the Backup Data that are recorded in the archive index file of condition 3., File total amount is consistent;
The file recorded in the index file of condition 4. is all present.
Wherein, condition 2, the total amount of the blocks of files of archive index file record just can be counted while index file is traveled through Calculate, time complexity is o (n), space complexity o (1);Condition 3, because a file may be divided into multiple blocks of files, therefore The file total amount recorded in index file can using the quantity of blocks of files that original position is 0 as All Files total quantity; File total amount can also be calculated in traversal, and time complexity is o (n), space complexity o (1);Condition 4, judges index text The file recorded in part is all present, it is also possible to checked in traversal, and time complexity is o (n), space complexity o (1)。
Wherein, in condition 2, condition 3, the file size total amount of data, file total amount, file total amount, energy are actually backed up It is enough to be completed in the once single, traversal to Backup Data.Due to the traversal only need obtain file title and file it is big Small, speed is many soon relative to reading data, the time complexity o (1) of traversal, space complexity o (1).
The present invention has the advantages and positive effects that:When being verified to data before recovering, backed up using reading first Index file, verification calculating is carried out to archive index file, it is ensured that archive index file is correct, then for archive index The mode that each blocks of files recorded in file is calculated, verified;Due to being minimum operation unit with blocks of files, can keep away Exempt from large-scale Merging;And blocks of files operation is separate, is easy to use parallel computation, and each blocks of files is entered respectively Row verification, makes full use of multi-core CPU resource;Additionally, the present invention is by method of equal value, use time complexity and spatial complex The algorithm of degree, verifies to Backup Data, it is ensured that Backup Data is accurate.
Brief description of the drawings
Fig. 1 is structural representation of the invention;
Fig. 2 is the structural representation of the specific embodiment of the invention.
Specific embodiment
As shown in Fig. 2 the entitled checksum_result of the index file of the present embodiment, it is first determined index file Checksum_result check values in itself are 3127472607, are exactly the Value in accompanying drawing.In specific verification, according to Lower step is carried out:
The first step, verifies index file itself, sees whether its check value is equal to 3127472607;
Second step, on the basis of first step establishment, reads index file;
First record of specific embodiment is read first:It is a file, then judge that this file whether there is;Together When, number of folders record number adds 1;
Article 2 record is read again, and it is still a file, then judge that this file whether there is;Meanwhile, file Folder quantity record number adds 1;
Article 3 record is read, it is a blocks of files, then verify this blocks of files, sees the check value of blocks of files Whether 2049854218 is correct, meanwhile, the record number of file total size adds the file block size 12846085 for now reading;
Article 4 record is read, is also a blocks of files, then verify this blocks of files, see the check value of blocks of files Whether 2117240010 is correct, meanwhile, the record number of file total size adds the file block size 13200431 for now reading.
3rd step, on the basis of second step establishment, travels through true backup data files folder;
File total amount and file total amount in Study document folder, and file size summation;Whether verify these three values It is equal with three count values in second step;If equal, show that Backup Data is completely correct.
One embodiment of the present of invention has been described in detail above, but the content is only preferable implementation of the invention Example, it is impossible to be considered as limiting practical range of the invention.All impartial changes made according to the present patent application scope and improvement Deng all should still belong within patent covering scope of the invention.

Claims (5)

1. a kind of extensive Backup Data fast calibration method based on file system, it is characterised in that methods described includes:It is standby Part index file;The archive index file includes being actually backed up the file and blocks of files of data;The file includes text The path of part folder;The blocks of files includes the path of file, start offset, file block size and check value where blocks of files;
Methods described is operated in accordance with the following steps:
Step 1:The check value of archive index file is obtained from backup metadata;
Step 2:Whole archive index file is verified, the correctness of archive index file is judged;
Step 3:If archive index file is correct, starts verification thread and the blocks of files is verified;
Step 4:A line record for reading archive index file is judged:
If file, judge that it whether there is, and count is incremented by file total amount;
If blocks of files, start verification thread and the blocks of files is verified, and file size total amount is counted into increase text The size of part block;If the start offset amount of blocks of files is 0, by file total amount, count is incremented;
Step 5:File total amount, file total amount, file size total amount are obtained by the result of step 4;
Step 6:Traversal backup directory, this step needs to carry out contrast verification with the value of step 5;
Traversal backup directory is file, then by file total amount, count is incremented;
Traversal backup directory is file, then by file total amount, count is incremented;
Step 7:The file size total amount that contrast step 5 and step 6 draw respectively, file total amount, file total amount.
2. the extensive Backup Data fast calibration method based on file system according to claim 1, it is characterised in that: The archive index file is by the way of additional writing;A blocks of files is often backed up, is increased by an archive index file Record, meanwhile, according to additional Backup Data, the check value of data is write before substitution, update archive index file itself Check value;In Backup end, by the archive index file check value of itself, in write-in backup metadata.
3. the extensive Backup Data fast calibration method based on file system according to claim 1, it is characterised in that: Thread is verified in step 3 can calculate the check value of blocks of files when being verified to the blocks of files, while by blocks of files Relevant information is written in archive index file;The verification of the blocks of files and the relevant information write-in of blocks of files can be grasped parallel Make.
4. the extensive Backup Data fast calibration method based on file system according to claim 1, it is characterised in that: Verification thread described in step 4 is multiple thread parallel operations.
5. the extensive Backup Data fast calibration method based on file system according to claim 1, it is characterised in that: The data that are actually backed up need to meet following 4 conditions simultaneously:
The All Files block check of condition 1. is correct;
The size total amount of the blocks of files recorded in the archive index file of condition 2. and the total amount phase of the size of file in backup directory Deng;
The file total amount recorded in the archive index file of condition 3., the file total amount in file total amount and Backup Data, file Folder total amount is consistent;
The file recorded in the index file of condition 4. is all present.
CN201410664300.XA 2014-11-19 2014-11-19 A kind of extensive Backup Data fast calibration method based on file system Active CN104375906B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410664300.XA CN104375906B (en) 2014-11-19 2014-11-19 A kind of extensive Backup Data fast calibration method based on file system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410664300.XA CN104375906B (en) 2014-11-19 2014-11-19 A kind of extensive Backup Data fast calibration method based on file system

Publications (2)

Publication Number Publication Date
CN104375906A CN104375906A (en) 2015-02-25
CN104375906B true CN104375906B (en) 2017-06-13

Family

ID=52554842

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410664300.XA Active CN104375906B (en) 2014-11-19 2014-11-19 A kind of extensive Backup Data fast calibration method based on file system

Country Status (1)

Country Link
CN (1) CN104375906B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105573863A (en) * 2015-12-14 2016-05-11 北京尚易德科技有限公司 Index file recovery method and apparatus and video monitoring system
CN106372160A (en) * 2016-08-31 2017-02-01 天津南大通用数据技术股份有限公司 Distributive database and management method
CN110633164B (en) * 2019-08-09 2023-05-16 锐捷网络股份有限公司 Message-oriented middleware fault recovery method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1627274A (en) * 1995-03-23 2005-06-15 切恩尼软件(英国)有限公司 Backup system and backup method
CN101477486A (en) * 2009-01-22 2009-07-08 中国人民解放军国防科学技术大学 File backup recovery method based on sector recombination
CN102394894A (en) * 2011-11-28 2012-03-28 武汉大学 Network virtual disk file safety management method based on cloud computing

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004005222A (en) * 2002-05-31 2004-01-08 Internatl Business Mach Corp <Ibm> Backup technique for recording devices with different storage formats

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1627274A (en) * 1995-03-23 2005-06-15 切恩尼软件(英国)有限公司 Backup system and backup method
CN101477486A (en) * 2009-01-22 2009-07-08 中国人民解放军国防科学技术大学 File backup recovery method based on sector recombination
CN102394894A (en) * 2011-11-28 2012-03-28 武汉大学 Network virtual disk file safety management method based on cloud computing

Also Published As

Publication number Publication date
CN104375906A (en) 2015-02-25

Similar Documents

Publication Publication Date Title
CN103136243B (en) File system duplicate removal method based on cloud storage and device
US10248556B2 (en) Forward-only paged data storage management where virtual cursor moves in only one direction from header of a session to data field of the session
CN109710572B (en) HBase-based file fragmentation method
JP2013528883A5 (en)
CN103778030B (en) Daily record subsystem wiring method, error tracking method and processor
CN104199888A (en) Data recovery method and device for resilient file system
WO2020093809A1 (en) Method and device for reading blockchain data
CN106933823B (en) Data synchronization method and device
CN103914522A (en) Data block merging method applied to deleting duplicated data in cloud storage
CN109359283A (en) Method of summary, terminal device and the medium of list data
CN104375906B (en) A kind of extensive Backup Data fast calibration method based on file system
CN104778123A (en) Method and device for detecting system performance
CN103412929A (en) Mass data storage method
CN104484427A (en) Video file storage device and video file storage method
CN108009223B (en) Method and device for detecting consistency of transaction data
CN106897338A (en) A kind of data modification request processing method and processing device for database
KR101351561B1 (en) Big data extracting system and method
CN106354587A (en) Mirror image server and method for exporting mirror image files of virtual machine
CN106682021A (en) Database migration method and device
CN110187834A (en) Data processing method, the device, electronic equipment of copy are deleted again
CN104166524A (en) Processing method of metadata and data
CN106611138B (en) GHOST file security check method and device
CN105224607A (en) A kind of Virtual File System method for designing simulating cloud memory device
CN105573862A (en) Method and equipment for recovering file systems
Ma et al. UCDC: unlimited content-defined chunking, a file-differing method apply to file-synchronization among multiple hosts

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant