CN106708657A - Metadata management method and apparatus, and distributed file system - Google Patents

Metadata management method and apparatus, and distributed file system Download PDF

Info

Publication number
CN106708657A
CN106708657A CN201510468075.7A CN201510468075A CN106708657A CN 106708657 A CN106708657 A CN 106708657A CN 201510468075 A CN201510468075 A CN 201510468075A CN 106708657 A CN106708657 A CN 106708657A
Authority
CN
China
Prior art keywords
metadata
data
backup
backed
differential
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201510468075.7A
Other languages
Chinese (zh)
Inventor
郑跃杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201510468075.7A priority Critical patent/CN106708657A/en
Priority to PCT/CN2015/092114 priority patent/WO2016145838A1/en
Publication of CN106708657A publication Critical patent/CN106708657A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation

Abstract

The invention provides a metadata management method and apparatus, and a distributed file system. The method comprises the steps of calculating differential data of current metadata and backed-up metadata by adopting a differential algorithm during metadata backup; and recovering the current metadata by utilizing the differential data and the backed-up metadata during metadata recovery. By implementing the method and the apparatus, metadata backup and recovery are quickly performed by utilizing a quick differential algorithm and a differential recovery algorithm; an original backup disk space uses order of magnitude reduced to MB from GB; quick backup and recovery of the metadata are realized; and a backup duration of massive metadata of the distributed file system can be shortened to a few minutes from a few hours originally.

Description

A kind of metadata management method and device, distributed file system
Technical field
The present invention relates to distributed document field of storage, more particularly to a kind of metadata management method and device, Distributed file system.
Background technology
The network storage equipment is all used in product including CRBT, MS, WAP gateway etc. at present, in order to reach To the requirement of Large Copacity, high-throughput and high reliability, with the raising to requirements such as these product reliability, Often price is also exponentially to rise, and cost of these equipment in whole system is often above 50%, it has been various storage class business hairs that distributed file system is built on cheap general hardware platform The inexorable trend of exhibition.
Distributed file system is a common store software platform, is operated on common hardware, to need The product of storage service provides storage platform support, there is provided produced by storage, query and search and management product Mass data is serviced, such as content of multimedia storage, business datum storage.Single domain supports 10PB (1,000,000,000 Number of files) storage capacity, system entirety is up to EB grades of (hundred billion number of files) mass storage capacity, and system is whole Body performance is with storage size synchronization linear increase.The file of magnanimity produces the metadata of G up to a hundred, system upgrade Backed up with the disk space that backup procedure is required for G up to a hundred, and need the backup of 1-2h times;With Power system capacity is continuously increased, and system metadata is also continuously increased, and backup-and-restore system metadata turns into system Bottleneck when upgrading or daily backup, a large amount of occupancy system cpu and disk are used.
Therefore, how a kind of metadata management method that can quickly carry out metadata backup and recovery is provided, is Those skilled in the art's technical problem urgently to be resolved hurrily.
The content of the invention
The invention provides a kind of metadata management method and device, distributed file system, to realize to unit The quick backup of data and recovery.
The invention provides a kind of metadata management method, it includes:When metadata is backed up, using difference Algorithm calculates current meta data and has backed up the differential data of metadata;When metadata is recovered, using difference Data and backed up metadata recover current meta data.
Further, having backed up metadata includes the metadata of the last full backup.
Further, differential data includes having backed up mark, variance data side-play amount and the difference number of metadata According to.
Further, using differential data and backed up metadata recover current meta data include:Determine difference Data are corresponding to have backed up metadata, has been backed up according to the determination of variance data side-play amount to be replaced in metadata Data to be replaced are replaced with variance data by data, generate current meta data.
Further, differential data also includes having backed up the original differences data in metadata.
Further, also include:Metadata backup is carried out using multi-threaded parallel and metadata is recovered.
Further, after metadata backup, also include:Size according to differential data determines next number of times Backup mode during according to backup, backup mode includes:Full backup and differential backup.
The invention provides a kind of meta data management device, it includes:Backup module, for standby in metadata During part, current meta data is calculated using difference algorithm and the differential data of metadata has been backed up;Recovery module, For when metadata is recovered, using differential data and metadata is backed up and has recovered current meta data.
Further, having backed up metadata includes the metadata of the last full backup.
Further, differential data includes having backed up mark, variance data side-play amount and the difference number of metadata According to.
Further, recovery module has backed up metadata for determining that differential data is corresponding, according to difference number Determine to have backed up the data to be replaced in metadata according to side-play amount, data to be replaced replaced with into variance data, Generation current meta data.
Further, differential data also includes having backed up the original differences data in metadata.
Further, backup module is additionally operable to carry out metadata backup using multi-threaded parallel, and recovery module is also For carrying out metadata recovery using multi-threaded parallel.
Further, after metadata backup, under backup module is additionally operable to be determined according to the size of differential data Backup mode during data backup, backup mode includes:Full backup and differential backup.
The invention provides a kind of distributed file system, its meta data management device for including present invention offer.
Beneficial effects of the present invention:
The invention provides a kind of new metadata management method, reduced by fast differential algorithm and difference and calculated Method quickly carries out metadata backup and recovery, and original backup disk space by GB using being reduced to MB's Magnitude, realizes quick backup and the recovery of metadata, and distributed file system magnanimity metadata can be made standby Part duration shorten to a few minutes by original a few houres.
Brief description of the drawings
The structural representation of the meta data management device that Fig. 1 is provided for first embodiment of the invention;
The flow chart of the metadata management method that Fig. 2 is provided for second embodiment of the invention;
The flow chart of the metadata management method that Fig. 3 is provided for third embodiment of the invention;
Fig. 4 is the contrast schematic diagram of metadata in third embodiment of the invention.
Specific embodiment
Further annotation explanation is now made to the present invention by way of specific embodiment combination accompanying drawing.
First embodiment:
The structural representation of the meta data management device that Fig. 1 is provided for first embodiment of the invention, can by Fig. 1 Know, in the present embodiment, the meta data management device 1 that the present invention is provided includes:
Backup module 11, for when metadata is backed up, using difference algorithm calculate current meta data with it is standby The differential data of part metadata;
Recovery module 12, for when metadata is recovered, using differential data and has backed up metadata and recovers to work as Preceding metadata.
In certain embodiments, the metadata that backed up in above-described embodiment includes the last full backup Metadata.In actual use, the change of metadata is gradual change, and difference algorithm is with the last first number According to as basis is calculated, the data volume of differential data can be greatly reduced, certainly, those skilled in the art can The metadata of full backup as the basis of Difference Calculation is carried out to be arranged as required to particular point in time.
In certain embodiments, the differential data in above-described embodiment includes having backed up mark, the difference of metadata Different data offset and variance data.So when recovering, it is possible to quickly determine the position of variance data, Carry out that difference is inverse to be calculated current meta data.
In certain embodiments, the recovery module 12 in above-described embodiment is used to determine that differential data is corresponding Backup metadata, determines to have backed up the data to be replaced in metadata according to variance data side-play amount, will wait to replace Change data and replace with variance data, generate current meta data.
In certain embodiments, the differential data in above-described embodiment is also original in metadata including having backed up Variance data.This is when being recovered, it is possible to contrasted, when data to be replaced and original differences number According to it is identical when, illustrate no mistake, can directly be recovered, if differing, illustrate differential data/ There is mistake, it is necessary to judge whether to need to recover according to actual conditions, and how to recover in backup metadata.
In certain embodiments, the backup module 11 in above-described embodiment is additionally operable to be carried out using multi-threaded parallel Metadata is backed up, and recovery module 12 is additionally operable to carry out metadata recovery using multi-threaded parallel.Multi-threaded parallel Mode can further strengthen backup and the resume speed of metadata.
In certain embodiments, after metadata backup, the backup module 11 in above-described embodiment is additionally operable to root Backup mode during data backup next time is determined according to the size of differential data, backup mode includes:Full dose is standby Part and differential backup.Specifically, a threshold value, the difference when certain once carries out differential backup can be set Data are more than threshold value, just illustrate current meta data and have backed up differing greatly for metadata, and this is often represented User has carried out larger modification to the data storage of distributed file system, such as changes hardware device, only Carry out differential backup and easily cause the leakage of variance data to calculate, therefore setting is when backing up next time, using full dose The mode of backup carries out the full backup of former data, used as the basis of follow-up differential backup.
It is corresponding, the invention provides a kind of distributed file system, its metadata for including present invention offer Managing device 1.
Second embodiment:
The flow chart of the metadata management method that Fig. 2 is provided for second embodiment of the invention, as shown in Figure 2, In the present embodiment, the metadata management method that the present invention is provided is comprised the following steps:
S201:When metadata is backed up, current meta data is calculated using difference algorithm and metadata has been backed up Differential data;
S202:When metadata is recovered, using differential data and backed up metadata recover current meta data.
In certain embodiments, the metadata that backed up in above-described embodiment includes the last full backup Metadata.
In certain embodiments, the differential data in above-described embodiment includes having backed up mark, the difference of metadata Different data offset and variance data.
In certain embodiments, in above-described embodiment utilization differential data and metadata has been backed up and has recovered current Metadata includes:Determine that differential data is corresponding and backed up metadata, determined according to variance data side-play amount Data to be replaced are replaced with variance data by the data to be replaced in backup metadata, generate current meta data.
In certain embodiments, the differential data in above-described embodiment is also original in metadata including having backed up Variance data.
In certain embodiments, the method in above-described embodiment also includes:First number is carried out using multi-threaded parallel Recover according to backup and metadata.
In certain embodiments, the method in above-described embodiment also includes after metadata backup:According to difference The size of divided data determines backup mode during data backup next time, and backup mode includes:Full backup and Differential backup.
Further annotation explanation is done to the present invention in conjunction with concrete application scene.
3rd embodiment:
The flow chart of the metadata management method that Fig. 3 is provided for third embodiment of the invention, from the figure 3, it may be seen that In the present embodiment, the metadata management method that the present invention is provided is comprised the following steps:
S301:Backup cycle and full backup threshold value are set.
When system is initial, configuration backup mode, backup cycle are 3 hours, full backup threshold value is 1G.
S302:First time BACKUP TIME is arrived, and carries out full backup.
After backup cycle is arrived, it is defined as first time BACKUP TIME, carries out a metadata full backup, and match somebody with somebody Next backup mode is put for differential backup.The backup file of generation is using current system version and timestamp as standby Part ID.
S303:Second BACKUP TIME is arrived, and carries out differential backup.
Next backup cycle then, checks Backup Data ID and backup mode, according to Backup ID and backup Mode carries out this backup.If generating Backup Data ID after the completion of full backup backup.If difference Backup generation Backup Data ID and the delta files of backup generation.Multithreading is used during backup, while to not Same meta data file carries out difference, generates corresponding differential data file.
S304:Compare differential data and full backup threshold value, it is determined that backup mode next time.
Judge the backup file size of this backup whether in the thresholding for setting after the completion of step S303 backups It is interior, it is differential backup if less than set thresholding 1G to set next backup mode, if greater than under threshold sets Secondary backup mode is full backup.
S305:Metadata recovery is carried out using differential data.
The recovery reduction of backup file, if necessary to reduction initial data, according to Backup Data ID, backup side Formula and initial data are reduced.Likewise, when difference is reduced, according to right by the way of multithreading The differential data answered, current meta data generates corresponding initial data to realize metadata by difference retrieving algorithm Reduction.
The present embodiment is entered by specific difference algorithm come the magnanimity meta data file for distributed file system Row difference, generates corresponding differential file to carry out quick backup and quick reduction, poor using meta data file Divide algorithm and reverse retrieving algorithm, difference generation differential file effectively carried out to original document and file destination, Original document can be reduced into according to differential file and file destination if necessary to restoring data.
In actual production process, the mass file wherein most of distributed file system is not made an amendment, That is the metadata that most of file is produced is constant, and thus differential backup and reduction are to constant The optimal backup-and-restore mode of magnanimity metadata.
As shown in figure 4, two parts of different times metadata two-stage systems compare, if according to existing metadata backup side Method, it is necessary to carry out full backup to all metadata, if according to the present invention, only from 140-1423 word Section is different, and by the difference algorithm comparing difference file generated differential file of byte stream, differential file is only Side-play amount and 3 data of byte (for data backup) and original 3 of different pieces of information can be recorded Byte data (is used for data convert), and the processing procedure of the above is differential file generating algorithm;It is common, It can be only tens the even difference of several bytes that metadata changes, and can be saved with this algorithm process and largely deposited Storage space, and accelerate metadata backup rate.
Metadata recover be differential backup reverse process, by differential file, we can easily basis Side-play amount and difference-byte generation original document in current file and differential file, as above then realizes reverse Retrieving algorithm.
In summary, by implementation of the invention, at least there is following beneficial effect:
The invention provides a kind of new metadata management method, reduced by fast differential algorithm and difference and calculated Method quickly carries out metadata backup and recovery, realize quick backup and the recovery of metadata, specification system The collaboration process of upgrading, can make distributed file system magnanimity metadata backup duration by original a few houres A few minutes are shorten to, backup disk space originally uses the magnitude that MB is reduced to by GB, with first number According to its performance that increases will be apparent from.
The above is only specific embodiment of the invention, any formal limitation not done to the present invention, Every any simple modification made to embodiment of above according to technical spirit of the invention, equivalent variations, With reference to or modification, still fall within the protection domain of technical solution of the present invention.

Claims (15)

1. a kind of metadata management method, it is characterised in that including:
When metadata is backed up, current meta data is calculated using difference algorithm and the difference number of metadata has been backed up According to;
When metadata is recovered, the current unit is recovered using the differential data and the metadata that backed up Data.
2. metadata management method as claimed in claim 1, it is characterised in that described to have backed up first number According to the metadata including the last full backup.
3. metadata management method as claimed in claim 1, it is characterised in that the differential data bag Include mark, variance data side-play amount and the variance data for having backed up metadata.
4. metadata management method as claimed in claim 3, it is characterised in that described to utilize the difference Divided data and the metadata recovery current meta data that backed up include:Determine the differential data correspondence Backup metadata, data to be replaced in metadata have been backed up according to variance data side-play amount determines, The data to be replaced are replaced with into the variance data, the current meta data is generated.
5. metadata management method as claimed in claim 3, it is characterised in that the differential data is also Including the original differences data backed up in metadata.
6. metadata management method as claimed in claim 1, it is characterised in that also include:Using many Thread parallel carries out metadata backup and metadata is recovered.
7. the metadata management method as described in any one of claim 1 to 6, it is characterised in that in unit After data backup, also include:Size according to the differential data determines backup during data backup next time Mode, the backup mode includes:Full backup and differential backup.
8. a kind of meta data management device, it is characterised in that including:
Backup module, for when metadata is backed up, calculating current meta data and having backed up using difference algorithm The differential data of metadata;
Recovery module, for when metadata is recovered, using the differential data and described has backed up metadata Recover the current meta data.
9. meta data management device as claimed in claim 8, it is characterised in that described to have backed up first number According to the metadata including the last full backup.
10. meta data management device as claimed in claim 8, it is characterised in that the differential data bag Include mark, variance data side-play amount and the variance data for having backed up metadata.
11. meta data management devices as claimed in claim 10, it is characterised in that the recovery module Metadata is backed up for determining that the differential data is corresponding, according to variance data side-play amount determines The data to be replaced are replaced with the variance data by the data to be replaced in backup metadata, generate institute State current meta data.
12. meta data management devices as claimed in claim 10, it is characterised in that the differential data Also include the original differences data backed up in metadata.
13. meta data management devices as claimed in claim 8, it is characterised in that the backup module is also For carrying out metadata backup using multi-threaded parallel, the recovery module is additionally operable to enter using multi-threaded parallel Row metadata is recovered.
14. meta data management device as described in any one of claim 8 to 13, it is characterised in that After metadata backup, the backup module is additionally operable to determine next secondary data according to the size of the differential data Backup mode during backup, the backup mode includes:Full backup and differential backup.
15. a kind of distributed file systems, it is characterised in that including such as any one of claim 8 to 14 Described meta data management device.
CN201510468075.7A 2015-08-03 2015-08-03 Metadata management method and apparatus, and distributed file system Withdrawn CN106708657A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201510468075.7A CN106708657A (en) 2015-08-03 2015-08-03 Metadata management method and apparatus, and distributed file system
PCT/CN2015/092114 WO2016145838A1 (en) 2015-08-03 2015-10-16 Metadata management method and device, and distributed file system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510468075.7A CN106708657A (en) 2015-08-03 2015-08-03 Metadata management method and apparatus, and distributed file system

Publications (1)

Publication Number Publication Date
CN106708657A true CN106708657A (en) 2017-05-24

Family

ID=56919642

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510468075.7A Withdrawn CN106708657A (en) 2015-08-03 2015-08-03 Metadata management method and apparatus, and distributed file system

Country Status (2)

Country Link
CN (1) CN106708657A (en)
WO (1) WO2016145838A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480010A (en) * 2017-08-21 2017-12-15 郑州云海信息技术有限公司 A kind of method and device for recovering metadata
CN108089947A (en) * 2017-12-15 2018-05-29 安徽长泰信息安全服务有限公司 A kind of method of the efficient differential backup of multinode
CN111045870A (en) * 2019-12-27 2020-04-21 北京浪潮数据技术有限公司 Method, device and medium for saving and restoring metadata
CN111198902A (en) * 2018-11-16 2020-05-26 长鑫存储技术有限公司 Metadata management method and device, storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102193842A (en) * 2010-03-15 2011-09-21 成都市华为赛门铁克科技有限公司 Data backup method and device
US20120203742A1 (en) * 2011-02-08 2012-08-09 International Business Machines Corporation Remote data protection in a networked storage computing environment
CN103019888A (en) * 2012-12-21 2013-04-03 华为技术有限公司 Backup method and device
CN103049353A (en) * 2012-12-21 2013-04-17 华为技术有限公司 Data backup method and related device
US20150154078A1 (en) * 2010-09-30 2015-06-04 Emc Corporation Post backup catalogs

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7756833B2 (en) * 2004-09-22 2010-07-13 Microsoft Corporation Method and system for synthetic backup and restore
CN101051285A (en) * 2006-09-21 2007-10-10 上海交通大学 File matching method in computer network data backup

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102193842A (en) * 2010-03-15 2011-09-21 成都市华为赛门铁克科技有限公司 Data backup method and device
US20150154078A1 (en) * 2010-09-30 2015-06-04 Emc Corporation Post backup catalogs
US20120203742A1 (en) * 2011-02-08 2012-08-09 International Business Machines Corporation Remote data protection in a networked storage computing environment
CN103019888A (en) * 2012-12-21 2013-04-03 华为技术有限公司 Backup method and device
CN103049353A (en) * 2012-12-21 2013-04-17 华为技术有限公司 Data backup method and related device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李尚忠: "数据备份策略分析研究", 《中国科技信息》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480010A (en) * 2017-08-21 2017-12-15 郑州云海信息技术有限公司 A kind of method and device for recovering metadata
CN108089947A (en) * 2017-12-15 2018-05-29 安徽长泰信息安全服务有限公司 A kind of method of the efficient differential backup of multinode
CN108089947B (en) * 2017-12-15 2021-09-24 安徽长泰信息安全服务有限公司 Multi-node efficient differential backup method
CN111198902A (en) * 2018-11-16 2020-05-26 长鑫存储技术有限公司 Metadata management method and device, storage medium and electronic equipment
CN111198902B (en) * 2018-11-16 2023-06-16 长鑫存储技术有限公司 Metadata management method and device, storage medium and electronic equipment
CN111045870A (en) * 2019-12-27 2020-04-21 北京浪潮数据技术有限公司 Method, device and medium for saving and restoring metadata
CN111045870B (en) * 2019-12-27 2022-06-10 北京浪潮数据技术有限公司 Method, device and medium for saving and restoring metadata

Also Published As

Publication number Publication date
WO2016145838A1 (en) 2016-09-22

Similar Documents

Publication Publication Date Title
CN106407356B (en) Data backup method and device
US11416344B2 (en) Partial database restoration
JP5732536B2 (en) System, method and non-transitory computer-readable storage medium for scalable reference management in a deduplication-based storage system
US9276980B2 (en) Data synchronization based on file system activities
US9612936B2 (en) Correlation of source code with system dump information
US20170344433A1 (en) Apparatus and method for data migration
US7925856B1 (en) Method and apparatus for maintaining an amount of reserve space using virtual placeholders
US9529810B2 (en) Methods of synchronizing files including synchronized hash map linked lists and related data processing nodes
CN103744906A (en) System, method and device for data synchronization
CN106708657A (en) Metadata management method and apparatus, and distributed file system
CN105573859A (en) Data recovery method and device of database
CN106528071B (en) The choosing method and device of object code
CN104461773A (en) Backup deduplication method of virtual machine
CN105160253A (en) Client program restoration method, apparatus and system and server
CN109086425B (en) Data processing method and device for database
CN113961393A (en) Real-time database measuring point deletion recovery method, system, storage medium and server
CN103176867A (en) Fast file differential backup method
CN105224418A (en) A kind of data back up method and device
US20120158652A1 (en) System and method for ensuring consistency in raid storage array metadata
CN102523112B (en) Information processing method and equipment
CN103092955B (en) Checkpointed method, Apparatus and system
CN104539449A (en) Handling method and related device for fault information
US10635636B1 (en) Hyper-V virtual machine synthetic full backup where user and recovery snapshots coexist
CN105260423A (en) Duplicate removal method and apparatus for electronic cards
US11157367B1 (en) Promotional logic during database differential backup

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20170524

WW01 Invention patent application withdrawn after publication