WO2016145838A1 - 一种元数据管理方法及装置、分布式文件系统 - Google Patents

一种元数据管理方法及装置、分布式文件系统 Download PDF

Info

Publication number
WO2016145838A1
WO2016145838A1 PCT/CN2015/092114 CN2015092114W WO2016145838A1 WO 2016145838 A1 WO2016145838 A1 WO 2016145838A1 CN 2015092114 W CN2015092114 W CN 2015092114W WO 2016145838 A1 WO2016145838 A1 WO 2016145838A1
Authority
WO
WIPO (PCT)
Prior art keywords
metadata
backup
data
backed
difference data
Prior art date
Application number
PCT/CN2015/092114
Other languages
English (en)
French (fr)
Inventor
郑跃杰
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016145838A1 publication Critical patent/WO2016145838A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation

Definitions

  • This document relates to, but is not limited to, the field of distributed file storage, and more particularly to a metadata management method, apparatus, distributed file system, and computer readable storage medium.
  • Distributed File System is a general-purpose storage software platform that runs on general-purpose hardware and provides storage platform support for products that need storage services. It provides massive data services such as multimedia content storage and services generated by storage, query retrieval and management products. Data storage, etc.
  • the single domain supports 10PB (1 billion file count) storage capacity, and the system as a whole can reach the EB level (100 billion files) massive storage capacity, and the overall performance of the system grows linearly with the storage scale.
  • Massive files generate hundreds of gigabytes of metadata.
  • System upgrades and backup processes require hundreds of gigabytes of disk space for backup, and require 1-2 hours of backup; as system capacity continues to increase, system metadata continues to increase. Backing up and restoring system metadata becomes a bottleneck in system upgrades or daily backups, and consumes a large amount of system cpu and disk usage.
  • the embodiment of the invention provides a metadata management method and device, and a distributed file system, so as to implement fast backup and recovery of metadata.
  • the embodiment of the present invention provides a metadata management method, which includes: when metadata is backed up, a difference algorithm is used to calculate difference data between current metadata and backed up metadata; when metadata is restored, differential data is used and Backup metadata restores current metadata.
  • the backed up metadata includes metadata for the most recent full backup.
  • the differential data includes an identifier of the backed up metadata, a difference data offset, and difference data.
  • restoring the current metadata by using the differential data and the backed up metadata includes: determining the backed up metadata corresponding to the differential data, determining the to-be-replaced data in the backed up metadata according to the difference data offset, and replacing the data to be replaced Generates current metadata for the difference data.
  • the differential data also includes raw difference data within the backed up metadata.
  • the method further includes: performing multi-thread parallel data backup and metadata recovery.
  • the method further includes: determining a backup mode for the next data backup according to the size of the differential data, where the backup mode includes: full backup and differential backup.
  • the embodiment of the present invention provides a metadata management apparatus, including: a backup module, configured to calculate a difference data between the current metadata and the backed up metadata by using a difference algorithm when the metadata is backed up; and the recovery module is set to When metadata is restored, the current metadata is restored using the differential data and the backed up metadata.
  • the backed up metadata includes metadata for the most recent full backup.
  • the differential data includes an identifier of the backed up metadata, a difference data offset, and difference data.
  • the recovery module is configured to determine the backed up metadata corresponding to the differential data, determine the data to be replaced in the backed up metadata according to the difference data offset, replace the data to be replaced with the difference data, and generate current metadata.
  • the differential data also includes raw difference data within the backed up metadata.
  • the backup module is further configured to perform multi-thread parallel data backup
  • the recovery module is further configured to perform multi-thread parallel data recovery.
  • the backup module is further set to determine according to the size of the differential data.
  • the backup method for the next data backup include: full backup and differential backup.
  • the embodiment of the invention provides a distributed file system, which comprises the metadata management device provided by the embodiment of the invention.
  • the embodiment of the invention provides a new metadata management method, which quickly performs metadata backup and recovery by using a fast differential algorithm and a differential restoration algorithm, and the original backup disk space usage is reduced from GB to MB, and metadata is realized.
  • Fast backup and recovery can reduce the backup time of massive metadata of distributed file system from a few hours to a few minutes.
  • FIG. 1 is a schematic structural diagram of a metadata management apparatus according to a first embodiment of the present invention
  • FIG. 2 is a flowchart of a metadata management method according to a second embodiment of the present invention.
  • FIG. 3 is a flowchart of a metadata management method according to a third embodiment of the present invention.
  • FIG. 4 is a schematic diagram of comparison of metadata in a third embodiment of the present invention.
  • the metadata management apparatus 1 is a schematic structural diagram of a metadata management apparatus according to a first embodiment of the present invention. As shown in FIG. 1, in the embodiment, the metadata management apparatus 1 provided in this embodiment includes:
  • the backup module 11 is configured to calculate, by using a difference algorithm, difference data between the current metadata and the backed up metadata when the metadata is backed up;
  • the recovery module 12 is configured to recover the current metadata using the differential data and the backed up metadata when the metadata is restored.
  • the backed up metadata in the above embodiment includes the last time Full backup of metadata.
  • the change of the metadata is gradual, and the difference algorithm uses the latest metadata as the calculation basis, which can greatly reduce the data amount of the differential data.
  • Metadata is used as the basis for differential calculations.
  • the differential data in the foregoing embodiment includes an identifier of the backed up metadata, a difference data offset, and difference data.
  • the differential inverse calculation is performed to obtain the current metadata.
  • the recovery module 12 in the foregoing embodiment is configured to determine the backed up metadata corresponding to the differential data, and determine, to be replaced, the data to be replaced in the backed up metadata according to the difference data offset.
  • the data is replaced with difference data to generate the current metadata.
  • the differential data in the above embodiments further includes raw difference data within the backed up metadata.
  • the comparison can be made.
  • the data to be replaced is the same as the original difference data, it indicates that there is no error, and the recovery can be directly performed. If not, the difference data/backed metadata has an error, and needs to be based on the actual situation. The situation determines whether recovery is needed and how to recover.
  • the backup module 11 in the foregoing embodiment is further configured to perform metadata backup in parallel by using multiple threads
  • the recovery module 12 is further configured to perform metadata recovery in parallel by using multiple threads.
  • Multi-threaded parallel mode can enhance the backup and recovery speed of metadata.
  • the backup module 11 in the foregoing embodiment is further configured to determine a backup mode in the next data backup according to the size of the differential data, where the backup mode includes: full backup and differential Backup.
  • a threshold may be set.
  • the differential data of the differential backup is greater than the threshold, the difference between the current metadata and the backed up metadata is large, which often represents the storage data of the user on the distributed file system. Large changes have been made, such as replacing hardware devices. Only differential backup is easy to cause the calculation of the difference data. Therefore, when setting the next backup, the full backup of the original data is performed by the full backup method, which is used as the basis for the subsequent differential backup. .
  • the metadata management method provided in this embodiment includes the following steps:
  • the backed up metadata in the above embodiment includes metadata of the most recent full backup.
  • the differential data in the foregoing embodiment includes an identifier of the backed up metadata, a difference data offset, and difference data.
  • restoring the current metadata by using the differential data and the backed up metadata in the foregoing embodiment includes: determining the backed up metadata corresponding to the difference data, and determining the backed up metadata according to the difference data offset.
  • the data to be replaced is replaced with the difference data to generate the current metadata.
  • the differential data in the above embodiments further includes raw difference data within the backed up metadata.
  • the method in the foregoing embodiment further includes performing metadata backup and metadata recovery in parallel by using multiple threads.
  • the method in the foregoing embodiment further includes: determining, according to the size of the differential data, a backup mode in the next data backup, where the backup manner includes: a full backup and a differential backup.
  • FIG. 3 is a flowchart of a metadata management method according to a third embodiment of the present invention. As shown in FIG. 3, the metadata management method provided in this embodiment includes the following steps:
  • S301 Set a backup period and a full backup threshold.
  • the backup mode and backup period are 3 hours, and the full backup threshold is 1G.
  • the generated backup file uses the current system version and timestamp as the backup ID.
  • the backup data ID and backup mode When the next backup cycle expires, check the backup data ID and backup mode, and perform this backup according to the backup ID and backup mode. If it is a full backup, the backup data ID is generated after the backup is completed. If it is a differential backup, generate a backup data ID and a differential data file generated by the backup. Multiple threads are used for backup, and different metadata files are differentiated to generate corresponding differential data files.
  • S304 Compare the differential data with the full backup threshold to determine the next backup mode.
  • step S303 it is determined whether the backup file size of the backup is within the set threshold. If the threshold is less than the set threshold of 1G, the next backup mode is set as a differential backup. If the threshold is greater than the threshold, the next backup is set. The way is full backup.
  • S305 Perform metadata recovery using differential data.
  • Restore and restore the backup file If you need to restore the original data, restore it according to the backup data ID, backup mode and original data.
  • the multi-threaded method is used to implement metadata reduction according to the corresponding differential data, and the current metadata is generated by the differential restoration algorithm to generate corresponding original data.
  • the differential algorithm is used to differentiate the massive metadata files of the distributed file system, and the corresponding differential files are generated for fast backup and fast restoration, and the metadata file difference algorithm and the inverse reduction algorithm are used to effectively the original file and
  • the target file is differentially generated to generate a differential file. If you need to restore the data, you can restore the original file according to the difference file and the target file.
  • the difference file is generated by the difference algorithm of the byte stream to generate a difference file.
  • the difference file only records the offset of different data and the data of 3 bytes (for data backup) and the original
  • the first 3 bytes of data (for data restoration) is the differential file generation algorithm; usually, the metadata change will only be tens or even a few bytes different, so the algorithm can handle Save a lot of storage space and speed up metadata backup.
  • Metadata recovery is the reverse process of differential backup. Through the difference file, we can easily generate the original file based on the offset and difference bytes in the current file and the difference file.
  • the reverse recovery algorithm is implemented as above.
  • a distributed file system which includes the metadata management device 1 provided by the embodiment of the present invention.
  • the embodiment of the present invention further provides a computer readable storage medium, which stores program instructions, and when the program instructions are executed by the processor, can implement a metadata management method provided by the embodiments of the present invention.
  • a new metadata management method which implements metadata backup and recovery through fast differential algorithm and differential restoration algorithm, realizes fast backup and recovery of metadata, and standardizes the collaborative process of system upgrade, which can make distributed file system
  • the backup time of massive metadata has been shortened from a few hours to a few minutes.
  • the original backup disk space usage has been reduced from GB to MB. As the metadata increases, its performance will become more apparent.
  • all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve.
  • Each device/function module/functional unit in the above embodiments may use a general-purpose computing device. Implementations can be centralized on a single computing device or distributed across a network of multiple computing devices.
  • each device/function module/functional unit in the above embodiment When each device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium.
  • the above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
  • the embodiment of the invention provides a new metadata management method, which uses the fast differential algorithm and the differential restoration algorithm to perform metadata backup and recovery, realizes fast backup and recovery of metadata, and standardizes the collaborative process of system upgrade, which can make
  • the massive file backup time of the distributed file system was shortened from a few hours to a few minutes, and the original backup disk space usage was reduced from GB to MB.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明实施例提供了一种元数据管理方法、装置、分布式文件系统和计算机可读存储介质。该方法包括:在元数据备份时,采用差分算法计算得到当前元数据与已备份元数据的差分数据;在元数据恢复时,利用差分数据及已备份元数据恢复当前元数据。

Description

一种元数据管理方法及装置、分布式文件系统 技术领域
本文涉及但不限于分布式文件存储领域,尤其涉及一种元数据管理方法、装置、分布式文件系统和计算机可读存储介质。
背景技术
目前包括彩铃、MS、WAP网关等产品中都用到了网络存储设备,为了达到大容量、高吞吐量和高可靠性的要求,随着对这些产品可靠性等要求的提高,往往价格也是呈现指数级上升,这些设备在整个系统的中的成本往往都超过了50%,在廉价的通用硬件平台上构建分布式文件系统已经是各种存储类业务发展的必然趋势。
分布式文件系统是一个通用存储软件平台,运行在通用硬件之上,为需要存储服务的产品提供存储平台支撑,提供存储、查询检索和管理产品所产生的海量数据服务,如多媒体内容存储、业务数据存储等。单域支持10PB(10亿文件数)存储能力,系统整体可达EB级(千亿文件数)海量存储能力,系统整体性能随存储规模同步线性增长。海量的文件产生上百G的元数据,系统升级和备份过程都需要上百G的磁盘空间来备份,而且需要1-2h时间的备份;随着系统容量不断增加,系统元数据也不断增加,备份和还原系统元数据成为系统升级或日常备份时的瓶颈,大量占用系统cpu和磁盘使用。
因此,如何提供一种可快速进行元数据备份及恢复的元数据管理方法,是本领域技术人员亟待解决的技术问题。
发明内容
以下是对本文详细描述的主题的概述,本概述并非是为了限制权利要求的保护范围。
本发明实施例提供了一种元数据管理方法及装置、分布式文件系统,以实现对元数据的快速备份及恢复。
本发明实施例提供了一种元数据管理方法,其包括:在元数据备份时,采用差分算法计算得到当前元数据与已备份元数据的差分数据;在元数据恢复时,利用差分数据及已备份元数据恢复当前元数据。
可选的,已备份元数据包括最近一次全量备份的元数据。
可选的,差分数据包括已备份元数据的标识、差异数据偏移量及差异数据。
可选的,利用差分数据及已备份元数据恢复当前元数据包括:确定差分数据对应的已备份元数据,根据差异数据偏移量确定已备份元数据内的待替换数据,将待替换数据替换为差异数据,生成当前元数据。
可选的,差分数据还包括已备份元数据内的原始差异数据。
可选的,还包括:采用多线程并行进行元数据备份及元数据恢复。
可选的,在元数据备份后,还包括:根据差分数据的大小确定下一次数据备份时的备份方式,备份方式包括:全量备份和差分备份。
本发明实施例提供了一种元数据管理装置,其包括:备份模块,设置为在元数据备份时,采用差分算法计算得到当前元数据与已备份元数据的差分数据;恢复模块,设置为在元数据恢复时,利用差分数据及已备份元数据恢复当前元数据。
可选的,已备份元数据包括最近一次全量备份的元数据。
可选的,差分数据包括已备份元数据的标识、差异数据偏移量及差异数据。
可选的,恢复模块设置为确定差分数据对应的已备份元数据,根据差异数据偏移量确定已备份元数据内的待替换数据,将待替换数据替换为差异数据,生成当前元数据。
可选的,差分数据还包括已备份元数据内的原始差异数据。
可选的,备份模块还设置为采用多线程并行进行元数据备份,恢复模块还设置为采用多线程并行进行元数据恢复。
可选的,在元数据备份后,备份模块还设置为根据差分数据的大小确定 下一次数据备份时的备份方式,备份方式包括:全量备份和差分备份。
本发明实施例提供了一种分布式文件系统,其包括本发明实施例提供的元数据管理装置。
本发明实施例的有益效果:
本发明实施例提供了一种新的元数据管理方法,通过快速差分算法和差分还原算法快速进行元数据备份和恢复,原来的备份磁盘空间使用由GB降低到MB的量级,实现了元数据的快速备份及恢复,可以使分布式文件系统海量元数据备份时长由原来的几小时缩短为几分钟。
在阅读并理解了附图和详细描述后,可以明白其他方面。
附图概述
图1为本发明第一实施例提供的元数据管理装置的结构示意图;
图2为本发明第二实施例提供的元数据管理方法的流程图;
图3为本发明第三实施例提供的元数据管理方法的流程图;
图4为本发明第三实施例中元数据的对比示意图。
本发明的较佳实施方式
现通过具体实施方式结合附图的方式对本发明实施例的方案做出进一步的诠释说明。
第一实施例:
图1为本发明第一实施例提供的元数据管理装置的结构示意图,由图1可知,在本实施例中,本实施例提供的元数据管理装置1包括:
备份模块11,设置为在元数据备份时,采用差分算法计算得到当前元数据与已备份元数据的差分数据;
恢复模块12,设置为在元数据恢复时,利用差分数据及已备份元数据恢复当前元数据。
可选地,在一些实施例中,上述实施例中的已备份元数据包括最近一次 全量备份的元数据。在实际使用中,元数据的变化是渐变的,差分算法以最近一次的元数据作为计算基础,可以大大减少差分数据的数据量,当然,本领域技术人员可以根据需要设置特定时间点进行全量备份的元数据作为差分计算的基础。
可选地,在一些实施例中,上述实施例中的差分数据包括已备份元数据的标识、差异数据偏移量及差异数据。这样在恢复时,就可以快速的确定差异数据的位置,进行差分逆计算得到当前元数据。
可选地,在一些实施例中,上述实施例中的恢复模块12设置为确定差分数据对应的已备份元数据,根据差异数据偏移量确定已备份元数据内的待替换数据,将待替换数据替换为差异数据,生成当前元数据。
可选地,在一些实施例中,上述实施例中的差分数据还包括已备份元数据内的原始差异数据。这个在进行恢复时,就可以进行对比,当待替换数据与原始差异数据相同时,说明没有错误,可以直接进行恢复,若不相同,则说明差分数据/已备份元数据存在错误,需要根据实际情况判断是否需要恢复,以及如何恢复。
可选地,在一些实施例中,上述实施例中的备份模块11还设置为采用多线程并行进行元数据备份,恢复模块12还设置为采用多线程并行进行元数据恢复。多线程并行方式可以加强元数据的备份及恢复速度。
可选地,在一些实施例中,在元数据备份后,上述实施例中的备份模块11还设置为根据差分数据的大小确定下一次数据备份时的备份方式,备份方式包括:全量备份和差分备份。可选的,可以设置一个阈值,当某一次进行差分备份时的差分数据大于阈值,就说明当前元数据与已备份元数据的差异较大,这往往代表了用户对分布式文件系统的存储数据进行了较大的修改,如更换硬件设备等,仅进行差分备份容易造成差异数据的漏计算,因此设置下一次备份时,采用全量备份的方式进行原数据的全量备份,作为后续差分备份的基础。
第二实施例:
图2为本发明第二实施例提供的元数据管理方法的流程图,由图2可知,在本实施例中提供的元数据管理方法包括以下步骤:
S201:在元数据备份时,采用差分算法计算得到当前元数据与已备份元数据的差分数据;
S202:在元数据恢复时,利用差分数据及已备份元数据恢复当前元数据。
可选地,在一些实施例中,上述实施例中的已备份元数据包括最近一次全量备份的元数据。
可选地,在一些实施例中,上述实施例中的差分数据包括已备份元数据的标识、差异数据偏移量及差异数据。
可选地,在一些实施例中,上述实施例中的利用差分数据及已备份元数据恢复当前元数据包括:确定差分数据对应的已备份元数据,根据差异数据偏移量确定已备份元数据内的待替换数据,将待替换数据替换为差异数据,生成当前元数据。
可选地,在一些实施例中,上述实施例中的差分数据还包括已备份元数据内的原始差异数据。
可选地,在一些实施例中,上述实施例中的方法还包括:采用多线程并行进行元数据备份及元数据恢复。
可选地,在一些实施例中,上述实施例中的方法在元数据备份后,还包括:根据差分数据的大小确定下一次数据备份时的备份方式,备份方式包括:全量备份和差分备份。
现结合具体应用场景对本发明实施例做进一步的诠释说明。
第三实施例:
图3为本发明第三实施例提供的元数据管理方法的流程图,由图3可知,在本实施例中提供的元数据管理方法包括以下步骤:
S301:设置备份周期及全量备份阈值。
系统初始时,配置备份方式、备份周期为3小时、全量备份阈值为1G。
S302:第一次备份时间到,进行全量备份。
备份周期到后,确定为第一次备份时间,进行一次元数据全量备份,并配置下次备份方式为差分备份。生成的备份文件以当前系统版本和时间戳作为备份ID。
S303:第二次备份时间到,进行差分备份。
下一个备份周期到时,检查备份数据ID和备份方式,根据备份ID和备份方式进行本次备份。如果是全量备份,备份完成后生成备份数据ID。如果是差分备份,生成备份数据ID和备份生成的差分数据文件。备份时采用多线程,同时对不同的元数据文件进行差分,生成对应的差分数据文件。
S304:比较差分数据与全量备份阈值,确定下一次备份方式。
在步骤S303备份完成后,判断本次备份的备份文件大小是否在设定的门限内,如果小于设置门限1G,则设定下次备份方式为差分备份,如果大于门限,则设定下次备份方式为全量备份。
S305:利用差分数据进行元数据恢复。
备份文件的恢复还原,如果需要还原原始数据,根据备份数据ID,备份方式和原始数据进行还原。同样的,差分还原的时候,采用多线程的方式根据对应的差分数据,当前元数据由差分还原算法生成对应的原始数据来实现元数据还原。
本实施例通过差分算法来针对分布式文件系统的海量元数据文件进行差分,生成对应的差分文件来进行快速备份和快速还原,采用元数据文件差分算法和逆向还原算法,有效的对原始文件和目标文件进行差分生成差分文件,如果需要还原数据可以根据差分文件和目标文件还原成原始文件。
在实际生产过程中,分布式文件系统的海量文件其中大部分是不做修改的,也就是说大部分文件产生的元数据是不变的,由此差分备份和还原是对不变的海量元数据最佳的备份和还原方式。
如图4所示,两份不同时期元数据二进制比较,若按照现有元数据备份方法,需要对所有元数据进行全量备份,若依据本发明,只有从140-1423个字节是不同的,通过字节流的差分算法比较差异文件生成差分文件,差分文件只会记录不同数据的偏移量以及3个字节的数据(用于数据备份)和原 始的3个字节数据(用于数据还原),以上的处理过程即是差分文件生成算法;通常的,元数据改变只会是几十个甚至几个字节的不同,以此算法处理可以节约大量存储空间,并且加快元数据备份速度。
元数据恢复是差分备份的逆向过程,通过差分文件我们可以很容易的根据当前文件与差分文件中的偏移量和差异字节生成原始文件,如上则实现了逆向还原算法。
第四实施例:
对应的,还提供了一种分布式文件系统,其包括本发明实施例提供的元数据管理装置1。
第五实施例:
本发明实施例还提供一种计算机可读存储介质,存储有程序指令,当该程序指令被处理器执行时可实现本发明实施例所提供的一种元数据管理方法。
综上可知,通过本发明实施例的方案,至少存在以下有益效果:
提供了一种新的元数据管理方法,通过快速差分算法和差分还原算法进行元数据备份和恢复,实现了元数据的快速备份及恢复,规范了系统升级的协作流程,可以使分布式文件系统海量元数据备份时长由原来的几小时缩短为几分钟,原来的备份磁盘空间使用量由GB降低到MB的量级,随着元数据的增多,其性能将更加明显。
本领域普通技术人员可以理解上述实施例的全部或部分步骤可以使用计算机程序流程来实现,所述计算机程序可以存储于一计算机可读存储介质中,所述计算机程序在相应的硬件平台上(如系统、设备、装置、器件等)执行,在执行时,包括方法实施例的步骤之一或其组合。
可选地,上述实施例的全部或部分步骤也可以使用集成电路来实现,这些步骤可以被分别制作成一个个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。
上述实施例中的各装置/功能模块/功能单元可以采用通用的计算装置来 实现,它们可以集中在单个的计算装置上,也可以分布在多个计算装置所组成的网络上。
上述实施例中的各装置/功能模块/功能单元以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。上述提到的计算机可读取存储介质可以是只读存储器,磁盘或光盘等。
工业实用性
本发明实施例提供了一种新的元数据管理方法,通过快速差分算法和差分还原算法进行元数据备份和恢复,实现了元数据的快速备份及恢复,规范了系统升级的协作流程,可以使分布式文件系统海量元数据备份时长由原来的几小时缩短为几分钟,原来的备份磁盘空间使用量由GB降低到MB的量级。

Claims (16)

  1. 一种元数据管理方法,包括:
    在元数据备份时,采用差分算法计算得到当前元数据与已备份元数据的差分数据;
    在元数据恢复时,利用所述差分数据及所述已备份元数据恢复所述当前元数据。
  2. 如权利要求1所述的元数据管理方法,其中,所述已备份元数据包括最近一次全量备份的元数据。
  3. 如权利要求1所述的元数据管理方法,其中,所述差分数据包括所述已备份元数据的标识、差异数据偏移量及差异数据。
  4. 如权利要求3所述的元数据管理方法,其中,所述利用所述差分数据及所述已备份元数据恢复所述当前元数据包括:确定所述差分数据对应的已备份元数据,根据差异数据偏移量确定所述已备份元数据内的待替换数据,将所述待替换数据替换为所述差异数据,生成所述当前元数据。
  5. 如权利要求3所述的元数据管理方法,其中,所述差分数据还包括所述已备份元数据内的原始差异数据。
  6. 如权利要求1所述的元数据管理方法,还包括:采用多线程并行进行元数据备份及元数据恢复。
  7. 如权利要求1至6任一项所述的元数据管理方法,其中,在元数据备份后,还包括:根据所述差分数据的大小,确定下一次数据备份时的备份方式;所述备份方式包括:全量备份和差分备份。
  8. 一种元数据管理装置,包括:备份模块和恢复模块;其中,
    所述备份模块,设置为在元数据备份时,采用差分算法计算得到当前元数据与已备份元数据的差分数据;
    所述恢复模块,设置为在元数据恢复时,利用所述差分数据及所述已备份元数据恢复所述当前元数据。
  9. 如权利要求8所述的元数据管理装置,其中,所述已备份元数据包括 最近一次全量备份的元数据。
  10. 如权利要求8所述的元数据管理装置,其中,所述差分数据包括所述已备份元数据的标识、差异数据偏移量及差异数据。
  11. 如权利要求10所述的元数据管理装置,其中,所述恢复模块设置为确定所述差分数据对应的已备份元数据,根据差异数据偏移量确定所述已备份元数据内的待替换数据,将所述待替换数据替换为所述差异数据,生成所述当前元数据。
  12. 如权利要求10所述的元数据管理装置,其中,所述差分数据还包括所述已备份元数据内的原始差异数据。
  13. 如权利要求8所述的元数据管理装置,其中,所述备份模块还设置为采用多线程并行进行元数据备份,所述恢复模块还设置为采用多线程并行进行元数据恢复。
  14. 如权利要求8至13任一项所述的元数据管理装置,其中,在元数据备份后,所述备份模块还设置为根据所述差分数据的大小确定下一次数据备份时的备份方式;所述备份方式包括:全量备份和差分备份。
  15. 一种分布式文件系统,包括如权利要求8至14任一项所述的元数据管理装置。
  16. 一种计算机可读存储介质,存储有程序指令,当该程序指令被处理器执行时实现权利要求1至7任一项所述的方法。
PCT/CN2015/092114 2015-08-03 2015-10-16 一种元数据管理方法及装置、分布式文件系统 WO2016145838A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510468075.7A CN106708657A (zh) 2015-08-03 2015-08-03 一种元数据管理方法及装置、分布式文件系统
CN201510468075.7 2015-08-03

Publications (1)

Publication Number Publication Date
WO2016145838A1 true WO2016145838A1 (zh) 2016-09-22

Family

ID=56919642

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/092114 WO2016145838A1 (zh) 2015-08-03 2015-10-16 一种元数据管理方法及装置、分布式文件系统

Country Status (2)

Country Link
CN (1) CN106708657A (zh)
WO (1) WO2016145838A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480010A (zh) * 2017-08-21 2017-12-15 郑州云海信息技术有限公司 一种恢复元数据的方法及装置
CN108089947B (zh) * 2017-12-15 2021-09-24 安徽长泰信息安全服务有限公司 一种多节点高效差异备份的方法
CN111198902B (zh) * 2018-11-16 2023-06-16 长鑫存储技术有限公司 元数据管理方法、装置、存储介质及电子设备
CN111045870B (zh) * 2019-12-27 2022-06-10 北京浪潮数据技术有限公司 一种保存与恢复元数据的方法、装置和介质
CN112905221A (zh) * 2021-02-20 2021-06-04 百度在线网络技术(北京)有限公司 一种版本回退方法、装置、电子设备及存储介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1752939A (zh) * 2004-09-22 2006-03-29 微软公司 用于综合备份和恢复的方法和系统
CN101051285A (zh) * 2006-09-21 2007-10-10 上海交通大学 计算机网络数据备份中文件匹配的方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102193842A (zh) * 2010-03-15 2011-09-21 成都市华为赛门铁克科技有限公司 一种数据备份方法和装置
US8943356B1 (en) * 2010-09-30 2015-01-27 Emc Corporation Post backup catalogs
US8676763B2 (en) * 2011-02-08 2014-03-18 International Business Machines Corporation Remote data protection in a networked storage computing environment
CN103019888B (zh) * 2012-12-21 2016-03-30 华为技术有限公司 备份方法与装置
CN103049353B (zh) * 2012-12-21 2016-01-06 华为技术有限公司 一种数据备份方法及相关装置

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1752939A (zh) * 2004-09-22 2006-03-29 微软公司 用于综合备份和恢复的方法和系统
CN101051285A (zh) * 2006-09-21 2007-10-10 上海交通大学 计算机网络数据备份中文件匹配的方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LI, SHANGZHONG: "Analysis and Study of Data Backup Strategy", CHINA SCIENCE AND TECHNOLOGY INFORMATION, 29 February 2008 (2008-02-29), ISSN: 1001-8972 *
LIU, HUIMIN: "Analysis of Data Backup Strategy", FUJIAN COMPUTER, 31 August 2007 (2007-08-31), ISSN: 1673-2782 *

Also Published As

Publication number Publication date
CN106708657A (zh) 2017-05-24

Similar Documents

Publication Publication Date Title
US11416344B2 (en) Partial database restoration
US10880375B2 (en) Data driven backup policy for data-centers and applications
WO2016145838A1 (zh) 一种元数据管理方法及装置、分布式文件系统
JP6865219B2 (ja) 連続クエリ処理におけるイベントバッチ処理、出力シーケンス化、およびログベースの状態記憶
CN106933501B (zh) 用于创建复制品的方法、系统和计算机程序产品
US9645892B1 (en) Recording file events in change logs while incrementally backing up file systems
US10268695B2 (en) Snapshot creation
US10204016B1 (en) Incrementally backing up file system hard links based on change logs
US10585753B2 (en) Checkpoint triggering in a computer system
US10936216B2 (en) Method and system for storage exhaustion estimation
US20210081284A1 (en) Incrementally updating recovery map data for a memory system
CN105573859A (zh) 一种数据库的数据恢复方法和设备
US10409691B1 (en) Linking backup files based on data partitions
CN111771193A (zh) 用于在生产集群中备份最终一致的数据库的系统和方法
GB2500085A (en) Determining whether a standby database is synchronized with a primary database
US11461186B2 (en) Automatic backup strategy selection
US11593213B2 (en) Classifying snapshot image processing
US20240248801A1 (en) Optimizing snapshot image processing
US9557932B1 (en) Method and system for discovering snapshot information based on storage arrays
US20150112946A1 (en) Computing device and data recovery method for distributed file system
US20120158673A1 (en) Storing and publishing contents of a content store
US11157367B1 (en) Promotional logic during database differential backup
US10387264B1 (en) Initiating backups based on data changes
US10284593B1 (en) Protecting newly restored clients from computer viruses
CN110658989B (zh) 用于备份存储垃圾收集的系统和方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15885208

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15885208

Country of ref document: EP

Kind code of ref document: A1