CN104199963A - Method and device for HBase data backup and recovery - Google Patents

Method and device for HBase data backup and recovery Download PDF

Info

Publication number
CN104199963A
CN104199963A CN201410483014.3A CN201410483014A CN104199963A CN 104199963 A CN104199963 A CN 104199963A CN 201410483014 A CN201410483014 A CN 201410483014A CN 104199963 A CN104199963 A CN 104199963A
Authority
CN
China
Prior art keywords
data
hbase
recovery
file
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410483014.3A
Other languages
Chinese (zh)
Inventor
刘璧怡
郭美思
吴楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201410483014.3A priority Critical patent/CN104199963A/en
Publication of CN104199963A publication Critical patent/CN104199963A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for HBase data backup and recovery. The method includes brushing data in an HBase memory to HFile during backup of HBase data; creating corresponding reference file for each HFile in each Region under an HBase list structure and backing up the HFile of each HBase list; during backup of the HBase data, if the data needed to be recovered are persistent data, performing persistent data recovery according to the reference files corresponding to the data needed to be recovered, if the data needed to be recovered are memory data, then recovering the data of the HBase memory according to log files. By the method and the device, the HBase data can be backed up and recovered effectively and completely.

Description

The method and apparatus of HBase data backup restoration
Technical field
The present invention relates to technical field of data processing, relate in particular to the method and apparatus of a kind of HBase (Hadoop Database) data backup restoration.
Background technology
Be accompanied by the arrival in mass data epoch, computation model is also experiencing the differentiation of various modes.Evolution process from single computing machine to Distributed Calculation is the inexorable trend of sustainable growth data volume.Present stage, the demands such as the analysis of large data sets, management, excavation are all that traditional database is incompetent, and according to statistics, the structural data that data base tool is processed is in GB rank, and conventional art cannot adapt to this extendability.In the technology and instrument now having, the most ripe is Hadoop file storage Computational frame and framework associated component thereon.Hadoop itself is by Hadoop distributed file system (HDFS, Hadoop Distributed File System) and distributed computing framework MapReduce form, wherein MapReduce Computational frame is mainly applicable to batch documents processing, and is HBase in the technology of mainly using aspect real-time data analysis processing.
HBase is based upon on distributed file system HDFS, and being one provides high reliability, row storage, high-performance, scalable, the distributed data base system that can read and write in real time.It can carry out retrieve data by major key and major key scope between non-relational database and relevant database.Data in HBase are times to time change, if want to use the data of section sometime to back up recovery to it.Therefore, HBase data backup restoration is very important.
In HBase, data are divided into two parts storages, and a part is in internal memory, and another part is that the form with HFile (Hadoop File) file is persisted on HDFS.Therefore, when recovering, executing data backup two-part content need be backed up to recovery, but, existing HBase data backup restoration method need to stop the service of HBase when carrying out HBase data backup restoration, affect user's operation, and carry out data backup restoration according to the journal file in HBase, be difficult to guarantee the data integrity of two parts content when backup recovers.
Summary of the invention
In order to solve the problems of the technologies described above, the invention provides a kind of HBase data backup restoration method and apparatus, can efficiently and intactly to HBase data, back up recovery.
In order to reach the object of the invention, the invention provides a kind of HBase data backup restoration method, comprising: when HBase data back up, the data in HBase internal memory are brushed in HFile file; For the corresponding reference document of each HFile document creation in each Region under HBase list structure, the HFile file of each HBase table is backed up; When HBase data are recovered, if the data of required recovery are perdurable datas, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery; If the data of required recovery are internal storage datas, according to journal file, HBase internal storage data is recovered.
Further, if the data of required recovery are perdurable datas, the reference document corresponding according to the data of required recovery, carry out perdurable data recovery, comprise: if the data of required recovery are perdurable datas, reference document corresponding to the data of required recovery put under the corresponding Region file of HBase table, carried out perdurable data recovery.
Further, carrying out perdurable data recovery is when the Region of HBase merges, to be organized into complete data, and union operation is that a plurality of HFile Piece file mergences are become to large file, when reaching HBase merging configuration parameter value, automatically triggers.
Further, journal file comprises the HBase table of executable operations, the corresponding executable operations that the HBase of executable operations shows corresponding Region and carries out.
Further, if the data of required recovery are internal storage datas, according to journal file, HBase internal storage data is recovered, comprise: if the data of required recovery are internal storage datas, according to HBase table name corresponding in journal file, claim the title with Region, journal file is returned under corresponding Region file; When reference document and daily record are placed under Region file corresponding position, start HBase table, Region is assigned in corresponding RegionServer, and Region can read corresponding journal file in own inner MemStore, completes HBase data and recovers.
A HBase data backup restoration device, comprising: backup units, for when HBase data back up, brushes the data in HBase internal memory in HFile file; For the corresponding reference document of each HFile document creation in each Region under HBase list structure, the HFile file of each HBase table is backed up; Recovery unit, for when HBase data are recovered, if the data of required recovery are perdurable datas, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery; If the data of required recovery are internal storage datas, according to journal file, HBase internal storage data is recovered.
Compared with prior art, the present invention includes: when HBase data back up, the data in HBase internal memory are brushed in HFile file; For the corresponding reference document of each HFile document creation in each Region under HBase list structure, the HFile file of each HBase table is backed up; When HBase data are recovered, if the data of required recovery are perdurable datas, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery; If the data of required recovery are internal storage datas, according to journal file, HBase internal storage data is recovered.The present invention is by the data in HBase internal memory are brushed in HFile file, unified HFile file carried out to backup operation, guaranteed when backup recovers the data integrity of internal memory and HDFS two parts content in HBase.In addition, for the corresponding reference document of each HFile document creation, to storing the position of data in HBase, back up, when carrying out data recovery, file carries out the data recovery of HBase by reference, has guaranteed the normal use of data when carrying out HBase data backup restoration.Therefore, can efficiently and intactly to HBase data, back up recovery.
Accompanying drawing explanation
Fig. 1 is the framework schematic diagram of HBase storage data of the present invention.
Fig. 2 is the schematic flow sheet of HBase data backup restoration method of the present invention.
Fig. 3 is the structural representation of HBase data backup restoration device of the present invention.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail.By these exemplifying embodiments of enough detailed description, make those skilled in the art can put into practice the present invention.Without departing from the spirit and scope in the present invention, can to implement to make logic, realize with other change.
Fig. 1 is the framework schematic diagram of HBase storage data of the present invention.As shown in Figure 1, HBase table is that form with Region is kept in RegionServer in logic.The record number corresponding when HBase table constantly increases and surpasses after certain threshold value, can automatically split into a plurality of Region, and different Region can be distributed to corresponding RegionServer by Master and manage.
Region is the minimum unit of distributed storage and load balancing in HBase, and it is upper that different Region can assign to different RegionServer, but a Region can not split on a plurality of RegionServer.It should be noted that Region is the minimum unit of distributed storage, but be not the minimum unit of storage, the minimum unit of storage is Store.
Region is comprised of one or more Store, and each Store preserves Yi Gelie family, and each Store is comprised of an internal memory (MemStore) and a plurality of StoreFile again, and wherein StoreFile is that form persistence with HFile is on HDFS.Therefore, when HBase data backup, need to back up MemStore and HFile file, when HBase data are recovered, need to recover the data of HFile file and MemStore.
Fig. 2 is the schematic flow sheet of HBase data backup restoration method of the present invention, and as shown in Figure 2, first the method backs up HBase data, and then carries out the recovery of HBase data, specifically comprises:
Step 21, when HBase data back up, brushes the data in HBase internal memory in HFile file.
In this step, in HBase, data are divided into two parts storages, and a part is in HBase internal memory, and another part is that the form with HFile file is persisted on HDFS.
When HBase data are backed up, first the data in HBase internal memory are brushed in HFile, guarantee that all data are all by persistence, follow-up just can unification HFile file is carried out to corresponding backup operation.
Step 22, is the corresponding reference document of each HFile document creation in each Region under HBase list structure, and the HFile file of each HBase table is backed up.
Step 23, when HBase data are recovered, judges whether the data of required recovery are perdurable datas, if so, enters step 24; If not, enter step 25.
Step 24, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery.
This step is specially: reference document corresponding to the data of required recovery put under the corresponding Region file of HBase table, carried out perdurable data recovery.The disk space that this reference document takies is very little, so the speed of copy can be very fast.
When reading out data, according to reference document, in backup file folder, find corresponding data to read.
It is when the Region of HBase merges, to be just organized into complete data that perdurable data recovers.Union operation is automatically to trigger when reaching the value of HBase merging configuration parameter, and union operation can become large file by a plurality of HFile Piece file mergences, can, so that need not open a plurality of files when the operations such as inquiry, only need to open a file like this.So, can, so that HBase data provide service after recovering as soon as possible, improve efficiency.
Step 25, according to journal file, recovers HBase internal storage data.
This step is that the data of persistence are not yet recovered, and by the data of recovering in journal file, is undertaken HBase internal storage data to recover.When carrying out the recovery of HBase internal storage data, in each Region, there is one's own internal memory, therefore, need to recover data to the internal memory of each Region.
The journal file of HBase is that each operation is carried out to record, and this journal file comprises the HBase table of executable operations, the corresponding executable operations that the HBase of this executable operations shows corresponding Region and carries out.According to HBase table name corresponding in journal file, claim the title with Region, journal file is returned under corresponding Region file.
When reference document and daily record have all been placed under Region file corresponding position, just can start HBase and show.When starting HBase table, Region can be assigned in corresponding RegionServer, and Region can read corresponding journal file in own inner MemStore, has completed HBase data and has recovered.
The present invention is by the data in HBase internal memory are brushed in HFile file, unified HFile file carried out to backup operation, guaranteed when backup recovers the data integrity of internal memory and HDFS two parts content in HBase.In addition, for the corresponding reference document of each HFile document creation, to storing the position of data in HBase, back up, when carrying out data recovery, file carries out the data recovery of HBase by reference, has guaranteed the normal use of data when carrying out HBase data backup restoration.Therefore, can efficiently and intactly to HBase data, back up recovery.
Fig. 3 is the structural representation of HBase data backup restoration device of the present invention, as shown in Figure 3, specifically comprises:
Backup units, for when HBase data back up, brushes the data in HBase internal memory in HFile file; For the corresponding reference document of each HFile document creation in each Region under HBase list structure, the HFile file of each HBase table is backed up;
Recovery unit, for when HBase data are recovered, judges whether the data of required recovery are perdurable data HFile files, and if so, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery; If not, according to journal file, HBase internal storage data is recovered.
HBase data backup restoration device and HBase data backup restoration method are corresponding, and therefore, the concrete details that realizes can, referring to HBase data backup restoration method, be not repeated herein.
HBase data backup restoration device of the present invention is by the data in HBase internal memory are brushed in HFile file, unified HFile file carried out to backup operation, guaranteed when backup recovers the data integrity of internal memory and HDFS two parts content in HBase.In addition, for the corresponding reference document of each HFile document creation, to storing the position of data in HBase, back up, when carrying out data recovery, file carries out the data recovery of HBase by reference, has guaranteed the normal use of data when carrying out HBase data backup restoration.Therefore, can efficiently and intactly to HBase data, back up recovery.
Be to be understood that, although this instructions is described according to embodiment, but not each embodiment only comprises an independently technical scheme, this narrating mode of instructions is only for clarity sake, those skilled in the art should make instructions as a whole, technical scheme in each embodiment also can, through appropriately combined, form other embodiments that it will be appreciated by those skilled in the art that.
Listed a series of detailed description is above only illustrating for feasibility embodiment of the present invention; they are not for limiting the scope of the invention, all disengaging within equivalent embodiment that skill spirit of the present invention does or change all should be included in protection scope of the present invention.

Claims (10)

1. a method for HBase data backup restoration, is characterized in that, comprising:
When HBase data back up, the data in HBase internal memory are brushed in HFile file; For the corresponding reference document of each HFile document creation in each Region under HBase list structure, the HFile file of each HBase table is backed up;
When HBase data are recovered, if the data of required recovery are perdurable datas, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery; If the data of required recovery are internal storage datas, according to journal file, HBase internal storage data is recovered.
2. the method for HBase data backup restoration according to claim 1, is characterized in that, if the data of described required recovery are perdurable datas, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery, comprising:
If the data of required recovery are perdurable datas, reference document corresponding to the data of required recovery put under the corresponding Region file of HBase table, carry out perdurable data recovery.
3. the method for HBase data backup restoration according to claim 1 and 2, it is characterized in that, it is described that to carry out that perdurable data recovers be when the Region of HBase merges, to be organized into complete data, union operation is that a plurality of HFile Piece file mergences are become to large file, when reaching HBase merging configuration parameter value, automatically triggers.
4. the method for HBase data backup restoration according to claim 1, is characterized in that, described journal file comprises the HBase table of executable operations, the corresponding executable operations that the HBase of executable operations shows corresponding Region and carries out.
5. the method for HBase data backup restoration according to claim 4, is characterized in that, if the data of described required recovery are internal storage datas, according to journal file, HBase internal storage data is recovered, and comprising:
If the data of required recovery are internal storage datas, according to HBase table name corresponding in journal file, claim the title with Region, journal file is returned under corresponding Region file; When reference document and daily record are placed under Region file corresponding position, start HBase table, Region is assigned in corresponding RegionServer, and Region can read corresponding journal file in own inner MemStore, completes HBase data and recovers.
6. a HBase data backup restoration device, is characterized in that, comprising:
Backup units, for when HBase data back up, brushes the data in HBase internal memory in HFile file; For the corresponding reference document of each HFile document creation in each Region under HBase list structure, the HFile file of each HBase table is backed up;
Recovery unit, for when HBase data are recovered, if the data of required recovery are perdurable datas, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery; If the data of required recovery are internal storage datas, according to journal file, HBase internal storage data is recovered.
7. HBase data backup restoration device according to claim 6, is characterized in that described recovery unit, if the data for required recovery are perdurable datas, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery, comprising:
If described recovery unit is perdurable data for the data of required recovery, reference document corresponding to the data of required recovery put under the corresponding Region file of HBase table, carry out perdurable data recovery.
8. according to the HBase data backup restoration device described in claim 6 or 7, it is characterized in that, it is described that to carry out that perdurable data recovers be when the Region of HBase merges, to be organized into complete data, union operation is that a plurality of HFile Piece file mergences are become to large file, when reaching HBase merging configuration parameter value, automatically triggers.
9. HBase data backup restoration device according to claim 6, is characterized in that, described journal file comprises the HBase table of executable operations, the corresponding executable operations that the HBase of executable operations shows corresponding Region and carries out.
10. HBase data backup restoration device according to claim 9, is characterized in that, described recovery unit, if be internal storage data for the data of required recovery, according to journal file, recovers HBase internal storage data, comprising:
If described recovery unit is internal storage data for the data of required recovery, according to HBase table name corresponding in journal file, claim the title with Region, journal file is returned under corresponding Region file; When reference document and daily record are placed under Region file corresponding position, start HBase table, Region is assigned in corresponding RegionServer, and Region can read corresponding journal file in own inner MemStore, completes HBase data and recovers.
CN201410483014.3A 2014-09-19 2014-09-19 Method and device for HBase data backup and recovery Pending CN104199963A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410483014.3A CN104199963A (en) 2014-09-19 2014-09-19 Method and device for HBase data backup and recovery

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410483014.3A CN104199963A (en) 2014-09-19 2014-09-19 Method and device for HBase data backup and recovery

Publications (1)

Publication Number Publication Date
CN104199963A true CN104199963A (en) 2014-12-10

Family

ID=52085256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410483014.3A Pending CN104199963A (en) 2014-09-19 2014-09-19 Method and device for HBase data backup and recovery

Country Status (1)

Country Link
CN (1) CN104199963A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778097A (en) * 2015-03-27 2015-07-15 新浪网技术(中国)有限公司 Data recovery method and data recovery device
CN105159945A (en) * 2015-08-10 2015-12-16 北京思特奇信息技术股份有限公司 Method and system for extracting and converting data between Hbase and Hdfs
CN105988995A (en) * 2015-01-27 2016-10-05 杭州海康威视数字技术股份有限公司 HFile based data batch loading method
CN106294008A (en) * 2016-08-05 2017-01-04 浙江宇视科技有限公司 A kind of data reconstruction method and device
CN108228752A (en) * 2017-12-21 2018-06-29 中国联合网络通信集团有限公司 Data full dose deriving method, data distribution device and data export node
US11119863B2 (en) 2015-09-25 2021-09-14 Huawei Technologies Co., Ltd. Data backup method and data processing system
US11132260B2 (en) 2015-09-25 2021-09-28 Huawei Technologies Co., Ltd. Data processing method and apparatus

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7725470B2 (en) * 2006-08-07 2010-05-25 Bea Systems, Inc. Distributed query search using partition nodes
CN101957863A (en) * 2010-10-14 2011-01-26 广州从兴电子开发有限公司 Data parallel processing method, device and system
CN102779185A (en) * 2012-06-29 2012-11-14 浙江大学 High-availability distribution type full-text index method
CN103106286A (en) * 2013-03-04 2013-05-15 曙光信息产业(北京)有限公司 Method and device for managing metadata

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7725470B2 (en) * 2006-08-07 2010-05-25 Bea Systems, Inc. Distributed query search using partition nodes
CN101957863A (en) * 2010-10-14 2011-01-26 广州从兴电子开发有限公司 Data parallel processing method, device and system
CN102779185A (en) * 2012-06-29 2012-11-14 浙江大学 High-availability distribution type full-text index method
CN103106286A (en) * 2013-03-04 2013-05-15 曙光信息产业(北京)有限公司 Method and device for managing metadata

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
栾洋洋: "《分布式数据库HBase故障恢复方法研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105988995A (en) * 2015-01-27 2016-10-05 杭州海康威视数字技术股份有限公司 HFile based data batch loading method
CN105988995B (en) * 2015-01-27 2019-05-24 杭州海康威视数字技术股份有限公司 A method of based on HFile batch load data
CN104778097A (en) * 2015-03-27 2015-07-15 新浪网技术(中国)有限公司 Data recovery method and data recovery device
CN105159945A (en) * 2015-08-10 2015-12-16 北京思特奇信息技术股份有限公司 Method and system for extracting and converting data between Hbase and Hdfs
US11119863B2 (en) 2015-09-25 2021-09-14 Huawei Technologies Co., Ltd. Data backup method and data processing system
US11132260B2 (en) 2015-09-25 2021-09-28 Huawei Technologies Co., Ltd. Data processing method and apparatus
CN106294008A (en) * 2016-08-05 2017-01-04 浙江宇视科技有限公司 A kind of data reconstruction method and device
CN106294008B (en) * 2016-08-05 2019-06-11 浙江宇视科技有限公司 A kind of data reconstruction method and device
CN108228752A (en) * 2017-12-21 2018-06-29 中国联合网络通信集团有限公司 Data full dose deriving method, data distribution device and data export node

Similar Documents

Publication Publication Date Title
CN104199963A (en) Method and device for HBase data backup and recovery
US9183268B2 (en) Partition level backup and restore of a massively parallel processing database
JP6219305B2 (en) System and method for restoring application data
US10990288B2 (en) Systems and/or methods for leveraging in-memory storage in connection with the shuffle phase of MapReduce
US8904125B1 (en) Systems and methods for creating reference-based synthetic backups
CN104239443B (en) A kind of storage method of serialized data operation log
US9047108B1 (en) Systems and methods for migrating replicated virtual machine disks
CN106021016A (en) Virtual point in time access between snapshots
WO2014058711A1 (en) Creation of inverted index system, and data processing method and apparatus
US9672113B1 (en) Data recovery from multiple data backup technologies
US11663160B2 (en) Recovering the metadata of data backed up in cloud object storage
CN105573859A (en) Data recovery method and device of database
CN106469152A (en) A kind of document handling method based on ETL and system
CN103955530A (en) Data reconstruction and optimization method of on-line repeating data deletion system
CN105677736A (en) Method and apparatus for increasing and deleting server nodes
CN103914359A (en) Data recovery method and device
CN105740462A (en) Method for supporting data migration between different environments
CN103106271A (en) Database backup and recovery method and system based on mass data
CN103778259A (en) Method for realizing data recovery of smart phone on basis of Sqlite3
CN104182436A (en) Method and device for cleaning databases
CN104820625B (en) A kind of data record, backup and the restoration methods of Information management system
US8650160B1 (en) Systems and methods for restoring multi-tier applications
US10031904B2 (en) Database management system based on a spreadsheet concept deployed in an object grid
CN106874343B (en) Data deletion method and system for time sequence database
Papadakis et al. Blocking for large-scale entity resolution: Challenges, algorithms, and practical examples

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20141210