CN104199963A - Method and device for HBase data backup and recovery - Google Patents
Method and device for HBase data backup and recovery Download PDFInfo
- Publication number
- CN104199963A CN104199963A CN201410483014.3A CN201410483014A CN104199963A CN 104199963 A CN104199963 A CN 104199963A CN 201410483014 A CN201410483014 A CN 201410483014A CN 104199963 A CN104199963 A CN 104199963A
- Authority
- CN
- China
- Prior art keywords
- data
- hbase
- recovery
- file
- region
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method and a device for HBase data backup and recovery. The method includes brushing data in an HBase memory to HFile during backup of HBase data; creating corresponding reference file for each HFile in each Region under an HBase list structure and backing up the HFile of each HBase list; during backup of the HBase data, if the data needed to be recovered are persistent data, performing persistent data recovery according to the reference files corresponding to the data needed to be recovered, if the data needed to be recovered are memory data, then recovering the data of the HBase memory according to log files. By the method and the device, the HBase data can be backed up and recovered effectively and completely.
Description
Technical field
The present invention relates to technical field of data processing, relate in particular to the method and apparatus of a kind of HBase (Hadoop Database) data backup restoration.
Background technology
Be accompanied by the arrival in mass data epoch, computation model is also experiencing the differentiation of various modes.Evolution process from single computing machine to Distributed Calculation is the inexorable trend of sustainable growth data volume.Present stage, the demands such as the analysis of large data sets, management, excavation are all that traditional database is incompetent, and according to statistics, the structural data that data base tool is processed is in GB rank, and conventional art cannot adapt to this extendability.In the technology and instrument now having, the most ripe is Hadoop file storage Computational frame and framework associated component thereon.Hadoop itself is by Hadoop distributed file system (HDFS, Hadoop Distributed File System) and distributed computing framework MapReduce form, wherein MapReduce Computational frame is mainly applicable to batch documents processing, and is HBase in the technology of mainly using aspect real-time data analysis processing.
HBase is based upon on distributed file system HDFS, and being one provides high reliability, row storage, high-performance, scalable, the distributed data base system that can read and write in real time.It can carry out retrieve data by major key and major key scope between non-relational database and relevant database.Data in HBase are times to time change, if want to use the data of section sometime to back up recovery to it.Therefore, HBase data backup restoration is very important.
In HBase, data are divided into two parts storages, and a part is in internal memory, and another part is that the form with HFile (Hadoop File) file is persisted on HDFS.Therefore, when recovering, executing data backup two-part content need be backed up to recovery, but, existing HBase data backup restoration method need to stop the service of HBase when carrying out HBase data backup restoration, affect user's operation, and carry out data backup restoration according to the journal file in HBase, be difficult to guarantee the data integrity of two parts content when backup recovers.
Summary of the invention
In order to solve the problems of the technologies described above, the invention provides a kind of HBase data backup restoration method and apparatus, can efficiently and intactly to HBase data, back up recovery.
In order to reach the object of the invention, the invention provides a kind of HBase data backup restoration method, comprising: when HBase data back up, the data in HBase internal memory are brushed in HFile file; For the corresponding reference document of each HFile document creation in each Region under HBase list structure, the HFile file of each HBase table is backed up; When HBase data are recovered, if the data of required recovery are perdurable datas, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery; If the data of required recovery are internal storage datas, according to journal file, HBase internal storage data is recovered.
Further, if the data of required recovery are perdurable datas, the reference document corresponding according to the data of required recovery, carry out perdurable data recovery, comprise: if the data of required recovery are perdurable datas, reference document corresponding to the data of required recovery put under the corresponding Region file of HBase table, carried out perdurable data recovery.
Further, carrying out perdurable data recovery is when the Region of HBase merges, to be organized into complete data, and union operation is that a plurality of HFile Piece file mergences are become to large file, when reaching HBase merging configuration parameter value, automatically triggers.
Further, journal file comprises the HBase table of executable operations, the corresponding executable operations that the HBase of executable operations shows corresponding Region and carries out.
Further, if the data of required recovery are internal storage datas, according to journal file, HBase internal storage data is recovered, comprise: if the data of required recovery are internal storage datas, according to HBase table name corresponding in journal file, claim the title with Region, journal file is returned under corresponding Region file; When reference document and daily record are placed under Region file corresponding position, start HBase table, Region is assigned in corresponding RegionServer, and Region can read corresponding journal file in own inner MemStore, completes HBase data and recovers.
A HBase data backup restoration device, comprising: backup units, for when HBase data back up, brushes the data in HBase internal memory in HFile file; For the corresponding reference document of each HFile document creation in each Region under HBase list structure, the HFile file of each HBase table is backed up; Recovery unit, for when HBase data are recovered, if the data of required recovery are perdurable datas, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery; If the data of required recovery are internal storage datas, according to journal file, HBase internal storage data is recovered.
Compared with prior art, the present invention includes: when HBase data back up, the data in HBase internal memory are brushed in HFile file; For the corresponding reference document of each HFile document creation in each Region under HBase list structure, the HFile file of each HBase table is backed up; When HBase data are recovered, if the data of required recovery are perdurable datas, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery; If the data of required recovery are internal storage datas, according to journal file, HBase internal storage data is recovered.The present invention is by the data in HBase internal memory are brushed in HFile file, unified HFile file carried out to backup operation, guaranteed when backup recovers the data integrity of internal memory and HDFS two parts content in HBase.In addition, for the corresponding reference document of each HFile document creation, to storing the position of data in HBase, back up, when carrying out data recovery, file carries out the data recovery of HBase by reference, has guaranteed the normal use of data when carrying out HBase data backup restoration.Therefore, can efficiently and intactly to HBase data, back up recovery.
Accompanying drawing explanation
Fig. 1 is the framework schematic diagram of HBase storage data of the present invention.
Fig. 2 is the schematic flow sheet of HBase data backup restoration method of the present invention.
Fig. 3 is the structural representation of HBase data backup restoration device of the present invention.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail.By these exemplifying embodiments of enough detailed description, make those skilled in the art can put into practice the present invention.Without departing from the spirit and scope in the present invention, can to implement to make logic, realize with other change.
Fig. 1 is the framework schematic diagram of HBase storage data of the present invention.As shown in Figure 1, HBase table is that form with Region is kept in RegionServer in logic.The record number corresponding when HBase table constantly increases and surpasses after certain threshold value, can automatically split into a plurality of Region, and different Region can be distributed to corresponding RegionServer by Master and manage.
Region is the minimum unit of distributed storage and load balancing in HBase, and it is upper that different Region can assign to different RegionServer, but a Region can not split on a plurality of RegionServer.It should be noted that Region is the minimum unit of distributed storage, but be not the minimum unit of storage, the minimum unit of storage is Store.
Region is comprised of one or more Store, and each Store preserves Yi Gelie family, and each Store is comprised of an internal memory (MemStore) and a plurality of StoreFile again, and wherein StoreFile is that form persistence with HFile is on HDFS.Therefore, when HBase data backup, need to back up MemStore and HFile file, when HBase data are recovered, need to recover the data of HFile file and MemStore.
Fig. 2 is the schematic flow sheet of HBase data backup restoration method of the present invention, and as shown in Figure 2, first the method backs up HBase data, and then carries out the recovery of HBase data, specifically comprises:
Step 21, when HBase data back up, brushes the data in HBase internal memory in HFile file.
In this step, in HBase, data are divided into two parts storages, and a part is in HBase internal memory, and another part is that the form with HFile file is persisted on HDFS.
When HBase data are backed up, first the data in HBase internal memory are brushed in HFile, guarantee that all data are all by persistence, follow-up just can unification HFile file is carried out to corresponding backup operation.
Step 22, is the corresponding reference document of each HFile document creation in each Region under HBase list structure, and the HFile file of each HBase table is backed up.
Step 23, when HBase data are recovered, judges whether the data of required recovery are perdurable datas, if so, enters step 24; If not, enter step 25.
Step 24, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery.
This step is specially: reference document corresponding to the data of required recovery put under the corresponding Region file of HBase table, carried out perdurable data recovery.The disk space that this reference document takies is very little, so the speed of copy can be very fast.
When reading out data, according to reference document, in backup file folder, find corresponding data to read.
It is when the Region of HBase merges, to be just organized into complete data that perdurable data recovers.Union operation is automatically to trigger when reaching the value of HBase merging configuration parameter, and union operation can become large file by a plurality of HFile Piece file mergences, can, so that need not open a plurality of files when the operations such as inquiry, only need to open a file like this.So, can, so that HBase data provide service after recovering as soon as possible, improve efficiency.
Step 25, according to journal file, recovers HBase internal storage data.
This step is that the data of persistence are not yet recovered, and by the data of recovering in journal file, is undertaken HBase internal storage data to recover.When carrying out the recovery of HBase internal storage data, in each Region, there is one's own internal memory, therefore, need to recover data to the internal memory of each Region.
The journal file of HBase is that each operation is carried out to record, and this journal file comprises the HBase table of executable operations, the corresponding executable operations that the HBase of this executable operations shows corresponding Region and carries out.According to HBase table name corresponding in journal file, claim the title with Region, journal file is returned under corresponding Region file.
When reference document and daily record have all been placed under Region file corresponding position, just can start HBase and show.When starting HBase table, Region can be assigned in corresponding RegionServer, and Region can read corresponding journal file in own inner MemStore, has completed HBase data and has recovered.
The present invention is by the data in HBase internal memory are brushed in HFile file, unified HFile file carried out to backup operation, guaranteed when backup recovers the data integrity of internal memory and HDFS two parts content in HBase.In addition, for the corresponding reference document of each HFile document creation, to storing the position of data in HBase, back up, when carrying out data recovery, file carries out the data recovery of HBase by reference, has guaranteed the normal use of data when carrying out HBase data backup restoration.Therefore, can efficiently and intactly to HBase data, back up recovery.
Fig. 3 is the structural representation of HBase data backup restoration device of the present invention, as shown in Figure 3, specifically comprises:
Backup units, for when HBase data back up, brushes the data in HBase internal memory in HFile file; For the corresponding reference document of each HFile document creation in each Region under HBase list structure, the HFile file of each HBase table is backed up;
Recovery unit, for when HBase data are recovered, judges whether the data of required recovery are perdurable data HFile files, and if so, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery; If not, according to journal file, HBase internal storage data is recovered.
HBase data backup restoration device and HBase data backup restoration method are corresponding, and therefore, the concrete details that realizes can, referring to HBase data backup restoration method, be not repeated herein.
HBase data backup restoration device of the present invention is by the data in HBase internal memory are brushed in HFile file, unified HFile file carried out to backup operation, guaranteed when backup recovers the data integrity of internal memory and HDFS two parts content in HBase.In addition, for the corresponding reference document of each HFile document creation, to storing the position of data in HBase, back up, when carrying out data recovery, file carries out the data recovery of HBase by reference, has guaranteed the normal use of data when carrying out HBase data backup restoration.Therefore, can efficiently and intactly to HBase data, back up recovery.
Be to be understood that, although this instructions is described according to embodiment, but not each embodiment only comprises an independently technical scheme, this narrating mode of instructions is only for clarity sake, those skilled in the art should make instructions as a whole, technical scheme in each embodiment also can, through appropriately combined, form other embodiments that it will be appreciated by those skilled in the art that.
Listed a series of detailed description is above only illustrating for feasibility embodiment of the present invention; they are not for limiting the scope of the invention, all disengaging within equivalent embodiment that skill spirit of the present invention does or change all should be included in protection scope of the present invention.
Claims (10)
1. a method for HBase data backup restoration, is characterized in that, comprising:
When HBase data back up, the data in HBase internal memory are brushed in HFile file; For the corresponding reference document of each HFile document creation in each Region under HBase list structure, the HFile file of each HBase table is backed up;
When HBase data are recovered, if the data of required recovery are perdurable datas, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery; If the data of required recovery are internal storage datas, according to journal file, HBase internal storage data is recovered.
2. the method for HBase data backup restoration according to claim 1, is characterized in that, if the data of described required recovery are perdurable datas, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery, comprising:
If the data of required recovery are perdurable datas, reference document corresponding to the data of required recovery put under the corresponding Region file of HBase table, carry out perdurable data recovery.
3. the method for HBase data backup restoration according to claim 1 and 2, it is characterized in that, it is described that to carry out that perdurable data recovers be when the Region of HBase merges, to be organized into complete data, union operation is that a plurality of HFile Piece file mergences are become to large file, when reaching HBase merging configuration parameter value, automatically triggers.
4. the method for HBase data backup restoration according to claim 1, is characterized in that, described journal file comprises the HBase table of executable operations, the corresponding executable operations that the HBase of executable operations shows corresponding Region and carries out.
5. the method for HBase data backup restoration according to claim 4, is characterized in that, if the data of described required recovery are internal storage datas, according to journal file, HBase internal storage data is recovered, and comprising:
If the data of required recovery are internal storage datas, according to HBase table name corresponding in journal file, claim the title with Region, journal file is returned under corresponding Region file; When reference document and daily record are placed under Region file corresponding position, start HBase table, Region is assigned in corresponding RegionServer, and Region can read corresponding journal file in own inner MemStore, completes HBase data and recovers.
6. a HBase data backup restoration device, is characterized in that, comprising:
Backup units, for when HBase data back up, brushes the data in HBase internal memory in HFile file; For the corresponding reference document of each HFile document creation in each Region under HBase list structure, the HFile file of each HBase table is backed up;
Recovery unit, for when HBase data are recovered, if the data of required recovery are perdurable datas, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery; If the data of required recovery are internal storage datas, according to journal file, HBase internal storage data is recovered.
7. HBase data backup restoration device according to claim 6, is characterized in that described recovery unit, if the data for required recovery are perdurable datas, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery, comprising:
If described recovery unit is perdurable data for the data of required recovery, reference document corresponding to the data of required recovery put under the corresponding Region file of HBase table, carry out perdurable data recovery.
8. according to the HBase data backup restoration device described in claim 6 or 7, it is characterized in that, it is described that to carry out that perdurable data recovers be when the Region of HBase merges, to be organized into complete data, union operation is that a plurality of HFile Piece file mergences are become to large file, when reaching HBase merging configuration parameter value, automatically triggers.
9. HBase data backup restoration device according to claim 6, is characterized in that, described journal file comprises the HBase table of executable operations, the corresponding executable operations that the HBase of executable operations shows corresponding Region and carries out.
10. HBase data backup restoration device according to claim 9, is characterized in that, described recovery unit, if be internal storage data for the data of required recovery, according to journal file, recovers HBase internal storage data, comprising:
If described recovery unit is internal storage data for the data of required recovery, according to HBase table name corresponding in journal file, claim the title with Region, journal file is returned under corresponding Region file; When reference document and daily record are placed under Region file corresponding position, start HBase table, Region is assigned in corresponding RegionServer, and Region can read corresponding journal file in own inner MemStore, completes HBase data and recovers.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410483014.3A CN104199963A (en) | 2014-09-19 | 2014-09-19 | Method and device for HBase data backup and recovery |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410483014.3A CN104199963A (en) | 2014-09-19 | 2014-09-19 | Method and device for HBase data backup and recovery |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104199963A true CN104199963A (en) | 2014-12-10 |
Family
ID=52085256
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410483014.3A Pending CN104199963A (en) | 2014-09-19 | 2014-09-19 | Method and device for HBase data backup and recovery |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104199963A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104778097A (en) * | 2015-03-27 | 2015-07-15 | 新浪网技术(中国)有限公司 | Data recovery method and data recovery device |
CN105159945A (en) * | 2015-08-10 | 2015-12-16 | 北京思特奇信息技术股份有限公司 | Method and system for extracting and converting data between Hbase and Hdfs |
CN105988995A (en) * | 2015-01-27 | 2016-10-05 | 杭州海康威视数字技术股份有限公司 | HFile based data batch loading method |
CN106294008A (en) * | 2016-08-05 | 2017-01-04 | 浙江宇视科技有限公司 | A kind of data reconstruction method and device |
CN108228752A (en) * | 2017-12-21 | 2018-06-29 | 中国联合网络通信集团有限公司 | Data full dose deriving method, data distribution device and data export node |
US11119863B2 (en) | 2015-09-25 | 2021-09-14 | Huawei Technologies Co., Ltd. | Data backup method and data processing system |
US11132260B2 (en) | 2015-09-25 | 2021-09-28 | Huawei Technologies Co., Ltd. | Data processing method and apparatus |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7725470B2 (en) * | 2006-08-07 | 2010-05-25 | Bea Systems, Inc. | Distributed query search using partition nodes |
CN101957863A (en) * | 2010-10-14 | 2011-01-26 | 广州从兴电子开发有限公司 | Data parallel processing method, device and system |
CN102779185A (en) * | 2012-06-29 | 2012-11-14 | 浙江大学 | High-availability distribution type full-text index method |
CN103106286A (en) * | 2013-03-04 | 2013-05-15 | 曙光信息产业(北京)有限公司 | Method and device for managing metadata |
-
2014
- 2014-09-19 CN CN201410483014.3A patent/CN104199963A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7725470B2 (en) * | 2006-08-07 | 2010-05-25 | Bea Systems, Inc. | Distributed query search using partition nodes |
CN101957863A (en) * | 2010-10-14 | 2011-01-26 | 广州从兴电子开发有限公司 | Data parallel processing method, device and system |
CN102779185A (en) * | 2012-06-29 | 2012-11-14 | 浙江大学 | High-availability distribution type full-text index method |
CN103106286A (en) * | 2013-03-04 | 2013-05-15 | 曙光信息产业(北京)有限公司 | Method and device for managing metadata |
Non-Patent Citations (1)
Title |
---|
栾洋洋: "《分布式数据库HBase故障恢复方法研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105988995A (en) * | 2015-01-27 | 2016-10-05 | 杭州海康威视数字技术股份有限公司 | HFile based data batch loading method |
CN105988995B (en) * | 2015-01-27 | 2019-05-24 | 杭州海康威视数字技术股份有限公司 | A method of based on HFile batch load data |
CN104778097A (en) * | 2015-03-27 | 2015-07-15 | 新浪网技术(中国)有限公司 | Data recovery method and data recovery device |
CN105159945A (en) * | 2015-08-10 | 2015-12-16 | 北京思特奇信息技术股份有限公司 | Method and system for extracting and converting data between Hbase and Hdfs |
US11119863B2 (en) | 2015-09-25 | 2021-09-14 | Huawei Technologies Co., Ltd. | Data backup method and data processing system |
US11132260B2 (en) | 2015-09-25 | 2021-09-28 | Huawei Technologies Co., Ltd. | Data processing method and apparatus |
CN106294008A (en) * | 2016-08-05 | 2017-01-04 | 浙江宇视科技有限公司 | A kind of data reconstruction method and device |
CN106294008B (en) * | 2016-08-05 | 2019-06-11 | 浙江宇视科技有限公司 | A kind of data reconstruction method and device |
CN108228752A (en) * | 2017-12-21 | 2018-06-29 | 中国联合网络通信集团有限公司 | Data full dose deriving method, data distribution device and data export node |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104199963A (en) | Method and device for HBase data backup and recovery | |
US9183268B2 (en) | Partition level backup and restore of a massively parallel processing database | |
JP6219305B2 (en) | System and method for restoring application data | |
US10990288B2 (en) | Systems and/or methods for leveraging in-memory storage in connection with the shuffle phase of MapReduce | |
US8904125B1 (en) | Systems and methods for creating reference-based synthetic backups | |
CN104239443B (en) | A kind of storage method of serialized data operation log | |
US9047108B1 (en) | Systems and methods for migrating replicated virtual machine disks | |
CN106021016A (en) | Virtual point in time access between snapshots | |
WO2014058711A1 (en) | Creation of inverted index system, and data processing method and apparatus | |
US9672113B1 (en) | Data recovery from multiple data backup technologies | |
US11663160B2 (en) | Recovering the metadata of data backed up in cloud object storage | |
CN105573859A (en) | Data recovery method and device of database | |
CN106469152A (en) | A kind of document handling method based on ETL and system | |
CN103955530A (en) | Data reconstruction and optimization method of on-line repeating data deletion system | |
CN105677736A (en) | Method and apparatus for increasing and deleting server nodes | |
CN103914359A (en) | Data recovery method and device | |
CN105740462A (en) | Method for supporting data migration between different environments | |
CN103106271A (en) | Database backup and recovery method and system based on mass data | |
CN103778259A (en) | Method for realizing data recovery of smart phone on basis of Sqlite3 | |
CN104182436A (en) | Method and device for cleaning databases | |
CN104820625B (en) | A kind of data record, backup and the restoration methods of Information management system | |
US8650160B1 (en) | Systems and methods for restoring multi-tier applications | |
US10031904B2 (en) | Database management system based on a spreadsheet concept deployed in an object grid | |
CN106874343B (en) | Data deletion method and system for time sequence database | |
Papadakis et al. | Blocking for large-scale entity resolution: Challenges, algorithms, and practical examples |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20141210 |