CN110287172A

CN110287172A - A method of formatting HBase data

Info

Publication number: CN110287172A
Application number: CN201910588013.8A
Authority: CN
Inventors: 李烨
Original assignee: Sichuan XW Bank Co Ltd
Current assignee: Sichuan XW Bank Co Ltd
Priority date: 2019-07-01
Filing date: 2019-07-01
Publication date: 2019-09-27
Anticipated expiration: 2039-07-01
Also published as: CN110287172B

Abstract

The invention discloses a kind of methods for formatting HBase data, belong to data format field, solve prior art progress HBase data format, operation is relatively complicated, the problem of taking a long time.The present invention stops all services of HBase cluster, while the Zookeeper and Hadoop that keep HBase cluster to rely on are still in normal operating condition；All child nodes for including under the root node and root node that store HBase metadata on Zookeeper are first deleted on HBase cluster, and all subdirectories for including under the root and root that store HBase data on Hadoop are deleted on HBase cluster；After deletion, starts all services of HBase cluster, obtain the HBase of original state.The present invention is used for quick formatting HBase data.

Description

A method of formatting HBase data

Technical field

A method of HBase data are formatted, quick formatting HBase data is used for, belongs to data format field.

Background technique

Data format refers to total data and metadata in deletion system, by system reducing to original state.Work as system In when data are no longer useful or system mode exception, by executing data format, so that system is restored to dry Net available state.

Zookeeper:ZooKeeper is one distributed, and the distributed application program coordination service of open source code is The realization of Chubby mono- open source of Google, is the significant components that Hadoop and HBase is relied on, is currently the community Apache Top open source projects.It is one and provides the software of Consistency service for Distributed Application, and the function of providing includes: configuration dimension Shield, domain name service, distributed synchronization, group service etc..

Hadoop:Hadoop includes a distributed file system HDFS and distributed computing framework MapReduce, at present It is the top project of the community Apache.Hadoop has the characteristics that high fault tolerance, and is designed to be deployed in cheap hardware On, and it provides the data that high-throughput carrys out access application, those is suitble to have the application program of super large data set.

HBase is popular one distributed NoSQL database towards column, is the top of the community Apache Open source projects, application scenarios be mainly mass data storage and high concurrent under the conditions of rigid condition retrieval.It is surveyed in exploitation It test ring border, can by executing formatting to HBase data when the data in HBase are no longer useful or HBase abnormal state Quickly to obtain the HBase of an original state, i.e., the HBase of no any data.The operation of HBase depends on Zookeeper and Hadoop, metadata are stored on Zookeeper, and data are stored on Hadoop.HBase itself does not have The method formatted or tool are provided, do not retrieved in disclosed patent about the patent for formatting HBase, in internet On also being discussed in detail without the similar method described herein for formatting HBase.The one kind being readily apparent that is able to achieve identical mesh Solution be, unload original HBase cluster, that is, need to delete all data of HBase, metadata, software package, configuration file Deng building a set of completely new HBase cluster again and (need to reinstall software package in each node of HBase cluster, reset Configuration file), but the operation of this method is relatively complicated, takes a long time.

Summary of the invention

Aiming at the problem that the studies above, the purpose of the present invention is to provide a kind of methods for formatting HBase data, solve In the prior art by unloading original HBase cluster, a set of completely new HBase cluster is built again come progress HBase data lattice Formula, operation is relatively complicated, the problem of taking a long time.

In order to achieve the above object, the present invention adopts the following technical scheme:

A method of HBase data being formatted, following steps:

S1, all services for stopping HBase cluster, while the Zookeeper and Hadoop that keep HBase cluster to rely on are still In normal operating condition；

S2, after executing step S1, the root node that HBase metadata is stored on Zookeeper is first deleted on HBase cluster And all child nodes under root node including, then delete on HBase cluster the root that HBase data are stored on Hadoop and All subdirectories for including under root；Or first deleted on HBase cluster on Hadoop store HBase data root and All subdirectories for including under root, then the root section that HBase metadata is stored on Zookeeper is deleted on HBase cluster All child nodes for including under point and root node；Or storage HBase metadata on Zookeeper is first deleted on HBase cluster Root node and root node under include all child nodes, while on HBase cluster delete Hadoop on store HBase data Root and root under include all subdirectories；

After S3, deletion, start all services of HBase cluster to get the HBase of original state is arrived.

Further, in the step S2,

The institute for including under the root node and root node that store HBase metadata on Zookeeper is deleted on HBase cluster There is a specific implementation process of child node are as follows: in the configuration file hbase-site.xml of HBase cluster The root node that HBase metadata is stored on Zookeeper is found in zookeeper.znode.parent label, after finding, All child nodes for including under root node and root node are deleted on Zookeeper；

All sons for including under the root and root that store HBase data on Hadoop are deleted on HBase cluster The specific implementation process of catalogue are as follows: in the hbase.rootdir label of the configuration file hbase-site.xml of HBase cluster The root for storing HBase data on Hadoop is found, after finding, includes under deletion root and root on Hadoop All subdirectories.

Further, processor receives the request for formatting HBase data, stops all services of HBase cluster, simultaneously The Zookeeper and Hadoop for keeping HBase cluster to rely on are still in normal operating condition；

Then, processor deletes instruction according to inquiry, and processor calls inquiry and deletion program in memory, All sub- sections for including under the root node and root node that store HBase metadata on Zookeeper are first deleted on HBase cluster Point, then all specific items for including under the root and root that store HBase data on Hadoop are deleted on HBase cluster Record；Or processor deletes instruction according to inquiry, processor calls inquiry and deletion program in memory, on HBase cluster All subdirectories for including under the root and root that store HBase data on Hadoop are first deleted, then on HBase cluster Delete all child nodes for including under the root node and root node that store HBase metadata on Zookeeper；Or processor according to Instruction is deleted in inquiry, and processor calls inquiry and deletion program in memory, first deletes Zookeeper on HBase cluster All child nodes for including under the root node and root node of upper storage HBase metadata, while being deleted on HBase cluster All subdirectories for including under the root and root of HBase data are stored on Hadoop；

After deletion, processor starts all services of HBase cluster to get the HBase of original state is arrived.

The present invention compared with the existing technology, its advantages are shown in:

One, the present invention is stored in the whole metadata and Hadoop that HBase cluster stores on Zookeeper by deleting Data, simplify realization step, the cumbersome degree for reducing operation realizes to fast implement HBase data format The optimal solution that computer handles internal object.

Detailed description of the invention

Fig. 1 is the whole metadata stored on Zookeeper first to be deleted in the present invention, then delete the number stored on Hadoop According to flow diagram.

Specific embodiment

Below in conjunction with the drawings and the specific embodiments, the invention will be further described.

A method of HBase data being formatted, following steps:

Include under the root node and root node of storage HBase metadata on deletion Zookeeper on HBase cluster is all The specific implementation process of child node are as follows: in the configuration file hbase-site.xml of HBase cluster The root node that HBase metadata is stored on Zookeeper is found in zookeeper.znode.parent label, after finding, All child nodes for including under root node and root node are deleted on Zookeeper；

Above-mentioned lookup deletion in, adopt manually the configuration file hbase-site.xml of HBase cluster into Row is searched and is deleted according to lookup result, i.e., checks to find corresponding content and provide deletion instruction by naked eyes and be deleted It removes；Or it after being received by program and finding instruction, is searched automatically in the configuration file hbase-site.xml of HBase cluster And deleted according to lookup result, wherein search the program for storing the root node of HBase metadata on Zookeeper: compiling XML analysis program (such as calling DOM4J common XML parsing library) is write,<name>is found out from hbase-site.xml Zookeeper.znode.parent</name>it marks corresponding<value>...</value>the value of label, then executing should Program is searched.It searches the program for storing the root of HBase data on Hadoop: writing XML analysis program (as called The common XML such as DOM4J parses library), it is found out from hbase-site.xml<name > hbase.rootdir</name>label The value of corresponding < value > ... </ value > label, then executes the program, is searched；The program of deletion are as follows: delete The Java API or other languages of zkCli.sh script or the deletion of node of Zookeeper can be used in node on Zookeeper The API etc. of speech；Hdfs dfs-rm-r < catalogue > or hadoop fs-rm-r < catalogue > can be used in the catalogue deleted on Hadoop Both hadoop included order, or the API of the Java API or other language that deltree using Hadoop.

Realize that the data flow formatted is as follows:

Processor receives the request for formatting HBase data, stops all services of HBase cluster, keeps simultaneously The Zookeeper and Hadoop that HBase cluster relies on are still in normal operating condition；

Processor deletes instruction according to inquiry, and processor calls inquiry and deletion program in memory, in HBase cluster All child nodes for including under the root node and root node of storage HBase metadata on upper first deletion Zookeeper, then All subdirectories for including under the root and root that store HBase data on Hadoop are deleted on HBase cluster；Or processing Device deletes instruction according to inquiry, and processor calls inquiry and deletion program in memory, first deletes on HBase cluster All subdirectories for including under the root and root of HBase data are stored on Hadoop, then are deleted on HBase cluster All child nodes for including under the root node and root node of HBase metadata are stored on Zookeeper；Or processor is according to inquiry Instruction is deleted, processor calls inquiry and deletion program in memory, first deletes on Zookeeper and deposit on HBase cluster All child nodes for including under the root node and root node of HBase metadata are stored up, while being deleted on Hadoop on HBase cluster Store all subdirectories for including under the root and root of HBase data；

The above is only the representative embodiment in the numerous concrete application ranges of the present invention, to protection scope of the present invention not structure At any restrictions.It is all using transformation or equivalence replacement and the technical solution that is formed, all fall within rights protection scope of the present invention it It is interior.

Claims

1. a kind of method for formatting HBase data, which is characterized in that following steps:

S1, stop HBase cluster all services, while keep HBase cluster rely on Zookeeper and Hadoop still in Normal operating condition；

S2, after executing step S1, the root node and root that HBase metadata is stored on Zookeeper are first deleted on HBase cluster All child nodes for including under node, then the root and root mesh that HBase data are stored on Hadoop are deleted on HBase cluster All subdirectories for including under record；Or the root and root mesh that HBase data are stored on Hadoop are first deleted on HBase cluster All subdirectories for including under record, then on HBase cluster delete Zookeeper on store HBase metadata root node and All child nodes for including under root node；Or the root that HBase metadata is stored on Zookeeper is first deleted on HBase cluster All child nodes for including under node and root node, while the root that HBase data are stored on Hadoop is deleted on HBase cluster All subdirectories for including under catalogue and root；

2. a kind of method for formatting HBase data according to claim 1, which is characterized in that in the step S2,

All sons for including under the root node and root node that store HBase metadata on Zookeeper are deleted on HBase cluster The specific implementation process of node are as follows: in the configuration file hbase-site.xml of HBase cluster The root node that HBase metadata is stored on Zookeeper is found in zookeeper.znode.parent label, after finding, All child nodes for including under root node and root node are deleted on Zookeeper；

All subdirectories for including under the root and root that store HBase data on Hadoop are deleted on HBase cluster Specific implementation process are as follows: found in the hbase.rootdir label of the configuration file hbase-site.xml of HBase cluster The root that HBase data are stored on Hadoop after finding, deletes include under root and root all on Hadoop Subdirectory.

3. a kind of method for formatting HBase data according to claim 1, which is characterized in that

Processor receives the request for formatting HBase data, stops all services of HBase cluster, while keeping HBase collection The Zookeeper and Hadoop that group relies on are still in normal operating condition；

Then, processor deletes instruction according to inquiry, and processor calls inquiry and deletion program in memory, in HBase collection First deleted on group on Zookeeper store HBase metadata root node and root node under include all child nodes, then All subdirectories for including under the root and root that store HBase data on Hadoop are deleted on HBase cluster；Or processing Device deletes instruction according to inquiry, and processor calls inquiry and deletion program in memory, first deletes on HBase cluster All subdirectories for including under the root and root of HBase data are stored on Hadoop, then are deleted on HBase cluster All child nodes for including under the root node and root node of HBase metadata are stored on Zookeeper；Or processor is according to inquiry Instruction is deleted, processor calls inquiry and deletion program in memory, first deletes on Zookeeper and deposit on HBase cluster All child nodes for including under the root node and root node of HBase metadata are stored up, while being deleted on Hadoop on HBase cluster Store all subdirectories for including under the root and root of HBase data；