CN110287172B

CN110287172B - Method for formatting HBase data

Info

Publication number: CN110287172B
Application number: CN201910588013.8A
Authority: CN
Inventors: 李烨
Original assignee: Sichuan XW Bank Co Ltd
Current assignee: Sichuan XW Bank Co Ltd
Priority date: 2019-07-01
Filing date: 2019-07-01
Publication date: 2023-05-02
Anticipated expiration: 2039-07-01
Also published as: CN110287172A

Abstract

The invention discloses a method for formatting HBase data, belongs to the field of data formatting, and solves the problems that in the prior art, the operation is complicated and the time consumption is long when the HBase data is formatted. According to the method, all services of the HBase cluster are stopped, and the Zookeeper and Hadoop on which the HBase cluster depends are kept in a normal running state; firstly deleting a root node storing HBase metadata on a Zookeeper and all child nodes contained under the root node on the HBase cluster, and deleting a root directory storing HBase data on a Hadoop and all child directories contained under the root directory on the HBase cluster; after deleting, all services of the HBase cluster are started to obtain the HBase in the initial state. The method is used for quickly formatting the HBase data.

Description

Method for formatting HBase data

Technical Field

A method for formatting HBase data is used for rapidly formatting the HBase data, and belongs to the field of data formatting.

Background

Data formatting refers to deleting all data and metadata in the system, and restoring the system to an initial state. When the data in the system is no longer useful or the system state is abnormal, the system can be quickly restored to a clean and usable state by performing data formatting.

Zookeeper: the ZooKeeper is a distributed application coordination service of open source codes, is an open source implementation of Chubbby of Google, is an important component of Hadoop and HBase dependence, and is currently a top-level open source project of Apache communities. It is a software providing a consistency service for distributed applications, the provided functions include: configuration maintenance, domain name service, distributed synchronization, group service, etc.

Hadoop: hadoop contains a distributed file system HDFS and a distributed computing framework MapReduce, which is currently the top-level item of the Apache community. Hadoop is characterized by high fault tolerance and is designed to be deployed on inexpensive hardware, and it provides high throughput access to data of applications that fit applications with very large data sets.

HBase is a very popular distributed and array-oriented NoSQL database, is a top-level open-source project of Apache communities, and has application scenes mainly of massive data storage and fixed condition retrieval under high concurrency conditions. In the development test environment, when the data in the HBase is no longer useful or the HBase state is abnormal, by formatting the HBase data, an HBase in an initial state, i.e., an HBase without any data, can be obtained quickly. The operation of HBase depends on Zookeeper and Hadoop, the metadata of which is stored on Zookeeper, and the data is stored on Hadoop. The HBase itself does not provide a method or tool for formatting, no patent is retrieved in the published patent regarding formatting the HBase, nor is there a detailed description of a method of formatting the HBase on the internet similar to that described herein. One solution that can easily be thought of and achieve the same purpose is to uninstall the original HBase cluster, namely, need to delete all data, metadata, software packages, configuration files and the like of the HBase, and to re-build a set of brand-new HBase clusters (need to reinstall the software packages and the configuration files in each node of the HBase cluster), but the operation of the method is complicated and takes a long time.

Disclosure of Invention

Aiming at the problems of the research, the invention aims to provide a method for formatting HBase data, which solves the problems of complicated operation and long time consumption in the prior art that a set of brand new HBase clusters are rebuilt to format the HBase data by unloading the original HBase clusters.

In order to achieve the above purpose, the invention adopts the following technical scheme:

a method of formatting HBase data, comprising the steps of:

s1, stopping all services of an HBase cluster, and simultaneously keeping a Zookeeper and Hadoop on which the HBase cluster depends in a normal running state;

s2, after the step S1 is executed, firstly deleting a root node storing HBase metadata on a Zookeeper and all child nodes contained under the root node on the HBase cluster, and then deleting a root directory storing HBase data on Hadoop and all child directories contained under the root directory on the HBase cluster; or deleting the root directory storing the HBase data on the Hadoop and all sub-directories contained in the root directory on the HBase cluster, and deleting the root node storing the HBase metadata on the Zookeeper and all sub-nodes contained in the root node on the HBase cluster; or firstly deleting the root node storing the HBase metadata on the Zookeeper and all sub-nodes contained under the root node on the HBase cluster, and simultaneously deleting the root directory storing the HBase data on the Hadoop and all sub-directories contained under the root directory on the HBase cluster;

and S3, after deleting, starting all services of the HBase cluster, and obtaining the HBase in an initial state.

Further, in the step S2,

the specific implementation process of deleting the root node storing the HBase metadata on the Zookeeper on the HBase cluster and all the child nodes contained under the root node is as follows: the method comprises the steps that a root node storing HBase metadata on a Zookeeper is found in a Zookeeper tag of a configuration file HBase-site.xml of an HBase cluster, and after the root node and all child nodes contained under the root node are deleted on the Zookeeper;

the specific implementation process for deleting the root directory storing the HBase data on the Hadoop on the HBase cluster and all subdirectories contained under the root directory comprises the following steps: and finding a root directory storing HBase data on the Hadoop in a HBase-site.xml HBase. Rootdir tag of a configuration file HBase cluster, and deleting the root directory and all subdirectories contained in the root directory on the Hadoop after finding.

Further, the processor receives a request for formatting HBase data, stops all services of the HBase cluster, and simultaneously keeps the Zookeeper and Hadoop on which the HBase cluster depends in a normal running state;

then, the processor calls a query and deletion program in a memory according to a query deletion instruction, firstly deletes a root node storing HBase metadata on a Zookeeper and all child nodes contained under the root node on the HBase cluster, and then deletes a root directory storing HBase data on a Hadoop and all child directories contained under the root directory on the HBase cluster; or the processor calls a query and deletion program in a memory according to a query deletion instruction, firstly deletes a root directory storing HBase data on Hadoop and all subdirectories contained under the root directory on the HBase cluster, and then deletes a root node storing HBase metadata on a Zookeeper and all subdirectories contained under the root node on the HBase cluster; or the processor calls a query and deletion program in a memory according to a query deletion instruction, firstly deletes a root node storing HBase metadata on the Zookeeper and all child nodes contained under the root node on the HBase cluster, and simultaneously deletes a root directory storing HBase data on the Hadoop and all child directories contained under the root directory on the HBase cluster;

after deleting, the processor starts all services of the HBase cluster, and then the HBase in an initial state is obtained.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the method, all metadata stored on the Zookeeper and data stored on the Hadoop by the HBase cluster are deleted, so that the implementation steps are simplified, the complexity of operation is reduced, the formatting of HBase data is realized rapidly, and the optimal solution of processing the internal object by the computer is realized.

Drawings

FIG. 1 is a flow chart of deleting all metadata stored on a Zookeeper in the present invention, and then deleting data stored on Hadoop.

Detailed Description

The invention will be further described with reference to the drawings and detailed description.

A method of formatting HBase data, comprising the steps of:

In the searching and deleting process, a manual mode is adopted to search in HBase cluster configuration files HBase-site.xml and delete according to the searching result, namely, corresponding content is found through naked eye checking and deleting instructions are given for deleting; or after receiving the searching instruction through the program, automatically searching in HBase-site.xml of the configuration file HBase cluster and deleting according to the searching result, wherein the program for searching the root node storing the HBase metadata on the Zookeeper: namely, an XML parsing program (such as a common XML parsing library such as a DOM4J is called) is written, a value of a < value > </value > tag corresponding to a < name > zookeeper/parent > tag is found out from hbase-site. Searching a root directory storing HBase data on Hadoop: an XML analysis program (such as a common XML analysis library such as a DOM4J is called) is written, a value of a < value > </value > mark corresponding to a < name > hbase. Rootdir </name > mark is found out from hbase-site.xml, and then the program is executed to search; the deleting procedure is as follows: the node deleted on the Zookeeper can adopt zkCli.sh script, java API of the deleted node of the Zookeeper or other language API, etc.; the directory on Hadoop can be deleted by using commands of hdfs dfs-rm-r < directory > or Hadoop fs-rm-r < directory > which are both carried by the Hadoop, or Java APIs of the Hadoop for deleting the directory or APIs of other languages.

The data stream that implements the formatting is as follows:

the processor receives a request for formatting HBase data, stops all services of the HBase cluster, and simultaneously keeps the Zookeeper and Hadoop relied on by the HBase cluster in a normal running state;

the processor calls a query and deletion program in a memory according to a query deletion instruction, firstly deletes a root node storing HBase metadata on a Zookeeper and all child nodes contained under the root node on the HBase cluster, and then deletes a root directory storing HBase data on a Hadoop and all child directories contained under the root directory on the HBase cluster; or the processor calls a query and deletion program in a memory according to a query deletion instruction, firstly deletes a root directory storing HBase data on Hadoop and all subdirectories contained under the root directory on the HBase cluster, and then deletes a root node storing HBase metadata on a Zookeeper and all subdirectories contained under the root node on the HBase cluster; or the processor calls a query and deletion program in a memory according to a query deletion instruction, firstly deletes a root node storing HBase metadata on the Zookeeper and all child nodes contained under the root node on the HBase cluster, and simultaneously deletes a root directory storing HBase data on the Hadoop and all child directories contained under the root directory on the HBase cluster;

The above is merely representative examples of numerous specific applications of the present invention and should not be construed as limiting the scope of the invention in any way. All technical schemes formed by adopting transformation or equivalent substitution fall within the protection scope of the invention.

Claims

1. A method of formatting HBase data, comprising the steps of:

the specific implementation process of deleting the root node storing the HBase metadata on the Zookeeper on the HBase cluster and all the child nodes contained under the root node is as follows: the method comprises the steps that a root node storing HBase metadata on a Zookeeper is found in a Zookeeper label of a configuration file hbae-se-s i t e.x m l of an H Ba s e cluster, and after the root node and all child nodes contained under the root node are deleted on the Zookeeper; the specific implementation process for deleting the root directory storing the HBase data on the Hadoop on the HBase cluster and all subdirectories contained under the root directory comprises the following steps: finding a root directory storing HBase data on the Hadoop in a HBase-site.xml HBase database tag of the HBase cluster, and deleting the root directory and all subdirectories contained in the root directory on the Hadoop after finding;

2. The method for formatting HBase data according to claim 1, wherein the processor receives a request for formatting HBase data, stops all services of the HBase cluster, and simultaneously keeps the Zookeeper and Hadoop on which the HBase cluster depends still in a normal operation state; then, the processor calls a query and deletion program in a memory according to a query deletion instruction, firstly deletes a root node storing HBase metadata on a Zookeeper and all child nodes contained under the root node on the HBase cluster, and then deletes a root directory storing HBase data on a Hadoop and all child directories contained under the root directory on the HBase cluster; or the processor calls a query and deletion program in a memory according to a query deletion instruction, firstly deletes a root directory storing HBase data on Hadoop and all subdirectories contained under the root directory on the HBase cluster, and then deletes a root node storing HBase metadata on a Zookeeper and all subdirectories contained under the root node on the HBase cluster; or the processor calls a query and deletion program in a memory according to a query deletion instruction, firstly deletes a root node storing HBase metadata on the Zookeeper and all child nodes contained under the root node on the HBase cluster, and simultaneously deletes a root directory storing HBase data on the Hadoop and all child directories contained under the root directory on the HBase cluster; after deleting, the processor starts all services of the HBase cluster, and then the HBase in an initial state is obtained.